Anomaly detection method based on semi-supervised context representation learning
By employing ensemble-level hierarchical learning and context-calibrated anomaly scoring, this approach addresses the shortcomings of existing methods, such as lack of context modeling and low utilization of supervisory signals. It achieves robust detection of complex anomalies, improving data efficiency and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing semi-supervised anomaly detection methods lack contextual modeling, cannot capture high-order interaction relationships between points in the set, have low utilization of supervision signals, and weak noise resistance, resulting in unstable anomaly scoring.
We employ ensemble-level hierarchical anomaly learning and context-based anomaly score calibration. By capturing high-order dependencies through a neural network architecture, we design a context calibration mechanism and use a small number of labeled anomalies to generate rich supervisory signals to eliminate noise bias.
It achieves robust detection of complex anomalies, improves data efficiency and robustness, performs well with a small amount of labeled data, and has strong noise resistance.
Smart Images

Figure CN122241507A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer science, and more specifically, to an anomaly detection method based on semi-supervised context representation learning. Background Technology
[0002] Anomaly detection (AD) is the process of identifying data points that significantly deviate from the majority of samples. This research field is crucial in data mining and information retrieval, and is widely used in scenarios such as industrial equipment monitoring, fraud detection, and network intrusion detection. The core challenge of anomaly detection lies in how to define and capture this "deviation." Traditional definitions consider anomalies not as an intrinsic property of a point, but as a relative concept related to its surrounding context.
[0003] In recent years, semi-supervised anomaly detection (SSAD) has attracted attention due to its ability to utilize a small amount of labeled anomaly data and a large amount of unlabeled data. Existing deep SSAD methods mainly include end-to-end regression methods, reconstruction-based methods (such as autoencoders or GANs), distribution-based methods, single-classification learning-based methods, and distance-based methods. Although these methods have made some progress, they still have significant limitations: First, most existing methods are based on point-wise or pair-wise scoring mechanisms. For example, some methods learn a function that maps a single point to anomaly scores, ignoring explicit interactions between points; while others introduce pairwise comparisons (such as distances to the center point or random points), a single reference point cannot provide a complete contextual view sufficient to identify complex anomalies. This "point-center" or "pair-center" perspective not only ignores the context-dependent nature of anomalies (i.e., anomalies are relative to a collective) but also fails to fully utilize the potential high-order interaction information in the data.
[0004] Secondly, existing methods do not make sufficient use of the supervision signal. Focusing only on a single labeled point or pair limits the richness of supervision information extracted from limited labeled data. In fact, a small number of labeled anomalies can be combined to form a large set with different anomaly grades, providing a denser training signal.
[0005] Finally, existing methods are often not robust enough to handling noise (contamination) in unlabeled data. Unlabeled data pools often contain unknown anomalies, which can lead to scoring bias if used directly as a normal reference. Existing scoring mechanisms lack effective calibration methods to eliminate this systematic bias introduced by the "dirty" context. Summary of the Invention
[0006] The purpose of this disclosure is to provide an anomaly detection method based on semi-supervised context representation learning, which aims to solve the problems of existing methods lacking context modeling, relying on isolated points or simple point pairs for scoring, failing to capture high-order interaction relationships of points in the set, and having difficulty identifying complex anomalies that disrupt the group structure; low utilization of supervision signals, failing to take advantage of the rich supervision information brought by set combination, resulting in performance limitations when there are very few labeled samples; weak noise resistance and calibration ability, lacking a calibration mechanism for unlabeled data contamination, resulting in unstable anomaly scoring and easy false alarms.
[0007] In general, this paper presents an anomaly detection method based on semi-supervised context representation learning. The system inputs multidimensional feature data collected by the industrial equipment monitoring system, and obtains a final score through two stages: set-level hierarchical anomaly learning and context anomaly score calibration. The final score is compared with a preset threshold. When the threshold is exceeded, the target monitoring object is determined to be in an abnormal state, and a corresponding equipment alarm signal or control command is output. The specific implementation method of the set-level hierarchical anomaly learning phase is as follows: First, a training dataset and hierarchical supervision signals are constructed: multi-dimensional feature data from a pre-collected industrial equipment monitoring system are acquired to build the training dataset. and the training dataset Divided into labeled anomaly datasets and unlabeled datasets The labeled anomaly dataset It contains a small number of anomalous samples that are pre-defined as system malfunctions, intrusions, or violations; the unlabeled dataset... The dataset contains a large number of unlabeled samples, which are primarily system normal operation status data in a physical sense, and may contain a small number of unidentified anomalous samples; in each training batch, samples are selected from the labeled anomalous dataset. and unlabeled datasets Samples were extracted from the middle and mixed to construct several groups containing Training set of samples , as a scoring function for the training set The basic unit; and statistics for each training set. The contents contained in The absolute number of labeled abnormal samples is defined as the number of samples in the training set. Hierarchical supervision label ; Secondly, we construct an attention-based ensemble scoring model and design a neural network architecture that satisfies permutation invariance. Then, model optimization training is performed; The specific implementation method of the context anomaly score calibration phase is as follows: First, based on context-based scoring and calibration, an anomaly scoring strategy based on context calibration is designed for the final scoring; Then, based on the computational acceleration of expected decoupling, edge detection and final scoring are performed.
[0008] The set scoring function formed by the neural network architecture that satisfies permutation invariance It includes the following main operations: Embedding: Using shared embedding functions Each point in the set Projecting onto the potential space yields ; Interaction: on the collection Vector set obtained by embedding projection By applying a multi-head self-attention mechanism, higher-order dependencies between points are captured to obtain context embeddings. : ; Aggregation: Applying permutation-invariant pooling to aggregate the features of all points, generating a set-level representation vector. ; Return: Input regression head Output the scalar score predicted for this set. .
[0009] Specifically, given a target set Set scoring function Defined as: Among them, the embedding function The specific form depends on the data structure of the object being detected, and a multilayer perceptron (MLP) is generally used. This refers to bullish self-attention trading; This refers to aggregation operations, typically using average pooling; regression head Implemented using a multilayer perceptron.
[0010] The specific method for optimizing and training the model is as follows: the training objective of ensemble hierarchical anomaly learning is to minimize the difference between the number of predicted anomalies and the number of actual anomalies. Mean absolute error between: .
[0011] The specific implementation method of the context-calibrated anomaly scoring strategy is as follows: First, perform context sampling, from The random sampling size is context set ; Next, the raw scores are calculated, the test points are placed in the context, and the set of outlier scores are calculated: ; Then, compute the given context. The baseline score, from Randomly sample several reference points Calculate the expected outlier scores for these reference points in the given context: ; Finally, outlier score normalization is performed by subtracting the baseline score from the original score of the test point to calibrate it: .
[0012] The computational acceleration method for marginalization and final scoring based on expectation decoupling involves randomly sampling multiple context sets and calculating the expectation of the normalized value of the anomaly score. Furthermore, computational speed is achieved by decoupling the expected difference using the linearity of the expected value: The first item is the assessment target point. The target item required for the exception calculation, the second item does not contain any related to The relevant variables are global constants that can be calculated and stored in advance; in engineering implementation, the presampling context... and reference points The constant is calculated and stored offline for online evaluation of any data to be tested.
[0013] The specific implementation steps of the Monte Carlo approximation of the expectation-based anomaly scoring method are as follows: First, from Random sampling The size is context set ; After that, from Random sampling Reference points The reference points are placed into different context sets, and the global score for reference is calculated and stored using the trained set scoring function. ; Finally, for test points that require online testing Each of these is placed into the context set obtained from the first step of sampling, the set anomaly score is calculated, and then normalized using the pre-calculated reference global score: The technical effects to be achieved by the embodiments of the present invention are as follows: (1) A set-level graded learning framework is proposed. Anomaly detection is refactored into a set scoring problem. Instead of scoring individual points, the model learns to quantify the "collective anomaly degree" (i.e. the number of anomalies contained in the set) within a set, thereby directly modeling the complex group-level interactions that define anomalies.
[0014] (2) An attention-based set encoder was designed. By capturing the dependencies between all points in the set through a self-attention mechanism, it can perceive how abnormal samples distort the overall structure of the set.
[0015] (3) A context-calibrated anomaly scoring mechanism is proposed. During the inference phase, the performance of test points in multiple randomly sampled contexts is aggregated and standardized for the benchmark of each context, which effectively eliminates the bias caused by sampling noise and data pollution, and provides robust and calibrated anomaly scores. Attached Figure Description
[0016] The above and other objects and features of this disclosure will become clearer from the following description taken in conjunction with the accompanying drawings.
[0017] Figure 1 This is a schematic diagram illustrating the architecture of an anomaly detection method based on semi-supervised context representation learning according to an embodiment of the present disclosure. Detailed Implementation
[0018] The following detailed embodiments are provided to aid the reader in gaining a comprehensive understanding of the methods, apparatus, and / or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and / or systems described herein will become apparent upon understanding this disclosure. For example, the order of operations described herein is merely illustrative and is not limited to those orders set forth herein, but may be changed as will become clear upon understanding this disclosure, except for operations that must occur in a specific order. Furthermore, for clarity and conciseness, descriptions of features known in the art may be omitted.
[0019] The features described herein may be implemented in different forms and should not be construed as limited to the examples described herein. Rather, the examples described herein are provided only to illustrate some of the many feasible ways of implementing the methods, apparatus, and / or systems described herein, which will become clear upon understanding the disclosure of this application.
[0020] As used herein, the term “and / or” includes any one of the associated listed items and any combination of any two or more.
[0021] Although terms such as “first,” “second,” and “third” may be used herein to describe various components, assemblies, regions, layers, or parts, these components, assemblies, regions, layers, or parts should not be limited by these terms. Rather, these terms are used only to distinguish one component, assembly, region, layer, or part from another. Thus, without departing from the teaching of the examples described herein, the first component, first assembly, first region, first layer, or first part referred to as the first component, first assembly, first region, first layer, or first part may also be referred to as the second component, second assembly, second region, second layer, or second part.
[0022] In the specification, when an element (such as a layer, region, or substrate) is described as being "on" another element, "connected to," or "bonded to" another element, the element may be directly "on" another element, directly "connected to," or "bonded to" the other element, or one or more other elements may be present in between. Conversely, when an element is described as being "directly on" another element, "directly connected to," or "directly bonded to" another element, no other elements may be present in between.
[0023] The terminology used herein is for the purpose of describing various examples only and is not intended to limit disclosure. Unless the context clearly indicates otherwise, the singular form is intended to include the plural form as well. The terms “comprising,” “including,” and “having” indicate the presence of the described features, quantities, operations, components, elements, and / or combinations thereof, but do not preclude the presence or addition of one or more other features, quantities, operations, components, elements, and / or combinations thereof.
[0024] Unless otherwise defined, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains upon understanding this disclosure. Unless expressly defined herein, terms (such as those defined in a general dictionary) shall be interpreted as having a meaning consistent with their meaning in the context of the relevant field and in this disclosure, and shall not be interpreted in an idealized or overly formalistic manner.
[0025] Furthermore, in the description of the examples, detailed descriptions of well-known related structures or functions will be omitted when it is believed that such detailed descriptions would lead to a vague interpretation of this disclosure.
[0026] Figure 1 This is a schematic diagram illustrating an anomaly detection method based on semi-supervised context representation learning according to an embodiment of the present disclosure.
[0027] To achieve the aforementioned objectives, this invention proposes a semi-supervised anomaly detection framework called SetAD, applied to anomaly identification scenarios in target monitoring systems (such as industrial equipment monitoring systems, computer network security systems, or financial risk control systems). The method uses multi-dimensional feature data collected by the target monitoring system as input, wherein the input data includes a large number of unlabeled samples representing the normal operating state of the system and a small number of labeled anomaly samples representing system malfunctions, intrusions, or violations. The method outputs an anomaly score for the detected object and compares it with a preset threshold. When the score exceeds the threshold, the target monitoring object is determined to be in an anomaly state, and a corresponding alarm signal or control command is output. Specifically, this method includes "ensemble-level hierarchical anomaly learning" in the training phase and "context-calibrated anomaly scoring" in the inference phase.
[0028] Problem Definition and Core Concepts Given dataset ,in This is unlabeled data (mainly normal samples). These are known anomalous samples, and .
[0029] The goal of this invention is to learn an anomaly scoring function. This makes it possible for any abnormal sample and any normal sample Try to make Unlike traditional methods, this invention introduces a set. As a basic processing unit, each set contains a certain number of marked exceptions.
[0030] Context-based anomaly learning computation method The main framework of this invention is as follows Figure 1 As shown. This is to obtain the anomaly scoring function for a single point. SetAD design and training aids for ensemble anomaly scoring functions Used to measure a given set The design of the set anomaly scoring function follows the core principle that "anomalies are not characteristics of the data points themselves, but rather the degree of deviation of the data from its context." Subsequently, in order to eliminate the impact of outliers that may be included in the sampled set during the inference process on the evaluation target point, this invention designs a context-calibrated outlier scoring method to accurately evaluate the degree of outlier of the target point in data that may contain contamination.
[0031] Specifically, the calculation process of SetAD semi-supervised anomaly detection is divided into two main stages: Phase 1: Hierarchical Anomaly Learning at the Collection Level (Training Phase) Step (1): Constructing the training set and hierarchical supervision signals This step involves the dataset. Sampling constructs are used to train the set scoring function The basic unit. In each training batch, several units of size are sampled from the training set to construct... set Each set Contains data from the marked exception set. Known anomalous samples and samples from unlabeled datasets A mixture of unlabeled samples. Define the supervision labels for this set. The number of known anomalies in the set (e.g., 0, 1, 2). This sampling strategy introduces graded supervision, forcing the model to learn how the severity of anomalies in the set changes as the number of anomaly instances increases.
[0032] Step (2): Attention-based ensemble scoring model Since the degree of anomaly of a set should not be related to the order of the points within the set, in order to determine the anomaly of a set... To perform scoring, design a neural network architecture that satisfies permutation invariance, which includes the following main operations: Embedding: Using shared embedding functions (e.g., MLP) will use each point in the set Projecting onto the potential space yields .
[0033] Interaction: interaction with a collection Vector set obtained by embedding projection By applying a multi-head self-attention mechanism, higher-order dependencies between points are captured to obtain the context embedding. : (1) Aggregation: Applying permutation-invariant pooling operations (such as Sum Pooling) to aggregate the features of all points, generating a set-level representation vector. .
[0034] Regression: Input regression head (e.g., MLP), outputs the scalar score predicted for that set. .
[0035] Step (3): Model Optimization The training objective of ensemble hierarchical anomaly learning is to minimize the difference between the number of predicted anomalies and the number of actual anomalies. Mean absolute error (MAE) between: (2) By iteratively sampling the set and optimizing the above training objectives, the model can learn the degree of anomalies within the quantified set.
[0036] Phase Two: Context Anomaly Score Calibration (Inference Phase) Step (4): Context-based scoring and calibration Due to unlabeled datasets It may contain exceptions; sample the context directly from it. With test points Form a set and apply the trained data. Its rating as Abnormal scores may be due to The presence of outliers causes a test point that is actually a normal sample to receive a high outlier score. Therefore, this invention designs an outlier scoring strategy based on context calibration, which consists of the following steps: Context sampling: from The random sampling size is context set .
[0037] Calculate the raw score: Place the test point in the context and calculate the set anomaly score: (3) Calculate given context Baseline score: from Randomly sample several reference points Calculate the expected outlier scores for these reference points in the given context: (4) because The main point is the normal point, and the above formula can represent the normal point in a given context. The expected performance of abnormal scores.
[0038] Outlier score normalization: calibrate the test points by subtracting the baseline score from their original scores. (5) Step (5): Marginalization and Final Scoring The context calibration strategy applied in step (4) can effectively mitigate the impact of noise bands in the context set on the evaluation of target point anomalies. However, a single context set sampled randomly... It may be impossible to accurately assess the test points. The degree of anomaly. To obtain stable anomaly assessment results, multiple context sets are randomly sampled, and the expected value of the normalized anomaly score obtained in step 4 is calculated: (6) Computational acceleration based on desired decoupling The Monte Carlo implementation of the above reasoning process involves sampling. Context set and Reference points The nested loops are complex to compute in engineering. Note that the expected difference in equation (6) can be decoupled using the linearity of the expected value: (7) The first term in equation (7) is the evaluation target point. The exception requires the calculation of the target item, while the second item does not contain any related to... The relevant variables are global constants that can be calculated and stored in advance. In engineering implementation, the context can be presampled. and reference points This constant is calculated and stored offline for online evaluation of any data to be detected. The aforementioned decoupling process aims to reduce the inference complexity from... The square order is reduced to The linear level significantly improves online inference speed.
[0039] Compared with the prior art, the present invention has the following significant advantages: Capturing higher-order interactions and contextual structures: Through a collection-level encoder and self-attention mechanism, SetAD can understand how anomalies disrupt the collective structure of their local contexts, enabling the detection of more subtle and complex anomalies compared to point-level or pair-level methods.
[0040] Extremely high data efficiency: By combining a limited number of labeled anomalies, a large number of sets with different anomaly levels are generated, greatly enriching the supervision signal. Experiments show that even with very little labeled data (e.g., 1%), SetAD significantly outperforms existing state-of-the-art models.
[0041] High robustness (resistant to contamination): The context-calibrated anomaly scoring mechanism effectively offsets biases caused by potential anomalies in unlabeled data by subtracting the context benchmark score. In experiments with noise levels as high as 20%, SetAD exhibits the smallest performance degradation and the most stable performance compared to state-of-the-art methods.
[0042] Flexible and efficient reasoning: The proposed expectation decoupling strategy eliminates the need for repeated benchmark calculations during the reasoning process, ensuring the scalability and real-time performance of the algorithm on large-scale datasets.
[0043] While some embodiments of this disclosure have been shown and described, those skilled in the art will understand that modifications may be made to these embodiments without departing from the principles and spirit of this disclosure, which are defined by the claims and their equivalents.
Claims
1. An anomaly detection method based on semi-supervised context representation learning, characterized in that, The system inputs multidimensional feature data collected by the industrial equipment monitoring system, and obtains a final score through two stages: set-level hierarchical anomaly learning and context anomaly score calibration. The final score is compared with a preset threshold. When the threshold is exceeded, the target monitoring object is determined to be in an abnormal state, and a corresponding equipment alarm signal or control command is output. The specific implementation method of the set-level hierarchical anomaly learning phase is as follows: First, the training dataset and hierarchical supervision signals are constructed: multi-dimensional feature data from the pre-collected industrial equipment monitoring system are obtained to build the training dataset. and the training dataset Divided into labeled anomaly datasets and unlabeled datasets The labeled abnormal dataset It contains a small number of anomalous samples that are explicitly marked as system malfunctions, intrusions, or violations; the unlabeled dataset The dataset contains a large number of unlabeled samples, which are primarily system normal operation status data in physical terms, and also include a small number of unidentified anomalous samples; in each training batch, samples are selected from the labeled anomalous dataset. and unlabeled datasets Samples were extracted from the middle and mixed to construct several groups containing Training set of samples , as a scoring function for the training set The basic unit; and statistics for each training set. The contents contained in The absolute number of labeled abnormal samples is defined as the number of samples in the training set. Hierarchical supervision label ; Secondly, we construct an attention-based ensemble scoring model and design a neural network architecture that satisfies permutation invariance. Then, model optimization training is performed; The specific implementation method of the context anomaly score calibration phase is as follows: First, based on context-based scoring and calibration, an anomaly scoring strategy based on context calibration is designed for the final scoring; Then, based on the computational acceleration of expected decoupling, edge detection and final scoring are performed.
2. The anomaly detection method based on semi-supervised context representation learning as described in claim 1, characterized in that, The set scoring function formed by the neural network architecture that satisfies permutation invariance It includes the following main operations: Embedding: Using shared embedding functions Each point in the set Projecting onto the potential space yields ; Interaction: on the collection Vector set obtained by embedding projection By applying a multi-head self-attention mechanism, higher-order dependencies between points are captured to obtain context embeddings. : ,in , , This indicates that the vector set is... Learnable projection transformation matrix , , The query feature matrix, key feature matrix, and value feature matrix obtained by linear transformation express The hidden layer dimension; Aggregation: Applying permutation-invariant pooling to aggregate the features of all points, generating a set-level representation vector. ; Return: Input regression head Output the scalar score predicted for this set. ; Specifically, given a target set Set scoring function Defined as: in, Embedded functions; This refers to bullish self-attention trading; It's a set aggregation operation; regression head. Implemented using a multilayer perceptron.
3. The anomaly detection method based on semi-supervised context representation learning as described in claim 2, characterized in that, The specific method for optimizing and training the model is as follows: given a set of learnable parameters for the model... ,in Each represents the embedding function. The learnable parameters and the regression head Learnable parameters Let each represent a learnable projection transformation matrix of the query, key, and value matrices of the multi-head self-attention mechanism, and construct a matrix based on the method containing... The training set of the nth sample, where the nth sample is... Sample Training labels Given the number of labeled outliers it contains, the training objective of ensemble hierarchical anomaly learning is to minimize the difference between the predicted number of anomalies and the actual number of anomalies. Mean absolute error between: .
4. The anomaly detection method based on semi-supervised context representation learning as described in claim 3, characterized in that, The specific implementation method of the context-calibrated anomaly scoring strategy is as follows: First, perform context sampling, from The random sampling size is context set ; Next, the raw scores are calculated for the samples to be detected in the actual scenario. Place it in context and calculate the set anomaly score: ,in Given the context; Then, compute the given context. The baseline score, from Randomly sample several reference points Calculate the expected outlier scores for these reference points in the given context: ; Finally, outlier score normalization is performed by subtracting the baseline score from the original score of the test point to calibrate it: .
5. The anomaly detection method based on semi-supervised context representation learning as described in claim 4, characterized in that, The computational acceleration method for marginalization and final scoring based on expectation decoupling involves randomly sampling multiple context sets and calculating the expectation of the normalized value of the anomaly score. ,in This represents the computation of the expected value operation, and decouples the expected difference using the linearity of the expected value to accelerate computation: The first item is the assessment target point. The target item required for the exception calculation, the second item does not contain any related to The relevant variables are global constants that can be calculated and stored in advance; in engineering implementation, the presampling context... and reference points The constant is calculated and stored offline for online evaluation of any data to be tested.
6. The computation acceleration method based on desired decoupling as described in claim 5, characterized in that, The implementation steps of the Monte Carlo approximation of the expectation-based anomaly scoring method are as follows: First, from Random sampling The size is context set ; After that, from Random sampling The reference point, of which the Each reference point is represented as The reference points are placed into different context sets, and the global score for reference is calculated and stored using the trained set scoring function. ; Finally, for test points that require online testing Each of these is placed into the context set obtained from the first step of sampling, the set anomaly score is calculated, and then normalized using the pre-calculated reference global score: .