Memory-guided dynamic heterogeneous graph representation learning denoising method and device

By using a memory-guided two-stage loss calibration framework to mine reliable interaction sets and bucket statistics in dynamic heterogeneous graphs, the problem of identifying and suppressing structural noise in dynamic heterogeneous graphs is solved, and efficient and stable anomaly scoring for target system anomaly detection is achieved.

CN122285604APending Publication Date: 2026-06-26NAT UNIV OF DEFENSE TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NAT UNIV OF DEFENSE TECH
Filing Date
2026-06-01
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In anomaly detection of target systems, structural noise exists in dynamic heterogeneous graphs. Existing methods struggle to effectively identify and suppress this noise, resulting in insufficient accuracy and stability in anomaly detection.

Method used

A memory-guided two-stage loss calibration framework is adopted. By mining reliable interaction sets and bucket statistics, the influence of structural noise is suppressed, and the trained model is used to generate warning signals through anomaly scoring.

Benefits of technology

It can improve the practicality and robustness of denoising methods without noise annotation, overcome scale bias across types and time series, enhance robustness in dynamic heterogeneous environments, and improve the performance of downstream tasks and the accuracy of anomaly detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122285604A_ABST
    Figure CN122285604A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of graph data processing technology. Addressing the problem of structural noise such as spurious interactions, missing edges, and temporal inconsistencies in dynamic heterogeneous graphs constructed from real logs, which leads to misleading representation learning and degrades the performance of downstream tasks, this invention provides a memory-guided denoising method and apparatus for dynamic heterogeneous graph representation learning. The method proposes a model-independent two-stage loss-calibrated memory-guided denoising framework: the first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model, mining reliable interaction sets based solely on link prediction supervision signals and ranking criteria, without the need for noise annotation; the second stage performs type-time bucket loss calibration on reliable interactions, estimates the probability of soft noise, and weights the training interactions to suppress the influence of noisy interactions. Experiments show that on multiple real-world dynamic heterogeneous graph datasets, this invention significantly improves the robustness of the backbone model and link prediction performance, demonstrating effectiveness and broad applicability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of graph data processing technology, specifically relating to a memory-guided dynamic heterogeneous graph representation learning denoising method and apparatus. Background Technology

[0002] Graph structures, as a data structure used to organize and store complex data, can provide a structured description of the actors and their actions within a system. In the security monitoring and analysis of target systems such as server systems, network devices, or cloud platforms, graph structures reflecting system behavior patterns can be constructed by mapping subjects (such as processes, users, IP addresses, or container instances) in system operation logs or network traffic logs to nodes, and mapping their operations or communication behaviors (such as file read / write, network connections, process creation, or API calls) to typed relationship edges.

[0003] In fields such as machine learning, graph representation learning is a fundamental technique for modeling graph-structured data. In recent years, Graph Neural Networks (GNNs) have achieved significant success in a wide range of graph learning tasks, including node classification, link prediction, and clustering. In security applications such as anomaly detection in target systems, modeling entity interactions in system logs or network traffic using graph structures has become an important means of detecting anomalous behavior. Meanwhile, many real-world graphs are neither homogeneous nor static: multiple node and relation types coexist, and interaction patterns evolve over time. To jointly model categorical association structures and temporal evolution, Dynamic Heterogeneous Graphs (DHGs) have emerged as a powerful paradigm, providing richer representations of real-world systems than static or homogeneous graphs.

[0004] While DHGs possess powerful expressive capabilities, those built from real-world logs inevitably contain structural noise, primarily stemming from imperfect data generation and collection processes that vary significantly across disciplines. For instance, in e-commerce DHGs, accidental clicks, bot behavior, and brief exposures may be recorded as meaningful user-item interactions, creating spurious edges. In academic DHGs such as Aminer / DBLP, ambiguous author names, missing metadata, and indexing delays can lead to incorrect co-authorship or citation relationships, or shift interaction timestamps. In financial DHGs, transient transactions and anomalous events may manifest as structural outliers, while reporting delays can result in missing or delayed edges. In DHGs designed for anomaly detection in target systems, when constructing graph structures from system operation logs or network traffic logs, incomplete log collection, false positives and false negatives, timestamp discrepancies, and attacker evasion tactics all introduce structural noise into the graph, blurring the boundaries between normal and anomalous behavior and increasing the risk of false positives and false negatives. Learning directly from noisy DHGs risks capturing spurious structures and temporal correlations, thereby impairing representation quality and downstream task performance. Therefore, an effective denoising mechanism is a crucial prerequisite for reliable learning on dynamic heterogeneous graphs.

[0005] A natural approach to addressing the prevalence of noisy structures is to introduce denoising or robustness mechanisms into graph representation learning. Existing research has proposed several useful building blocks, but most are designed for simpler scenarios or single-dimensional complexity. In simple (static and homogeneous) graphs, noise is typically modeled as spurious edges or feature contamination and mitigated through edge pruning, graph sparsity, or robust learning objectives. For complex scenarios beyond simple graphs, previous research has generally advanced denoising techniques along two relatively independent directions: temporal dynamics and structural heterogeneity. For dynamic graphs, representative methods utilize temporal smoothing, evolutionary constraints, or temporally aware attention mechanisms to reduce the impact of transient or unreliable structures between snapshots; while for (static) heterogeneous graphs, denoising methods explicitly model node and relation types, using meta-path semantics or relation-aware modeling to identify noise in cross-type connections. However, DHGs couple temporal drift with heterogeneous semantics, making noise inherently dependent on both temporal context and relation type; therefore, methods addressing only one aspect often perform poorly in joint settings. This limitation is particularly prominent in target system anomaly detection scenarios, because the abnormal behavior itself may only manifest as a structural deviation in a specific relationship type within a specific time window. If the denoising mechanism cannot capture the coupling features of time and type at the same time, it is difficult to effectively distinguish noise from real abnormal signals.

[0006] Specifically, existing methods face the following technical bottlenecks in practical applications: (1) Loss scale that is not comparable across time and type: Heterogeneous relation semantics and temporal drift lead to significant differences in loss magnitude and convergence speed between different relation types and time snapshots. This causes denoising strategies based on global thresholds or naive heuristics (such as "low loss equals clean") to produce systematic biases, which may unfairly suppress types or recent snapshots that are already difficult to learn.

[0007] (2) Context-dependent and propagating noise: Noise in DHG rarely satisfies the independent and identically distributed assumption. An interaction only manifests as noise in a specific time context or relation type. Moreover, such noise may propagate further through time-series message passing and cross-type dependencies, leading to unstable denoising decisions.

[0008] (3) Loss fluctuations caused by optimization: Link prediction training relies on negative sampling and mini-batch organization, which introduces additional variance in the loss of each edge; without proper grouping and calibration, criteria based purely on loss may not be able to generalize reliably across different backbone models and datasets.

[0009] In summary, in security applications that rely on log graph structured data, such as target system anomaly detection, how to achieve accurate and stable structural noise identification and suppression in dynamic heterogeneous graphs that couple temporal dynamics and structural heterogeneity has become an urgent problem to be solved in this field. Summary of the Invention

[0010] To address the aforementioned technical challenges, and specifically for anomaly detection scenarios in target systems, this invention proposes a memory-guided dynamic heterogeneous graph representation learning denoising method and apparatus. This method acquires noisy time snapshot sequences and constructs training interactions, employing a two-stage loss-calibrated memory-guided denoising framework: the first stage utilizes the memory effect to mine reliable sets from positive interactions (preheating training, ranking evaluation, and sliding window memory history); the second stage performs noise-aware training based on reliable interactions (bucket statistics, loss calibration, and weighted training) to suppress the influence of structural noise, and uses the trained model to score anomalies in interactions constructed from real-time collected operation logs or network traffic data. When the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

[0011] This invention provides a memory-guided dynamic heterogeneous graph representation learning-based denoising method for anomaly detection in target systems. The method includes: Obtain a time snapshot sequence of a dynamic heterogeneous graph constructed from the target system's operation logs or network traffic logs. Each time snapshot in the sequence contains typed nodes and typed relation edges. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when it is constructed from the operation logs or network traffic logs. Construct a training interaction set, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling; A memory-guided denoising approach with a two-stage loss calibration framework is employed to achieve robust representation learning on dynamic heterogeneous graphs. The first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model to mine a reliable set of interactions from positive interactions without the need for noise annotation. This includes: performing preheating training on a noisy dynamic heterogeneous graph; evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on a ranking criterion after each round of preheating training; and constructing a reliable set of interactions based on the memory history of the memory states in multiple training rounds within a sliding window. The second stage involves noise-aware training based on the reliable interaction set, including: binning the training interactions according to time snapshots and relationship types; calculating bin-level statistics within each interaction bin based on the loss distribution of positive interactions in the reliable interaction set; calibrating the loss of each training interaction and estimating the noise probability based on the bin-level statistics; and using the noise probability to perform weighted training on the training interactions to suppress the influence of noisy interactions. Using a dynamic heterogeneous graph backbone model trained with noise perception, anomaly scoring is performed on the interaction to be tested constructed from real-time collected operation logs or network traffic data. If the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

[0012] On the other hand, this invention protects a memory-guided dynamic heterogeneous graph representation learning denoising device, applied to anomaly detection in a target system, comprising: The first module is used to obtain a time snapshot sequence of a dynamic heterogeneous graph constructed from the target system's operation logs or network traffic logs. Each time snapshot in the sequence contains a node with a type and a relation edge with a type. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when it is constructed from the operation logs or network traffic logs. The second module is used to construct a training interaction set, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling; The third module is used to achieve robust representation learning on dynamic heterogeneous graphs by employing memory-guided denoising with a two-stage loss calibration framework. The first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model to mine a reliable set of interactions from positive interactions without the need for noise annotation. This includes: performing preheating training on a noisy dynamic heterogeneous graph; evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on a ranking criterion after each round of preheating training; and constructing a reliable set of interactions based on the memory history of the memory states in multiple training rounds within a sliding window. The second stage involves noise-aware training based on the reliable interaction set, including: binning the training interactions according to time snapshots and relationship types; calculating bin-level statistics within each interaction bin based on the loss distribution of positive interactions in the reliable interaction set; calibrating the loss of each training interaction and estimating the noise probability based on the bin-level statistics; and using the noise probability to perform weighted training on the training interactions to suppress the influence of noisy interactions. The fourth module is used to use the dynamic heterogeneous graph backbone model trained by noise perception to perform anomaly scoring on the interaction to be tested constructed from real-time collected operation logs or network traffic data. If the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

[0013] Compared with the prior art, the present invention achieves the following beneficial effects: 1. Eliminates reliance on noise labeling and reduces application costs: This invention utilizes the inherent memory effect of the dynamic heterogeneous graph backbone model to mine reliable interactions, eliminating the need for manual labeling of noise samples. This solves the problem of scarce and expensive noise labeling in real-world scenarios, significantly improving the practicality and deployment efficiency of the method.

[0014] 2. Overcomes scale bias across types and time series, and improves denoising fairness: By introducing a type-time bucket-aware loss calibration mechanism, the problem of incomparable loss scale caused by semantic differences in relations and time series drift is solved, and the systematic misjudgment of difficult samples (such as specific types or recent snapshots) by traditional global thresholds is avoided.

[0015] 3. Suppress noise propagation and enhance robustness in dynamic heterogeneous environments: By jointly considering temporal context and relation type for bucket statistics and weighted training, it can effectively handle non-independent and identically distributed context-dependent noise, block the cascading propagation of noise in temporal message passing and cross-type dependencies, and keep the denoising decision stable in dynamic heterogeneous graphs.

[0016] 4. It possesses model independence and generality, and strong adaptability: The proposed weighted denoising mechanism is a lightweight wrapper that can be integrated into various existing dynamic heterogeneous graph backbone models with minimal modifications, without the need to redesign the backbone model architecture, and has good generality and transferability.

[0017] 5. Effectively improve the performance of downstream tasks in noisy environments and directly support anomaly detection applications: Experiments on real-world datasets show that this method can continuously improve the performance of downstream tasks such as link prediction under controllable structural noise conditions, achieving robust dynamic heterogeneous graph representation learning. On this basis, the model trained with noise perception is used to score the anomalies of the interactions to be tested constructed from real-time collected operation logs or network traffic data. This can effectively generate system anomaly warning signals in the anomaly detection scenario of the target system, improving the accuracy of security monitoring. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.

[0019] Figure 1 This is a flowchart of the steps of a memory-guided dynamic heterogeneous graph representation learning denoising method in one embodiment of the present invention; Figure 2 The trust factor in the Aminer dataset used in the experiments of this invention A schematic diagram of the sensitivity analysis, in which, Figure 2 (a) represents the noise ratio Hits@1 index varies with trust factor A schematic diagram of sensitivity analysis of changes. Figure 2 (b) represents the noise ratio The MRR index varies with the trust factor. A schematic diagram of sensitivity analysis of changes. Figure 2 (c) represents the noise ratio Hits@1 index varies with trust factor A schematic diagram of sensitivity analysis of changes. Figure 2 (d) represents the noise ratio The MRR index varies with the trust factor. A schematic diagram of sensitivity analysis of changes; HGT, HLinear, and HTGNN are three representative backbone models of dynamic heterogeneous graphs. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0021] In real-world computer / network systems, dynamic heterogeneous graphs constructed from logs, as complex data structures, suffer from structural noise that directly impacts the accuracy of system behavior analysis. For example, this noise can severely reduce the output accuracy of machine learning models like graph neural networks in downstream tasks such as link prediction. Nodes and edges in a dynamic heterogeneous graph correspond to actual interaction records. Due to errors in the log collection process, false or erroneous structural information (i.e., structural noise) is mixed into the constructed dynamic heterogeneous graph. This noise can cause subsequent graph neural networks to misclassify non-existent interactions as existing ones when performing link prediction, thus reducing the precision and recall of the prediction results. This invention aims to address how to effectively identify and suppress structural noise in the entity interaction data structure of a target system, thereby improving the output accuracy of graph neural network architectures / models, such as improving evaluation metrics like the mean reciprocal rank (MRR) and hit rate (Hits@N) of positive edges in link prediction results.

[0022] In one embodiment, reference is made to Figure 1 As shown, a memory-guided dynamic heterogeneous graph representation learning denoising method is provided for anomaly detection in a target system, including: Obtain a time snapshot sequence of a dynamic heterogeneous graph constructed from the target system's operation logs or network traffic logs. Each time snapshot in the sequence contains typed nodes and typed relation edges. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when it is constructed from the operation logs or network traffic logs. Construct a training interaction set, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling; A memory-guided denoising approach with a two-stage loss calibration framework is employed to achieve robust representation learning on dynamic heterogeneous graphs. The first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model to mine a reliable set of interactions from positive interactions without the need for noise annotation. This includes: performing preheating training on a noisy dynamic heterogeneous graph; evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on a ranking criterion after each round of preheating training; and constructing a reliable set of interactions based on the memory history of the memory states in multiple training rounds within a sliding window. The second stage involves noise-aware training based on the reliable interaction set, including: binning the training interactions according to time snapshots and relationship types; calculating bin-level statistics within each interaction bin based on the loss distribution of positive interactions in the reliable interaction set; calibrating the loss of each training interaction and estimating the noise probability based on the bin-level statistics; and using the noise probability to perform weighted training on the training interactions to suppress the influence of noisy interactions. Using a dynamic heterogeneous graph backbone model trained with noise perception, anomaly scoring is performed on the interaction to be tested constructed from real-time collected operation logs or network traffic data. If the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

[0023] Specifically, firstly, a time snapshot sequence of the dynamic heterogeneous graph is obtained. Each time snapshot in the sequence contains nodes with a type and relation edges with a type. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when constructed from the runtime logs or network traffic logs. Nodes correspond to the behavioral entities in the logs, such as processes, user accounts, or IP addresses; relation edges correspond to the operations or communication behaviors between these entities, such as file read / write, network connections, or API calls.

[0024] To study denoising perceptual representation learning on dynamic heterogeneous graphs (DHGs), in target systems such as computers / networks, DHGs are stored in hardware memory in the form of data structures such as adjacency matrices or edge lists, and then read and computed using graph neural network models. Formally, a dynamic heterogeneous graph can be represented as a sequence of time snapshots of heterogeneous graphs. ,in, This indicates the dynamic heterogeneous graph at time (snapshot label / time step). A snapshot of time, i.e., a (static) heterogeneous graph. ; This represents the total number of time snapshots. Each time snapshot (or simply snapshot) It contains a set of typed nodes and typed relation edges. and These represent snapshots. The set of nodes and the set of edges; A collection representing node types. A set representing a relation (edge) type; This indicates a node type mapping function, which will... Each specific node in the mapping is mapped to its corresponding node in the mapping. The corresponding node type; This represents the edge type mapping function, which will... Each specific edge in the map is mapped to its corresponding edge in the map. The corresponding relation type.

[0025] Then, a training interaction set is constructed, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling.

[0026] For a certain relation type At any moment The observed set of positive interactions is defined as: ; in, Indicates at time One observed from the source node To the target node The interaction, also known as positive example interaction, in practical applications, is based on the dynamic heterogeneous graph (DHG) being constructed from actual logs. There may be structural noise in the data, meaning that some recorded interactions are spurious, while some real interactions may be missing or delayed. This prompted the present invention to introduce a denoising mechanism during the learning process.

[0027] To perform supervised learning, a training interaction set needs to be constructed, containing both positive and negative interactions (negative samples). Specifically, for each time snapshot... and each type of relationship The present invention is based on Each observation interaction in As a positive example of interaction (setting labels) ), and construct corresponding negative samples (labels) for them. The generation of negative samples follows a sampling strategy constrained by relation type: based on relation type... It connects to a specific source node type. and target node type During negative sampling, only samples belonging to the target node type are included. Randomly select nodes from the node set This constitutes a negative example interaction. This refers to negative samples, used to ensure the semantic rationality of the sampling. Sampling methods can include uniform random sampling or popularity sampling based on node degree to balance training difficulty. Each positive interaction typically corresponds to a fixed number (e.g., ...). (Number) negative samples are used to form the training interaction set for supervised learning. ; in, Indicates targeting the node Relationship types and time The negative sample set obtained through the negative sampling process. The training interaction set. This will serve as the basis for subsequent model training and denoising processes.

[0028] To more clearly illustrate the application of the method of this invention to the denoising process of dynamic heterogeneous graph learning tasks, in a typical example, the following general temporal link prediction task is used as the basis for backbone model training, but this invention does not depend on a specific link prediction implementation method.

[0029] Given a deadline (time step) The observed DHG context and a set of relation types The query edge under time series link prediction aims to estimate the time series link at time step 1. Each interaction The possibility of it happening. Formally, let Represents the supporting graph context (e.g., window size is...). Sliding window settings , or up to (All historical snapshots). A set of parameters A parameterized DHG model consists of an encoder and a decoder: ; in, Indicates by parameters The parameterized backbone encoder maps the support graph context to time intervals. Node representation (Regarding) (Nodes that participate in interaction at all times); Indicates by parameters The parameterized backbone decoder outputs in relational types. The following is a list of query node pairs Interaction score (optionally conditional at time) ) Given a set of labeled query edges. This invention uses a supervised edge-level objective function (e.g., binary cross-entropy) to train the model: ; in, Indicates at time Relationship types The loss function value is expressed here in the form of binary cross-entropy; This indicates that the sigmoid activation function maps the scores to... An interval is used to calculate probabilities; Represents the query edge set The number of samples (interactions) in the middle.

[0030] Based on the above analysis, this invention selects memory-guided denoising with a two-stage loss calibration framework to achieve robust representation learning on dynamic heterogeneous graphs.

[0031] The denoising strategy employed in this invention is model-agnostic: it does not depend on the architecture of a specific DHG backbone model. The DHG backbone model refers to a neural network (model) used for representation learning and relation scoring of nodes and interactions in a dynamic heterogeneous graph. Any DHG encoder-decoder structure that supports edge scoring (e.g., temporal graph neural networks or heterogeneous graph neural networks) can be used as a model. and The two-stage denoising process proposed in this invention is applied to the dynamic behavior (memory effect) during the training process and the training loss of each edge.

[0032] Structural noise in DHGs can severely impair the quality of graph representation learning and degrade the performance of downstream tasks. Although existing research has focused on denoising dynamic graphs and heterogeneous graphs respectively, effective denoising methods for DHGs are still in their early stages of development. To bridge this gap, this invention proposes a denoising-aware learning framework suitable for DHGs, which jointly models temporal dynamics and structural heterogeneity.

[0033] The memory-guided denoising framework employs a two-stage design: In the first stage, the memory effect of the dynamic heterogeneous graph backbone model is utilized to mine a reliable set of interactions in a self-supervised manner, serving as "clean proxy" data that requires no additional annotation. In the second stage, this invention performs noise-aware training by calibrating the interaction loss on a type-timebucket basis, transforming each interaction loss into a soft noise probability, thereby downweighting suspicious interactions during the optimization process.

[0034] Based on the estimated noise probability, this invention employs a weighted denoising strategy to suppress the influence of questionable interactions while preserving potentially informative but difficult-to-learn samples. This two-stage framework provides a fundamental method for suppressing structural noise across time snapshots and relation types, significantly improving the robustness of DHG representation learning.

[0035] The goal of the first stage is to construct a reliable (trustworthy) set of interactions without supervision, serving as a "clean proxy" for subsequent noise modeling. This invention is based on a widely validated phenomenon in deep learning: in the early stages of training, models tend to fit simple and consistent patterns first, while the memorization of noisy or inconsistent samples occurs later. Therefore, interactions that are repeatedly ranked high by the model in the early stages are more likely to be clean and structurally stable, serving as reliable signals.

[0036] (1) Perform preheating training on a noisy dynamic heterogeneous graph; after each round of preheating training, evaluate whether each positive interaction is remembered by the dynamic heterogeneous graph backbone model in this round of training based on the ranking criteria.

[0037] This invention first involves a total of [a process / operation] on a noisy DHG. The pre-training rounds use standard edge-level loss functions (e.g., binary cross-entropy). After each round of pre-training, the invention evaluates whether each positive interaction has been "memorized" by the dynamic heterogeneous graph backbone model using a ranking-based criterion. Considering time... Time snapshot, relation type A positive example interaction (edge) in the data. ,make This represents the model score generated by the encoder-decoder backbone model. For efficient node approximation... At the node The ranking of candidate targets in this invention is based on relation type. Target node centralized temporary sampling negative samples And calculate the ranking after sampling. .

[0038] For each relation type It connects a specific source node type and a target node type; the relationship type The target node set refers to the set of all nodes belonging to the target node type. During the negative sampling process, negative samples are only sampled from this set to ensure the semantic rationality of the sampling.

[0039] In one embodiment, the sampled ranking It is given by the following formula: ; in, Indicates an indicator function.

[0040] like If the positive example interaction is remembered in this round of training, then it is considered that the positive example interaction has been remembered; otherwise, if If the condition is not met, it is assumed that the positive example interaction was not memorized in this round of training.

[0041] Record the memory states that were remembered or not remembered.

[0042] This can be understood as, if a certain positive example interaction In the Appearing in the top of the above sampling rankings in the training rounds The name is considered to have been "remembered" during this round of training and recorded as follows: ; in, It is the preset Hit@k threshold. The number of negative samples is a hyperparameter. The Hit@k threshold is a commonly used evaluation metric in information retrieval, recommendation systems, and link prediction tasks, used to measure how well the model ranks correct results. The ability to measure bit loss. Compared to directly using the original loss value, this Hit@-style criterion is less sensitive to scale differences between different time snapshots and relation types, which is especially important in dynamic heterogeneous graphs.

[0043] (2) Construct a reliable interaction set based on the memory history of memory states in multiple training rounds within a sliding window.

[0044] Single round training (e.g., the current round) The memory state during training (rounds of training) may be unstable due to random optimization and negative sampling. To obtain robust estimates, this invention maintains a length of [length missing] for each positive example interaction. A sliding window records the most recent positive interactions. The memory states during rounds of training include the following historical memories: ; Subsequently, this invention defines the degree of memory as the average memory frequency within the window: ; Ultimately, when an interaction is continuously remembered within the window, that is, if If the positive example interaction is determined to be a reliable interaction, it will be included in the reliable interaction set. : ; in, This is the credibility threshold. Intuitively, the above formula selects the values ​​that have been repeatedly ranked in the top of recent training rounds. The interactions are more likely to be clean and structurally consistent within the DHG. The process is fully self-supervised, and the resulting set of reliable interactions will guide the training in the second stage.

[0045] In the second stage, noise perception training is performed based on the reliable interaction set, and the noise perception training process is implemented based on type-time bucket loss calibration.

[0046] The second phase utilizes the reliable interaction set discovered in the first phase. This invention improves training robustness by suppressing noisy interactions. In DHG, temporal drift and heterogeneous relation semantics often lead to significant differences in loss scales between different snapshots and relation types, thus the global loss threshold may be systematically biased towards certain specific buckets. To avoid such bias, this invention calibrates the loss of each interaction relative to its corresponding interaction bucket and converts the calibrated bias into soft noise probability for weighted training.

[0047] (1) First, the training interactions are bucketed according to time snapshots and relation types, including calculating the interaction loss and constructing interaction buckets.

[0048] Due to temporal drift and structural heterogeneity, interactions in dynamic heterogeneous graphs (DHGs) exhibit significantly different learning difficulties under different time snapshots and relation types. Therefore, this invention does not employ a globally unified judgment criterion, but instead divides the training interactions according to time snapshots. and relation types The training interactions are organized into fine-grained interaction buckets. Due to the time With time snapshot It is a one-to-one correspondence, and time can also be used directly. Simplified description of corresponding time snapshot .

[0049] In terms of form, for each moment Time snapshots and relationship types This invention defines the interaction bucket as: ; The following will be based on the above The bucketing mechanism serves as the basic unit of the denoising process. This invention is based on reliable interaction subsets. The loss statistic for each bucket is fitted and used to calibrate the noise probability to achieve weighted training.

[0050] For time Time snapshot, relation type Training interaction Interaction score based on the backbone model output of DHG Calculate its interaction loss: ; in, In the implementation of this invention, it is a binary cross-entropy with logits.

[0051] According to the definition of interaction buckets, training interactions are organized into interaction buckets based on relation type and snapshot labels (moments / time steps). .

[0052] (2) Calculate bucket-level statistics based on the loss distribution of positive interactions in the reliable interaction set within each interaction bucket, that is, perform bucket-level loss statistics on reliable interactions.

[0053] For each interaction bucket ,Pick With reliable interaction sets The intersection of the two makes Get the interaction bucket A reliable subset of interactions within.

[0054] In the second phase, The loss in the interaction bucket provides a reference for other interactions within the same bucket, used for calibration. Therefore, it is necessary to calculate the interaction bucket. Bucket-level statistics, including Mean loss of reliable interaction and standard deviation .

[0055] Given This invention summarizes the loss distribution using empirical mean and standard deviation, and calculates the following loss mean values ​​respectively. and standard deviation : ; To avoid estimation degradation when losses are highly concentrated, this invention uses standard deviation. Apply a lower bound: ; Sparse bucket handling: Some buckets may contain too few reliable interactions to support stable estimations.

[0056] when Number of reliable interactions In this case, the present invention recalculates using a pooling estimate from a larger set of reliable interactions. and In practice, this invention merges reliable interactions with the same relationship type or the same snapshot, and falls back to the global pool if necessary. This step stabilizes the calibration process while keeping the backbone model and training objective unchanged. Specific pooling estimation strategies include at least one of the following: Proximity pooling: Snapshots of the current time reliable interaction subset , and adjacent time snapshots and The same relationship type reliable interaction subset and Merge the subset of reliable interactions after merging. Recalculate the mean and standard deviation of the loss; Relational semantic pooling: Using the current time snapshot reliable interaction subset , and time snapshot The middle has relation type Most similar relationship type reliable interaction subset Merge the subset of reliable interactions after merging. Recalculate the mean and standard deviation; repeat the above process until the number of reliable interactions in the gradually expanded, merged reliable interaction subset reaches a preset threshold. At that time, the mean and standard deviation of the loss are recalculated based on the finally merged reliable interaction subset; the one with relation type Most similar relationship type The similarity is determined by calculating the similarity matrix of relation types, which is pre-constructed based on the semantic co-occurrence frequency of relation types or expert knowledge. If the above pooling estimation strategy still cannot reliably obtain the number of interactions to reach the preset threshold If the reliable interaction subset is not found, then the global mean and standard deviation of the loss of all reliable interaction sets C are used as the pooling estimation result.

[0057] (3) The loss of each training interaction is calibrated and the probability of noise is estimated based on the bucket statistics.

[0058] Given and interactive bucket Interaction loss in First, calculate the upper confidence boundary: ; in, It is a trust factor, which follows a standard normal distribution. -Quantiles. Next, the definition revolves around... Symmetrical transition zone: ; in, These are the transition band control coefficients, used to control the width of the transition band. Noise probability is obtained through piecewise linear mapping:

[0059] This mapping ensures that interactions with interaction loss consistent with the bucket reference have a lower probability of noise, while gradually increasing the probability of noise for high-loss outliers, rather than making a hard decision. Therefore, each training interaction is within its own... It is evaluated within the context, avoiding loss comparisons across buckets.

[0060] (4) The training interaction is weighted using the noise probability to suppress the influence of noise interaction.

[0061] The aforementioned loss calibration and noise probability estimation process is automatically performed by the system's calculation program using pre-stored bucket statistics in a piecewise linear manner. The resulting weight coefficients are directly used in the backpropagation of the next training round. Without changing the graph neural network structure, this reduces the contribution of noise interaction to gradient updates, thereby reducing the risk of overfitting the model to false structural information such as false reports or collection errors in the logs, and improving the accuracy in real system behavior analysis tasks.

[0062] Suspicious interactions are downweighted using noise-aware weights, specifically based on the likelihood of noise. Calculate each training interaction Weights: ; in, The preset minimum weight; Minimize the following weighted loss function: ; in, This represents the set of labeled training interactions used in the current training round. express The number of training interactions. In the implementation, weights are only applied to the available training interactions for bucket statistics, while negative samples dynamically sampled for efficient sorting training remain unweighted to ensure stability.

[0063] In one embodiment, because the loss scale may drift during training, the same reliable set of interactions is used at fixed epoch intervals during the second phase of training. Based on the updated model predictions, re-estimate the bucket-level statistics for each interaction bucket. This keeps the calibration synchronized with the current training dynamics. This allows the calibration to remain aligned with the current training dynamics while preserving the unsupervised nature of a reliable mining process.

[0064] make This represents the total number of labeled training interactions across all snapshots and relation types. This represents the number of positive interactions. Let... This indicates the total number of interaction buckets. This represents the computational cost (including forward and backward propagation) of each training round of the selected Dynamic Heterogeneous Graph (DHG) backbone model.

[0065] The framework of this invention is model-independent and does not change the dominant training cost per round of the selected DHG backbone model, i.e. The additional overhead in the first phase mainly comes from the memory tracking process. For each positive interaction, this invention samples... To estimate the sampling rank of each positive edge, we use negative target nodes, thereby introducing additional [items / methods] to each positive edge. This is a secondary scoring operation. Therefore, in In the first phase of the warm-up training, the total time spent is: (In practice, this is a decoder-level scoring operation.)

[0066] In the second phase, the time complexity of estimating the loss statistics (mean and standard deviation) for each bucket from the reliable interaction set is O(n). During training, the probability of noise is calculated for each training interaction in each round. Applying noise-aware weights requires a linear traversal, resulting in a time complexity of O(n log n). .

[0067] Storing the memory history requires maintaining a length of [length missing] for each positive interaction. The sliding window requires space of If a moving average is used instead of a complete history, the space requirement is reduced to The second phase requires maintaining statistics for each bucket. ,common There are 1 bucket, therefore the space complexity is O(n). Noise-aware weights can be calculated in real time during training, eliminating the need for persistent storage of all interactions and thus not significantly increasing memory overhead. In one embodiment, after completing the noise perception training, the method further includes an online application step for anomaly detection of the target system: real-time collection of the operation logs or network traffic data generated by the target system, and parsing them into the current interaction to be tested; the interaction to be tested is represented in the form of node pairs and relation types, and follows the same data structure as the positive example interactions in the training phase.

[0068] Specifically, real-time log data is mapped to a triple. ,in, As the source node, For the target node, For relation types, mapping is performed by querying the node set and relation type set of the constructed dynamic heterogeneous graph.

[0069] Then, using the dynamic heterogeneous graph backbone model trained with noise perception, the interaction to be tested is forward-propagated, and the interaction score is calculated as the anomaly score.

[0070] The anomaly score is compared with a preset threshold. If the anomaly score exceeds the preset threshold, a system anomaly warning signal is generated for the target system.

[0071] In addition, to evaluate the effectiveness of the proposed method, experiments were conducted on three real-world dynamic heterogeneous graph datasets.

[0072] (a) Selecting a dataset The selected real-world dynamic heterogeneous graph datasets mainly include Aminer, DBLP, and ml100k.

[0073] EComm: A dynamic heterogeneous graph dataset from a real e-commerce platform. This dataset primarily records user shopping behavior across 11 daily snapshots, containing two types of nodes (users and products) and four edge types (click, purchase, add to cart, and add to favorites). This invention employs a sliding window setup, and the model utilizes time intervals. Intra-interaction as context for time prediction Interaction.

[0074] Aminer: An academic citation network dataset where nodes include authors, papers, and conference venues. Edges represent collaborations and citation relationships, evolving over time. This invention employs a time-order partitioning strategy, utilizing historical publication records to predict future co-authorship relationships or paper-conference publication links.

[0075] MovieLens-100k (ML100k): A classic movie recommendation dataset. To adapt to dynamic graph learning tasks, this invention discretizes continuous timestamps into 16 time intervals to simulate temporal evolution. Similar to EComm, this invention also employs a sliding window strategy for training and evaluation.

[0076] (ii) Selecting the main model To verify the model-agnostic nature of the proposed memory-guided denoising framework, this invention applies the framework to various backbone encoder / decoder architectures for dynamic heterogeneous graphs, covering unstructured baseline models, homogeneous graph neural networks (GNNs), relation-aware heterogeneous graph neural networks, and temporally heterogeneous dynamic heterogeneous graph backbone models. The main backbone models include the following: HLinear is an unstructured baseline model that maps node features to embeddings through linear layers and uses an MLP decoder for edge scoring. This model ignores message passing mechanisms and temporal context information.

[0077] GCN supports the application of homogeneous neighborhood aggregation on graphs. When used for heterogeneous data, this invention follows common practice, treating all observed relationships as a set of unlabeled edges after type-specific feature projection.

[0078] GraphSAGE (SAGE) is another typical homogeneous GNN backbone model. It aggregates sampled neighbors through learnable pooling / meaning operations, has strong expressive power and good scalability, and serves as a strong baseline model.

[0079] RGCN explicitly models relation types and achieves relation-aware message passing through relation-specific transformations, making it suitable for heterogeneous graph scenarios.

[0080] HGT is a heterogeneous graph neural network based on Transformer, which introduces type-dependent attention mechanisms and transformation operations, enabling it to capture and support complex cross-type interactions in graphs.

[0081] HGT+ introduces relative temporal encoding (RTE) on the basis of HGT to more effectively integrate timing information in dynamic environments.

[0082] HTGNN is a time-series heterogeneous graph backbone model that directly processes snapshot windows (i.e., a series of support graph sequences) and performs time-aware message passing within the window, representing a stronger time-series modeling capability.

[0083] All backbone models use the same link decoding interface and are trained under the same data partitioning and optimization settings, thereby ensuring a controlled comparison between different heterogeneous and temporal modeling options, and comparing the effects of "Original" and "Denoising" training.

[0084] (III) Experimental Setup Configure noise injection settings: Since real-world graphs typically lack accurate labels for noisy edges, this invention evaluates the robustness of the model by injecting synthetic labeled noise into the training set, while keeping the validation and test sets clean.

[0085] The link prediction task of this invention is built on a bipartite graph user-item structure, with the node sets being respectively and The candidate edge space is defined as follows: ,remember The set of training edges for observed positive examples.

[0086] False positives (0→1): This invention starts from... Uniformly sample a certain number of non-existent edges and assign them positive labels. Given a contamination ratio... The number of false positive edges injected is: ; Optional False Negatives (1→0): This invention further considers the false negative situation by randomly flipping a portion of the positive training labels to zero. Specifically, let... The proportion of positive samples that are flipped.

[0087] Note: When At this point, the protocol degenerates into contamination containing only false positives, simulating spurious interactions introduced during data acquisition or preprocessing. This can be achieved by scaling the protocol. Pollution levels can be controlled under conditions of graph sparsity.

[0088] Configure the training protocol: For each backbone model This invention compares two training settings under the same data partitioning and optimized hyperparameters: Original training: The backbone model is trained directly on the noisy training interaction, using the standard edge-level binary cross-entropy loss function. For the training interaction... Its model score The corresponding loss is: ; The overall training objective is: ; Ours (Denoised) (Denoising Training of the Invention): The same backbone model is used in the memory-based denoising framework of the invention. Specifically, the invention first uses Perform on the model A warm-up training session was conducted. Subsequently, the memory effect was used to mine reliable interaction sets. And estimate the noise probability of each interaction based on the loss pattern of the type-time bucket. During the noise perception training phase, suspicious interactions are downweighted, and the following objectives are optimized: ; in, This is a small constant used to ensure training stability. An alternative approach could be a truncation strategy, which retains only those parameters that satisfy the condition. Training is conducted on the sides.

[0089] Design evaluation metrics: This invention evaluates the performance of all methods on a temporal link prediction task. For each positive test edge... The present invention samples negative edge And based on the predicted score, Sort the candidate edges.

[0090] make This indicates the rank of the positive edge (lower is better). The following two commonly used performance metrics were used in the experiment: ; ; in, It is the mean reciprocal rank. (Similar to the Hit@k threshold) is a commonly used evaluation metric in information retrieval, recommender systems, and link prediction tasks. Take a positive integer, for example Or 50.

[0091] All experiments were repeated five times using different random seeds; this invention reports average performance (with standard deviation if necessary).

[0092] (iv) Experimental results and analysis Table 1. Comparison of the performance of different backbone networks on the Econm dataset under the influence of noise.

[0093] Table 1 shows the results under different backbone networks and different noise ratios. Performance comparison on the Econm dataset. Performance metrics used include Hits@10, Hits@50, and MRR (higher values ​​indicate better performance). For each backbone model and noise ratio, the results comparing the superior performance of the original training method and the denoising training method of this invention show that the method of this invention achieves significant performance improvements under most backbone network and noise conditions, especially when noise interference is severe. Taking GCN as an example, when the noise ratio reaches 60% (i.e., 0.6), the Hits@10 of the original method is only 2.49, while the method of this invention improves to 20.72, an increase of more than 8 times; Hits@50 increases from 10.70 to 48.13, and MRR increases from 1.38 to 9.22. On RGCN, the method of this invention also outperforms the original method across all noise levels. For example, at a noise level of 60%, Hits@10 increases from 37.19 to 50.14, Hits@50 increases from 62.01 to 70.05, and MRR increases from 19.25 to 28.17, fully verifying the robustness of the method of this invention in high-noise environments.

[0094] Furthermore, for backbone networks such as HLINear, HGT, and HGTP+, the method of this invention outperforms the original method at almost all noise levels. For example, on HGT, with 40% noise, Hits@10 increases from 13.42 to 16.49, and MRR increases from 6.31 to 7.66; with 60% noise, Hits@10 increases significantly from 12.90 to 17.68, Hits@50 increases from 35.60 to 40.19, and MRR increases from 5.64 to 7.79. On HGTP+, with 20% noise, Hits@10 jumps from 10.31 to 19.41, Hits@50 increases from 26.64 to 42.23, and MRR increases from 4.81 to 8.67, with improvements exceeding 50%.

[0095] Table 2. Comparison of the performance of different backbone networks on the ML100K dataset under the influence of noise.

[0096] Table 2 presents the experimental results on the ML100K dataset. Due to the small size and discretized timestamps of this dataset, the absolute performance metrics are relatively low. However, the method of this invention still achieves consistent and moderate performance improvements on most backbone models. In particular, for temporal backbone HTGNNs, the method of this invention shows significant improvement under high noise conditions, demonstrating that it can effectively supplement the temporal aggregation process and prevent the window encoder from repeatedly aggregating noisy edges.

[0097] Table 3. Comparison of the performance of different backbone networks on the Aminer dataset under the influence of noise.

[0098] Table 3 presents the experimental results on the Aminer dataset. On this dataset, the performance improvement of the proposed method exhibits a certain backbone dependency: for most message-passing backbone models (such as RGCN and SAGE), the proposed method brings stable gains; while for some backbones (such as HLinear and HGT+), the performance fluctuates slightly under certain configurations. This phenomenon indicates that the memory-based loss calibration strategy is sensitive to the candidate edge ranking space and the organization of heterogeneous relationships.

[0099] Tables 1-3 summarize the different noise ratios on the three benchmark datasets. The temporal link prediction performance was improved. Overall, applying MGD to various backbone models significantly improved robustness on the EComm and ML100K datasets, and this gain remained consistent across most backbone model architectures and noise levels; while the performance improvement on the Aminer dataset showed a stronger backbone dependency. These results support the following two conclusions: (i) the loss dynamics contain actionable signals that can be used to identify suspicious interactions; and (ii) the framework of this invention can be seamlessly integrated into heterogeneous / dynamic graph backbone models with minimal assumptions.

[0100] (1) Experiments on the EComm dataset show that the memory-guided dynamic heterogeneous graph representation learning denoising method of the present invention has significantly improved robustness, especially in high-noise environments.

[0101] On the EComm dataset, as the noise ratio... With the increase of [unspecified factor], the performance of most backbone models under original training significantly decreased, reflecting their high vulnerability to injected false positive interactions. In contrast, after adopting the MGD method, the performance of MRR and [unspecified factor] improved. All indicators have achieved steady improvement, and the advantages are even more significant under high noise conditions.

[0102] A typical example is GCN: when When the ratio is increased to 40% or 60%, the original training almost collapses (e.g., MRR drops to 1.36–1.38), while MGD still maintains strong performance (MRR of 8.93–9.22), and Hits@50 also improves significantly (from 10.81 to 51.40). This indicates that denoising mechanisms are particularly beneficial for backbone models whose message passing processes are susceptible to interference from spurious edges.

[0103] For relation-aware backbone models, MGD also brings significant improvements. For example, RGCN in The MRR increased from 31.89 to 34.18. The time increased from 19.25 to 28.17, indicating that even with explicit modeling of relational semantics, the loss calibration weighting strategy of this invention can still effectively suppress noisy interactions. Similarly, HGT / HGT+ continues to benefit at medium to high noise levels (e.g., HGT at...). The MRR improved from 5.64 to 7.79, while HGT+... At %, significant improvements were observed in both MRR and Hits metrics.

[0104] The present invention also observed that the gain of HLinear is relatively small, which is reasonable because the model does not propagate noise through graph structures; this phenomenon further indicates that the main advantage of MGD lies in mitigating the problem of structural noise amplification during message passing.

[0105] (2) Experiments on the ML100K dataset show that the memory-guided dynamic heterogeneous graph representation learning denoising method of the present invention achieves consistent but moderate improvement across backbone models.

[0106] On the ML100K dataset, due to the small data size and the discretization of timestamps into time buckets, the absolute performance metrics are relatively low, but MGD still achieves stable gains on most backbone models. For example, HGT / HGT+ generally improves MRR and Hits@50 across all noise levels. Notably, the temporal backbone HTGNN shows significant performance enhancement in noisy environments (e.g., in...). The MRR increased from 0.78 to 1.89. The improvement from 0.72 to 1.48 indicates that the denoising mechanism effectively supplements the temporal aggregation process by preventing the window encoder from repeatedly aggregating false edges.

[0107] Overall, the results of ML100K verify that the proposed loss calibration reweighting strategy remains effective in scenarios with timestamp discretization and sparse interactions.

[0108] (3) On the Aminer dataset, the backbone dependency behavior and analysis of the memory-guided dynamic heterogeneous graph representation learning denoising method of the present invention.

[0109] On the Aminer dataset, this invention observes a more complex trend: MGD brings performance improvements to most message-passing backbone models (such as RGCN and SAGE), but in certain configurations (especially HLINear / HGT / HGT+), it leads to performance degradation. This backbone-dependent behavior suggests that memory-based reliable interaction mining criteria and subsequent loss calibration processes may be sensitive to: (i) the candidate edge ranking space induced by a specific backbone / decoder; and (ii) the organization and query mechanism of heterogeneous relationships in temporal link prediction protocols.

[0110] In summary, the memory-guided dynamic heterogeneous graph representation learning denoising method of this invention demonstrates the effectiveness of jointly modeling temporal dynamics and heterogeneity for denoising tasks on multiple datasets and evaluation metrics, as well as the improved performance of Aminer and DBLP using HGT / HGT+.

[0111] (4) Parameter analysis and experimental results analysis confidence boundary Parameter analysis: This invention investigates the confidence boundary parameters used in bucket-level loss calibration using a noise-perceived weighting mechanism. Sensitivity. A review shows that for each type—time bucket… This invention uses the mean and standard deviation Summarize the statistics of reliable loss and define an upper bound. Intuitively, the larger ones This corresponds to a more conservative "clean loss" region (i.e., fewer interactions are considered high loss outliers), while smaller... This leads to a more aggressive weighting strategy. Therefore, this invention evaluates... These values ​​roughly cover the commonly used one-sided confidence levels in Gaussian distributions. Other hyperparameters remained constant, and the reporting metrics were Hits@1 and MRR. Specific experimental results are as follows: Figure 2 As shown. Figure 2 The trust factor is given on the Aminer dataset. Sensitivity analysis was conducted, summarizing the results on the Aminer dataset regarding the ratio of two types of noise ( and The results also include experimental results for three representative backbone models (HGT, HLINear, and HTGNN). Figure 2 (a) represents the noise ratio Hits@1 index varies with trust factor A schematic diagram of sensitivity analysis of changes. Figure 2 (b) represents the noise ratio The MRR index varies with the trust factor. A schematic diagram of sensitivity analysis of changes. Figure 2 (c) represents the noise ratio Hits@1 index varies with trust factor A schematic diagram of sensitivity analysis of changes. Figure 2 (d) represents the noise ratio The MRR index varies with the trust factor. A schematic diagram of sensitivity analysis for changes.

[0112] Overall, with As the confidence margin increases, performance consistently improves, indicating that moderately increasing the confidence margin helps suppress false positive noise while preserving informative interactions. For HGT, when... As the value increased from 1.64 to higher values, both Hits@1 and MRR continued to improve, and even at the maximum test value... It reaches its optimal performance. HTGNN exhibits a similar trend: its performance decreases as... The monotonically increasing trend indicates that the backbone model benefits from a more stringent high-loss anomaly identification criterion. For HLINear, for The sensitivity is more pronounced in: At that time, Hits@1 and MRR followed It improved significantly from 1.64 to 2.58, but when The time performance decreased slightly; At that time, the optimal result still appears in a larger... However, the gain tends to saturate, indicating that the performance improvement is limited when the boundary is sufficiently conservative.

[0113] These observations and Their role is consistent within calibration rules. When After a short time, the upper realm If the soft likelihood function becomes too tight, it will distribute non-trivial noise probabilities to a large number of interactions, potentially unintentionally overweighting samples within some buckets that should be considered "hard but clean." Increasing the soft likelihood function... The confidence region of the bucket-level loss is expanded, making the weighting mechanism more moderate and thus improving optimization stability, especially under high noise levels. Based on the overall trend across the backbone and noise ratio, this invention adopts [a specific approach] in the main experiments. As the default setting.

[0114] Control coefficients for the transition zone and reliable interaction set threshold Parameter analysis: This invention analyzes two hyperparameters that directly affect the behavior of the bucket calibration weighting mechanism. The first is the transition band control coefficient. (Implemented as delta_factor), it determines the width of the soft transition region in the piecewise noise likelihood mapping. A larger delta_factor results in a wider delta_factor. This creates a wider transition zone, making The change is smoother near the barrel-level loss boundary; while the smaller This makes the weighting process steeper. The second is the reliable interaction set threshold. (Implemented as thresholded), used in the first phase to build a reliable set of interactions. This threshold controls the trade-off between purity and coverage: a more lenient threshold yields greater purity. (Higher coverage may result in lower purity), while a stricter threshold, while improving purity, may lead to unstable bucket-level statistical estimations in the sparse bucket and second stages. This invention was reported in Aminer. Hits@10 and MRR for the three representative backbone models (HGT, HLinear, and HTGNN), with the other settings remaining unchanged.

[0115] In another embodiment of the present invention, a memory-guided dynamic heterogeneous graph representation learning denoising device is provided for anomaly detection in a target system. The device includes: The first module is used to obtain a time snapshot sequence of a dynamic heterogeneous graph constructed from the target system's operation logs or network traffic logs. Each time snapshot in the sequence contains a node with a type and a relation edge with a type. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when it is constructed from the operation logs or network traffic logs. The second module is used to construct a training interaction set, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling; The third module is used to achieve robust representation learning on dynamic heterogeneous graphs by employing memory-guided denoising with a two-stage loss calibration framework. The first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model to mine a reliable set of interactions from positive interactions without the need for noise annotation. This includes: performing preheating training on a noisy dynamic heterogeneous graph; evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on a ranking criterion after each round of preheating training; and constructing a reliable set of interactions based on the memory history of the memory states in multiple training rounds within a sliding window. The second stage involves noise-aware training based on the reliable interaction set, including: binning the training interactions according to time snapshots and relationship types; calculating bin-level statistics within each interaction bin based on the loss distribution of positive interactions in the reliable interaction set; calibrating the loss of each training interaction and estimating the noise probability based on the bin-level statistics; and using the noise probability to perform weighted training on the training interactions to suppress the influence of noisy interactions. The fourth module is used to use the dynamic heterogeneous graph backbone model trained by noise perception to perform anomaly scoring on the interaction to be tested constructed from real-time collected operation logs or network traffic data. If the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

[0116] In one embodiment, the third module is constructed as a model-independent weighted denoising encapsulator for encapsulating into different dynamic heterogeneous graph backbone models; the dynamic heterogeneous graph backbone model adopts a dynamic heterogeneous graph encoder-decoder structure that supports edge scoring, and includes at least a temporal graph neural network or a heterogeneous graph neural network.

[0117] In summary, the beneficial effects achieved by this invention through the design of a memory-guided dynamic heterogeneous graph representation learning denoising method and apparatus are as follows: 1. Eliminates reliance on noise labeling and reduces application costs: This invention utilizes the inherent memory effect of the dynamic heterogeneous graph backbone model to mine reliable interactions, eliminating the need for manual labeling of noise samples. This solves the problem of scarce and expensive noise labeling in real-world scenarios, significantly improving the practicality and deployment efficiency of the method.

[0118] 2. Overcomes scale bias across types and time series, and improves denoising fairness: By introducing a type-time bucket-aware loss calibration mechanism, the problem of incomparable loss scale caused by semantic differences in relations and time series drift is solved, and the systematic misjudgment of difficult samples (such as specific types or recent snapshots) by traditional global thresholds is avoided.

[0119] 3. Suppress noise propagation and enhance robustness in dynamic heterogeneous environments: By jointly considering temporal context and relation type for bucket statistics and weighted training, it can effectively handle non-independent and identically distributed context-dependent noise, block the cascading propagation of noise in temporal message passing and cross-type dependencies, and keep the denoising decision stable in dynamic heterogeneous graphs.

[0120] 4. It possesses model independence and generality, and strong adaptability: The proposed weighted denoising mechanism is a lightweight wrapper that can be integrated into various existing dynamic heterogeneous graph backbone models with minimal modifications, without the need to redesign the backbone model architecture, and has good generality and transferability.

[0121] 5. Effectively improve the performance of downstream tasks in noisy environments and directly support anomaly detection applications: Experiments on real-world datasets show that this method can continuously improve the performance of downstream tasks such as link prediction under controllable structural noise conditions, achieving robust dynamic heterogeneous graph representation learning. On this basis, the model trained with noise perception is used to score the anomalies of the interactions to be tested constructed from real-time collected operation logs or network traffic data. This can effectively generate system anomaly warning signals in the anomaly detection scenario of the target system, improving the accuracy of security monitoring.

[0122] In one embodiment, the present invention provides a computer device, which may be a server, comprising a processor, a memory, a network interface, and a database connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database stores memory-guided dynamic heterogeneous graph representation learning denoising data. The network interface communicates with external terminals via a network connection. When the computer program is executed by the processor, it implements the memory-guided dynamic heterogeneous graph representation learning denoising method.

[0123] Those skilled in the art will understand that the description of the device technical features in the above embodiments does not constitute a limitation on all devices to which the present invention is applied. Specific devices may include more or fewer components, or combinations of certain components, or different component arrangements.

[0124] In another embodiment, a storage medium is provided on which a computer program is stored, which, when executed by a processor, implements the steps of the aforementioned memory-guided dynamic heterogeneous graph representation learning denoising method.

[0125] Those skilled in the art will understand that all or part of the processes of the methods described in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

[0126] Matters not covered in this invention are common knowledge.

[0127] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0128] The embodiments described above are merely examples of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention.

Claims

1. A memory-guided dynamic heterogeneous graph representation learning denoising method, applied to anomaly detection in a target system, characterized in that, The method includes: Obtain a time snapshot sequence of a dynamic heterogeneous graph constructed from the target system's operation logs or network traffic logs. Each time snapshot in the sequence contains typed nodes and typed relation edges. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when it is constructed from the operation logs or network traffic logs. Construct a training interaction set, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling; A memory-guided denoising approach with a two-stage loss calibration framework is employed to achieve robust representation learning on dynamic heterogeneous graphs. The first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model to mine a reliable set of interactions from positive interactions without the need for noise annotation. This includes: performing preheating training on a noisy dynamic heterogeneous graph; evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on a ranking criterion after each round of preheating training; and constructing a reliable set of interactions based on the memory history of the memory states in multiple training rounds within a sliding window. The second stage involves noise-aware training based on the reliable interaction set, including: binning the training interactions according to time snapshots and relationship types; calculating bin-level statistics within each interaction bin based on the loss distribution of positive interactions in the reliable interaction set; calibrating the loss of each training interaction and estimating the noise probability based on the bin-level statistics; and using the noise probability to perform weighted training on the training interactions to suppress the influence of noisy interactions. Using a dynamic heterogeneous graph backbone model trained with noise perception, anomaly scoring is performed on the interaction to be tested constructed from real-time collected operation logs or network traffic data. If the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

2. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 1, characterized in that, The step of evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on the ranking criteria after each round of pre-training includes: For each positive example interaction After each round of pre-training, temporary sampling is performed from the target node set of the corresponding relationship type. For each negative sample, calculate the interaction with the positive examples. Rank among temporarily sampled negative samples ,in, These are parameters used to parameterize dynamic heterogeneous graphs. The preset Hit@k threshold; like If the positive example interaction is remembered in this round of training, then it is considered that the positive example interaction has been remembered in this round of training; otherwise, it is considered that the positive example interaction has not been remembered in this round of training. Record the memory states that were remembered or not remembered.

3. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 2, characterized in that, The construction of a reliable interaction set based on the memory history of memory states in multiple training rounds within a sliding window includes: The length of each positive example interaction is maintained as follows A sliding window records the most recent positive interactions. Memory states during rounds of training; calculating the average memory frequency within a window. ;like If the positive example interaction is determined to be a reliable interaction, it will be included in the reliable interaction set. ,in, As a credibility threshold, This indicates the current training round.

4. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 1, characterized in that, The bucketing of training interactions based on time snapshots and relationship types includes: dividing each time snapshot... and each type of relationship The training interaction is organized into interaction buckets. , among which, time , , The total number of time snapshots. It is a set of relation types.

5. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 4, characterized in that, The calculation of bucket-level statistics based on the loss distribution of reliable interaction subsets within each interaction bucket includes: For each interaction bucket ,Pick With reliable interaction sets The intersection of these elements yields a reliable interaction subset. ; Computational interaction bucket Bucket-level statistics, including Mean loss of reliable interaction and standard deviation ; when Less than the preset threshold At that time, pooling estimation is used to recalculate the mean loss. and standard deviation ,in express The number of reliable interactions in the process; The pooling estimation includes at least one of the following strategies: Proximity pooling: Snapshots of the current time reliable interaction subset , and adjacent time snapshots and The same relationship type reliable interaction subset and Merge the subset of reliable interactions after merging. Recalculate the mean and standard deviation of the loss; Relational semantic pooling: Using the current time snapshot reliable interaction subset , and time snapshot The middle has relation type Most similar relationship type reliable interaction subset Merge the subset of reliable interactions after merging. Recalculate the mean and standard deviation; repeat the above process until the number of reliable interactions in the gradually expanded, merged reliable interaction subset reaches a preset threshold. At that time, the mean and standard deviation of the loss are recalculated based on the finally merged reliable interaction subset; the one with relation type Most similar relationship type The similarity is determined by calculating the similarity matrix of relation types, which is pre-constructed based on the semantic co-occurrence frequency of relation types or expert knowledge. If the above pooling estimation strategy still cannot reliably obtain the number of interactions to reach the preset threshold If the reliable interaction subset is not found, then the global mean and standard deviation of the loss of all reliable interaction sets C are used as the pooling estimation result.

6. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 5, characterized in that, The loss for each training interaction is calibrated and its noise probability is estimated based on the bucket-level statistics, including: Based on the average loss and standard deviation upper confidence bounds And set a symmetrical transition zone ,in, The trust factor is taken from a standard normal distribution. -Quantities; This is the control coefficient for the transition zone; The training interaction is achieved through a piecewise linear function. Interaction loss Mapping to noise probability : like ,but ; like ,but ; otherwise, ; The training interaction Interaction loss It is given by the following formula: ; in, The binary cross-entropy with logits; It is the interaction score output by the dynamic heterogeneous graph backbone model; It is a training interaction Whether it is a positive example interaction tag, express It is a positive example interaction. express It is a negative sample.

7. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 6, characterized in that, Weighted training of the training interactions using the noise probabilities includes: Based on noise probability Calculate each training interaction weight ,in, The preset minimum weight; Minimize the weighted loss function ,in, This is the set of training interactions used in the current training round. express The number of training interactions.

8. The memory-guided dynamic heterogeneous graph representation learning denoising method according to claim 1, characterized in that, During the second phase of training, the same reliable set of interactions is used at fixed epoch intervals. Based on the updated model predictions, the bucket-level statistics for each interaction bucket are re-estimated to keep the calibration in sync with the current training dynamics.

9. A memory-guided dynamic heterogeneous graph representation learning denoising device, applied to anomaly detection in a target system, characterized in that, The device includes: The first module is used to obtain a time snapshot sequence of a dynamic heterogeneous graph constructed from the target system's operation logs or network traffic logs. Each time snapshot in the sequence contains a node with a type and a relation edge with a type. Each relation edge interacts as a positive example and is associated with a relation type. The dynamic heterogeneous graph contains structural noise when it is constructed from the operation logs or network traffic logs. The second module is used to construct a training interaction set, wherein the training interactions include the positive example interactions and negative samples generated through negative sampling; The third module is used to achieve robust representation learning on dynamic heterogeneous graphs by employing memory-guided denoising with a two-stage loss calibration framework. The first stage utilizes the memory effect of the dynamic heterogeneous graph backbone model to mine a reliable set of interactions from positive interactions without the need for noise annotation. This includes: performing preheating training on a noisy dynamic heterogeneous graph; evaluating whether each positive interaction was remembered by the dynamic heterogeneous graph backbone model in the current training round based on a ranking criterion after each round of preheating training; and constructing a reliable set of interactions based on the memory history of the memory states in multiple training rounds within a sliding window. The second stage involves noise-aware training based on the reliable interaction set, including: binning the training interactions according to time snapshots and relationship types; calculating bin-level statistics within each interaction bin based on the loss distribution of positive interactions in the reliable interaction set; calibrating the loss of each training interaction and estimating the noise probability based on the bin-level statistics; and using the noise probability to perform weighted training on the training interactions to suppress the influence of noisy interactions. The fourth module is used to use the dynamic heterogeneous graph backbone model trained by noise perception to perform anomaly scoring on the interaction to be tested constructed from real-time collected operation logs or network traffic data. If the anomaly score exceeds a preset threshold, a system anomaly warning signal is generated.

10. The memory-guided dynamic heterogeneous graph representation learning denoising device according to claim 9, characterized in that, The third module is constructed as a model-independent weighted denoising encapsulator, which is used to encapsulate different dynamic heterogeneous graph backbone models. The dynamic heterogeneous graph backbone model adopts a dynamic heterogeneous graph encoder-decoder structure that supports edge scoring, and includes at least a temporal graph neural network or a heterogeneous graph neural network.