Method for identifying few-shot fraud types of spectrum adaptation hints and related apparatus

By constructing high- and low-frequency feature sample sets and filters through a spectrum adaptive prompting method, and combining node-level spectrum weight coefficients and frequency band perception comparative learning, the problems of spectrum adaptation and few-sample transfer in fraud detection of graph neural networks are solved, thereby improving the accuracy and generalization ability of fraud type identification.

CN122241485APending Publication Date: 2026-06-19SOUTHWESTERN UNIV OF FINANCE & ECONOMICS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTHWESTERN UNIV OF FINANCE & ECONOMICS
Filing Date
2026-05-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing fraud detection methods based on graph neural networks suffer from insufficient spectrum adaptation capabilities, difficulties in transferring spectrum knowledge under few sample conditions, and loss of structural information due to sparse supervision signals. These problems make it difficult to effectively capture high-frequency abnormal patterns of fraudulent behavior and improve recognition accuracy in scenarios with few samples.

Method used

By constructing a spectrum-adaptive cueing method, high-frequency and low-frequency feature sample sets are built using adjacency matrices and updated feature matrices. A parameterized Chebyshev polynomial is used to construct filters. Adaptive weighted fusion is performed by combining node-level spectrum weight coefficients and frequency band-aware comparative learning. A spectrum characteristic adaptive adapter is introduced to fine-tune the model and optimize node embedding to identify fraud types.

Benefits of technology

It achieves efficient transfer and adaptive adjustment of spectral knowledge in low-sample scenarios, improves the accuracy and generalization ability of fraud type identification, significantly alleviates the overfitting problem, and enhances the ability to capture local common and global difference patterns.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241485A_ABST
    Figure CN122241485A_ABST
Patent Text Reader

Abstract

This invention discloses a few-shot fraud type identification method and related apparatus with spectrum adaptive prompting, belonging to the field of electronic digital data processing. The method includes: preprocessing unlabeled graph data, extracting high and low frequency features of nodes using a Chebyshev model, and training a spectrum perception model through adaptive weighted contrastive learning using node-level spectrum weight coefficients. The high-frequency energy proportion of downstream graph data is calculated, an adapter is introduced to generate enhancement filter coefficients, high and low frequency features are extracted and weighted fused, and optimization is performed using clustering constraints, updating only a small number of parameters. This invention achieves dynamic adaptation and efficient transfer of pre-trained spectrum knowledge to few-shot downstream tasks, significantly reducing dependence on labeled data, alleviating the few-shot overfitting problem, and improving the accuracy and generalization ability of fraud identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of electronic digital data processing, and in particular to a method and apparatus for identifying few-sample fraud types using spectrum adaptive prompting. Background Technology

[0002] With the rapid development of the internet and fintech, fraud has become increasingly complex, covert, diversified, and large-scale. Fraud types have evolved from single forms to a complex landscape encompassing multiple types such as credit card fraud, account theft, identity fraud, illegal fund transfers, cash-out schemes, and phishing attacks. Traditional rule-based or single-point detection methods are no longer sufficient to address these evolving fraud tactics, necessitating the development of more intelligent and efficient fraud detection technologies.

[0003] Graph neural networks (GNNs), with their powerful relational modeling capabilities, can effectively capture complex patterns of association in fraudulent activities, making them a research hotspot in fraud detection. However, existing GNN-based fraud detection methods face the following key challenges in practical applications: (1) Insufficient spectrum adaptation capability. Fraudulent behavior often exhibits low homogeneity, meaning that fraudulent nodes differ significantly from surrounding normal nodes. This difference corresponds to high-frequency components in graph signal processing. However, most graph neural networks overemphasize low-frequency information, making it difficult to effectively capture the high-frequency anomaly patterns unique to fraudulent behavior. More importantly, the significant homogeneity differences between different fraudulent subgraphs, and even between different nodes within the same graph, make it difficult for any method with a fixed frequency band preference to adapt to diverse local structures.

[0004] (2) Difficulty in transferring spectral knowledge under few-sample conditions. In real-world financial scenarios, the cost of acquiring fraudulent samples is high and the labeling is difficult, resulting in an extreme scarcity of labeled data. Existing hint-based optimization methods are mainly designed in the spatial domain and cannot explicitly modulate key spectral components, which means that the spectral knowledge learned in the pre-training stage cannot be effectively transferred to downstream few-sample tasks.

[0005] (3) Loss of structural information due to sparse supervision signals. In scenarios with few samples, only a very small number of labeled nodes exist. Existing methods lack a mechanism to propagate the limited category semantic information of labeled nodes to a large number of unlabeled nodes based on spectral structural similarity. Summary of the Invention

[0006] The purpose of this invention is to overcome the problems of the prior art and provide a method and related apparatus for identifying few-sample fraud types with spectral adaptive prompting.

[0007] The objective of this invention is achieved through the following technical solution: a few-sample fraud type identification method with spectral adaptive cueing, the method comprising the following steps: The original feature matrix of the unlabeled graph data used for pre-training is sequentially subjected to dimension mapping, normalization, and smoothing to obtain the updated feature matrix; the adjacency matrix and the updated feature matrix are used to construct a first positive sample set and a first negative sample set of high-frequency features and a second positive sample set and a second negative sample set of low-frequency features for each node; The normalized Laplacian matrix is ​​calculated based on the adjacency matrix and the updated feature matrix. Complementary low-frequency and high-frequency filters are constructed using parameterized Chebyshev polynomials to form a Chebyshev model. Low-frequency and high-frequency features of each node are extracted based on the Chebyshev model. An adaptive band modulation mechanism based on the node spectral energy distribution is used to construct node-level spectral weight coefficients. Band-aware contrastive learning is used to jointly optimize low-frequency and high-frequency features. The low-frequency contrastive loss function and the high-frequency contrastive loss function are adaptively weighted and fused using the node-level spectral weight coefficients to construct the total loss function. The Chebyshev model is trained using the total loss function until convergence to obtain a pre-trained spectrum-aware model. The original feature matrix of the downstream graph data related to fraud type identification is sequentially subjected to dimension mapping, normalization, and smoothing to obtain the downstream updated feature matrix; The pre-trained spectrum-aware model is frozen, and spectral characteristics are analyzed on the downstream graph data to calculate the proportion of high-frequency energy. A spectral characteristic adaptive adapter is introduced as a spectral domain cue. An adaptive adjustment is generated using the original filter coefficients and the proportion of high-frequency energy as inputs and added to the original filter coefficients to obtain the cue-enhanced filter coefficients. An enhanced filter is constructed using the enhanced filter coefficients. The downstream updated feature matrix is ​​convolved with the cue-enhanced filter to obtain optimized low-frequency and high-frequency features. These features are then weighted and combined according to learnable fusion coefficients to form node embeddings. Clustering target constraints are introduced to calculate the clustering loss, which is then weighted and fused with the classification loss to form the downstream total loss function. Only the cue adapter parameters, cluster centers, and classification head parameters are updated. The optimized low-frequency and high-frequency features are combined according to the fusion coefficient to obtain the final embedded representation of each node, which is then input into the classifier to output the fraud type identification result.

[0008] In one embodiment, constructing a first positive sample set and a first negative sample set of high-frequency features and a second positive sample set and a second negative sample set of low-frequency features for each node includes: For high-frequency features or downstream high-frequency features, the nodes with the highest similarity to the node in the feature space are used as the first positive sample set, and the nodes randomly selected that do not belong to the first positive sample set are used as the first negative sample set; for low-frequency features or downstream low-frequency features, the nodes of the node's first-order neighbors are used as the second positive sample set, and the nodes randomly selected that do not belong to the second positive sample set are used as the second negative sample set.

[0009] In one embodiment, constructing node-level spectral weighting coefficients includes: The L2 norm squared of the low-frequency eigenvector of the node is used as the low-frequency energy of the node, and the L2 norm squared of the high-frequency eigenvector of the node is used as the high-frequency energy of the node. The ratio of the low-frequency energy of the node to the sum of the low-frequency energy and the high-frequency energy is used as the low-frequency weight coefficient of the node, and the ratio of the high-frequency energy of the node to the sum of the low-frequency energy and the high-frequency energy is used as the high-frequency weight coefficient of the node.

[0010] In one embodiment, constructing the total loss function includes: The low-frequency contrastive loss function is calculated based on the low-frequency features and the second positive sample set of low-frequency features of each node, and a temperature coefficient is introduced to adjust the sharpness of the contrastive learning distribution. The high-frequency contrastive loss function is calculated based on the high-frequency features and the first positive sample set of the high-frequency features of each node, and a temperature coefficient is introduced to adjust the sharpness of the contrastive learning distribution. The total loss function is constructed by multiplying the low-frequency contrast loss of each node by the low-frequency weight coefficient of that node, and multiplying the high-frequency contrast loss of each node by the high-frequency weight coefficient of that node.

[0011] In one embodiment, the step of performing spectral characteristic analysis on the downstream graph data and calculating the high-frequency energy ratio includes: Calculate the total variation of the graph signal in the downstream graph data, and calculate the ratio of the total variation of the graph signal to the square of the Frobenius norm of the feature matrix of the downstream graph data. Map the ratio to the interval between 0 and 1 to obtain the high-frequency energy ratio.

[0012] In one embodiment, the spectral characteristic adaptive adapter adopts a bottleneck structure design, which encodes the original filter coefficients into low-dimensional features through downward projection, maps the high-frequency energy ratio into a modulation vector, multiplies the modulation vector with the low-dimensional features element by element, and then restores the original dimension through upward projection to generate an adaptive adjustment amount.

[0013] In one embodiment, the step of introducing clustering objective constraints to calculate clustering loss includes: Based on node embeddings and learnable cluster centers, the soft assignment probability of each node to each cluster center is calculated using the t-distribution to obtain the soft assignment distribution. The soft assignment probability is then squared normalized to construct a sharpened target distribution. The clustering loss is obtained by minimizing the KL divergence between the target distribution and the soft assignment distribution.

[0014] It should be further noted that the technical features corresponding to the above examples can be combined or replaced to form new technical solutions.

[0015] The present invention also includes a computer program product comprising a computer program that, when executed by a processor, implements the steps of the spectrum adaptive prompting few-sample fraud type identification method formed by any or a combination of the above examples.

[0016] The present invention also includes a storage medium storing computer instructions that, when executed, perform the steps of the spectrum adaptive prompting few-sample fraud type identification method formed by any or more of the above examples.

[0017] The present invention also includes a terminal comprising a memory and a processor, the memory storing computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the spectrum adaptive prompting few-sample fraud type identification method formed by any or more of the above examples.

[0018] Compared with the prior art, the beneficial effects of the present invention are: 1. By combining node-level spectral weight coefficients in the pre-training phase with graph-level spectral characteristic indicators in the fine-tuning phase, the spectral structure learned in the pre-training phase can be dynamically adapted in the fine-tuning phase, avoiding the problem of the pre-trained model and fine-tuning being independent of each other and the difficulty in effectively transferring spectral knowledge. Simultaneously, the cueing mechanism is directly applied to the Chebyshev filter coefficients, and the cueing intensity is dynamically adjusted through graph-level spectral characteristic indicators (high-frequency energy proportion), avoiding the destruction of pre-trained spectral knowledge and achieving efficient transfer of pre-trained spectral knowledge in downstream tasks. Furthermore, by using clustering objective constraints to propagate the supervision signals from finite labeled nodes to unlabeled nodes, the utilization efficiency of sparse supervision signals is significantly improved, alleviating the overfitting problem in low-sample scenarios, thereby improving the accuracy of fraud type identification results.

[0019] 2. Differentiated positive and negative sample sets are constructed for high-frequency and low-frequency features respectively. This allows high-frequency contrastive learning to focus on semantic neighbors in the feature space to discover global abnormal patterns, while low-frequency contrastive learning focuses on topological neighbors to capture local common structures, thus providing accurate sample support for subsequent frequency band-aware contrastive learning.

[0020] 3. By calculating the squared L2 norm of nodes on low-frequency and high-frequency features as spectral energy, and constructing node-level spectral weight coefficients based on the energy ratio, the model can distinguish the individual differences of different nodes in frequency band distribution, providing an accurate node-level adjustment basis for adaptive weighted fusion.

[0021] 4. By calculating the low-frequency and high-frequency contrast loss functions and using the node-level spectral weight coefficients to adaptively weight and sum the high-low frequency contrast loss of each node, nodes with dominant low-frequency energy automatically strengthen their learning of local common patterns, and nodes with dominant high-frequency energy automatically strengthen their learning of global difference patterns, thus fundamentally solving the spectral bias problem caused by fixed weights.

[0022] 5. By calculating the proportion of high-frequency energy, an accurate modulation signal can be provided for the spectrum characteristic adaptive adapter, enabling the spectral domain cue to be dynamically adjusted according to the spectrum characteristics of the downstream graph.

[0023] 6. By generating an adaptive adjustment amount that matches the downstream spectral characteristics through an adaptive adapter of the bottleneck structure's spectral characteristics, efficient adaptation of spectral domain hints can be achieved with the introduction of a very small number of trainable parameters, avoiding the destruction of the spectral knowledge already learned by the pre-trained model.

[0024] 7. By minimizing the KL divergence between the target distribution and the soft assignment distribution, the clustering loss is calculated, which makes the node embedding in the spectral feature space compact and consistent. The category semantic information of the limited labeled nodes is efficiently propagated to the unlabeled nodes based on spectral similarity, which significantly improves the generalization ability in the few-sample scenario. Attached Figure Description

[0025] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings. The accompanying drawings are provided to provide a further understanding of the present application and constitute a part of the present application. The same reference numerals are used in these drawings to denote the same or similar parts. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application.

[0026] Figure 1 This is a flowchart of a method provided in an embodiment of the present invention. Detailed Implementation

[0027] The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0028] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, the technical features involved in the different embodiments of this invention described below can be combined with each other as long as they do not conflict with each other.

[0029] In one embodiment, such as Figure 1As shown, a few-sample fraud type identification method with spectral adaptive cueing is proposed, which includes the following steps: S1: Graph structure preprocessing: The original feature matrix of the unlabeled graph data used for pre-training is sequentially subjected to dimension mapping, normalization, and smoothing to obtain the updated feature matrix; the adjacency matrix and the updated feature matrix are used to construct the first positive sample set and the first negative sample set of high-frequency features and the second positive sample set and the second negative sample set of low-frequency features for each node.

[0030] In step S1, the unlabeled image data G = ( , , ),include Each node and Edges, where the node set matrix , For the first A matrix of nodes, where each node represents a transaction entity to be detected, including credit card transactions, accounts, user or device identifiers, etc. Let X be the edge set matrix, where each edge represents at least one connection between two nodes. The node attribute set matrix corresponds to the original feature matrix X, where each row represents the feature value of a node; the connection relationships between nodes are represented by the adjacency matrix. This indicates that the connection includes one or more of the following: sharing the same device identifier, the same IP address, the same shipping address, or the same payment account.

[0031] Furthermore, by using the adjacency matrix and the updated feature matrix, the similarity distance between each node and other nodes is measured, thereby constructing positive and negative sample sets for high-frequency features and positive and negative sample sets for low-frequency features.

[0032] In this embodiment, step S1 addresses the differences in feature dimensions and scales of multi-source heterogeneous graph data through feature projection and normalization. At the same time, graph signal smoothness analysis is introduced to align and sort the node feature dimensions, providing a structurally consistent feature foundation for subsequent spectrum modeling.

[0033] S2: Spectrum Sensing Map Pre-training: Based on the adjacency matrix and the updated feature matrix, the normalized Laplacian matrix is ​​calculated. Parametric Chebyshev polynomials are used to construct complementary low-frequency and high-frequency filters, forming a Chebyshev model. Low-frequency and high-frequency features of each node are extracted based on the Chebyshev model. An adaptive band modulation mechanism based on the node's spectral energy distribution is used to construct node-level spectral weight coefficients. Band-sensing contrastive learning is used to jointly optimize low-frequency and high-frequency features. The low-frequency and high-frequency contrastive loss functions are adaptively weighted and fused using the node-level spectral weight coefficients to construct the total loss function. The Chebyshev model is trained using the total loss function until convergence, resulting in a pre-trained spectrum sensing model.

[0034] In step S2, an adaptive frequency band modulation mechanism based on node spectral energy distribution is proposed. Specifically, by calculating the energy proportion of each node in low-frequency and high-frequency features, node-level spectral weight coefficients are constructed. These coefficients are used during pre-training to dynamically adjust the weights of low-frequency and high-frequency contrast losses, enabling the model to adaptively allocate attention to low-frequency and high-frequency information for different nodes. This significantly overcomes the spectral bias problem caused by fixed-band modeling in traditional methods. The final output of this adaptive frequency band modulation mechanism is the skeleton of the spectrum-aware model after training convergence. The spectrum-aware model is a graph filter bank built based on Chebyshev polynomials, such as a graph convolutional network including learnable low-pass and high-pass filters. Its core parameters are the Chebyshev filter coefficients and the corresponding multilayer perceptron weights, which carry the general spectral structure knowledge learned from multi-source graph data. Meanwhile, this method designs a dual-channel frequency feature learning mechanism, namely: constructing low-frequency and high-frequency filters based on parameterized Chebyshev polynomials, and extracting low-frequency common features and high-frequency difference features respectively by combining multilayer perceptron. Through joint optimization of the contrast loss of low-frequency and high-frequency features, the spectrum sensing model can learn global and local modes simultaneously.

[0035] S3: Few-shot feature engineering and transfer: The original feature matrix of the downstream graph data related to fraud type identification is sequentially subjected to dimension mapping, normalization, and smoothing to obtain the downstream updated feature matrix.

[0036] Downstream graph data consists of few-sample graph data, which is graph data with less than a preset threshold of labeled data. In step S3, the few-sample graph data related to fraud type identification undergoes the same preprocessing as in step S1, aligning the feature distribution of the downstream graph data with the feature distribution of the pre-training data, thereby achieving effective transfer of pre-trained spectral knowledge to few-sample scenarios.

[0037] S4: Clustering Enhancement Cue Fine-tuning: Freeze the pre-trained spectrum-aware model, perform spectral characteristic analysis on the downstream graph data, calculate the high-frequency energy ratio, introduce a spectral characteristic adaptive adapter as a spectral domain cue, generate an adaptive adjustment amount using the original filter coefficients and the high-frequency energy ratio as input, and add it to the original filter coefficients to obtain the cue-enhanced filter coefficients; construct an enhanced filter using the enhanced filter coefficients, convolve the downstream updated feature matrix with the cue-enhanced filter to obtain optimized low-frequency and high-frequency features, and weight and combine them according to learnable fusion coefficients to form node embeddings; introduce clustering target constraints to calculate clustering loss, and weight and fuse it with classification loss to form the downstream total loss function; only update the cue adapter parameters, cluster centers, and classification head parameters, and keep the pre-trained spectrum-aware model parameters frozen.

[0038] In step S4, starting with the frozen model skeleton (i.e., Chebyshev filter coefficients) output from the pre-training stage, fine-tuning of spectral characteristics is performed for the downstream few-sample node classification task. Specifically, the pre-trained spectrum-aware skeleton is first frozen, and a spectral characteristic adaptive adapter is introduced as a spectral domain cue to its Chebyshev coefficients. The core improvement of this adapter lies in that it not only receives the original frozen filter coefficients as input, but also receives the high-frequency energy proportion calculated from the downstream graph data as a modulation signal, generating an adaptive adjustment amount that matches the downstream spectral characteristics to correct the filter coefficients. In this process, the filter coefficients learned in the pre-training stage provide the adapter with high-quality spectral priors, while the calculation result of the high-frequency energy proportion drives the adapter to generate an adaptive adjustment amount that matches the spectral characteristics of the downstream data, achieving fine-tuning of low-frequency and high-frequency features. This design breaks through the technical limitation of existing spatial domain cueing, which is only added to node features and cannot explicitly control spectral components, enabling the model to dynamically adjust the attention to low-frequency and high-frequency information according to the high-frequency energy proportion of the downstream graph data. Simultaneously, the clustering objective constraints introduced during the fine-tuning phase are functionally coupled with the spectral domain cue. The clustering constraints act on the node embeddings enhanced by the spectral domain cue. Through the soft assignment distribution of node embeddings and learnable cluster centers, using KL divergence as a regularization term, the node representations maintain compactness and consistency in the global structure. The synergistic optimization of clustering loss and classification loss, on the one hand, indirectly transmits the category semantic information of finite labeled nodes to unlabeled nodes through cluster centers; on the other hand, the distribution of node embeddings after clustering can serve as a feedback signal to indirectly verify the effectiveness of the spectral domain cue adjustment. The entire fine-tuning process only updates the cue adapter, cluster centers, and classification head parameters, while keeping the pre-trained skeleton frozen. This preserves the spectral knowledge learned by the pre-trained model and achieves efficient adaptation to downstream tasks.

[0039] S5: Node classification prediction: Combine the optimized low-frequency and high-frequency features according to the fusion coefficient to obtain the final embedded representation of each node, input the classifier and output the fraud type identification result.

[0040] Based on steps S1-S4, the optimized low-frequency features and high-frequency features are weighted and combined according to the fusion coefficient, so that each node obtains a final embedded representation that integrates low-frequency common information and high-frequency difference information. The final embedded representation is input into the classifier, and the classifier outputs the probability of the node belonging to each type of fraud. The category with the highest probability is taken as the fraud type identification result of the node.

[0041] This invention trains a spectrum-aware model skeleton using an adaptive frequency band modulation mechanism. This skeleton serves as the initial starting point for fine-tuning. Based on this skeleton, a spectral domain cue adapter is introduced. Adaptive adjustment amounts are generated using the spectral characteristics of downstream graph data, achieving transfer and adaptation of spectral knowledge. Clustering constraints, while optimizing node embedding, functionally couple cluster distribution information with the optimization of spectral domain cueing, enhancing the global structural consistency of node embedding. These three elements form a collaborative mechanism: pre-training to learn the spectral structure, fine-tuning adaptive matching the spectrum, and clustering constraints to monitor the spread of the spectral space signal. This enables stable generalization across heterogeneous graph data even with limited sample sizes.

[0042] It should be noted that the method of this invention is not limited to fraud type identification scenarios, but can also be widely applied to other graph-based node classification scenarios, such as IoT device fault diagnosis scenarios and network security intrusion detection scenarios. In IoT device fault diagnosis scenarios, nodes represent sensor devices, edges represent physical connections or communication links between devices, and the characteristics of each node include device operating parameters, such as temperature, voltage, and current, used to predict device failure risks. In network security intrusion detection scenarios, nodes represent IP addresses or hosts, edges represent network communication traffic, and the characteristics of each node include packet size, protocol type, connection time, etc., used to identify network attack behaviors. The above application scenarios all belong to graph structure data processing. The spectrum-sensing graph pre-training and clustering enhancement prompting method of this invention is also applicable to the above scenarios. Through node-level spectrum adaptation and spectral domain prompting transfer, it achieves accurate identification and classification of various abnormal nodes under conditions of few samples.

[0043] In one embodiment, the original feature matrix of the unlabeled graph data used for pre-training is sequentially subjected to dimensionality mapping, normalization, and smoothing, including: S11: Utilizing feature projection blocks The dimension of the feature matrix of each graph Mapping to a unified feature dimension Above; the feature projection block is represented as a function: in, For the real number space, express Euclidean space. For each graph data point, applying feature projection blocks yields a feature representation of uniform dimension: in, It is a unified-dimensional feature matrix of graph data after feature projection block processing; feature projection block The computation process, which sequentially performs linear transformations, nonlinear activation functions, and normalization, can be represented as follows when implemented using fully connected layers in a neural network: in, It is a weight matrix. It is a bias vector. It includes non-linear activation functions such as ReLU. In this step, the feature projection block adopts the ReLU activation function to introduce non-linearity, and a batch normalization layer is used to accelerate training and improve generalization ability.

[0044] S12: Uniform dimensional feature matrix for graph data Normalization is performed to obtain the normalized feature matrix. This ensures that the L2 norm of each eigenvector is 1, achieved through the following formula: in, It is the characteristic matrix The Frobenius norm. In step S12, dimensionality unification is the overall dimensionality unification for each dataset. This step uses the L2 norm normalization method to standardize the feature matrix. By adjusting the scale of the feature vectors, it ensures that all feature vectors are compared and analyzed on a uniform scale, thereby avoiding analytical bias caused by differences in feature scale.

[0045] S13: Calculate the smoothness score based on the normalized feature matrix. The smoothness score is used to sort all features of each node in descending order, aligning features with similar smoothness in the dimensional space, achieved through the following formula: in, For the figure in the first Smoothness score on each feature dimension; Indicates the number of edges; and These are nodes and In the Normalized feature matrix in each feature dimension The eigenvalues ​​are then sorted according to their smoothness from highest to lowest. By rearranging the order of the features, the updated feature matrix is ​​obtained. In step S13, Each feature dimension represents multiple features for each node, and the number of features per node corresponds to the number of dimensions. This step quantifies the distribution of features in the graph structure by calculating the smoothness score of each feature dimension, thereby identifying features with high consistency in the graph structure. These features can better reflect the inherent structure and patterns of the graph data.

[0046] From a graph signal processing perspective, the reciprocal of the low-frequency energy ratio monotonically increases with the degree of node anomaly, indicating that high-frequency graph signals play a more important role in anomaly detection. From the perspective of spatial graph neural networks, heterogeneity information also plays an important role in distinguishing anomalous nodes. Smoothness score The lower the value, the lower the value of the connected node at the 1st position. The greater the difference in values ​​across feature dimensions, the higher the frequency of the corresponding image signal and the stronger its heterogeneity. Based on this, the projected features of different datasets are rearranged in descending order of smoothness score, aligning high-frequency features in each dataset to the same position. This allows for focused attention on high-frequency heterogeneous information within a unified feature space, thereby improving the model's sensitivity to outliers and its ability to align features across datasets.

[0047] Specifically, based on the smoothness score calculated in step S23, all feature dimensions are sorted in descending order, aligning features with similar smoothness in the dimensional space, further optimizing the feature organization. This smoothness score is calculated by... The node The average of the squared differences of each attribute across all edges is used to derive an index measuring the smoothness of the attribute's distribution in the graph. By calculating the sum of the squared differences of these attribute values ​​and then averaging them, an index can be obtained. Ordering the features makes it easier to identify and select the most representative features and improves the interpretability of the feature space. This feature alignment strategy provides a clearer and more ordered feature foundation for subsequent feature analysis and graph data modeling. In this way, graph neural network-based models can learn to automatically filter graph signals at different smoothness levels and predict anomalies accordingly.

[0048] In one embodiment, constructing a first positive sample set and a first negative sample set for high-frequency features and a second positive sample set and a second negative sample set for low-frequency features for each node includes: For high-frequency features, the feature with the highest similarity to the node in the feature space is selected. 1. Nodes are used as the first positive sample set, and then randomly selected nodes that do not belong to the first positive sample set are... For low-frequency features, the first set of negative samples consists of a number of nodes; for low-frequency features, the second set of positive samples consists of multiple first-order neighbor nodes of a node, and the third set consists of randomly selected nodes that do not belong to the second set of positive samples. Each node is used as a second negative sample set. The similarity metric is calculated by determining the cosine similarity distance between each node and the other nodes.

[0049] In one embodiment, spectrum sensing map pre-training includes the following sub-steps: S21: Input the updated feature matrix and adjacency matrix into the Chebyshev model to obtain the high- and low-frequency filters for each node, where the weight coefficients of the Chebyshev model are... for: in , , Represent node characteristics; This indicates the number of coefficients in a Chebyshev polynomial. Indicates the first A Chebyshev polynomial of order 1 is recursively defined as follows: ,in For the first Learnable parameters Low-pass filter parameters are generated using a learnable sliding cosine parameterization method. and high-pass filter parameters : in, This represents the first learnable bias parameter of the low-frequency filter; This represents the second learnable bias parameter of the low-frequency filter; This represents the first learnable bias parameter of the high-frequency filter; This represents the second learnable bias parameter of the high-frequency filter. During initialization, and Set them to 0 and 2 respectively, and and Set to 2. All four parameters can be trained during the learning process. In summary, the first... Learnable parameters Replace them with the learnable parameters of the low-pass filter respectively. Learnable parameters of high-pass filters A graph filter can be represented as: in, This represents the low-frequency features extracted by the low-pass filter; This represents the high-frequency features extracted by the high-pass filter; A multilayer perceptron representing low-frequency feature extraction; A multilayer perceptron representing high-frequency feature extraction; The first low-frequency filter Chebyshev polynomial weighting coefficients of order 1; Represents the first high-frequency filter Chebyshev polynomial weighting coefficients of order 1; Represents the normalized Laplace matrix ,in For degree matrix, It is an adjacency matrix. It is the identity matrix; In step S21, the Chebyshev model first receives the feature data describing the attributes of the nodes in the graph after the data preprocessing is completed. Adjacency matrix of connections between nodes The input is fed into a Chebyshev polynomial-based graphical filter. This filter employs a learnable sliding cosine parameterization method to construct the low-frequency filter coefficients. and high-frequency filter coefficients The corresponding weight coefficients are then calculated using Chebyshev interpolation. and The updated feature matrix is ​​convolved with both the low-frequency and high-frequency filters to obtain the low-frequency filter output. and high-frequency filter output ,in It emphasizes the low-frequency common signals that change gradually between nodes and their neighbors, reflecting the structural consistency of the local neighborhood; Highlight the high-frequency difference signals that distinguish nodes from their neighbors, capturing the specificity and boundary information in the graph.

[0050] S22: Further, calculate the spectral energy of each node's low-frequency and high-frequency features, construct a node-level spectral energy representation, and obtain the node's spectral weight coefficients based on the ratio of low-frequency energy to high-frequency energy, which are used to characterize the information distribution characteristics of the node in different frequency bands. in, Represents a node Low-frequency characteristics; Represents a node High-frequency characteristics; 、 The energy of a node in low-frequency and high-frequency features is measured by the squared L2 norm of its eigenvector, which characterizes the response intensity of the node in the corresponding spectral space. and These represent the low-frequency weighting coefficient and high-frequency weighting coefficient of a node, respectively, used to characterize the information proportion of a node in different frequency bands.

[0051] This invention proposes an adaptive band modulation mechanism based on node spectral energy distribution, which works in conjunction with the aforementioned Chebyshev band decoupling module. For the low-frequency and high-frequency features of each node obtained during the pre-training phase, its spectral energy at low and high frequencies is first calculated using the squared L2 norm. and This leads to the node-level spectral weighting coefficients. and This weighting coefficient reflects the proportion of information distribution of nodes in different frequency bands and is used to dynamically adjust the contribution weights of low-frequency loss and high-frequency loss during the contrastive learning process.

[0052] It should be noted that existing Chebyshev high-low frequency contrastive learning methods typically use fixed weights to weight the low-frequency and high-frequency losses, meaning that all nodes share the same loss weight coefficients. This makes it impossible to distinguish the individual differences of different nodes in frequency band distribution. This approach results in spectral bias at the node level of the model.

[0053] This invention introduces node-level spectral weighting coefficients, enabling the model to independently calculate the energy proportion of each node in low and high frequencies, and dynamically adjust the loss weight of that node in contrastive learning based on this energy proportion. Specifically, for nodes with a higher proportion of low-frequency energy... The model automatically increases the weight of its low-frequency contrast loss, making it more focused on learning common patterns in the local neighborhood; for nodes with a higher proportion of high-frequency energy ( (With a larger weighting), the model automatically increases the weight of its high-frequency contrast loss, making it more focused on learning cross-regional difference patterns. This node-level adaptive weighting mechanism allows the model to dynamically balance the learning intensity of low-frequency structural information and high-frequency difference information based on the spectral characteristics of each node during training, fundamentally overcoming the spectral bias problem caused by fixed weights in traditional methods. Through this mechanism, the model can simultaneously capture local consistency and global differences between nodes, achieving more accurate representation learning on heterogeneous graph data. Simultaneously, the node-level spectral weight coefficients output by this mechanism serve as prior knowledge of the node's spectral distribution, providing node-level spectral priors for spectral domain cues during the fine-tuning stage, achieving alignment of spectral knowledge at the node granularity between pre-training and fine-tuning.

[0054] S23: Based on low-frequency characteristics Calculate the low-frequency feature loss function using the second positive sample set of low-frequency features for each node. During the calculation process, the spectral weighting coefficients of the nodes are introduced to adjust the contrastive learning intensity of different frequency bands, specifically as follows: in, Represents a node Low-frequency contrast loss, It is a node set matrix. yes Any node in, yes Any node in, yes The negative sample set of low-frequency features, i.e., excluding nodes Nodes randomly selected outside the nodes of the positive sample set with low-frequency features; The temperature coefficient calculated for low-frequency features needs to be manually set during model training. It controls the sharpness of the distribution, i.e. whether the predicted probability is concentrated in a certain class. Cosine similarity measures vector similarity. for The low-frequency characteristics of nodes For nodes The positive sample set of low-frequency features; and Representing nodes respectively Low-frequency characteristics and nodes Its low-frequency characteristics.

[0055] Based on high frequency features Calculate the high-frequency feature loss function using the positive sample set of high-frequency features for each node. Specifically: in, Represents a node High-frequency contrast loss; yes Any node in, yes The negative sample set of high-frequency features, i.e., excluding nodes Nodes randomly selected outside the nodes of the positive sample set of high-frequency features; The temperature coefficient calculated for high-frequency features needs to be manually set during model training to control the sharpness of the distribution. for High-frequency characteristics of nodes For nodes The positive sample set of high-frequency features; and Representing nodes respectively High-frequency characteristics and nodes High-frequency characteristics.

[0056] Based on the adaptive weighted fusion of the low-frequency contrast loss function and the high-frequency contrast loss function using node-level spectral weighting coefficients, the total loss function is constructed as follows: in, For the total loss function, The sum is the node-level adaptive weight coefficient, calculated in step S21, which enables different nodes to dynamically adjust the weights of low-frequency loss and high-frequency loss during the optimization process. This adaptive weighting mechanism can dynamically balance structural information and difference information according to the spectral characteristics of the nodes, thereby improving the generalization ability of the model under different graph structures.

[0057] S24: Train the Chebyshev model using the total loss function until the model converges. The convergence determination method during training may include: (1) Slowing loss decrease: During training, when the decrease of the loss value in multiple consecutive iterations is less than the preset threshold, the model is judged to have converged. (2) Loss stability: When the loss fluctuates slightly around a certain value and no longer decreases significantly, the model is judged to have converged.

[0058] In steps S22 to S24 of this embodiment, the decoupled low-frequency features and high-frequency features are jointly optimized through a dual-path contrastive learning framework, and the end-to-end training of the model is achieved by combining a node-level adaptive weighting mechanism.

[0059] On the low-frequency feature path, based on the comparison target of direct neighbor nodes, the low-frequency representation of each node is made as close as possible to the low-frequency representation of its first-order neighbor in the feature space. This comparison target is achieved by calculating the cosine similarity between the node and its neighbors, and a temperature parameter is introduced to adjust the difficulty of the comparison learning. At the same time, non-neighbor nodes are randomly selected globally as negative samples for optimization. Compared with the prior art, this invention introduces for the first time the node-level spectral weight coefficients calculated in step S22, using the energy proportion of the node on the low-frequency features as a dynamic adjustment factor. This allows different nodes to adaptively allocate the learning intensity of low-frequency information during training, thereby ensuring the consistency of adjacent nodes in the smooth feature space, effectively capturing stable common patterns in local regions, and enhancing the model's ability to express homogeneous structures and community aggregation characteristics in the graph.

[0060] On high-frequency feature paths, this invention innovatively employs a differentiated comparison strategy, enabling the high-frequency representation of each node to be compared with a pre-selected feature similarity-based path. A positive association is established between the nearest neighbor nodes. These positive samples do not come from topological neighbors, but from nodes that are semantically similar but may be topologically distant in the feature space. Through this mechanism, high-frequency contrastive learning can discover the unique attributes of nodes that span the regular neighborhood range and keenly capture the heterogeneous patterns and boundary information in the graph.

[0061] Node-level spectral weight coefficients are also used to dynamically adjust the contrastive learning intensity of high-frequency features, achieving adaptive weighted fusion of low-frequency and high-frequency feature optimization processes. This allows the model to dynamically balance structural and differential information based on the spectral energy distribution of each node during training. Ultimately, this invention forms a total loss function by weighting the low-frequency and high-frequency feature losses. This enables the model to maintain learning of local common patterns while also keenly capturing cross-regional differential features during end-to-end training, significantly improving the model's generalization ability under different graph structures and heterogeneous conditions. It is particularly suitable for complex scenarios with heterogeneous features, such as few-shot classification and fraud detection, achieving the effects of adaptive spectral optimization and dynamic balance of structural and differential information.

[0062] Compared to existing Chebyshev high-low frequency contrastive learning methods (such as S3GCL), the core innovation of this invention in the pre-training stage lies in upgrading the graph-level unified contrastive paradigm to a node-level adaptive collaborative paradigm. Specifically, existing methods use globally fixed weights to uniformly weight all nodes, and their contrastive objective is essentially to allow the all-pass representation of the multilayer perceptron to learn the output of the Chebyshev biased filter, forming a teacher-student distillation framework. This design cannot distinguish the individual differences of different nodes in frequency band distribution, and low-frequency and high-frequency features use the same topological neighbors as positive samples, ignoring the fundamental difference in their physical meaning. To address the aforementioned technical shortcomings, this invention introduces node-level spectral weighting coefficients. By calculating the energy proportion of each node in low-frequency and high-frequency features, independent weighting is achieved. This allows nodes with dominant low-frequency energy to automatically strengthen their focus on common patterns in their local neighborhoods during contrastive learning, while nodes with dominant high-frequency energy automatically strengthen their focus on cross-regional differential patterns. Simultaneously, a fundamental breakthrough is achieved in the design of contrast targets: low-frequency paths use topological neighbors as positive samples to capture homogeneous structures, while high-frequency paths use semantic nearest neighbors in the feature space as positive samples to mine heterogeneous patterns. This forms a path-specific differential contrast architecture that is fundamentally different from existing methods' unified distillation across channels. Through the collaborative design of node-level adaptive weighting and path-specific differential contrast, this invention enables the model to dynamically balance the learning intensity of low-frequency common information and high-frequency differential information based on the spectral energy distribution of each node. This fundamentally overcomes the spectral bias problem caused by fixed weights and unified positive sample strategies in existing methods, achieving adaptive spectral optimization at the node level and significantly improving the model's generalization ability on different homogeneous graph data.

[0063] In one embodiment, cluster enhancement cue fine-tuning includes the following sub-steps: S41: Freeze the pre-trained spectrum sensing model, retaining its Chebyshev polynomial order K and corresponding filter coefficients. A spectrum-aware bottleneck adapter is introduced onto the frozen Chebyshev coefficients as a spectral domain cue to generate an adaptive adjustment. .

[0064] First, regarding downstream graph data Calculate the total variation of the graph signal. Among them, This represents the set of nodes in the downstream graph data. Represents the set of edges in the downstream graph data. This represents the node feature matrix of the downstream graph data. For the node feature matrix of the downstream task... The total variation of its graph signal Defined as: in, Represents a node eigenvectors; Represents a node eigenvectors; Representing nodes in the degree matrix The corresponding degree value; Representing nodes in the degree matrix The corresponding degree value. The total variation reflects the overall smoothness of the graph signal. The larger the total variation, the higher the energy proportion of high-frequency components in the graph signal.

[0065] Secondly, calculate the normalized high-frequency energy percentage. : in, σ is the squared Frobenius norm of the characteristic matrix, used for normalization; σ(⋅) is the Sigmoid function, which maps the ratio to the interval [0,1]. A higher value indicates a higher proportion of high-frequency energy in the graph data, signifying stronger heterogeneity in the graph structure. In fraud detection scenarios, fraudulent nodes exhibit significant feature differences from surrounding normal nodes, resulting in a typically higher proportion of high-frequency energy. The high-frequency energy proportion is obtained by calculating the total variation of the graph signal in the downstream graph data, then calculating the ratio of this total variation to the squared Frobenius norm of the feature matrix of the downstream graph data, and finally mapping this ratio to the 0-1 range using the Sigmoid function. A higher value indicates a higher proportion of high-frequency energy in the graph data.

[0066] A spectrum-aware adaptive adapter for spectral characteristics is introduced onto the frozen Chebyshev coefficients. This adapter receives not only the original filter coefficients as input, but also the spectral characteristic indices calculated above. This generates an adaptive adjustment amount that matches the spectral characteristics of the downstream graph.

[0067] Hint function Bottleneck structure design adopted: in Encoding function for weight information; The learnable weight matrix for the lower projection layer. This is the learnable weight matrix of the upper projection layer; Let be the dimension of the original filter. The low-level feature dimension of the bottleneck layer. To maintain training stability, Initialize to zero to ensure minimal initial perturbation to the pre-trained filter. It is a lightweight multilayer perceptron that incorporates scalar spectral characteristics. This is mapped to a d′-dimensional modulation vector. The adaptive adjustment is added to the original coefficients to obtain the cue-enhanced filter coefficients. Based on the enhanced filter coefficients, a cue-enhanced filter is constructed, and the feature matrix is ​​updated. Graph filtering is performed, and the filtering result is input into a multilayer perceptron to obtain the enhanced low-frequency and high-frequency features. : The enhanced low-frequency and high-frequency features are weighted and combined according to learnable fusion coefficients to form the final node embedding: in The learnable fusion coefficient is used to assess the relative contribution of sexual information.

[0068] In step S41, for downstream few-sample node classification tasks, an efficient spectral domain fine-tuning method based on a lightweight spectral characteristic adaptive adapter is proposed to optimize the parameters of the pre-trained spectrum sensing model. Specifically, the pre-trained spectrum sensing model is first frozen, retaining its Chebyshev polynomial order and corresponding filter coefficients. A spectral domain cue adapter is then introduced onto the frozen coefficients to achieve dynamic control of the spectral response. For downstream graph data, the ratio of the total graph variation of the node feature matrix to the square of the Frobenius norm is used to obtain the normalized high-frequency energy proportion through a Sigmoid function mapping, quantifying the heterogeneity of the graph structure. A larger high-frequency energy proportion indicates richer high-frequency components in the graph data, thus providing the spectral domain adapter with a downstream task-specific modulation signal to achieve spectral adaptation.

[0069] In this embodiment, the spectral cue adapter employs a bottleneck structure. It maps the input frozen Chebyshev coefficients to a low-dimensional space via downward projection, obtaining low-dimensional features. The high-frequency energy proportion is mapped to a modulation vector via a lightweight multilayer perceptron, multiplied element-wise with the low-dimensional features, and then restored to its original dimension via upward projection. This ensures that the generated adaptive adjustment accurately reflects the spectral characteristics of the downstream graph. To guarantee training stability, the parameters of the upward projection layer are initialized to zero, minimizing disturbance to the pre-trained filter in the initial stage and effectively preserving the general spectral knowledge learned by the pre-trained model. Subsequently, the adaptive adjustment is added to the original Chebyshev coefficients to obtain the cue-enhanced filter coefficients, which are applied to graph filtering in the low-frequency and high-frequency paths, respectively, generating cue-enhanced low-frequency and high-frequency features. Finally, learnable fusion coefficients are used to weightedly combine the enhanced dual-path features to form node embeddings. These fusion coefficients dynamically balance low-frequency homogeneous information and high-frequency heterogeneous information, enabling intelligent control of node representation between local common patterns and cross-regional differential features.

[0070] It should be noted that the pre-training and fine-tuning stages of this invention are not implemented independently, but rather achieve deep collaboration through a spectral characteristic consistency mechanism. Specifically, on the one hand, the spectrum-aware model skeleton trained through a node-level adaptive weighting mechanism in the pre-training stage carries the general spectral structure knowledge learned from multi-source graph data in its Chebyshev filter coefficients. This skeleton is fully preserved in the fine-tuning stage and serves as the initialization starting point for spectrum adaptation. On the other hand, the spectral characteristic adaptive adapter introduced in the fine-tuning stage is not designed independently. The filter adjustment amount it generates is based on the Chebyshev coefficients frozen in the pre-training stage and uses the spectral energy distribution law established in the pre-training stage as a priori. At the same time, the high-frequency energy ratio calculated from the downstream graph data drives the adapter to generate an adaptive adjustment amount that matches the downstream spectral characteristics, thereby achieving fine-grained adaptation of spectral characteristics while retaining the pre-training knowledge. Through the collaborative mechanism of learning the general spectral structure in pre-training and adaptive matching of fine-tuning based on downstream spectral characteristics, this invention constructs an effective transmission channel for spectral knowledge from pre-training to fine-tuning, enabling the model to maintain stable few-sample generalization ability on graph data with different levels of heterogeneity.

[0071] Compared to existing cue fine-tuning methods and clustering enhancement methods, this invention deeply integrates spatial domain cueing with unsupervised clustering regularization during the fine-tuning stage, forming a collaborative fine-tuning paradigm of spectrum-aware cueing + clustering structure constraints. Specifically, existing cue fine-tuning methods mainly design learnable cue vectors in the node feature space or graph structure space and superimpose them onto the node features or graph embeddings. These methods cannot explicitly modulate the spectral components already learned by the pre-trained model, causing the low-frequency commonalities and high-frequency differences established during the pre-training stage to be destroyed or weakened during the fine-tuning process. While existing clustering enhancement methods can optimize the global structure of node embeddings through clustering constraints, their clustering objectives are independent of the spectral characteristics of the pre-trained model and do not form a synergy with the spectrum-aware module.

[0072] To address the shortcomings of this technology, this invention directly applies the cueing mechanism to the Chebyshev filter coefficients of the pre-trained model. By introducing a spectral characteristic adaptive adapter for the bottleneck structure, it uses the high-frequency energy proportion calculated from downstream graph data as the modulation signal to generate an adaptive adjustment amount that matches the downstream spectral characteristics to correct the filter coefficients. This allows the cueing enhancement to directly affect the spectral response level rather than the node feature level, fundamentally avoiding the destruction of pre-trained spectral knowledge by spatial domain cueing. Furthermore, this invention introduces clustering target constraints. However, unlike traditional clustering enhancement methods, these constraints do not operate independently but rather apply to the node embeddings after spectral domain cueing enhancement. KL divergence optimization ensures that the node embeddings remain compact and consistent in the spectral feature space. Simultaneously, the category semantic information of the limited number of labeled nodes is indirectly transferred to unlabeled nodes through learnable cluster centers, achieving structured diffusion of the supervisory signal in the spectral space. Through this functional coupling design of spectral domain cueing and clustering constraints, this invention enables the general spectral knowledge learned in the pre-training stage to be accurately preserved in the fine-tuning stage and adaptively adjusted according to the characteristics of downstream data. At the same time, the clustering constraint, as a structural regularization term, not only enhances the global consistency of node embedding, but also diffuses the classification supervision signal from sparse labeled nodes to a large number of unlabeled nodes. This forms a technical effect of mutual support and synergistic optimization between cue adaptation and clustering enhancement, which significantly improves the generalization ability and recognition accuracy in low-sample scenarios.

[0073] S42: To enhance the consistency of the spectral structure of node embeddings, a clustering objective constraint is introduced. Unlike existing clustering methods, the clustering objective of this invention directly acts on the spectral feature space, that is, the node embedding Z obtained in step S41, which integrates low-frequency common information and high-frequency difference information, already contains the spectral characteristic distribution of nodes in the graph structure. Let... Embedded for the final node, These serve as learnable cluster centers. First, nodes are computed using the Student-t distribution. Cluster centers soft assignment probability : in, For nodes Embedded representation; Temperature parameters control the smoothness of the distribution. The number of categories. To enhance cluster discriminative power, a sharpened target distribution probability is constructed. : The clustering loss is optimized by minimizing the KL divergence between the target distribution and the soft-assigned distribution. : in, This represents the number of categories. This clustering objective does not require additional supervision signals; it serves only as a structural regularization term to promote the compactness and consistency of the node embedding space.

[0074] In step S42, the node embeddings are structurally regularized by introducing clustering objective constraints, achieving collaborative optimization between labeled and unlabeled nodes. Specifically, based on the final node embeddings obtained in step S41, a set of learnable cluster centers is initialized. The Student-t distribution is used to calculate the value of each node. For each cluster center soft assignment probability Temperature parameters Control the smoothness of the distribution. To enhance cluster discriminative power, construct a sharpened target distribution. By normalizing the soft assignment probabilities to the square, high-confidence assignments are strengthened while low-confidence assignments are suppressed. Subsequently, the target distribution is minimized. With soft allocation distribution The KL divergence between them yields the clustering loss. .

[0075] Compared to existing clustering enhancement methods, this invention deeply couples clustering constraints with the spectral feature space in its clustering objective design, rather than operating them as isolated regularization terms. Existing methods typically act directly on the original node features, and their clustering objectives are unrelated to the spectral characteristics of the pre-trained model, failing to utilize learned low-frequency commonalities and high-frequency differences to guide the clustering process. To address these shortcomings, the clustering objective introduced in step S42 of this invention targets the node embedding Z, which has been enhanced with spectral domain cueing. This embedding fully carries the spectral characteristic distribution of the nodes. Based on this, a clustering loss is constructed through learnable cluster centers, soft assignment of the Student-t distribution, and sharpening of the target distribution. Essentially, guided by spectral structural similarity, it clusters nodes with similar spectral characteristics in the feature space. This design achieves deep coupling between the clustering objective and the spectral feature space. Specifically, clustering constraints force nodes with similar spectral distribution patterns closer together, enhancing the separability of spectral features. Learnable cluster centers obtain category semantic supervision from labeled nodes through classification loss, which indirectly propagates this semantic information to unlabeled nodes, realizing the structured diffusion of supervision signals in the spectral space. Compared to existing methods, this invention functionally couples the clustering objective with spectral-aware pre-training and spectral-domain cueing fine-tuning, enabling the model to efficiently transfer category semantic information from labeled nodes to unlabeled nodes based on spectral similarity under limited sample conditions, significantly improving the utilization efficiency of sparse supervision signals.

[0076] S43: Calculate the total downstream loss function The overall optimization objective of the downstream task is composed of a weighted average of the supervised classification loss and the clustering loss, expressed as: in, The cross-entropy classification loss is used for labeled nodes after passing through the MLP classification head. This is a balance coefficient used to adjust the strength of cluster regularization.

[0077] Throughout the fine-tuning process, only the prompt adapter parameters are updated. Cluster Center Along with the classification head parameters, the pre-trained spectrum-aware skeleton remains frozen. This design achieves efficient and stable few-shot adaptation, effectively preserving the spectrum knowledge learned by the pre-trained model while introducing a small number of trainable parameters.

[0078] The clustering objective and classification loss work synergistically to optimize the node embedding space. (Classification loss) Based on the category supervision signals provided by labeled nodes, the category semantic information is transmitted to the corresponding clustering center. This ensures that each cluster center carries the typical characteristics of the k-th class node; clustering loss Guided by the classification loss, this category semantic information is propagated to unlabeled nodes, prompting them to cluster towards cluster centers with similar features and specific category semantics. Through this mechanism, labeled nodes clearly identify their category affiliation, unlabeled nodes are attracted to the vicinity of labeled nodes of the same category, and different categories of node groups are naturally separated in the feature space.

[0079] Through the synergistic optimization of classification and clustering losses, the node embedding space, guided by both types of losses, forms a discriminative feature distribution where nodes of the same type are tightly clustered and nodes of different types are separated. Specifically, the classification loss ensures that the embedding of labeled nodes accurately maps to the corresponding category region, while the clustering loss groups labeled and unlabeled nodes of the same category together in the embedding space, allowing unlabeled nodes to obtain reliable category assignments. Ultimately, for any input node, the embedding representation obtained after pre-training, adaptive adaptation of spectral characteristics, and clustering enhancement fine-tuning can effectively distinguish different fraud types and output accurate fraud type identification results through the classifier, thereby achieving accurate detection and identification of multiple fraud behaviors under limited sample conditions. This invention significantly reduces the dependence on labeled data; under limited sample conditions, only 5 to 10 labeled samples are needed per category, while still maintaining a high fraud identification accuracy, providing an efficient and transferable technical solution for fraud prevention in complex financial scenarios.

[0080] The present invention also provides a computer program product, comprising a computer program that, when executed by a processor, implements the steps of the spectrum adaptive prompting few-sample fraud type identification method formed by any or a combination of the above examples. The processor may be a single-core or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.

[0081] The present invention also provides a storage medium having the same inventive concept as the spectrum-adaptive prompting few-sample fraud type identification method formed by any or more of the above examples, wherein computer instructions are stored thereon, which, when executed, perform the steps of the spectrum-adaptive prompting few-sample fraud type identification method formed by any or more of the above examples.

[0082] Based on this understanding, the technical solution of this embodiment, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0083] This invention also provides a terminal that shares the same inventive concept as any or a combination of examples corresponding to the aforementioned spectrum-adaptive prompting method for identifying few-sample fraud types. The terminal includes a memory and a processor. The memory stores computer instructions executable on the processor. When the processor executes the computer instructions, it performs the steps of the aforementioned spectrum-adaptive prompting method for identifying few-sample fraud types. The processor may be a single-core or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement this invention.

[0084] In one example, the terminal, i.e., the electronic device, is represented in the form of a general-purpose computing device. The components of the electronic device may include, but are not limited to: at least one processing unit (processor) mentioned above, at least one storage unit mentioned above, and a bus connecting different system components (including storage units and processing units).

[0085] The storage unit stores program code that can be executed by the processing unit, causing the processing unit to perform the steps described in the "Exemplary Methods" section above, based on various exemplary embodiments of the present invention. For example, the processing unit can perform the aforementioned spectrum-adaptive cueing method for identifying few-sample fraud types.

[0086] The storage unit may include readable media in the form of volatile storage units, such as random access memory (RAM) and / or cache storage units, and may further include read-only memory (ROM).

[0087] The storage unit may also include a program / utility having a set (at least one) of program modules, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0088] A bus can represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus that uses any of the various bus structures.

[0089] The electronic device can also communicate with one or more external devices (e.g., keyboards, pointing devices, Bluetooth devices, etc.), one or more devices that enable a user to interact with the electronic device, and / or any device that enables the electronic device to communicate with one or more other computing devices (e.g., routers, modems, etc.). This communication can be performed via input / output (I / O) interfaces. Furthermore, the electronic device can communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and / or public networks, such as the Internet) via a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be understood that other hardware and / or software modules can be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0090] Through the above description, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solution according to this exemplary embodiment can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the method of the exemplary embodiment of this application.

[0091] The above detailed embodiments are a description of the present invention. It should not be considered that the specific embodiments of the present invention are limited to these descriptions. For those skilled in the art, several simple deductions and substitutions can be made without departing from the concept of the present invention, and all of these should be considered to fall within the protection scope of the present invention.

Claims

1. A method for identifying few-sample fraud types using spectral adaptive cues, characterized in that, Includes the following steps: The original feature matrix of the unlabeled graph data used for pre-training is sequentially subjected to dimension mapping, normalization, and smoothing to obtain the updated feature matrix; the adjacency matrix and the updated feature matrix are used to construct a first positive sample set and a first negative sample set of high-frequency features and a second positive sample set and a second negative sample set of low-frequency features for each node; The normalized Laplacian matrix is ​​calculated based on the adjacency matrix and the updated feature matrix. Complementary low-frequency and high-frequency filters are constructed using parameterized Chebyshev polynomials to form a Chebyshev model. Low-frequency and high-frequency features of each node are extracted based on the Chebyshev model. An adaptive band modulation mechanism based on node spectral energy distribution is used to construct node-level spectral weight coefficients. Band-aware contrastive learning is used to jointly optimize low-frequency and high-frequency features. The low-frequency contrastive loss function and the high-frequency contrastive loss function are adaptively weighted and fused through the node-level spectral weight coefficients to construct the total loss function. The Chebyshev model is trained using the total loss function until convergence to obtain the pre-trained spectrum-aware model. The original feature matrix of the downstream graph data related to fraud type identification is sequentially subjected to dimension mapping, normalization, and smoothing to obtain the downstream updated feature matrix; The pre-trained spectrum-aware model is frozen, and spectral characteristics are analyzed on the downstream graph data to calculate the proportion of high-frequency energy. A spectral characteristic adaptive adapter is introduced as a spectral domain cue. An adaptive adjustment is generated using the original filter coefficients and the proportion of high-frequency energy as inputs and added to the original filter coefficients to obtain the cue-enhanced filter coefficients. An enhanced filter is constructed using the enhanced filter coefficients. The downstream updated feature matrix is ​​convolved with the cue-enhanced filter to obtain optimized low-frequency and high-frequency features. These features are then weighted and combined according to learnable fusion coefficients to form node embeddings. Clustering target constraints are introduced to calculate the clustering loss, which is then weighted and fused with the classification loss to form the downstream total loss function. Only the cue adapter parameters, cluster centers, and classification head parameters are updated. The optimized low-frequency and high-frequency features are combined according to the fusion coefficient to obtain the final embedded representation of each node, which is then input into the classifier to output the fraud type identification result.

2. The few-sample fraud type identification method with spectral adaptive prompting according to claim 1, characterized in that, The construction of a first positive sample set and a first negative sample set for high-frequency features and a second positive sample set and a second negative sample set for low-frequency features for each node includes: For high-frequency features or downstream high-frequency features, the nodes with the highest similarity to the node in the feature space are used as the first positive sample set, and the nodes randomly selected that do not belong to the first positive sample set are used as the first negative sample set; for low-frequency features or downstream low-frequency features, the nodes of the node's first-order neighbors are used as the second positive sample set, and the nodes randomly selected that do not belong to the second positive sample set are used as the second negative sample set.

3. The method for identifying few-sample fraud types using spectral adaptive prompting according to claim 1, characterized in that, The construction of node-level spectral weighting coefficients includes: The L2 norm squared of the low-frequency eigenvector of the node is used as the low-frequency energy of the node, and the L2 norm squared of the high-frequency eigenvector of the node is used as the high-frequency energy of the node. The ratio of the low-frequency energy of the node to the sum of the low-frequency energy and the high-frequency energy is used as the low-frequency weight coefficient of the node, and the ratio of the high-frequency energy of the node to the sum of the low-frequency energy and the high-frequency energy is used as the high-frequency weight coefficient of the node.

4. The method for identifying few-sample fraud types using spectral adaptive prompting according to claim 1, characterized in that, The construction of the total loss function includes: The low-frequency contrastive loss function is calculated based on the low-frequency features and the second positive sample set of low-frequency features of each node, and a temperature coefficient is introduced to adjust the sharpness of the contrastive learning distribution. The high-frequency contrastive loss function is calculated based on the high-frequency features and the first positive sample set of the high-frequency features of each node, and a temperature coefficient is introduced to adjust the sharpness of the contrastive learning distribution. The total loss function is constructed by multiplying the low-frequency contrast loss of each node by the low-frequency weight coefficient of that node, and multiplying the high-frequency contrast loss of each node by the high-frequency weight coefficient of that node.

5. The few-sample fraud type identification method with spectral adaptive prompting according to claim 1, characterized in that, The step of performing spectral characteristic analysis on downstream graph data and calculating the proportion of high-frequency energy includes: Calculate the total variation of the graph signal in the downstream graph data, and calculate the ratio of the total variation of the graph signal to the square of the Frobenius norm of the feature matrix of the downstream graph data. Map the ratio to the interval between 0 and 1 to obtain the high-frequency energy ratio.

6. The method for identifying few-sample fraud types using spectrum adaptive prompting according to claim 1, characterized in that, The adaptive adapter for spectral characteristics adopts a bottleneck structure design, which encodes the original filter coefficients into low-dimensional features through downward projection, maps the high-frequency energy ratio into a modulation vector, and multiplies the modulation vector with the low-dimensional features element by element and then restores the original dimension through upward projection to generate an adaptive adjustment amount.

7. The method for identifying few-sample fraud types using spectral adaptive prompting according to claim 1, characterized in that, The calculation of clustering loss by introducing clustering objective constraints includes: Based on node embeddings and learnable cluster centers, the soft assignment probability of each node to each cluster center is calculated using the t-distribution to obtain the soft assignment distribution. The soft assignment probability is then squared normalized to construct a sharpened target distribution. The clustering loss is obtained by minimizing the KL divergence between the target distribution and the soft assignment distribution.

8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the few-sample fraud type identification method with spectral adaptive prompting as described in any one of claims 1-7.

9. A storage medium storing computer instructions thereon, characterized in that, When the computer instructions are executed, they perform the steps of the few-sample fraud type identification method with spectral adaptive prompting as described in any one of claims 1-7.

10. A terminal comprising a memory and a processor, wherein the memory stores computer instructions executable on the processor, characterized in that, When the processor executes the computer instructions, it performs the steps of the few-sample fraud type identification method with spectral adaptive prompting as described in any one of claims 1-7.