Self-supervised graph condensation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The self-supervised graph condensation technique addresses the reliance on costly labeled data by using embeddings, clusters, and pseudo-labels to create efficient, versatile condensed graphs suitable for diverse downstream tasks.

WO2026139970A1PCT designated stage Publication Date: 2026-07-02FUJITSU LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: FUJITSU LTD
Filing Date: 2025-07-02
Publication Date: 2026-07-02

Application Information

Patent Timeline

02 Jul 2025

Application

02 Jul 2026

Publication

WO2026139970A1

IPC: G06N3/08; G06N3/045; G06F16/901; G06N3/084

AI Tagging

Technology Topics

Algorithm Theoretical computer science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Motion action analysis system and motion action analysis method
CN116600861BComputer graphics (images)Algorithm
Information processing device, information processing method, and program
JP2026100894AMachine learning Knowledge based models Information processing Algorithm
Reducing Noise in Video Frames
US20260170619A1Image enhancement Image analysisNoise (video)Algorithm
Apparatus and method for automatically verifying quality data associated with at least one slide
JP2026101624AImage analysis Biological testing Computer hardware Algorithm
A sea surface weak target detection method based on residual network and hypersphere constraint
CN122260256ADistribution fitting is robustSolve the scarcity problemKernel methods Biological models Frequency spectrumSmall sample

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing graph condensation methods rely on supervised node-level information, which is costly, incomplete, or unavailable, leading to unreliable condensed graphs and compromised downstream performance, especially in large-scale or dynamic environments with noisy labels.

Method used

A self-supervised graph condensation technique that uses embeddings, clusters, and latent pseudo-labels to create condensed graphs without requiring explicit labels, utilizing a swapped prediction loss and similarity metric to update the condensed graphs, ensuring they retain essential structural and attribute properties.

Benefits of technology

The method efficiently reduces graph complexity while preserving key features, making it suitable for various downstream tasks like node classification and community detection, even in noisy or label-sparse environments, and is applicable in distributed data centers.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure IN2025050968_02072026_PF_FP_ABST

Patent Text Reader

Abstract

In an embodiment, operations include determining first condensed graphs and perturbed graphs, based on a received input graph. The operations include determining first embeddings for perturbed graphs and second embeddings for first condensed graphs. Clusters associated with the first embeddings are determined. Latent representations associated with the first condensed graphs are determined, based on the second embeddings. Swapped prediction loss associated with perturbed graphs is determined, based on the first embeddings and the clusters. Latent pseudo-labels associated with the perturbed graphs are determined, based on the swapped prediction loss. A similarity metric between the latent representations and the latent pseudo-labels is determined to update the first condensed graphs. The operations further include rendering of information related to the updated first condensed graph.

Need to check novelty before this filing date? Find Prior Art

Description

SELF-SUPERVISED GRAPH CONDENSATIONFIELD

[0001] The embodiments discussed in the present disclosure are related to self-supervised graph condensation.BACKGROUND

[0002] Graph neural networks (GNNs) have become prominent models for graph data learning, achieving significant success. However, real-world graph data often consists of vast numbers of nodes and edges, presenting diverse node attributes and complex structural connections. Modeling such extensive graphs poses substantial challenges in data storage and GNN model design, limiting their application in many industrial contexts. Designing GNN models typically involves repeated training to fine-tune hyper-parameters and develop optimal architectures. When large-scale graphs are used as training data, this iterative training process highly becomes computation-intensive and time-consuming due to trial and error. Graph condensation method offers a promising solution by reducing the size of graphs, thereby improving storage efficiency and accelerating training, visualization, and retrieval tasks for graph-related analysis. Traditionally, graph condensation techniques rely on supervised node-level information to encode the topology structure of the original graph into the synthetic node attributes of the condensed graph. However, obtaining correctly labelled data in large-scale or dynamic graph environments is often costly and time-consuming, with labels frequently incomplete or entirely unavailable. Additionally, graphs obtained from sensors can be affected by different environmental factors or mechanical failures, leading to noisy labels. As a result, existing supervised graph condensation methods become unreliable withinaccurately labelled condensed graphs, ultimately compromising downstream performance.

[0003] The subject matter claimed in the present disclosure is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described in the present disclosure may be practiced.

[0004] SUMMARY

[0005] According to an aspect of an embodiment, a method may include a set of operations which may include receiving an input graph associated with an application domain and determining a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph. The set of operations may further include determining a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs. The set of operations may further include determining a plurality of clusters associated with the plurality of first embeddings. The set of operations may further include determining a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. The set of operations may further include determining a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. The set of operations may further include determining a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss. The set of operations may further include determining a first similarity metric between the set of latent representations and the set of firstlatent pseudo-labels to update the plurality of first condensed graphs based on the first similarity metric. The set of operations may further include rendering information related to the updated plurality of first condensed graph.

[0006] The objects and advantages of the embodiments will be realized and achieved at least by the elements, features, and combinations particularly pointed out in the claims.

[0007] Both the foregoing general description and the following detailed description are given as examples and are explanatory and are not restrictive of the invention, as claimed.BRIEF DESCRIPTION OF THE DRAWINGS

[0008] Example embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

[0009] FIG. 1 is a diagram representing an example environment related to selfsupervised graph condensation;

[0010] FIG. 2 is a block diagram that illustrates an exemplary electronic device of FIG. 1 for self-supervised graph condensation;

[0011] FIG. 3 is a diagram that illustrates a pipeline diagram of an example method for self-supervised graph condensation;

[0012] FIG. 4A and FIG. 4B are diagrams that collectively illustrate an exemplary scenario for graph condensation and graph neural network training for a downstream task based on the graph condensation;

[0013] FIG. 5A and FIG. 5B are diagrams that collectively illustrate an exemplary scenario for use of condensed graphs to fine-tune graph neural network model for downstream tasks;

[0014] FIG.6 is a diagram that illustrates an exemplary scenario for determination of latent representations of an original graph and update of condensed graphs;

[0015] FIG. 7 is a diagram that illustrates an exemplary scenario for producing condensed graphs and latent pseudo-labels for self-supervised graph condensation;

[0016] FIG. 8 is a diagram that illustrates an exemplary scenario for training of a GNN model for downstream tasks, based on the condensed graphs and latent pseudo labels;

[0017] FIG. 9 is a diagram that illustrates a flowchart of an exemplary method for creating the condensed graphs for received input graphs;

[0018] FIG. 10 is a diagram that illustrates a flowchart for an exemplary method for training a GNN model for a downstream task; and

[0019] FIG. 11 is a diagram that illustrates a flowchart for an exemplary method of self-supervised graph condensation, all according to at least one embodiment described in the present disclosure.DESCRIPTION OF EMBODIMENTS

[0020] Some embodiments described in the present disclosure may relate to methods and electronic devices for self-supervised graph condensation. In the present disclosure, input graph associated with an application domain (for example, but not limited to, financial domain, social network domain, and recommendation system domain, medical domain) may be received. A plurality of first condensed graphs and a plurality of perturbed graphs may be determined based on the input graph. A plurality of first embeddings may be determined for the plurality of perturbed graphs and set of second embeddings for the plurality of first condensed graphs. A plurality of clusters may be determined associated with the plurality of firstembeddings. A set of latent representations associated with the plurality of first condensed graphs may be determined, based on the set of second embeddings. A swapped prediction loss associated with the plurality of perturbated graphs may be determined, based on the plurality of first embeddings and the plurality of clusters. A set of first latent pseudo-labels associated with the plurality of perturbed graphs may be determined, based on the swapped prediction loss. A first similarity metric between the set of latent representations and the set of first latent pseudo-labels may be determined. The plurality of first condensed graphs may be updated based on the first similarity metric. Information related to the updated plurality of first condensed graphs may be rendered.

[0021] Existing graph condensation methods may be used in the field of graph representation learning to reduce the complexity of large graphs while preserving their essential structural and attribute properties. The graph condensation methods may rely on supervised information and training for specific downstream tasks. The graph condensation may focus on creating a condensed version of the graph that can be used for various applications without prior task-specific training. The process involves identifying and merging similar nodes and edges based on their structural and attribute similarities, effectively creating a smaller, more manageable graph that retains the key characteristics of the original graph. The condensed graphs may then be used for a range of downstream tasks such as node classification, link prediction, and community detection. However, the traditional graph condensation methods may be supervised and may require node / graph level labelling information to condense large graphs. The supervision requirement for existing graph condensationmethods may restrict their generalization in noisy environments for different downstream tasks.

[0022] To address these challenges, a task-agnostic Self-Supervised Graph Condensation (SSGC) technique that efficiently condenses an input graph within a small memory budget without requiring any node-level label information is proposed in this disclosure. This technique may be particularly useful in scenarios where input graphs arrive at different timestamps, making it impractical to store or process historical graphs jointly with new incoming graphs, or when input graphs are distributed across different data centers, preventing joint training using all graphs. The SSGC method may involve producing a condensed graph from a large input graph using back-propagation by matching the set of first latent pseudo-labels learned from the original graph. For downstream tasks, a predictive graph neural network (GNN) model may be trained using the condensed graph, followed by fine-tuning with limited supervision corresponding to the specific tasks.

[0023] The technological field of graph condensation may be improved by configuring an electronic device to perform self-supervised graph condensation. The electronic device may receive an input graph associated with an application domain and determine a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph. The electronic device may determine a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs. The electronic device may determine a plurality of clusters associated with the plurality of first embeddings. The electronic device may determine a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. Theelectronic device may determine a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. The electronic device may determine a set of first latent pseudolabels associated with the plurality of perturbed graphs, based on the swapped prediction loss. Further, the electronic device may determine a first similarity metric between the set of latent representations and the set of first latent pseudo-labels. The electronic device may update the plurality of first condensed graphs based on the first similarity metric and render information related to the updated plurality of first condensed graphs.

[0024] The disclosed approach may offer several advantages. The condensed graph obtained based on the disclosed approach may retain essential structural and attribute properties, making the condensed graph applicable to a variety of downstream tasks such as node classification, link prediction, and community detection, without the need for task-specific training. Since graph condensation does not rely on labelled data, it is particularly useful in environments where data is noisy, or labels are sparse or unavailable. This makes it suitable for distributed data centers where data quality and labelling can vary significantly. The electronic device may receive an input graph associated with the application domain. The electronic device then determines a plurality of first condensed graphs and perturbed graphs based on the input graph. This reduces the complexity of the original graph while preserving its essential features. The electronic device may compute embeddings for both the perturbed graphs and the condensed graphs. Such embeddings may capture the structural and attribute information of the graphs in a lower-dimensional space. The electronic device may determine clusters based on the embeddings of the perturbedgraphs, which helps in identifying similar nodes and structures within the graph. The electronic device may calculate latent representations for the condensed graphs based on their embeddings. The disclosed approach also provides for the determination of the swapped prediction loss for the perturbed graphs, which are used to generate the set of first latent pseudo-labels. These pseudo-labels may act as a form of weak supervision, guiding the learning process without requiring explicit labels. The electronic device may compute a similarity metric between the latent representations and the pseudo-labels. The similarity metric may help in refining the condensed graphs to better capture the underlying structure of the graph (for example, the input graph). The condensed graphs may then be updated based on this similarity metric. Finally, the electronic device may render the information related to the updated condensed graphs, making them available for various downstream tasks. The described approach effectively utilizes the advantages of graph condensation by generalizing the condensed graphs for any downstream tasks and making it suitable for distributed data centers with noisy or no labels. By creating embeddings, clusters, and latent representations, and using pseudo-labels for weak supervision, the method ensures that the condensed graphs retain essential properties of the original graph, enabling efficient and versatile graph representation learning.

[0025] Embodiments of the present disclosure are explained with reference to the accompanying drawings.

[0026] FIG. 1 is a diagram representing an example environment related to selfsupervised graph condensation, arranged in accordance with at least one embodiment described in the present disclosure. With reference to FIG. 1, there is shown an environment 100. The environment 100 may include an electronic device102, a graph neural network (GNN) model 104, a server 106, a database 108, a communication network 110, a display device 112, and an input graph 114. The server 106 may host the database 108. Further, the electronic device 102 may be communicatively coupled to the server 106, via the communication network 110. The electronic device 102 may include the GNN model 104, the display device 112, and the input graph 114. Though not shown in FIG. 1, in another embodiment, the database 108 may include the input graph 114.

[0027] The electronic device 102 may include suitable logic, circuitry, interfaces and / or code that may be configured to receive the input graph 114 associated with an application domain. The electronic device 102 may determine a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph 114. The electronic device 102 may further determine a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs. Also, the electronic device 102 may determine a plurality of clusters associated with the plurality of first embeddings and may further determine a set of latent representations associated with the plurality of first condensed graphs (based on the set of second embeddings). The electronic device 102 may determine a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. The electronic device 102 may determine a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss and may further determine a first similarity metric between the set of latent representations and the set of first latent pseudo-labels. The electronic device 102 may update the plurality of first condensed graphs based on the first similarity metric and render information relatedto the updated plurality of first condensed graphs. In an embodiment, the electronic device 102 may control the display device 112. The display device 112 may be communicatively coupled to the electronic device 102 or may be a standalone device. In an embodiment, the electronic device 102 may render the information on the display device 112. Examples of the electronic device 102 may include, but may not be limited to, a computing device, a smartphone, a mainframe machine, a server, a consumer electronic (CE) device, a computer workstation, and / or a device with a graph-processing capability (such as, a device with a set of graphic processor units (GPU)).

[0028] The server 106 may include logic, circuitry, interfaces, and / or code configured to store an input graph (e.g., the input graph 114) associated with a certain application domain. In an example, the server 106 may store the input graph 114 on the database 108. In some embodiments, the server 106 may also store the GNN model 104 and / or a test graph on the database 108. The server 106 may be configured to retrieve data (for example, the input graph 114, the GNN model 104, and / or the test graph) from the database 108 and transmit the retrieved data to the electronic device 102.

[0029] The server 106 may be implemented as a cloud server and may execute operations through web applications, cloud applications, hypertext transport protocol (HTTP) requests, repository operations, file transfer, and the like. Other example implementations of the server 106 may include, but are not limited to, a database server, a file server, a web server, a media server, an application server, a mainframe server, a cloud computing server, and / or any device with a graphprocessing capability (such as, a device with a set of graphic processor units (GPU)).

[0030] In at least one embodiment, the server 106 may be implemented as a plurality of distributed cloud-based resources by use of several technologies that are well known to those ordinarily skilled in the art. In certain embodiments, the functionalities of the server 106 may be incorporated in its entirety or at least partially in the electronic device 102, without a departure from the scope of the disclosure.

[0031] The database 108 may include suitable logic, circuitry, interfaces, and / or code that may be configured to store graph data. For example, the graph data stored on the database 108 may include the input graph 114 and / or the test graph. The database 108 may further store the GNN model 104. The database 108 may be derived from data off a relational or non-relational database, or a set of comma-separated values (csv) files in a conventional storage or a big-data storage. The database 108 may be stored or cached on a device, such as, the server 106 or the electronic device 102. The device storing the database 108 may be configured to receive a query for the graph data or the GNN model 104. In response, the device storing the database 108 may be configured to retrieve and transmit the graph data or the GNN model 104 to the electronic device 102.

[0032] In accordance with an embodiment, the database 108 may be hosted on a plurality of servers stored at same or different locations. The operations of the database 108 may be executed using hardware including a processor, a microprocessor (for example, to perform or control performance of one or more operations), a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). In some other instances, the database 108 may be implemented using software.

[0033] A person with ordinary skill in the art will understand that the scope of the disclosure may not be limited to the implementation of the database 108 and the server 106 (or the electronic device 102) as two separate entities. In certain embodiments, the functionalities of the database 108 can be incorporated in its entirety or at least partially in the server 106 (or the electronic device 102), without a departure from the scope of the disclosure.

[0034] The communication network 110 may include various communication media through which the electronic device 102 may communicate with the server 106 or devices storing the input graph 114. Examples of the communication network 110 may include, but are not limited to, the Internet, a cloud network, a Wireless Fidelity (Wi-Fi) network, a Personal Area Network (PAN), a Local Area Network (LAN), a cellular network (such as, a Long-term evolution (or 4G) cellular network or a 5G cellular network), a satellite network (such as a network of low earth orbit satellites), and / or a Metropolitan Area Network (MAN)). Various devices in the environment 100 may connect to the communication network 110 using various wired and wireless communication protocols, including TCP / IP, UDP, HTTP, FTP, ZigBee, EDGE, IEEE 802.11, Li-Fi, IEEE 802.16, multi-hop communication, wireless access point (AP), device-to-device communication, cellular communication protocols, and Bluetooth.

[0035] The display device 112 may include logic, circuitry, interfaces, and / or code configured to display information, such as, the information related to the updated plurality of first condensed graph, information related to a total loss, and a prediction result for a downstream task. The display device 112 may be a touch screen, which may enable a user to provide a user-input via the display device 112. The touchscreen may be at least one of a resistive touch screen, a capacitive touch screen, or a thermal touch screen. The display device 112 may be realized through several known technologies such as, but not limited to, at least one of a Liquid Crystal Display (LCD) display, a Light Emitting Diode (LED) display, a plasma display, or an Organic LED (OLED) display technology, or other display devices. In accordance with an embodiment, the display device 112 may refer to a display screen of a head mounted device (HMD), a smart-glass device, a see-through display, a projection-based display, an electro-chromic display, or a transparent display.

[0036] The GNN model 104 may include suitable logic, circuitry, interfaces, and / or code that may configured to classify or analyze input graph data to generate an output result for a particular real-time application. For example, a trained GNN model may recognize different nodes in the input graph data, and edges between each node in the input graph data. The edges may correspond to different connections or relationship between each node in the input graph data. Based on the recognized nodes and edges, the trained GNN model may classify different nodes within the input graph data, into different labels or classes. In an example, a particular node of the input graph data may include a set of features (e.g., a node feature matrix) associated therewith. The set of features may include, but are not limited to, a matrix indicative of the incoming and outgoing edges and respective edge weights for a particular node. Further, each edge may connect with different nodes having similar set of features. The electronic device 102 may be configured to encode the set of features to generate a feature vector using the GNN model 104. After the encoding, information may be passed between the particular node and the neighboring nodes connected through the edges. Based on the information passed to the neighboringnodes, a final vector may be generated for each node. Such final vector may include information associated with the set of features for the particular node as well as the neighboring nodes, thereby providing reliable and accurate information associated with the particular node. As a result, the GNN model 104 may analyze the information represented as the input graph data.

[0037] The analysis of the input graph data by the GNN model 104 may include various stages, such as, but not limited to, data preparation, graph representation, message passing, node representation update, and graph-level prediction. An output of the analysis may be pseudo-labelled condensed graphs, latent node representations, and the like. For instance, when training the GNN model 104 using graphs from multiple distributed sources or time-evolving contexts (e.g., continual or dynamic graph learning), label distributions can vary significantly. Examples include differing criteria for loan approvals in financial sectors or varying definitions of anomalies in a time-evolving social network. The GNN model 104 may determine condensed graphs (for example, the plurality of first condensed graphs) and the plurality of perturbed graphs based on the input graphs (such as, the input graph 114). The GNN model 104 may also determine embeddings (for example, the plurality of first embeddings) for the perturbed graphs and embeddings (for example, the set of second embeddings) for the plurality of first condensed graphs. The GNN model 104 may be implemented using hardware including a processor, a microprocessor (e.g., to perform or control performance of one or more operations), a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). In some other instances, the GNN model 104 may be a code, a program, or setof software instruction. The GNN model 104 may be implemented using a combination of hardware and software.

[0038] In some embodiments, the GNN model 104 may correspond to multiple classification layers for classification of different nodes in the input graph data, where each successive layer may use an output of a previous layer as input. Each classification layer may be associated with a plurality of edges, each of which may be further associated with a plurality of weights. During training, the GNN model 104 may be configured to filter or remove the edges, or the nodes based on the input graph data and further provide an output result (i.e. a graph representation) of the GNN model 104. Examples of the GNN model 104 may include, but are not limited to, a graph convolution network (GCN), a Graph Spatial-Temporal Networks with GCN, a recurrent neural network (RNN), a deep Bayesian neural network, and / or a combination of such networks.

[0039] In operation, the electronic device 102 may receive the input graph 114 that may be associated with an application domain. The input graph 114 may include nodes and relationships between the nodes. To process the input graph 114, the electronic device 102 may perform various operations such as, but not limited to, determination of a node feature matrix. The node feature matrix may capture features or attributes of each node in the input graph 114. For instance, in a social network graph, the node feature matrix may include information such as age, gender, and interests of each user. The graphs may be a highly versatile data structures that may be prevalent in various real-world applications, such as social networks, bioinformatics, and knowledge graphs. These applications may often involve graphs with millions of nodes and edges, characterized by diverse node attributes andcomplex structural connections. The reception of the input graph is described further, for example, in FIG. 3.

[0040] The electronic device 102 may determine a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph 114. One or more graph reduction methods may be used to efficiently accumulate knowledge from multiple sources by reducing graph size, thereby facilitating effective GNN model training. The graph reduction methods may be, for instance, graph sparsification, graph coarsening, graph sketching, and graph condensation. The graph sparsification may select a subset from existing nodes or edges from the input graph 114. The graph coarsening may learn a surjective mapping from the original to a coarse graph by merging multiple nodes into super-nodes. The graph sketching summarizes an original graph (or the input graph 114) into a compact representation while preserving key structural properties, often through node and edge selection based on the criteria like centrality or random sampling, and further compression. The graph reduction methods may be unsupervised or only require limited supervision. However, the following limitations of these methods may restrict their performance for real-world applications. The graph sparsification may become less effective in size reduction when nodes are associated with the attributes. The graph reduction methods may focus on preserving certain graph properties within the small graph that may not be optimal for downstream GNN tasks. In practice, these methods may typically produce significantly lower performance than the existing graph condensation methods. The perturbed graphs may correspond to altered or modified graphs. The altered or modified graphs may include addition or removal of vertices or edges within the input graph, or adjustments to weights of edges in aweighted graph corresponding to the input graph 114. The determination of the plurality of first condensed graphs and the plurality of perturbed graphs are described further, for example, in FIG. 3.

[0041] The electronic device 102 may determine a plurality of embeddings (for example, the plurality of first embeddings) for the plurality of perturbed graphs and a set of embeddings (for example, the set of second embeddings) for the plurality of first condensed graphs. The determination of the plurality of embeddings is described further, for example, in FIG. 3.

[0042] The electronic device 102 may determine the plurality of clusters associated with the plurality of first embeddings. A cluster type may be determined for each of the plurality of cluster nodes. The type of cluster may be assigned to each embedding node of the plurality of embedding nodes, within the plurality of first embeddings or the set of second embeddings. The electronic device 102 may determine a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and a type of cluster node assigned to a plurality of embedding nodes (within the plurality of first embeddings or the set of second embeddings). The electronic device 102 may determine a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. The set of latent representations may be determined based on Dirac delta functions. The determination of the plurality of clusters and the determination of the set of latent representations are described further, for example, in FIG. 3.

[0043] The electronic device 102 may determine a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of firstembeddings and the plurality of clusters. For a given node, a clustering loss may be designed to bring the node’s embeddings closer irrespective of node augmentations by predicting their assigned pseudo-labels while repealing the other pseudo-labels. The determination of the swapped prediction loss is described in detail, for example, in FIG. 3 and FIG. 7.

[0044] The electronic device 102 may determine a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss. Given the input graph 114, a pseudo-labelling technique may be used to assign the same pseudo-labels to similar nodes based on their representations in a latent embedding space. By clustering in this latent space, latent properties of the input graph 114 may be efficiently captured. The input graph 114 may be updated to map augmented nodes to pseudo-labels, followed by an adjustment to the pseudo-labels and the underlying GNN model 104 (where the GNN model 104 may be configured to predict pseudo-labels for augmented node embeddings using the embeddings of other augmented nodes). The determination of the set of first latent pseudo-labels is described further, for example, in FIG. 3.

[0045] The electronic device 102 may determine a first similarity metric between the set of latent representations and the set of first latent pseudo-labels. The electronic device 102 may update the plurality of first condensed graphs based on the first similarity metric. The electronic device 102 may render information related to the updated plurality of first condensed graphs. The determination of the first similarity metric, the update of the plurality of first condensed graphs, and the rendering of the information are described further, for example, in FIG. 3.

[0046] Modifications, additions, or omissions may be made to FIG. 1 without departing from the scope of the present disclosure. For example, the environment 100 may include more or fewer elements than those illustrated and described in the present disclosure. For instance, in some embodiments, the environment 100 may include the electronic device 102 but not the database 108 and / or the server 106. In addition, in some embodiments, the functionality of each of the database 108 and / or the server 106 may be incorporated into the electronic device 102, without a deviation from the scope of the disclosure.

[0047] FIG. 2 is a block diagram that illustrates an exemplary electronic device of FIG.l for self-supervised graph condensation, arranged in accordance with at least one embodiment described in the present disclosure. FIG. 2 is explained in conjunction with elements from FIG. 1. With reference to FIG. 2, there is shown a block diagram 200 of the electronic device 102. The electronic device 102 may include a processor 202, a memory 204, the GNN model 104, an input / output (I / O) device 206, and a network interface 208. The I / O device 206 may include the display device 112. The memory 204 may include the input graph 114, condensed graphs 204A, and perturbed graphs 204B.

[0048] The processor 202 may include suitable logic, circuitry, interfaces, and / or code that may be configured to execute program instructions associated with different operations to be executed by the electronic device 102. The operations may include, but are not limited to, input graph reception, first condensed graph and perturbed graph determination, first embeddings determination, clusters determination, latent representations determination, swapped prediction loss determination, first latent pseudo-labels determination, first similarity metricdetermination, first condensed graphs update, and rendering information. The processor 202 may include any suitable special-purpose or general-purpose computer, computing entity, or processing device, including various computer hardware or software modules, and may be configured to execute instructions stored on any applicable computer-readable storage media. For example, the processor 202 may include a microprocessor, a microcontroller, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a Field-Programmable Gate Array (FPGA), or any other digital or analog circuitry configured to interpret and / or to execute program instructions and / or to process data.

[0049] In some embodiments, the processor 202 may be configured to interpret and / or execute program instructions and / or process data stored in the memory 204. In some embodiments, the processor 202 may fetch program instructions from the GNN model 104 and load the program instructions in the memory 204. After the program instructions are loaded into memory 204, the processor 202 may execute the program instructions. Some of the examples of the processor 202 may be a Graphical Processing Unit (GPU), a Central Processing Unit (CPU), a Reduced Instruction Set Computer (RISC) processor, an Application-Specific Integrated Circuit (ASIC) processor, a Complex Instruction Set Computer (CISC) processor, a coprocessor, and / or a combination thereof.

[0050] The memory 204 may include suitable logic, circuitry, interfaces, and / or code that may be configured to store program instructions executable by the processor 202. In certain embodiments, the memory 204 may be configured to store information, such as, but not limited to, the input graph 114, the condensed graphs 204A, and the perturbed graphs 204B. The memory 204 may further storeinformation related to the updated plurality of first condensed graphs, information related to a determined total loss, and prediction results for a downstream task.

[0051] By way of example, and not limitation, such computer-readable storage media may include tangible or non-transitory computer-readable storage media, including Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or flash memory devices (e.g., solid state memory devices). The computer-readable storage may also include any other storage medium which may be used to carry or store particular program code in the form of computerexecutable instructions or data structures, and which may be accessed by a general-purpose or special-purpose computer. Combinations of the above may also be included within the scope of computer-readable storage media. Computerexecutable instructions may include, for example, instructions and data configured to cause the processor 202 to perform a certain operation or group of operations associated with the electronic device 102.

[0052] The I / O device 206 may include suitable logic, circuitry, interfaces, and / or code that may be configured to receive a user input. For example, the user input may indicate a selection of (or instructions to create) the input graph 114. The I / O device 206 may be further configured to provide an output in response to the user input. For example, the output may correspond to the information related to the updated condensed graph, determined total loss, and prediction results. The I / O device 206 may include various input and output devices, which may be configured to communicate with the processor 202 and other components, such as the networkinterface 208. Examples of the input devices may include, but are not limited to, a touch screen, a keyboard, a mouse, a joystick, and / or a microphone. Examples of the output devices may include, but are not limited to, a display device 112 and a speaker. The I / O device 206 may be configured within the electronic device 102 or outside of the electronic device 102.

[0053] The network interface 208 may communicate via wireless communication with networks, such as the Internet, an Intranet, and / or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and / or a metropolitan area network (MAN). The wireless communication may use any of a plurality of communication standards, protocols and technologies, such as Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), wideband code division multiple access (W-CDMA), Long Term Evolution (LTE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (such as IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and / or IEEE 802.11n), voice over Internet Protocol (VoIP), light fidelity (Li-Fi), or Wi-MAX.

[0054] In certain embodiments, the electronic device 102 may include the display device 112, the server 106, and the database 108. Modifications, additions, or omissions may be made to the electronic device 102, without departing from the scope of the present disclosure. For example, in some embodiments, the electronic device 102 may include any number of other components that may not be explicitly illustrated or described. Operations of the processor 202 for self-supervised graph condensation are described in detail, for example, in FIG. 3, FIG. 4A, FIG. 4B, FIG. 5A, FIG. 5B, FIG. 6, FIG. 7, FIG. 8, FIG. 9, FIG. 10, and FIG. 11.

[0055] FIG. 3 is a diagram that illustrates a pipeline diagram of an example method for self-supervised graph condensation, in accordance with an embodiment of the disclosure. FIG. 3 is described in conjunction with elements from FIG. 1 and FIG. 2. With reference to FIG. 3, an exemplary execution pipeline 300 is shown. The execution pipeline 300 may include a set of operations that may be executed by one or more components of FIG. 1, such as the electronic device 102. The operations may include, but are not limited to, input graph reception, first condensed graph and perturbed graph determination, first embeddings determination, clusters determination, latent representations determination, swapped prediction loss determination, first latent pseudo-labels determination, first similarity metric determination, first condensed graphs update and rendering information. The electronic device 102 may perform the set of operations for self-supervised graph condensation.

[0056] At 302, an operation of input graph reception may be performed. The electronic device 102 may be configured to receive an input graph (for example, an original graph, such as the input graph 114) associated with an application domain. Examples of the application domain may include, but are not limited to, financial domain, social network domain, recommendation system domain, biological domain, bio-informatics domain, chemistry domain, bio-chemistry domain, material science domain, or citation network domain. In an embodiment, the electronic device 102 may receive the input graph 114 from the database 108, via the server 106. The received input graph 114 may be stored on the memory 204. In another embodiment, the input graph 114 may be pre-stored in the memory 204 and retrieved from the memory 204 for further processing.

[0057] At 304A and 304B, an operation of determination of perturbed graphs (e.g., “Gl” and “G2”) may be performed. The electronic device 102 may be configured to determine the perturbed graphs “Gl” and “G2”, for example. The perturbed graphs may be obtained by making modifications to a structure of the original graph (or the input graph 114). The modification may include addition or removal of edges or nodes or alteration of the weights of edges of the input graph 114. Exemplary methods to determine perturbed graphs may include, but are not limited to, edge perturbation, node perturbation, weight perturbation, and subgraph extraction. The perturbations may be made to evaluate a robustness and reliability of GNNs, especially in scenarios where the graph data is noisy or subject to adversarial attacks.

[0058] At 306A and 306B, an operation of determination of first embeddings for the perturbed graphs may be performed. The electronic device 102 may be configured to determine a plurality of embeddings (for example, the plurality of first embeddings, such as, “Zl” and “Z2”) for the plurality of perturbed graphs and a set of second embeddings for condensed graphs (for example, the plurality of first condensed graphs). The input graph may be trained using the GNN model 104. The determination of the plurality of first embeddings may involve passing the plurality of perturbed graphs (e.g., “Gl” and “G2”) through GNN layers, which may aggregate and transform node features based on the graph structure. Once the GNN model 104 is trained, the trained GNN model 104 may be used to generate embeddings for the nodes in the plurality of perturbed graphs. Typically, the embeddings may be extracted from the last layer of the GNN model 104 before final classification or regression step. The embeddings may be used for various downstream tasks, such as, node classification, link prediction, and the like.

[0059] The electronic device 102 may be further configured to determine a plurality of first condensed graphs (such as, the condensed graphs 204A, for example, " X_CG”). The plurality of first condensed graphs may be trained for use in downstream tasks for specific prediction models. One or more graph reduction methods may be used to efficiently accumulate knowledge from multiple sources by reducing graph size, thereby facilitating effective training of the GNN model 104. The graph reduction methods may be, for instance, graph sparsification, graph coarsening, graph sketching, and graph condensation. The graph sparsification may select a subset from existing nodes or edges from the input graph 114. The graph coarsening may learn a surjective mapping from the original to a coarse graph by merging multiple nodes into super-nodes. The graph sketching may summarize an original graph (or the input graph 114) into a compact representation while preserving key structural properties, often through node and edge selection based on the criteria like centrality or random sampling, and further compression. The graph reduction methods may be unsupervised or only require limited supervision.

[0060] The electronic device 102 may be configured to feed the plurality of first condensed graphs to the GNN model 104. Similar to the determination of the plurality of first embeddings (e.g., “Zl” and “Z2”), the electronic device 102 may determine a set of second embeddings (e.g., “Z_CG”) for the plurality of first condensed graphs, using the GNN model 104. A person having ordinary skill in the art may understand that multiple instances of the GNN model 104 may be used for the determination of the plurality of first embeddings and the set of second embeddings. Further, each instance of the GNN model 104 may share parameters with the other instances of the GNN model 104.

[0061] At 308A and 308B, an operation of determination of clusters (e.g., “QI” and “Q2”) associated with the plurality of first embeddings (e.g., “Zl” and “Z2”). The electronic device 102 may be configured to determine the plurality of clusters associated with the plurality of first embeddings. The clusters may be determined based on the determined embeddings. In some embodiments, the clustering may be performed using the embedding “Zl” and cluster “Q2” and the embedding “Z2” and cluster “QI”. The determination of the clusters is explained further, for example, in FIG. 7 and FIG.9.

[0062] At 310, an operation of determination of latent pseudo-labels may be performed. The electronic device 102 may be configured to determine latent pseudolabels (e.g., a set of first latent pseudo-labels associated with the plurality of perturbed graphs). The latent pseudo-labels may efficiently cluster similar input nodes of node embeddings among ‘K’ distinct groups such that each latent pseudolabel captures representative properties of its corresponding group of similar nodes. In order to capture different aspects of their properties in the latent space, the latent pseudo-labels may be represented as d-dimensional vectors i.e., Y G IRxd. The latent pseudo-labels may be mapped to at least one node of a condensed graph (for example, the plurality of first condensed graphs). The determination of the latent pseudo-labels is described further, for example, in FIG. 7.

[0063] At 312, an operation of determination of swapped prediction loss may be performed. The electronic device 102 may be configured to determine the swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. The swapped prediction loss may correspond to a first loss (e.g., “Loss 1”) associated with the plurality ofperturbed graphs. The swapped prediction loss associated with the plurality of perturbed graphs may be determined, based on the plurality of first embeddings and a type of cluster node assigned to the plurality of embedding nodes. The determination of the swapped prediction loss is described further, for example, in FIG. 7.

[0064] At 314, an operation of latent representation determination may be performed. The electronic device 102 may be configured to determine a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. The set of latent representations may be determined based on the GNN model 104. A pseudo-label learning may be used as a proxy of latent representative of the original graph. This step involves learning pseudo-labels in a self-supervised manner. These pseudo-labels may be inferred from a condensed graph’s inherent structure and features. The set of latent representations may be determined based on Dirac delta functions. The determination of latent representations is described further, for example, in FIG. 7.

[0065] At 316, an operation of first similarity metric determination may be performed. The electronic device 102 may be configured to determine a similarity metric (for example, a first similarity metric) between the set of latent representations and the set of first latent pseudo-labels. The first similarity metric may correspond to a second loss (e.g., “Loss 2”) between the set of latent representations (determined, for example, at 314) and the set of first latent pseudolabels (determined, for example, at 310). The latent representations may be associated with the plurality of first condensed graphs. In an example, each of the set of latent representations and the set of first latent pseudo-labels may correspond tovectors in a multi-dimensional latent space. The first similarity metric may correspond to a distance metric (such as, for example, a Euclidean distance, a Euler distance, a Manhattan distance, or any other vector distance metric) between first vectors of the set of latent representations and second vectors of the set of first latent pseudo-labels. The first similarity metric may help in refining the plurality of first condensed graphs to better capture the underlying structure of the input graph 114. The plurality of first condensed graphs may then be updated based on the first similarity metric. Finally, the electronic device 102 may render, on the display device 112, information related to the updated condensed graphs, making the updated condensed graphs available for various downstream tasks.

[0066] Existing graph condensation methods may be used in the field of graph representation learning to reduce the complexity of large graphs while preserving their essential structural and attribute properties. The graph condensation methods may rely on supervised information and training for specific downstream tasks. The graph condensation may focus on creating a condensed version of the graph that can be used for various applications without prior task-specific training. The process involves identifying and merging similar nodes and edges based on their structural and attribute similarities, effectively creating a smaller, more manageable graph that retains the key characteristics of the original graph. The condensed graphs may then be used for a range of downstream tasks such as node classification, link prediction, and community detection. However, the traditional graph condensation methods may be supervised and may require node / graph level labelling information to condense large graphs. The supervision requirement for existing graph condensationmethods may restrict their generalization in noisy environment for different downstream tasks.

[0067] To address these challenges, a task-agnostic Self-Supervised Graph Condensation (SSGC) technique that efficiently condenses an input graph within a small memory budget without requiring any node-level label information is proposed in this disclosure. This technique may be particularly useful in scenarios where input graphs arrive at different timestamps, making it impractical to store or process historical graphs jointly with new incoming graphs, or when input graphs are distributed across different data centers, preventing joint training using all graphs. The SSGC method may involve producing a condensed graph from a large input graph using back-propagation by matching the set of first latent pseudo-labels learned from the original graph. For downstream tasks, a predictive graph neural network (GNN) model may be trained using the condensed graph, followed by fine-tuning with limited supervision corresponding to the specific tasks.

[0068] The disclosed approach may offer several advantages. The condensed graph obtained based on the disclosed approach may retain essential structural and attribute properties, making it applicable to a variety of downstream tasks such as node classification, link prediction, and community detection, without the need for task-specific training. Since graph condensation does not rely on labelled data, it may be particularly useful in environments where data is noisy, or labels are sparse or unavailable. This makes it suitable for distributed data centers where data quality and labelling can vary significantly. The electronic device 102 may receive an input graph associated with an application domain. It then determines a plurality of first condensed graphs and perturbed graphs based on the input graph. This reduces thecomplexity of the original input graph while preserving its essential features. The electronic device 102 may compute embeddings for both the perturbed graphs and the first condensed graphs. Such embeddings may capture the structural and attribute information of the graphs in a lower-dimensional space. The electronic device 102 may determine clusters based on the embeddings of the perturbed graphs, which helps in identifying similar nodes and structures within the graph. The electronic device 102 may calculate latent representations for the condensed graphs based on their embeddings. The disclosed approach may also provide for the determination of the swapped prediction loss for the perturbed graphs, which are used to generate latent pseudo-labels. The pseudo-labels may act as a form of weak supervision, guiding the learning process without requiring explicit labels. The electronic device may compute the first similarity metric between the set of latent representations and the set of pseudo-labels. The first similarity metric may help in refining the plurality of first condensed graphs to better capture the underlying structure of the original graph (for example, the input graph 114). The plurality of first condensed graphs may then be updated based on the first similarity metric. Finally, the electronic device 102 may render the information related to the updated condensed graphs, making them available for various downstream tasks. The described approach effectively utilizes the advantages of graph condensation by generalizing the condensed graphs for any downstream tasks and making it suitable for distributed data centers with noisy or no labels. By creating embeddings, clusters, and latent representations, and using pseudo-labels for weak supervision, the method ensures that the condensed graphs retain essential properties of the original graph, enabling efficient and versatile graph representation learning.

[0069] FIG. 4A and FIG. 4B are diagrams that collectively illustrate an exemplary scenario for graph condensation and graph neural network training for a downstream task based on the graph condensation, in accordance with an embodiment of the disclosure. FIG. 4A and FIG. 4B is described in conjunction with elements from FIG. 1, FIG. 2, and FIG. 3. With reference to FIG. 4A and FIG. 4B, an exemplary scenario 400 is shown.

[0070] With reference to FIG. 4A, the scenario 400 includes a set of source input graphs (for example, a source input graph 1,... a source input graph n,...and a source input graph N), which may correspond to a set of original graphs (for example, an original graph 402A,...an original graph 402n,...and an original graph 402N, respectively). Hereinafter, the source input graph 1,... the source input graph n,...and the source input graph N may be interchangeably referred as a source 1402A,...a source n 402n,...and a source N 402N, respectively. FIG. 4A further includes the plurality of first condensed graphs (e.g., condensed graph 404A,...a condensed graph 404n,...and a condensed graph 404N). FIG.4A illustrates a first training phase of the GNN model 104 in which condensed graphs (i.e., the plurality of first condensed graphs) and corresponding latent pseudo-labels may be learned from the set of source input graphs. Herein, the processor 202 may apply graph condensation techniques on the set of source input graphs to obtain the plurality of first condensed graphs and latent cluster prototypes (i.e., the plurality of clusters and / or the set of latent representations) for each of the source input graphs.

[0071] The processor 202 of the electronic device 102 may be configured to determine each of condensed graphs (e.g., the plurality of first condensed graphs), from the set of source input graphs (i.e., the set of original graphs). The plurality offirst condensed graphs may be determined based on an application of one or more graph condensation techniques on the set of source input graphs. For example, the one or more graph condensation techniques may include, but may not be limited to, graph sparsification, graph coarsening, graph sketching, and graph reduction. The graph sparsification may select a subset from existing nodes or edges from each of the set of source input graphs. The graph coarsening may learn a surjective mapping from each of the set of source input graphs to a coarse graph by merging multiple nodes into super-nodes. The graph sketching may summarize an original graph (e.g., the original graph 402A) into a compact representation while preserving key structural properties, often through node and edge selection based on the criteria like centrality or random sampling, and further compression. The graph reduction methods may be unsupervised or only require limited supervision.

[0072] The one or more graph condensation techniques may be applied on the set of source input graphs to create smaller graphs (i.e., condensed graphs). Each condensed graph may retain essential structural and feature information of the corresponding original graph, making it useful for various downstream tasks while reducing the computational complexity. For a large source input graph, the plurality of first condensed graphs may be produced using a back-propagation technique by matching the latent pseudo-labels (as described further, for example, in FIG. 3 at 310), learned from the corresponding source input graph.

[0073] With reference to FIG. 4B, the scenario 400 may include a memory buffer 406, a first labelled graph 408A, a second labelled graph 410A, node prediction tasks 408, and link prediction tasks 410. In an embodiment, the memory buffer 406 may be included in the memory 204. Alternatively, the memory buffer 406 may beseparate from the memory 204 and may be communicatively coupled and associated with the memory 204. The processor 202 may be configured to store the plurality of first condensed graphs along with the corresponding latent pseudo-labels in the memory buffer 406. The first labelled graph 408A and the second labelled graph 410A may correspond to a downstream labelled graph for the node prediction tasks 408 and another downstream labelled graph for the link prediction tasks 410. FIG.4B illustrates a second training phase including a fine-tuning of the GNN model 104 on downstream tasks, based on the plurality of first condensed graphs (and the latent cluster prototypes, i.e., the plurality of clusters and / or the set of latent representations) and downstream labelled graphs (such as, the first labelled graph 408A and the second labelled graph 410A).

[0074] Examples of the downstream tasks may include, but are not limited to, fraud detection and user profiling in a financial application domain, and customized user profiling for advertisement selection and product recommendations in an e-commerce application domain. The downstream tasks may also be applicable to a medical application domain, which may include, but is not limited to, a first task for learning from multiple data sources where data sharing may be prohibited due to users’ privacy or large size of input data, and a second task where inputs are not labelled or incomplete.

[0075] In an example, for the downstream tasks, a predictive graph neural network (GNN) model (e.g., the GNN model 104) maybe trained using the plurality of first condensed graphs, followed by fine-tuning with limited supervision corresponding to the specific tasks. For example, the processor 202 may train and fine-tune a first instance of the GNN model 104 for the node prediction tasks 408,based on the plurality of first condensed graphs (and corresponding latent pseudolabels) and the first labelled graph 408A. Further, the processor 202 may train and fine-tune a second instance of the GNN model 104 for the link prediction tasks 410, based on the plurality of first condensed graphs (and corresponding latent pseudolabels) and the second labelled graph 410A. In current scenario, the trained first instance of the GNN model 104 may be applied on an input graph for the node prediction tasks 408, while the trained second instance of the GNN model 104 may be applied on an input graph for the link prediction tasks 410.

[0076] Typically, the node prediction tasks 408 and the link prediction tasks 410 may suffer significant performance degradation in noisy conditions. In contrast, the disclosure proposes removal of a dependency on supervised labels for graph condensation. The proposed pseudo-labelled graph condensation (PLGC) technique may consistently deliver superior and stable performance in noisy circumstances, and still remain competitive with the supervised method when trained on clean node labels.

[0077] It should be noted that the scenario 400 of FIG. 4A and FIG. 4B is for exemplary purposes and should not be construed as limiting the scope of the disclosure.

[0078] FIG. 5A and FIG. 5B are diagrams that collectively illustrate an exemplary scenario for use of condensed graphs to fine-tune graph neural network model for downstream tasks, in accordance with an embodiment of the disclosure. FIG. 5A and FIG. 5B are described in conjunction with elements from FIG. 1, FIG.2, FIG. 3, FIG.4A, and FIG. 4B. With reference to FIG. 5A and FIG. 5B, an exemplary scenario 500 is shown.

[0079] With reference to FIG. 5A, the scenario 500 includes the source input graphs (for example, a source input graph 502A,...a source input graph 502n,...and a source input graph 502N). The scenario 500 further includes the plurality of first condensed graphs (for example, a condensed graph 504A,...a condensed graph 504n,...and a condensed graph 504N) and latent cluster prototypes 506A,...506n,...and 506N. The scenario 500 further includes a labelled graph 508 associated with a downstream task with limited supervision. The scenario 500 further includes a GNN downstream model 510, a classifier head 512, and a supervised loss 514.

[0080] The processor 202 of the electronic device 102 may be configured to receive the source input graphs 502A,...502n,...and 502N. The processor 202 may determine the plurality of first condensed graphs (for example, the condensed graphs 504A,...504n,...and 504N) based on an application of one or more graph condensation techniques on the source input graphs 502A,...502n,...and 502N, respectively. The determination of the plurality of first condensed graphs is described further, for example, in FIG. 3 and FIG. 4A. The processor 202 may be configured to determine the latent cluster prototypes 506A,...506n,...and 506N associated with the plurality of first condensed graphs (for example, the condensed graphs 504A,...504n,...and 504N). The determination of the latent cluster prototypes is described further, for example, in FIG. 3 and FIG 4A.

[0081] The processor 202 may be configured to fine-tune the GNN downstream model 510 on a downstream task with limited supervision, based on the plurality of first condensed graphs (for example, the condensed graphs 504A,...504n,...and 504N), the latent cluster prototypes 506A,...506n,...and 506N, and downstream labelled graphs (such as, the labelled graph 508). The processor 202 may train theclassifier head 512, based on the fine-tuning of the GNN downstream model 510. The classifier head 512 may correspond to a task specific module (e.g., set of dense neural network layers) for specific supervision tasks with limited amount of labelled data. The processor 202 may train the classifier head 512 by minimizing the supervised loss 514. Herein, the supervised loss 514 may include an objective function corresponding to the labelled information of the downstream task for limited supervised learning of the classifier head 512. The objective function may correspond to an error or a difference between the set of latent representations and the set of first latent pseudo-labels. The minimization of the supervised loss 514 may lead to a maximization of a similarity (e.g., the first similarity metric) between the set of latent representations and the set of first latent pseudo-labels. The supervised loss 514 may correspond to difference between a labelled input graph (e.g., the labelled graph 508) of the downstream task and a third embedding (i.e., a node embedding of the labelled input graph).

[0082] With reference to FIG. 5B, the scenario 500 further includes an input graph 516, the GNN downstream model 510, the classifier head 512, and a prediction 518 (for a downstream task). The processor 202 may receive the input graph 516 from a plurality of sources including, but not limited to, the database 108 or the memory 204, for example. The processor 202 may receive the supervised loss 514. The processor 202 may feed the input graph 516 to the trained (and fine-tuned) GNN downstream model 510 (which may correspond to an inference model). The inference model may also receive the supervised loss 514. The inference model (i.e., the GNN downstream model 510) may also receive information related to a total loss. The determination of the total loss is described further, for example, in FIG. 10.Further, based on the fed input graph 516, the supervised loss 514, and the total loss, the processor 202 may determine an output from the trained (and fine-tuned) GNN downstream model 510. The processor 202 may feed the output to the classifier head 512. The processor 202 may apply the classifier head 512 on the output of the trained (and fine-tuned) GNN downstream model 510 to obtain the prediction 518 (i.e., a prediction result for a downstream task). The processor 202 may also render the prediction 518 (i.e., the prediction result) for the downstream task.

[0083] It should be noted that the scenario 500 of FIG. 5A and FIG. 5B is for exemplary purposes and should not be construed as limiting the scope of the disclosure.

[0084] FIG.6 is a diagram that illustrates an exemplary scenario for determination of latent representations of an original graph and update of condensed graphs, in accordance with an embodiment of the disclosure. FIG. 6 is described in conjunction with elements from FIG. 1, FIG. 2, FIG. 3, FIG. 4A, FIG. 4B, FIG. 5A, and FIG. 5B. With reference to FIG. 6, an exemplary scenario 600 is shown.

[0085] The scenario 600 may include a set of original graphs (for example, the original graph 402A) and a plurality of first condensed graphs (e.g., a condensed graph 404A, which may be a condensed version of the original graph 402A). The scenario 600 may further include latent node-level embeddings 602A (determined by the GNN model 104) associated with the original graph 402A and a latent distribution 604A associated with the original graph 402A. Further, the scenario 600 may include latent node-level embeddings 602B (determined by the GNN model 104) associated with the condensed graph 404A and a latent distribution 604B associated with the condensed graph 404A.

[0086] The processor 202 may be configured to determine the latent node-level embeddings 602A from the original graph 402A, based on the use of the GNN model 104. The latent node-level embeddings 602A may correspond to the plurality of first embeddings (e.g., “Zl” and “Z2”) for the plurality of perturbed graphs (e.g., “Gl” and “G2”, as described further, for example, in FIG. 3. Similarly, the processor 202 may determine the latent node-level embeddings 602B from the condensed graph 404A, based on the use of the GNN model 104. The latent node-level embeddings 602B may correspond to the set of second embeddings (e.g., “Z_CG”) for the plurality of first condensed graphs (e.g., “X_CG”), as described further, for example, in FIG. 3.

[0087] The processor 202 may determine the latent distribution 604A from the latent node-level embeddings 602A and may determine the latent distribution 604B from the latent node-level embeddings 602B. The latent distribution 604A may correspond to the plurality of clusters (e.g., “QI” and “Q2”) associated with the plurality of first embeddings. The latent distribution 604B may correspond to the set of latent representations (determined at 314 of FIG. 3) associated with the plurality of first condensed graphs. The determination of the plurality of clusters and the set of latent representations are described further, for example, in FIG. 3. A count of the clusters may correspond to a count of number of nodes in each first condensed graph and may depend on a predefined budget.

[0088] At 606, the condensed graph may be updated. The processor 202 may be configured to update the condensed graph 404A such that a divergence between the latent distribution 604A associated with the original graph 402A and the latent distribution 604B associated with the condensed graph 404A is minimized. The divergence may correspond to a Kullback-Leibler (KL) -divergence or a maximummean discrepancy (MMD) technique. The proposed objective function for supervised graph condensation (for update of the condensed graph) may be denoted by expression (1), as follows:mm L (GNNgs(A, X), QT. K) such that 0s= argmineL GNNeA', X', QS. Y and QT, Y = argminQT, Y £pseUdo GNN0, (A, X), QT. P) (1) where,"£()" may represent a loss function,“GNNSs” may represent the GNN model 104 with “0s” as parameters of the GNN model,“QTG {0,1}NxK” may represent a learnable assignment matrix, that may map each node of the original graph 402A, “T” with exactly one pseudo-label, and“X” may represent a data point.

[0089] The objective function for the existing gradient or latent representation matching may be based on a supervised graph condensation as shown in expression 1. Herein, {T}={(A, X), Y} and {S, Y’}={(A’, X’), Y’} may respectively denote the original large input graph (e.g., the original graph 402A) and the condensed graph (e.g., the condensed graph 404A) to be learned. AGRNxNand A'GRN'xN' may be corresponding adjacency matrices with N and N' nodes respectively such that N » N'. AGIRWXDand X’GIRW'XDmay denote D-dimensional node features of the original large input graph and the condensed graph. The GNNe may be the GNN model 104 with parameters “0”, where “0s” being parameters of the GNN model 104 trained using a graph “S”. “£node” may denote a supervised loss function for a specific downstream node classification (for example, a cross-entropy loss). The expression 1 may be used to learn acondensed graph, “S”, such that a GNN model (i.e., the GNNds) may be trained using “S” and also minimize “£node” for the input graph (or the original large graph) “T”. “Y” may denote a node label, “Y” may denote the pseudo labels, andmay denote a loss function. However, the supervised graph condensation methods may be difficult to train when labelling the nodes is expensive or in case the graphs are obtained from noisy data sources. In contrast, the self-supervised graph condensation method may define the loss objective function to be minimized without any supervision of the labels “Y” for the specific downstream node classification task. The pseudo-labelled graph condensation addresses these problems by artificially producing the pseudolabels, “Y”, corresponding to the node embeddings of the original input graph and generating the condensed graphs corresponding to these pseudo-labels using back-propagation. The condensed graphs may be converted into the latent node-level embeddings 602B using the GNN model 104. The original graphs maybe converted into the latent node-level embeddings 602A using the GNN model 104. The pseudolabels may be determined by clustering similar nodes of the input graphs or the original graph. The condensed graphs may be clustered in the latent space based on the similarity of nodes to determine the pseudo-labels. The latent distributions (604A and 604B, respectively) that may be generated for the input graph and the condensed graph are similar as shown in FIG. 6.

[0090] With reference to FIG.6, at 606, the processor 202 may determine whether statistics associated with the pseudo-labels, “Y”, are sufficient (also known as “sufficiency statistics”). The sufficiency statistics is statistical information that may indicate that information about a model's parameters that can be inferred with a sufficient accuracy from a sample dataset. This may allow data reduction. Thus, asmall set of labels may be used to determine the pseudo-labels. The processor 202 may then assign the pseudo-labels to nodes (i.e., Y, QT) based on the sufficiency statistics. The processor 202 may update the condensed graph 404A based on the sufficiency statistics and the assigned pseudo-labels.

[0091] It should be noted that the scenario 600 of FIG. 6 is for exemplary purposes and should not be construed to limit the scope of the disclosure.

[0092] FIG. 7 is a diagram that illustrates an exemplary scenario for producing condensed graphs and latent pseudo-labels for self-supervised graph condensation, in accordance with an embodiment of the disclosure. FIG. 7 may be described in conjunction with elements from FIG. 1, FIG. 2, FIG. 3, FIG. 4A, FIG. 4B, FIG. 5A, FIG. 5B, and FIG. 6. With reference to FIG. 7, an exemplary scenario 700 is shown.

[0093] The scenario 700 may include a set of source graphs (such as, input graphs received from the source 1402A,...the source n 402n,...and the source N 402N] and the condensed graphs 204A. The scenario 700 may further include the input graph 114 that may correspond to the source n 402n. The scenario 700 may further include a plurality of perturbed graphs (e.g., “Gl” and “G2”) 702A-702N for the input graph 114. The scenario 700 may further include the GNN model 104, a set of first node embeddings (such as, “Zl” and “Z2”) 704A-704N for the respective perturbed graphs 702A-702N, and a set of second embeddings (such as, “Z-CG”) for the condensed graph 204A. The scenario 700 may further include a plurality of clusters (e.g., “QI” and “Q2”) 706A-706N associated with the set of first node embeddings 704A-704N. The scenario 700 may further include a set of latent representations 708 associated with the set of second embeddings and a set of first latent pseudo-labels 710 associated with the plurality of perturbed graphs 702A-702N. The scenario 700 mayfurther include a loss 1 (i.e., a swapped prediction loss) 712 and a loss 2 (i.e., a similarity maximization loss) 714.

[0094] Referring to FIG.7, the input graphs 114 may be received from the plurality of sources 402A-402N. The input graphs 114 may be converted to perturbed graphs 702A-702N. Though FIG. 7 shows only two perturbed graphs, however, the scope of the disclosure should not be limited to only two perturbed graphs, and more than two perturbed graphs 702A-702N may be generated based on the input graphs 114, without departure from the scope of the disclosure. The perturbed graphs 702A-702N maybe fed to the GNN model 104 to determine node embeddings (such as, “Zl”, “Z2”, and so on) 704A-704N for the respective perturbed graphs 702A-702N. The GNN model 104 may use shared parameters for training the condensed graphs 204A. The pseudo-labels may be assigned to each of the node embeddings based on vector nodes (for example, color vector). The assignment of clusters (referred as " QT") may be denoted by 706A-706N. The latent pseudo labels (for e.g., of dimension “100x256”) may indicate that there are “100” pseudo-labels. Each of these pseudolabels may include “256” dimensions. In an embodiment, based on an assignment of first clusters (for example, “QI”), node embeddings (for example, “Z2”) may be learned and based on the assignment of second clusters (for example, “Q2”), another node embeddings (for example, “Zl”) may be learned. The operation of learning of the node embeddings from the plurality of clusters may corresponding to the swapped prediction loss 712 (a loss determined based on, for instance, “Zl”, “Q2”, “Z2”, and “QI”), set of first latent pseudo-labels 710 associated with the plurality of perturbed graphs 702A-702N. For example, the set of first latent pseudo-labels 710associated with the plurality of perturbed graphs 702A-702N may be determined based on the swapped prediction loss 712.

[0095] In an embodiment, the condensed graph 204A may be used for determination of the node embeddings (for instance, the second embeddings). The node embeddings may be used to determine the set of latent representations 708 of the condensed graphs 204A. The condensed graphs 204A maybe trained by updating the pseudo-labels Y. The condensed graphs 204A may be trained until a first similarity metrics of the set of latent representations of the condensed graphs 204A and the set of first latent pseudo-labels 710 is maximized.

[0096] The processor 202 may apply a graph condensation method on the input graph 114 (obtained from a given source) to train the GNN model 104 to determine a downstream task-specific prediction model. The processor 202 may use the expression (1) (of FIG. 6) to apply the graph condensation method. According to expression (1), the processor 202 may learn the condensed graphs 204A based on the training of the GNN model 104 to minimize a loss (and maximize a similarity metrics) between the condensed graph 204A and the input graph 114. The selfsupervised graph condensation method may pose challenges such as learning “K” distinct pseudo-labels, ”Y”, where K«N, and learning an assignment matrix, “QT”, to map each node of the input graph 114 to exactly one pseudo-label. The goal of learning pseudo-labels may be to efficiently cluster ‘similar’ input nodes among “K” distinct groups such that each pseudo-label captures the representative properties of its corresponding group of ‘similar’ nodes. To capture the different aspects of their properties in the latent space, the ‘pseudo-labels’ may be represented as d-dimensional vectors i.e., Y G The learned pseudo-labels may be distinct. That is YtY^i ^ j, where " Yj" represents the pseudo-labels for the ithcluster of nodes.

[0097] The learnable assignment matrix, QT E {0, 1}NXK,maymapeachnode of the original graph, “T”, with exactly one pseudo-label. Therefore, each row of “QT” may be a one hot vector, i.e., ’ i Qij = 1- 'Y ijeQT, and the pseudo-labels corresponding to the nodes may be given as QT. Y. Given QT and Y, the objective of the expression (1) of FIG. 6 may be to determine the condensed graphs 204A such that the GNN model 104 trained on a condensed graph “S” by minimizing the distance, should also minimize distance “L” between node-embeddings and " K”. Hence, in the absence of Y, the self-supervised objective may be formulated for the pseudo-labelled graph condensation (PLGC) method by incorporating " Q ” and “ as per expression (1).

[0098] Herein, pseudo-labels may be distinct that is YtVtj. "£”may map the node embeddings with their corresponding pseudo-labels. QSG {0, 1}N’XKmay be the assignment matrix for “S” with one-hot rows that manually map each node with exactly one pseudo-label. That is, i tfi-j = 1, QijeQs- Each pseudo-label may be mapped to at least one node of the condensed graph, i.e., N' > K. The maximum permissible number of pseudo-labels may be set (i.e., N' == K] to efficiently capture the granular properties of the original graphs. Hence, Qs = IK is predefined as a KxK-dimensional identity matrix. “Lpseudo” may denote an objective function for the pseudo-label learning objective that produces both “QT” and “Y” for the input graph 114. The self-supervised condensed graph learning may be executed using “QT” and “Y” in an iterative fashion.

[0099] The method of learning pseudo-labels and assignment matrix may include receiving an input graph “T" and using a pseudo-labelling technique to assign thesame pseudo-labels to similar nodes based on the respective representations in a latent embedding space. By clustering the node embeddings in the latent space, latent properties may be efficiently captured. “QT” may be iteratively updated to map augmented nodes to pseudo-labels, followed by adjusting the pseudo-labels and the underlying GNN model 104 (for e.g., GNNe ), where the GNN model 104 may predict pseudo-labels for augmented node embeddings using the embeddings of the other augmented nodes. For two augmented views of “T”, denoted as " Ti" and " Tj", normalized node embeddings may be obtained (which may be denoted as “ Zi" and " Zj"). The processor 202 may optimize an assignment function, "fsim", to obtain “Qi” for “Ti” by equally distributing the node embeddings among “Y”. The processor 202 may determine a clustering function, "fcius" that may correspond to a swappedprediction loss function that simultaneously updates “Y”. For an nthnode, for each augmented pair, " Ti" and “Tj”, the processor 202 may use "fcius" to predict the labelassignments (i.e., “qi.n”) of the nthnode of " Ti", based on the normalized node embedding (i.e., “Zi,n”) of the nth node of “Tj”. The “£pSeudo” maybe determined based on expression (2), as follows:■^pseudo ~ cius(Z ^n, c] jn,' Y) such that Qj — Ud~gi7lClXqT£ass^gn('ZjyQpY^ (2) where, z GNNgdAi,xC) may denotes an "f 2" normalized node-embedding vector, and“Zi n” may denote an "t?” normalized node-embedding vector for the nthnode of ithaugmented graph, Ti=(Ai, XQ.

[0100] The swapped prediction loss may be determined using the expression (3), as follows:„ z x<k>exp (zf„y<fc>) ^clus^t.n.qj.n} - - Sfc q7-,n ■ logexp (ztn?<fe>)[3]

[0101] For a given node, the clustering loss, "fcius" may be designed to bring embeddings closer irrespective of their augmentations by predicting their assigned pseudo-labels while repealing the other pseudo-labels. Therefore, "■faus^zinqjny’ may be defined as a cross-entropy loss by applying a soft-max activation over the similarities between the node embeddings, "zin”, and the pseudo-labels, “Y”. Herein, "zn" may be the transpose of "zi n”. “Y<k>” may denote the kthpseudo-label.

[0102] The “f assign” maybe designed to produce “Qi”, such that, for each augmented graph, “Ti”, the similarity between the node embeddings “zi”, and their assigned pseudo-labels, “Y” may be maximized and node embeddings “zi” may be equally partitioned among “Y”. The constraint of “equal partition” may ensure a learning of distinct pseudo-labels without collapsing into a trivial solution. In this case, the input graph 114 is divided into mini-batches of “B” nodes, the “equal partition” constraint may be enforced by constraining “Qi” as a transportation polytope (i.e., each of the “K” pseudo-labels may be assigned to at least “B / K” times). Based on the “equal partition” constraint, the objective function may be defined as an expression (4), as follows:assign = Tr^Q. Y*. z£)+G i) such that Qie{Qie^B+XKI #. 1B= Qt.1K= ± [4] where,“1 ” and “1B” may be “K” and “B” dimensional vectors, respectively.

[0103] For a complete graph, “Ti”, the above expression (4) may be modified with B=N. The assignment matrix takes the form of a normalized exponential matrix: Qi* = diag(u) exp (— Jdiag(v). Here, u and v elR5may be renormalization vectorsthat can be obtained using an iterative Sinkhorn-Knopp algorithm. “Q” may be normalized and discretized across each row to obtain one-hot vector representations, ensuring each node is assigned to one pseudo-label.

[0104] After obtaining “Q” and “Y” by optimizing “£pSeudo” at each iteration step, an existing graph condensation technique may be applied to achieve the objective of expression (1). For example, two well-known strategies to tackle this problem may be intermediate gradient matching and latent representation matching to align the model parameters. Intermediate gradient matching techniques may compute their loss as distance between model parameter gradients with respect to the condensed graph, “S”, and the input graph 114, “T”. Similarly, representation-matching techniques attempt to minimize the divergence between the class-conditional node representations of two graphs, “S” and “T”. Concretely, a maximum mean discrepancy (MMD) technique may be applied to minimize class-wise representations via matching their first-order class-conditional moments as given in expression (5), as follows:mm£y6yTy|| E[ZT|y] - E [ZS|y]||2(5)where," ZT\y” and “Zs\y” may denote the node embeddings of graphs “T” and “S”, respectively, with supervised class-labels, “y”,| ==y|TV= - may denote a class ratio,yIvl|.| maybe a cardinality of a set, and| Y==y | may be a number of nodes with the label “y”.

[0105] For the PLGC method, the pseudo-labels, " Y”, and the node assignments, " QT", may be obtained along with the trained GNN model 104 (i.e., GNNgi) by optimizing “£pSeudo”. Hence, optimization of parameters of the GNN model 104 may be excluded and the pseudo-labels may be directly applied as the representative statistics to optimize “S” using the MMD loss. Therefore, “IE[ZT|y]” for eachy G Y may be replaced with the GNN0' as the GNN model 104. Unlike the supervised techniques, the granular representation may be captured by learning the maximum number of pseudo-labels. The number of pseudo-labels may not exceed the number of nodes in “S” as each pseudo-label should be mapped to at least one node of “S”. Hence, N’=K and Qs=l may be chosen. It may lead to modifying the MMD loss to the mean-square loss as per expression (6), as follows:mmZy6? l|E[ZT|y] - E[ZS|yll2= min^ef||y - ^S|y|||2[6] where" Zs\y” may denote the embeddings of the single node of “S” whose label is preselected to be “y”.

[0106] In an embodiment, a fully connected adjacency matrix “A”’ may be learned along with “X”’. However, learning any edge connections for the condensed graphs 204A may be omitted without degrading the downstream predictive performance. Also, it may reduce the memory storage. Therefore, “A”’ may be selected as an identity matrix.

[0107] The proposed disclosure of self-supervised condensation method may produce the condensed graph “S” along with the pseudo-labels, “Y". For “M” datasources, “M” such pairs of condensed graph and pseudo labels may be obtained, i.e., CG= {(Sm, Next, for a given downstream task, the prediction model maybe learned by based on the following two steps. A first step may include a learning of a back-bone representation of the prediction model using the condensed graph “S”. A second step may include fine-tuning the GNN model 104 with a prediction head, using a task specific graph, Gtest= (test, Xtest], with a set of clean labels, “Ytest”.

[0108] For the learning of the back-bone representation, given a set of condensed graphs,{(Sm,?m)}m=i, an objective may be to map the node-embeddings to their corresponding pseudo-labels for all condensed graphs. Since, “Y" may be the representative of latent node embeddings, “Si” may be mapped to “Yj” for each “Si” using a mean square loss of expression (7), as follows:mmS(sf,yf)ecGSyeFl |y -Zsqy| I2[7]

[0109] Unlike existing supervised graph condensation techniques, the PLGC does not require any supervised labels for the graph condensation. In the proposed PLGC technique, task-specific labelled data may be required to fine-tune the GNN model 104 with an additional prediction head for a given downstream task. Given a taskspecific graph “Gtest” and its corresponding labels “Ytest”, the GNN model 104 may be fine-tuned with a prediction head ( o GNNe) by choosing a supervised loss “£” depending on the downstream tasks, as per expression (8), as follows:^n£supermsed(f0oG NN0(Atest, Xtest\ Ytest[8]where,“^supervisedmaY correspond to a cross-entropy loss for node-classification or link prediction tasks. Herein, parameters corresponding to the prediction head (i.e., “ ”) may be modified for partial fine-tuning.

[0110] It should be noted that the scenario 700 of FIG. 7 is for exemplary purposes and should not be construed to limit the scope of the disclosure.

[0111] FIG. 8 is a diagram that illustrates an exemplary scenario for training of a GNN model for downstream tasks, based on the condensed graphs and latent pseudo labels, in accordance with an embodiment of the disclosure. FIG. 8 may be described in conjunction with elements from FIG. 1, FIG. 2, FIG. 3, FIG. 4A, FIG.4B, FIG. 5A, FIG.5B, FIG. 6, and FIG. 7. With reference to FIG. 8, an exemplary scenario 800 is shown. The scenario 800 may include condensed graphs 204A, a downstream test graph 802, a downstream task-specific GNN model (e.g., the GNN model 104), node embeddings 804, a classifier head 806, node classifications 806A, and a supervised loss 808.

[0112] The processor 202 may receive condensed graphs (for example, second condensed graphs) associated with the multiple sources (for example, various application domains) and the set of latent pseudo-labels associated with the plurality of perturbed graphs. The GNN model 104 may be trained based on the second condensed graphs and the set of first latent pseudo-labels. The GNN model 104 may output the node embeddings 804. A dimension of the node embeddings may be “1x256”, for example. The trained GNN model 104 may update the classifier head 806 (or a prediction head). The classifier head may be a final layer or set of layers associated with the GNN model 104 that takes node embeddings (e.g., the node embeddings 804) as input and produces a desired output, such as, class labels for the nodes or edges. It typically consists of one or more fully connected layers followed by an activation function appropriate for the task (for example, a soft-max function for a classification task). The labelled data may be used to fine-tune the GNN model 104. The fine-tuning may involve a forward pass, a loss calculation, and a parameterupdate. The predictions may be determined using a backbone representation model and the prediction head or the classifier head 806. The supervised loss 808 may be determined on the predictions and true labels to update parameters of the GNN model 104 using a backpropagation to minimize the supervised loss 808.

[0113] It should be noted that the scenario 800 of FIG. 8 is for exemplary purposes and should not be construed to limit the scope of the disclosure.

[0114] FIG. 9 is a diagram that illustrates a flowchart of an exemplary method for creating the condensed graphs for received input graphs, in accordance with an embodiment of the disclosure. FIG. 9 may be described in conjunction with elements from FIG. 1 and FIG. 2, FIG. 3, FIG. 4A, FIG. 4B, FIG. 5A, FIG. 5B, FIG. 6, FIG. 7 and FIG.8. With reference to FIG. 9, an exemplary flowchart 900 is shown. The flowchart 900 may include operations 902 to 922 that may be executed by one or more components of FIG. 1, such as, the electronic device 102 or the processor 202 of FIG. 2. Control may start at 902 and proceed to 904.

[0115] At 904, an operation of loading incoming graphs (input graphs) may be performed. The electronic device 102 may be configured to load incoming graphs. The incoming graphs may be the input graphs received from various sources. The various sources may be the plurality of application domains.

[0116] At 906A and 906B, an operation of perturbed graph determination may be performed. The electronic device 102 may determine the perturbed graphs (such as, perturbed graph 1 and perturbed graph 2). The perturbed graph may correspond to altered or modified graphs. The altered or modified graphs may include addition or removal of vertices or edges within the input graph, or adjustments to weights of edges in a weighted graph corresponding to the input graph.

[0117] At 908A and 908B, an operation of determination of perturbed graph embedding may be performed. The electronic device 102 may be configured to determine the perturbed graph embeddings. The electronic device 102 may determine a plurality of embeddings (for example, the set of first node embeddings) for the plurality of perturbed graphs and the set of embeddings (for example, the set of second embeddings) for the plurality of first condensed graphs. The condensed graph 204A may be fed to the GNN model 104 to determine the node embeddings (for instance, the set of second embeddings) and the node embeddings may be used to determine the latent representations 708 of the condensed graph 204A. The perturbed graphs 702A-702N maybe passed to the GNN model 104 to determine the set of first node embeddings, such as, node embeddings (referred as Zl, Z2,...) 704A-704N for the respective perturbed graphs 702A-702N).

[0118] At 910, an operation of latent pseudo-label determination may be performed. The electronic device 102 may be configured to determine the latent pseudo-labels. The pseudo-labels may be assigned to each of the node embeddings based on vector nodes (for example, color vector). The electronic device 102 may determine a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. The set of latent representations may be determined based on Dirac delta functions. The plurality of first condensed graphs may correspond to matching the set of latent representations of the condensed graph to the latent pseudo-labels associated with the plurality of perturbed graphs.

[0119] At 912A and 912B, an operation of assigning clusters for the perturbed graphs may be performed. The electronic device 102 may be configured to performcluster assignment for the perturbed graphs. The assignment of clusters (referred as " QT") maybe performed at 706A-706N. For example, there maybe 100 latent pseudolabels each with 256 dimensions. In an embodiment, based on the assignment of first clusters (for example, “QI”), second node embeddings (for example, “Z2”) may be learned or trained and based on the assignment of second clusters (for example " Q2"), first node embeddings (for example “Z 1”) may be learned or trained.

[0120] At 914, an operation of error determination may be performed. The error determination may be associated with the swapped prediction loss. The electronic device 102 may be configured to determine the swapped prediction loss. The electronic device 102 may determine the swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. For a given node, the clustering loss may be designed to bring their embeddings closer irrespective of their augmentations by predicting their assigned pseudo-labels while repealing the other pseudo-labels.

[0121] At 916, an operation of determination of whether a predetermined number of epochs has been reached may be performed. The electronic device 102 may determine whether the predetermined number of epochs of the process of creating the condensed graphs for received input graphs has been performed. In case, a maximum number of (i.e., the predetermined number of) epochs are reached, control may pass to end. Otherwise, control may be passed to 918, 906A, and 906B.

[0122] At 918, an operation of initialization of a condensed graph may be performed. The electronic device 102 may be configured to initialize the condensed graph. For example, the condensed graph may be initialized based on the input graph114. In an example, an adjacency matrix of the input graph 114 may be simplified to obtain an initialized condensed graph.

[0123] At 920, an operation of the embedded condensed graphs determination may be performed. The electronic device 102 may be configured to determine the embedded condensed graphs. The node embeddings of the embedded condensed graphs may be used for various downstream tasks and may be determined using methods such as GNN, matrix factorization, random walk-based methods, and the like. The quality of node embeddings may be evaluated using metrics such as classification accuracy, link prediction AUC, and clustering metrics, and the like.

[0124] At 922, an operation of similarity metric determination may be performed. The electronic device 102 may be configured to determine the similarity metric. The similarity metric (for example, the first similarity metric) may be determined between the set of latent representations of the condensed graphs and the set of latent pseudo-labels. The first similarity metric may help ensure that the condensed graph (corresponding to the set of latent representations) accurately represents structure and features of the original input graph 114 (corresponding to the set of latent pseudo-labels). The latent pseudo-labels may be updated based on the first similarity metrics. The GNN model 104 may be trained until the similarity metrics of the latent representations of the condensed graphs and the latent pseudo labels is maximized. Control may be passed to end when the maximum number of epochs has been reached.

[0125] FIG. 10 is a diagram that illustrates a flowchart for an exemplary method for training a GNN model for a downstream task, in accordance with an embodiment of the disclosure. FIG. 10 may be described in conjunction with elements from FIG. 1,FIG. 2, FIG. 3, FIG. 4A, FIG. 4B, FIG. 5A, FIG. 5B, FIG. 6, FIG. 7, FIG. 8 and FIG. 9. With reference to FIG. 10, a flowchart 1000 is shown. The flowchart 1000 may include operations 1002 to 1018. The exemplary flowchart 1000 may include a set of operations that may be executed by one or more components of FIG. 1, such as, the electronic device 102 or the processor 202 of FIG. 2. Control may start at 1002 and proceed to 1004.

[0126] At 1004, an operation of reception of condensed graphs and latent pseudolabels may be performed. The electronic device 102 may be configured to receive a plurality of second condensed graphs and the set of first latent pseudo-labels as input. The plurality of second condensed graphs may be associated with the plurality of sources. The plurality of sources may include the plurality of application domains. A first GNN model (such as, the GNN model 104) may be trained based on the plurality of second condensed graphs and the set of first latent pseudo-labels. The training of the first GNN model is described further, for example, in FIG. 3, FIG. 5A, and FIG. 5B.

[0127] At 1006, an operation of condensed graph embedding may be performed. The electronic device 102 may be configured to determine the node embeddings of the plurality of second condensed graphs. The node embeddings of a condensed graph may correspond to latent representations of the condensed graphs. The first GNN model (e.g., the GNN model 104) maybe used to determine the condensed graph embedding.

[0128] At 1008, an operation of error computation to match the node-embeddings and the cluster prototypes (or the latent pseudo-labels) may be performed. The electronic device 102 may be configured to compute the error to match the node-embeddings and the cluster prototypes. The electronic device 102 may determine a second similarity metric between the plurality of second condensed graphs and the set of second latent pseudo-labels, based on a first GNN model. The similarity metric (for example, the second similarity metric) maybe determined between the plurality of second condensed graphs and the set of second latent pseudo-labels, based on the first GNN model (for example, the GNN model 104). The determination of the similarity metric is described further, for example, in FIG. 3 and FIG. 7.

[0129] At 1010, an operation of reception of labelled input graph, for downstream tasks, may be performed. The electronic device 102 may be configured to receive a labelled input graph for the downstream tasks. The labelled input graph may be a graph in which each node may be associated with a label. These labels may provide additional information about the node classification, link prediction, and graph classification. The labelled input graph may include nodes, edges, node features, edge features, and labels. The labelled input graph maybe processed using the GNN model 104 to learn node embeddings and make predictions based on the graph structure and features.

[0130] At 1012, an operation of determination of embeddings of the received labelled input graph maybe performed. The electronic device 102 maybe configured to determine embeddings (e.g., a third embedding) of the received labelled input graph. The determination of the embedded labelled graph may be associated with third embeddings. The embedding the labelled input graph may involve transforming the nodes of the labelled input graph into low-dimensional vector representations while preserving structural and feature information of the labelled input graph.

[0131] At 1014, an operation of determination of a supervised prediction loss may be performed. The electronic device 102 may be configured to determine the supervised prediction loss for downstream tasks for a limited / predetermined number of labels. The supervised loss may be determined for the third embedding, based on the third embedding and the classifier head 512 (of FIG. 5A). The determination of the supervised prediction loss is described further, for example, in FIG. 5A.

[0132] At 1016, an operation of total loss computation may be performed. The electronic device 102 may be configured to compute the total loss. The total loss may be determined or computed based on an output of the supervised loss and the received second similarity metric. For example, the total loss may be a loss function computed based on a normalized weighted sum of the supervised loss and the received second similarity metric. The electronic device 102 may be configured to render second information related to the total loss on the display device 112.

[0133] At 1018, an operation of determination of whether a predetermined number of epochs has been reached may be performed. The electronic device 102 may determine whether the predetermined number of epochs of the process of training the GNN model 104 for a downstream task has been performed. In case, a maximum number of (i.e., the predetermined number of) epochs are reached, control may pass to end. Otherwise, control may be passed to 1004.

[0134] FIG. 11 is a diagram that illustrates a flowchart for an exemplary method of self-supervised graph condensation, in accordance with an embodiment of the disclosure. FIG. 11 may be described in conjunction with elements from FIG. 1, FIG.2, FIG. 3, FIG.4A, FIG. 4B, FIG. 5A, FIG. 5B, FIG. 6, FIG. 7, FIG. 8, FIG. 9 and FIG. 10. Withreference to FIG. 11, an exemplary flowchart 1100 is shown. The flowchart 1100 may include operations 1102 to 1120. The exemplary flowchart 1100 may include a set of operations that may be executed by one or more components of FIG. 1, such as the electronic device 102, or the processor 202 of FIG. 2.

[0135] At 1102, an operation of reception of an input graph associated with an application domain may be performed. The processor 202 of the electronic device 102 may be configured to receive the input graph associated with the application domain. The input graph may be received from various sources and application domains. Examples of the application domain may include, but are not limited to, a financial domain, a social network domain, a recommendation system domain, a biological domain, a bio-informatics domain, a chemistry domain, a bio-chemistry domain, a material science domain, or a citation network domain.

[0136] At 1104, an operation of determination of a plurality of first condensed graphs and a plurality of perturbed graphs may be performed. The processor 202 of the electronic device 102 may be configured to determine the plurality of first condensed graphs and the plurality of perturbed graphs, based on the input graph 114. The processor 202 may use a graph condensation method for determination of the plurality of condensed graphs. For example, the graph condensation method may include, but is not limited to, a pseudo-labelled graph condensation, a graph coarsening, a graph sampling, a spectral sparsification, graph summarization. The plurality of perturbed graphs may correspond to altered or modified graphs. The altered or modified graphs may include addition or removal of vertices or edges within the input graph 114, or adjustments to weights of edges in a weighted graph corresponding to the input graph 114. The determination of the plurality of firstcondensed graphs and the plurality of perturbed graphs are described further, for example, in FIG. 3.

[0137] At 1106, an operation of determination of a plurality of first embeddings for the plurality of perturbed graphs and determination of a set of second embeddings for the plurality of first condensed graphs may be performed. The processor 202 of the electronic device 102 may be configured to determine the plurality of first embeddings for the plurality of perturbed graphs and the set of second embeddings for the plurality of first condensed graphs. The determination of the plurality of first embeddings for the plurality of perturbed graphs and the set of second embeddings for the plurality of first condensed graphs may include a training a second GNN model (e.g., the GNN model 104). The determination of the plurality of first embeddings and the set of second embeddings are described further, for example, in FIG. 3.

[0138] At 1108, an operation of determination of a plurality of clusters may be performed. The processor 202 of the electronic device 102 may be configured to determine the plurality of clusters associated with the plurality of first embeddings. The processor 202 may determine at least one type of cluster node of the plurality of cluster nodes and may assign the at least one type of cluster node to each node embedding. Each embedding node may correspond to a plurality of embedding nodes, within the plurality of first embeddings or the set of second embeddings. The swapped prediction loss associated with the perturbed graphs may be determined, based on the plurality of first embeddings and the at least one type of cluster node assigned to the plurality of embedding nodes. The determination of the plurality of clusters is described further, for example, in FIG. 3.

[0139] At 1110, an operation of determination of set of latent representations may be performed. The processor 202 of the electronic device 102 may be configured to determine the set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. The set of latent representations may be determined based on the Dirac delta functions. The set of latent representation may be determined based on the GNN model 104 (For example, a first GNN model, a second GNN model, or a third GNN model). The pseudo-label learning may be used as a proxy of latent representative of the original input graph 114. This step involves learning pseudo-labels in a self-supervised manner. These pseudo-labels may not be derived from any external supervision but may be inferred from inherent structure and features of the original input graph 114. The determination of the set of latent representations is described further, for example, in FIG. 3.

[0140] At 1112, an operation of determination of a swapped prediction loss may be performed. The processor 202 of the electronic device 102 may be configured to determine the swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. In an embodiment, based on the assignment of first clusters (for example, “QI”), second node embeddings (for example, “Z2”) may be learned or trained and based on the assignment of second clusters (for example, “Q2”), first node embeddings (for example, “Zl”) may be learned or trained. The determination of the swapped prediction loss is described further, for example, in FIG. 3.

[0141] At 1114, an operation of determination of a set of first latent pseudo-labels may be performed. The processor 202 of the electronic device 102 may be configuredto determine the set of first latent pseudo-labels associated with the perturbed graphs, based on the swapped prediction loss. The processor 202 may determine a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss. Given the input graph 114, a pseudo-labelling technique may assign the same pseudo-labels to similar nodes based on their representations in a latent embedding space. By performing clustering in this latent embedding space, latent properties of the input graph 114 may be efficiently captured. The input graph 114 may update map augmented nodes to pseudo-labels, followed by adjusting the pseudo-labels and an underlying GNN model (e.g., the GNN model 104), where the GNN model may predict pseudo-labels for augmented node embeddings using the embeddings of other augmented nodes. The determination of the set of first latent pseudo-labels is described further, for example, in FIG. 3.

[0142] At 1116, an operation of determination of a first similarity metric may be performed. The processor 202 of the electronic device 102 may be configured to determine the first similarity metric between the set of latent representations and the set of first latent pseudo-labels. The similarity metric (for example, the first similarity metric) may be determined between the set of latent representations of the plurality of first condensed graphs and the set of latent pseudo-labels. The first similarity metric may help ensure that the condensed graph (corresponding to the set of latent representations) accurately represents structure and features of the original input graph 114 (corresponding to the set of latent pseudo-labels). The latent pseudo-labels may be updated based on the first similarity metrics. The GNN model 104 may be trained until the similarity metrics of the latent representations of the plurality of first condensed graphs and the latent pseudo labels is maximized.The determination of the first similarity metric is described further, for example, in FIG. 3.

[0143] At 1118, an operation of update of the plurality of first condensed graphs may be performed. The processor 202 of the electronic device 102 may be configured to update the plurality of first condensed graphs based on the first similarity metric. The updating of the first condensed graphs may correspond to a matching of the set of latent representations of the condensed graph to the set of first latent pseudolabels associated with the plurality of perturbed graphs. The update of the plurality of the first condensed graphs may be iterated based on training of the GNN model 104 until the first similarity metrics of the latent representations of the plurality of first condensed graphs and the latent pseudo labels is maximized. The update of the plurality of first condensed graphs is described further, for example, in FIG.3 and FIG.6.

[0144] At 1120, an operation of information rendering may be performed. The processor 202 of the electronic device 102 may be configured to render information (e.g., first information) related to the update of the plurality of first condensed graphs. The processor 202 may render the first information related to the updated plurality of first condensed graphs on the display device 112. Control may pass to end.

[0145] Various embodiments of the disclosure may provide one or more non-transitory computer-readable storage media configured to store instructions that, in response to being executed, cause an electronic device 102 (such as, the electronic device 102) to perform a set of operations. The set of operations may include receiving an input graph associated with an application domain and determining aplurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph. The set of operations may further include determining a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs and determining a plurality of clusters associated with the plurality of first embeddings. The set of operations may further include determining a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings. The set of operations may further include determining a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters. The set of operations may further include determining a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss. The set of operations may further include determining a first similarity metric between the set of latent representations and the set of first latent pseudo-labels to update the plurality of first condensed graphs based on the first similarity metric and render information related to the updated plurality of first condensed graphs.

[0146] As used in the present disclosure, the terms “module” or “component” may refer to specific hardware implementations configured to perform the actions of the module or component and / or software objects or software routines that may be stored on and / or executed by general purpose hardware (e.g., computer-readable media, processing devices, etc.) of the computing system. In some embodiments, the different components, modules, engines, and services described in the present disclosure may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While some of the systems and methodsdescribed in the present disclosure are generally described as being implemented in software (stored on and / or executed by general purpose hardware), specific hardware implementations or a combination of software and specific hardware implementations are also possible and contemplated. In this description, a “computing entity” may be any computing system as previously defined in the present disclosure, or any module or combination of modulates running on a computing system.

[0147] Terms used in the present disclosure and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including, but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes, but is not limited to,” etc.).

[0148] Additionally, if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., “a” and / or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations.

[0149] In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” or “one or more of A, B, and C, etc.” is used, in general such a construction is intended to include A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc.

[0150] Further, any disjunctive word or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” should be understood to include the possibilities of “A” or “B” or “A and B.”

[0151] All examples and conditional language recited in the present disclosure are intended for pedagogical objects to aid the reader in understanding the present disclosure and the concepts contributed by the inventor to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. Although embodiments of the present disclosure have been described in detail, various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the present disclosure.

Claims

What is claimed is:

1. A method, executed by a processor, comprising:receiving an input graph associated with an application domain; determining a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph;determining a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs;determining a plurality of clusters associated with the plurality of first embeddings;determining a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings;determining a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters;determining a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss;determining a first similarity metric between the set of latent representations and the set of first latent pseudo-labels;updating the plurality of first condensed graphs based on the first similarity metric; andrendering information related to the updated plurality of first condensed graphs.

2. The method according to claim 1, further comprising:receiving a plurality of second condensed graphs associated with a plurality of sources;receive the set of first latent pseudo-labels associated with the plurality of perturbed graphs;training a first Graph Neural Network (GNN) model, based on the plurality of second condensed graphs and the set of first latent pseudo-labels;determining a second similarity metric between the plurality of second condensed graphs and the set of first latent pseudo-labels, based on the first GNN model;receiving a labelled input graph associated with a downstream task, based on the application domain;determining a third embedding for the labelled input graph; determining a supervised loss for the third embedding, based on third embedding, and a classifier head;determining a total loss, based on an output of the supervised loss and the second similarity metric; andrendering information related to the total loss.

3. The method according to claim 2, further comprising:receiving the input graph from the plurality of sources;receiving, by an inference model, the supervised loss, the inference model corresponding to the trained first GNN model;receiving the information related to the total loss;determining a prediction result for the downstream task based on the input graph, supervised loss and the total loss; andrendering the prediction result for the downstream task.

4. The method according to claim 2, wherein the plurality of sources include a plurality of application domains.

5. The method as claimed in claim 2, wherein the supervised loss corresponds to a minimized loss for the labelled input graph.

6. The method as claimed in claim 2, wherein the classifier head includes training a classifier head model to obtain a minimized supervised loss, wherein the supervised loss corresponds to difference between the labelled input graph of the downstream task and the third embedding.

7. The method according to claim 1, wherein determination of the plurality of first embeddings for the plurality of perturbed graphs and the set of second embeddings for the plurality of first condensed graphs includes training a second GNN model.

8. The method according to claim 1, whereinthe plurality of perturbed graphs corresponds to altered or modified graphs, andthe altered or modified graphs include addition or removal of vertices or edges within the input graph, or adjustments to weights of edges in a weighted graph corresponding to the input graph.

9. The method according to claim 1, further comprising:determining at least one type of cluster node of a plurality of cluster nodes of the plurality of clusters;assigning at least one type of cluster node to each embedding node of a plurality of embedding nodes, within the plurality of first embeddings or the set of second embeddings; anddetermining the swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the type of cluster node assigned to the plurality of embedding nodes.

10. The method according to claim 1, wherein the set of latent representations determined based on Dirac delta functions.

11. The method according to claim 1, wherein updating the plurality of first condensed graphs corresponds to matching the set of latent representations of the condensed graph to the set of first latent pseudo-labels associated with the plurality of perturbed graphs.

12. A non-transitory computer-readable storage medium configured to store instructions that, in response to being executed, causes an electronic device to perform operations, the operations comprising:receiving an input graph associated with an application domain; determining a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph;determining a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs;determining a plurality of clusters associated with the plurality of first embeddings;determining a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings;determining a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters;determining a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss;determining a first similarity metric between the set of latent representations and the set of first latent pseudo-labels;updating the plurality of first condensed graphs based on the first similarity metric; andrendering information related to the updated plurality of first condensed graph.

13. The non-transitory computer-readable storage medium according to claim 12, the operations further comprising:receiving a plurality of second condensed graph associated with a plurality of sources;determining a set of second latent pseudo-labels based on the plurality of second condensed graphs;training a first Graph Neural Network (GNN) model, based on the plurality of second condensed graphs and the set of second latent pseudo-labels;determining a second similarity metric between the plurality of second condensed graphs and the set of second latent pseudo-labels, based on the first GNN model;receiving a labelled input graph associated with a downstream task, based on the application domain;determining a third embedding for the labelled input graph; determining a supervised loss for the third embedding, based on third embedding, and a classifier head;determining a total loss, based on an output of the supervised loss and the second similarity metric; andrendering information related to the total loss.

14. The non-transitory computer-readable storage medium according to claim 13, the operations further comprising:receiving the input graph from the plurality of sources;receiving, by an inference model, the supervised loss, the inference model corresponding to the trained first GNN model;receiving the information related to the total loss;determining a prediction result for the downstream task based on the input graph, supervised loss and the total loss; andrendering the prediction result for the downstream task.

15. The non-transitory computer-readable storage medium according to claim 13, wherein the plurality of sources include a plurality of application domains.

16. The non-transitory computer-readable storage medium according to claim 13, wherein the classifier head includes training a classifier head model to obtain a minimized supervised loss, and wherein the supervised loss corresponds to difference between the labelled input graph of the downstream task and the third embedding.

17. The non-transitory computer-readable storage medium according to claim 12, wherein determination of the plurality of first embeddings for the plurality of perturbed graphs and the set of second embeddings for the plurality of first condensed graphs includes training a second GNN model.

18. The non-transitory computer-readable storage medium according to claim 12, further comprising:determining at least one type of cluster node of the plurality of cluster nodes;assigning at least one type of cluster node to each embedding node of a plurality of embedding nodes, within the plurality of first embeddings or the set of second embeddings; anddetermining the swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the type of cluster node assigned to the plurality of embedding nodes.

19. The non-transitory computer-readable storage medium according to claim 12, wherein updating the plurality of first condensed graphs corresponds to matching the set of latent representations of the condensed graph to the set of first latent pseudo-labels associated with the plurality of perturbed graphs.

20. An electronic device, comprising:a memory configured to store instructions; anda processor, coupled to the memory, configured to execute the instructions to perform a process comprising:receiving an input graph associated with an application domain; determining a plurality of first condensed graphs and a plurality of perturbed graphs, based on the input graph;determining a plurality of first embeddings for the plurality of perturbed graphs and a set of second embeddings for the plurality of first condensed graphs;determining a plurality of clusters associated with the plurality of first embeddings;determining a set of latent representations associated with the plurality of first condensed graphs, based on the set of second embeddings;determining a swapped prediction loss associated with the plurality of perturbed graphs, based on the plurality of first embeddings and the plurality of clusters;determining a set of first latent pseudo-labels associated with the plurality of perturbed graphs, based on the swapped prediction loss;determining a first similarity metric between the set of latent representations and the set of first latent pseudo-labels;updating the plurality of first condensed graphs based on the first similarity metric; andrendering information related to the updated plurality of first condensed graphs.