Intelligent prediction method based on graph neural network and transformer

By combining graph neural networks and Transformers, a KNN spatial graph is constructed and graph convolution and Transformer encoders are used to solve the difficulty of modeling spatial topology and global dependencies in geochemical exploration, and achieve efficient anomaly identification and stable spatial pattern characterization.

CN121725939BActive Publication Date: 2026-06-16JILIN UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JILIN UNIVERSITY
Filing Date
2026-02-14
Publication Date
2026-06-16

Smart Images

  • Figure CN121725939B_ABST
    Figure CN121725939B_ABST
Patent Text Reader

Abstract

The application discloses an intelligent prediction method based on a graph neural network and a transformer, relates to the technical field of geochemical exploration, and comprises the following steps: acquiring a spatial observation data set and performing data preprocessing, wherein the spatial observation data set comprises a plurality of sampling points, each sampling point comprises spatial position and gold ore grade data; a KNN spatial graph is constructed and an adjacency matrix is generated based on the spatial position through a K nearest neighbor algorithm, a unit matrix is added to the adjacency matrix to form a self-loop connection, and symmetric normalization is performed on the adjacency matrix based on a degree matrix to obtain a normalized adjacency matrix. By taking the sampling point latitude and longitude and the gold ore grade data as inputs and jointly learning on the spatial graph, the application improves the ability to describe abnormal spatial patterns and the recognition stability, and intuitively presents the abnormal distribution in a two-dimensional anomaly graph to improve the interpretation and delineation efficiency; the latitude and longitude are only used for graph construction and position coding, and do not participate in the reconstruction loss or abnormal scoring, so that the position reconstruction interference is avoided.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of geochemical exploration technology, and in particular to an intelligent prediction method based on graph neural networks and Transformers. Background Technology

[0002] Geochemical exploration, by analyzing the distribution characteristics of elemental content in surface or near-surface media, is a key method for discovering deep concealed mineral deposits (especially high-value metal deposits such as gold). The core challenge lies in accurately detecting geochemical anomalies. When spatial features such as latitude and longitude are incorporated, the task becomes even more difficult. At the same time, geochemical data are usually characterized by sparsity, noise, and irregular distribution, which poses challenges to traditional interpolation and statistical methods and limits the ability of existing methods to characterize multi-scale spatial dependencies.

[0003] Existing methods have several shortcomings. Manually identifying anomalies point by point on large geochemical datasets is impractical, resulting in slow speed and error susceptibility. Furthermore, these methods tend to oversimplify the complex, non-stationary relationships between geochemical features and geological structures. Univariate delineation methods such as CA and SA struggle to utilize spatial relationships between sampling points, particularly long-range spatial dependencies. Traditional machine learning is limited by dataset size and feature engineering complexity. Existing deep learning methods such as CNN / GNN / autoencoders, without specific design, struggle to inherently model spatial relationships, have high parameter tuning costs, and rely on labeled data, leading to insufficient generalization in unsupervised anomaly detection. Additionally, GNNs may be unstable on small datasets due to data segmentation bias and potential data leakage, and struggle to capture long-range spatial dependencies. Transformers remain challenging in adapting to spatial context preservation tasks, and their large parameter count makes training on small geochemical datasets difficult. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides an intelligent prediction method based on graph neural networks and Transformers to solve the problems of ineffective modeling of spatial topology and global dependencies, difficulty in training on small datasets, and lack of generalization ability under unsupervised conditions in geochemical anomaly identification.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] This invention provides an intelligent prediction method based on graph neural networks and Transformers, comprising: acquiring a spatial observation dataset and performing data preprocessing, wherein the spatial observation dataset contains multiple sampling points, each sampling point including spatial location and gold ore grade data; constructing a KNN spatial graph based on spatial location using the K-nearest neighbor algorithm and generating an adjacency matrix; adding an identity matrix to the adjacency matrix to form self-loop connections; and performing symmetric normalization on the adjacency matrix based on the degree matrix to obtain a normalized adjacency matrix; constructing a node feature matrix from the gold ore grade data and inputting it into a graph convolutional network encoder; performing graph convolution aggregation on the node feature matrix based on the normalized adjacency matrix to obtain local encoding; inputting the local encoding into a Transformer encoder to obtain a latent node representation matrix; inputting the latent node representation matrix into a decoder to reconstruct features; determining node-level anomaly scores based on the differences between the reconstructed features and the node feature matrix; and outputting a two-dimensional anomaly map based on the node-level anomaly scores.

[0008] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the spatial observation dataset is organized in units of sampling points, the spatial location of the sampling points is latitude and longitude, and the gold ore grade data of the sampling points is used as node features.

[0009] As a preferred embodiment of the intelligent prediction method based on graph neural network and Transformer described in this invention, the data preprocessing refers to deleting missing values ​​and filling local mean values ​​in the gold ore grade data, and performing logarithmic transformation on the skewed gold ore grade data.

[0010] Outlier and noise filtering is performed before constructing the KNN spatial graph, and min-max normalization is performed on the spatial locations.

[0011] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, wherein: the KNN spatial graph includes a set of nodes and a set of edges;

[0012] The edge set is constructed using the K-nearest neighbor algorithm based on the Euclidean distance of the spatial locations of the sampling points;

[0013] Using the latitude and longitude coordinates from the spatial observation dataset, a sparse matrix of the graph is constructed using the K-nearest neighbor algorithm, generating an adjacency matrix. ;

[0014] The adjacency matrix is ​​subjected to symmetric normalization to generate a normalized adjacency matrix.

[0015] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the specific steps for constructing a node feature matrix from gold ore grade data are as follows:

[0016] Arrange the gold ore grade data corresponding to each sampling point in the spatial observation dataset according to the sampling point index, and use the gold ore grade data corresponding to each sampling point as the feature vector of the corresponding node.

[0017] The feature vectors of all nodes are combined to form the node feature matrix.

[0018] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the graph convolutional network encoder uses two layers of GCN to perform local structure encoding on the node feature matrix and outputs the local encoding.

[0019] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the Transformer encoder captures global contextual relationships through a multi-head self-attention mechanism, and each attention head calculates the query matrix, key matrix, and value matrix through local encoding.

[0020] Local encoding outputs the latent node representation matrix through a Transformer encoder.

[0021] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the decoder adopts a multilayer perceptron (MLP) and includes two fully connected layers with ReLU activation functions.

[0022] The latent node representation matrix is ​​input into the decoder for reconstruction to obtain the reconstructed features.

[0023] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the specific steps for determining the node-level anomaly score based on the difference between the reconstructed features and the node feature matrix are as follows:

[0024] Calculate the first reconstructed feature and the node feature matrix. The reconstruction error of each node is used to generate a node-level anomaly score. The node-level anomaly score is calculated using the node-level metric of mean squared error (MSE), and the expression is as follows:

[0025] ;

[0026] in, Indicates the first Node-level anomaly score for each node; Indicates the first The reconstructed value of each node; Indicates the first The original feature values ​​of each node;

[0027] The mean squared error (MSE) is obtained by averaging the node-level anomaly scores of all nodes over the total number of nodes. The expression for MSE is:

[0028] ;

[0029] in, Indicates mean square error; This represents the total number of nodes in the reconstructed feature and node feature matrix;

[0030] The node feature matrix and reconstructed features are reshaped into a two-dimensional grid similar to an image according to their spatial location. Before gridding, the unsampled positions are interpolated to obtain the original two-dimensional grid and the reconstructed two-dimensional grid.

[0031] The original 2D mesh and the reconstructed 2D mesh are compared using the Structural Similarity Index (SSIM), expressed as follows:

[0032] ;

[0033] ;

[0034] ;

[0035] in, This represents the structural similarity index between the original 2D mesh and the reconstructed 2D mesh; These represent local windows of the original 2D mesh and the reconstructed 2D mesh, respectively; This represents the local mean of two local windows; This represents the variance of the two local windows; This represents the covariance of the two local windows; and Represents the stability constant; This represents the dynamic range of gold ore grade on a two-dimensional grid.

[0036] As a preferred embodiment of the intelligent prediction method based on graph neural networks and Transformers described in this invention, the step of outputting a two-dimensional anomaly map based on node-level anomaly scores refers to mapping the node-level anomaly scores corresponding to each sampling point to the grid cells corresponding to the spatial positions in a two-dimensional grid, thereby obtaining a two-dimensional anomaly map.

[0037] The beneficial effects of this invention are as follows: by using the latitude and longitude of sampling points and gold ore grade data as inputs and jointly learning local spatial relationships and long-distance correlations on a spatial map, the ability to characterize and identify anomalous spatial patterns is improved, and the anomalous distribution is presented intuitively in a two-dimensional anomalous map to improve the efficiency of interpretation and delineation; latitude and longitude are only used for map construction and location encoding and do not participate in reconstruction loss or anomalous scoring, thus avoiding meaningless location reconstruction interference; unsupervised reconstruction is adopted and node-level anomalous scores are calculated based on reconstruction differences, and after generating a regular grid by IDW interpolation, global evaluation is performed using SSIM, thereby improving the spatial continuity and structural consistency of the two-dimensional anomalous map. Attached Figure Description

[0038] Figure 1 This is a flowchart of an intelligent prediction method based on graph neural networks and Transformers.

[0039] Figure 2 This is a diagram of the GeoTransGNN model architecture.

[0040] Figure 3 This is a schematic diagram illustrating the performance of KNN with different K values.

[0041] Figure 4 This is a schematic diagram illustrating the performance of GCN encoders with different numbers of hidden units.

[0042] Figure 5 This is a schematic diagram showing the mean squared error of three models at different network depths.

[0043] Figure 6 This is a schematic diagram illustrating the model performance for different batch sizes.

[0044] Figure 7 This diagram illustrates the model performance at different dropout rates.

[0045] Figure 8 This diagram illustrates the performance of Transformer with different multi-head attention counts.

[0046] Figure 9 The graph shows the training loss of different number of layers as a function of epochs.

[0047] Figure 10 This is a bar chart showing the model's MSE performance on the gold mine dataset.

[0048] Figure 11 This is a bar chart showing the model's SSIM performance on the gold mine dataset.

[0049] Figure 12 This is a two-dimensional schematic diagram of gold mine anomalies in the original dataset.

[0050] Figure 13This is a schematic diagram of the final anomaly reconstructed by the Transformer encoder based on the squared error.

[0051] Figure 14 This is a schematic diagram of the final anomaly reconstructed by the GNN model based on squared error.

[0052] Figure 15 This is a schematic diagram of the final anomaly reconstructed by the GeoTransGNN model based on squared error.

[0053] Figure 16 The model's ROC curve and success rate curve are shown. Detailed Implementation

[0054] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0055] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0056] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0057] Example 1, referring to Figure 1 and Figure 2 This is the first embodiment of the present invention, which provides an intelligent prediction method based on graph neural networks and Transformers, including the following steps:

[0058] S1. Obtain the spatial observation dataset and perform data preprocessing. The spatial observation dataset contains multiple sampling points, and each sampling point includes spatial location and gold ore grade data.

[0059] It should be noted that this invention proposes a hybrid architecture GeoTransGNN model (geochemical graph neural network transformer), which integrates the topological aggregation capability of graph neural networks (GNNs) with the global context modeling advantages of transformers. Through joint learning of spatial coordinates and gold ore grade data, it captures the geometric features and geochemical essence of the metallogenic system. GeoTransGNN represents a hybrid model of "geochemical graph neural network transformer (Geo for geochemistry) - Transformer (abbreviated as Trans, transformer) - graph neural network (GNN)". Specifically, the graph neural network (GNN) is implemented using a graph convolutional network (GCN), which consists of two GCN layers. The hybrid architecture GeoTransGNN... The GeoTransGNN model's encoders include GCN and Transformer encoders, used to combine spatial information with gold ore grade data for gold anomaly detection. The GeoTransGNN model improves the interpretability and predictive performance of the model under different geochemical environments, while solving the scalability and oversmoothing problems of GNN. Oversmoothing refers to the phenomenon that as the number of layers in a graph neural network increases, the representations of different nodes gradually converge, thus weakening the discriminative ability. Scalability refers to the ability to maintain trainability and effective performance when the node size increases, the graph structure becomes more complex, or longer-distance dependencies need to be modeled. The GeoTransGNN model adopts an unsupervised training method, requiring no anomaly labels, avoiding the high cost of labeled data, while learning complex and robust associations within geochemical datasets.

[0060] The expression for the space observation dataset is,

[0061] ;

[0062] in, Represents a space observation dataset; Indicates the first Node feature data for each sampling point, such as mineral content and elevation; Indicates the first The spatial location of each sampling point; Indicates the total number of sampling points; This represents the sampling point index, with a value range of 1 to... .

[0063] Data preprocessing refers to removing missing values ​​and filling local mean values ​​in gold ore grade data, and performing logarithmic transformation on skewed gold ore grade data to reduce the impact of extreme values.

[0064] Outlier and noise filtering is performed before constructing the KNN spatial graph to ensure spatial consistency and improve the reproducibility of results.

[0065] Spatial location refers to latitude and longitude. Minimum-maximum normalization is performed on spatial location to ensure consistency of characteristic dimensions.

[0066] The spatial observation dataset is organized by sampling point, with the spatial location of the sampling point being latitude and longitude, and the gold ore grade data of the sampling point serving as node features.

[0067] S2. Based on spatial location, construct a KNN spatial graph using the K-nearest neighbor algorithm and generate an adjacency matrix. Add an identity matrix to the adjacency matrix to form self-loop connections. Then, perform symmetric normalization on the adjacency matrix based on the degree matrix to obtain a normalized adjacency matrix.

[0068] It should be noted that the expression for the KNN spatial graph is,

[0069] ;

[0070] in, Represents the KNN spatial graph; This represents a set of nodes, where each node corresponds to a sampling point; Represents the set of edges.

[0071] KNN spatial graphs use sampling points as nodes in a graph. Based on the Euclidean distance of the sampling points' spatial locations (latitude and longitude), the K-Nearest Neighbors (KNN) algorithm selects the K nearest neighbors for each node and establishes edges between the nodes, forming a graph structure that reflects the spatial proximity of the sampling points. KNN spatial graphs are used to characterize the spatial topological adjacency between samples, ensuring that geographically adjacent samples remain close in the graph space, facilitating subsequent aggregation of neighborhood information through graph convolution. spatial location Belongs to node of When the nearest neighbor is 1, then in and To establish an edge between them, the expression is:

[0072] ;

[0073] in, Indicates the node index.

[0074] Using the latitude and longitude coordinates from the spatial observation dataset, a sparse matrix of the graph is constructed using the K-nearest neighbor algorithm, generating an adjacency matrix composed of 0s and 1s. If node To the node If an edge exists, then ,otherwise .

[0075] The adjacency matrix is ​​symmetrically normalized, and its expression is:

[0076] ;

[0077] in, Represents the normalized adjacency matrix; Degree matrix; This represents the identity matrix, used for self-loop connections.

[0078] An adjacency matrix is ​​a matrix representation used to represent the connection relationships between nodes. The matrix elements are used to characterize whether there is an edge connection between any two nodes. A sparse matrix is ​​a storage / representation form in which most elements of the adjacency matrix are zero when the node scale is large, which is used to reduce storage and computational overhead. Self-loop connection means that each node establishes a connection with itself, and retains the node's own information when aggregating the neighborhood. A normalized adjacency matrix is ​​a matrix representation obtained by introducing self-loop connections into the adjacency matrix and then symmetrically normalizing it in combination with the degree matrix. It is used to stabilize training and balance the differences in aggregation contributions caused by different node degrees.

[0079] The adjacency matrix reflects the topological structure of the graph and captures spatial proximity; symmetric normalization ensures bidirectional consistency of edges; this step enables the GeoTransGNN model to respect the topological structure of the data, which is crucial for geochemical pattern recognition in local areas; latitude and longitude, as structural attributes, define the spatial adjacency relationships and edge connectivity in the KNN spatial graph, ensuring that samples that are geographically close remain close in the KNN spatial graph as well. This location information helps preserve spatial continuity and improves the localization accuracy of anomaly detection; nodes in the KNN spatial graph aggregate features from neighboring nodes. If only neighboring features are used, their own feature information may be lost; therefore, the adjacency matrix... and identity matrix Ensure each node is connected to itself; degree matrix It is a diagonal matrix containing the number of edges (neighborhoods) of each node (including self-loops); symmetric normalization balances the neighborhood contributions of nodes of different degrees, avoiding the dominance of features of high-degree nodes in aggregation; the GCN layer uses a normalized adjacency matrix to ensure that each node fairly sends and receives messages (including its own messages), making the GeoTransGNN model balanced, stable and efficient.

[0080] The degree of a node refers to the number of edges a node is connected to in a graph, i.e., the number of neighboring nodes a node has. When self-loop connections are introduced, the degree of a node can be understood as "the total number of connections between a node and its neighbors and itself", and can be represented by the diagonal elements of the degree matrix. A node with a high degree is a node with a large number of connected edges and a large neighborhood range. If no constraints are imposed, it is easier for features to dominate in the weighted aggregation during neighborhood aggregation, resulting in an imbalance in the contributions of different nodes.

[0081] S3. Construct a node feature matrix from the gold ore grade data and input it into a graph convolutional network encoder. Perform graph convolutional aggregation on the node feature matrix based on the normalized adjacency matrix to obtain local encoding. Input the local encoding into the Transformer encoder to obtain the latent node representation matrix.

[0082] It should be noted that the gold ore grade data corresponding to each sampling point in the spatial observation dataset are arranged in the order of the sampling point index, and the gold ore grade data corresponding to each sampling point is used as the feature vector of the corresponding node.

[0083] The feature vectors of all nodes are combined to form the node feature matrix, expressed as follows:

[0084] ;

[0085] in, Represents the node feature matrix; This indicates the input feature dimension, i.e., the number of input features, the number of features per sampling point.

[0086] The graph convolutional network encoder uses a GCN layer with a normalized adjacency matrix to perform neighborhood aggregation on the node feature matrix. That is, based on the normalized adjacency matrix, the feature vectors corresponding to each node and its neighboring nodes are aggregated to capture the local graph structure and neighborhood features. The GCN layer aggregates the features of adjacent nodes, thereby simultaneously encoding the gold grade data and local geochemical background of each sample. This is particularly useful in structurally complex or sparsely sampled areas, and the local spatial gradient can reveal the mineralization system.

[0087] Two GCN layers are used to encode the local structure of the node feature matrix, and the output local encoding is expressed as follows:

[0088] ;

[0089] ;

[0090] in, This represents the node representation output by the first GCN layer; This represents the trainable weight matrix corresponding to the first GCN layer, used to map features to a new feature space; This represents the node representation output by the second GCN layer, i.e., local encoding; This represents the trainable weight matrix corresponding to the second GCN layer; This represents a non-linear activation function.

[0091] The number of layers and the hidden dimension of the GCN layer are hyperparameters, which are fine-tuned through grid search during the learning process.

[0092] GCN layers excel at capturing local neighborhood information, but have limitations in modeling long-range dependencies; to address this issue, the Transformer encoder converts the output of the GCN layer... It is regarded as an input token sequence (each node corresponds to a token), which improves the GeoTransGNN model's ability to capture long-distance dependencies. For example, anomalies may cross faults or strata, and although they are spatially far apart, they are geologically related.

[0093] The input token sequence refers to the sequence formed by arranging the local codes output by the GCN layer in the order of node indices. Each node corresponds to a token, and the representation vector of each token is the local code of the node. The Transformer encoder takes the token sequence as input and models the global and long-distance dependencies between all nodes through a multi-head self-attention mechanism, thereby making up for the shortcomings of the GCN layer in long-distance dependency modeling.

[0094] The Transformer encoder captures global contextual relationships through a multi-head self-attention mechanism. Each attention head computes the query matrix, key matrix, and value matrix through local encoding, expressed as follows:

[0095] ;

[0096] in, Represents the query matrix; Represents the key matrix; Represents a value matrix; and These represent the trainable weight matrices corresponding to the query matrix, key matrix, and value matrix, respectively.

[0097] The self-attention mechanism is computed using standard multi-head attention, and the expression is:

[0098] ;

[0099] Among them, Attention ( ) represents the self-attention mechanism; This represents the dimension of the key vector.

[0100] Self-attention is a computational mechanism used by Transformer encoders to model the internal relationships within an input token sequence. Its core principle is: for each token in the input token sequence (which in this invention can correspond to the local encoding of each sampling point / node), different attention weights are assigned based on the relevance between the token and all tokens in the input token sequence. The feature representations of all tokens are then weighted and aggregated according to these attention weights to obtain an updated representation that incorporates global contextual information. The relevance is typically determined by the query matrix. AND key matrix The similarity was calculated and normalized to obtain the attention weights, and the value matrix was then used to calculate the attention weights. The output representation is obtained by performing a weighted summation. In this way, the self-attention mechanism can capture the dependencies between distant nodes without relying on a fixed neighborhood, thereby enhancing the Transformer encoder's ability to characterize global spatial patterns and long-distance geological correlations.

[0101] Local encoding outputs a latent node representation matrix through a Transformer encoder to capture global and long-range interactions between nodes, expressed as follows:

[0102] ;

[0103] in, The matrix represents the potential nodes.

[0104] The goal of the decoder is to transform the latent node representation matrix. Reconstructing the feature matrix of nodes This enables the GeoTransGNN model to learn normal patterns and deviations from normal patterns. If the GeoTransGNN model has difficulty accurately reconstructing the features of a node, it indicates that the current node may be abnormal (i.e., the feature pattern differs from the "normal" pattern learned by the GeoTransGNN model).

[0105] The decoder reconstructs the latent node representation matrix, enabling the GeoTransGNN model to form a representation of common patterns (normal patterns) in the data during unsupervised training. Here, "normal pattern" refers to the dominant spatial-geochemical feature distribution pattern learned by the GeoTransGNN model in the training data. "Deviating from normal pattern" means that the feature patterns of some nodes differ from the common patterns in the data, making it difficult for the GeoTransGNN model to accurately reconstruct the node features, thus exhibiting a large reconstruction error and being judged as a potential anomaly.

[0106] The decoder uses a multilayer perceptron (MLP) and contains two fully connected layers with ReLU activation functions.

[0107] The latent node representation matrix is ​​input into the decoder for reconstruction, yielding the reconstructed features, expressed as follows:

[0108] ;

[0109] in, Represents reconstructed features; Indicates the output dimension is The multilayer perceptron has the same input feature dimension as the node feature matrix; This represents the first-layer weight matrix; This represents the bias vector of the first layer; This represents the weight matrix of the second layer; This represents the bias vector of the second layer.

[0110] Latitude and longitude are used only for graph construction and location encoding of the Transformer encoder, and are not involved in reconstruction loss calculation or anomaly scoring (reconstruction location information has no practical significance in anomaly detection). This avoids reconstruction interference from meaningless locations. Only gold ore grade data is used for subsequent reconstruction error and anomaly score calculation, ensuring that the GeoTransGNN model retains explicit spatial information of coordinates, which is crucial for modeling the spatial relationships between geochemical samples, while avoiding the loss of location information during GCN layer embedding.

[0111] S4. Input the latent node representation matrix into the decoder to reconstruct the reconstructed features. Determine the node-level anomaly score based on the difference between the reconstructed features and the node feature matrix. Output a two-dimensional anomaly map based on the node-level anomaly score.

[0112] It should be noted that the calculation of the reconstructed features and the node feature matrix is ​​the first step. The reconstruction error of each node is used to generate a node-level anomaly score. The node-level anomaly score is calculated using the node-level metric of mean squared error (MSE), and the expression is as follows:

[0113] ;

[0114] in, Indicates the first Node-level anomaly score for each node; Indicates the first The reconstructed value of each node; Indicates the first The original feature values ​​of each node.

[0115] The mean squared error (MSE) is obtained by averaging the node-level anomaly scores of all nodes over the total number of nodes. The expression for MSE is:

[0116] ;

[0117] in, Indicates mean square error; This represents the total number of nodes in the reconstructed feature and node feature matrix, i.e., the total number of sampling points.

[0118] Since geochemical sampling points are typically irregularly distributed, directly reconstructing gold grade data into a grid may lead to spatial discontinuity. Therefore, before applying SSIM, inverse distance weighted (IDW) interpolation is used to estimate the gold grade of unsampled points, generating a grid with regular intervals while preserving the spatial coherence of gold grades and ensuring that adjacent grid cells correspond to geographically adjacent locations. IDW interpolation involves estimating the grid cells not directly covered by sampling points based on an inverse weighted average of the distance to surrounding sampling points to obtain a spatially coherent, regular grid. The grid resolution is selected based on the distribution of sampling points and the size of the study area to ensure accurate characterization of geochemical data. The node feature matrix and reconstructed features are reconstructed into an image-like two-dimensional grid according to spatial location, and interpolation is performed on unsampled locations before gridding to ensure spatial coherence, resulting in the original two-dimensional grid and the reconstructed two-dimensional grid. The expression is as follows:

[0119] ;

[0120] in, Represents a two-dimensional grid; Represents the first in a two-dimensional mesh Line number Column elements; This function represents the value of gold ore grade. Represents the first in a two-dimensional mesh Line number The latitude of the column; Represents the first in a two-dimensional mesh Line number The longitude of the column.

[0121] The highest score is for node-level anomalies ( The largest node is identified as a potential anomaly because its feature pattern deviates from the normal pattern learned by the GeoTransGNN model. Unlike simple point-state comparison methods such as MSE, SSIM measures the similarity of spatial anomaly maps or images by evaluating structural information. In anomaly detection based on spatial features, spatial concentration distribution patterns are more important than individual values. For example, two outputs may have similar average values ​​(lower MSE) but significantly different spatial patterns. In this case, SSIM can detect the difference, while MSE may not be able to reflect it.

[0122] Each grid cell corresponds to a spatial location. The original 2D grid and the reconstructed 2D grid are compared using the Structural Similarity Index (SSIM), and the SSIM is used as a global evaluation metric. The expression is as follows:

[0123] ;

[0124] ;

[0125] ;

[0126] ;

[0127] in, This represents the structural similarity index between the original 2D mesh and the reconstructed 2D mesh; These represent local windows of the original 2D mesh and the reconstructed 2D mesh, respectively; This represents the local mean of two local windows; This represents the variance of the two local windows; This represents the covariance of the two local windows; and Represents the stability constant; This represents the dynamic range of gold ore grade data on a two-dimensional grid, and is the difference between the maximum and minimum pixel values ​​within the two-dimensional grid.

[0128] Outputting a two-dimensional anomaly map based on node-level anomaly scores means mapping the node-level anomaly scores corresponding to each sampling point to the corresponding grid cells in a two-dimensional grid, so that the pixel values ​​of each grid cell in the two-dimensional grid represent the node-level anomaly scores of the corresponding spatial locations, thus obtaining a two-dimensional anomaly map.

[0129] SSIM is typically computed locally over an N×N window and averaged across the entire 2D grid. SSIM scores range from 1.0 (completely identical) to 0.0 (no similarity). If the 2D grids differ significantly, the node-level anomaly score may be negative. In image-based anomaly detection, low MSE and high SSIM together indicate that the GeoTransGNN model achieves accurate pixel-level reconstruction while preserving the overall image structure. This GeoTransGNN model employs unsupervised training, with no anomaly labels in the data. This unsupervised anomaly detection strategy is suitable for mineral exploration (where labeled data is often scarce or inaccurate). During training, the GeoTransGNN model converts the original input into a hidden layer representation and then maps the hidden layer representation back to the original input space, completing the learning process by evaluating the difference between the original input and the reconstructed data.

[0130] In summary, this invention improves the ability to characterize and identify anomalous spatial patterns and enhances identification stability by using the latitude and longitude of sampling points and gold ore grade data as inputs and jointly learning local spatial relationships and long-distance correlations on a spatial map. It also presents the anomalous distribution intuitively in a two-dimensional anomaly map to improve interpretation and delineation efficiency. Latitude and longitude are used only for map construction and location encoding, and do not participate in reconstruction loss or anomaly scoring, avoiding meaningless location reconstruction interference. Unsupervised reconstruction is employed, and node-level anomaly scores are calculated based on reconstruction differences. After generating a regular grid using IDW interpolation, global evaluation is performed using SSIM, thereby improving the spatial continuity and structural consistency of the two-dimensional anomaly map.

[0131] Example 2, Figures 3-16 As a second embodiment of the present invention, and to further verify the technical solution of the present invention, a parameterized analysis process applicable to intelligent prediction methods based on graph neural networks and Transformers is provided.

[0132] The study area of ​​this invention is the Hatu goldfield, located in northwestern Xinjiang Uygur Autonomous Region, China. The Hatu Fault is the main tectonic feature of this region and plays a key controlling role in mineralization: it provides migration channels for hydrothermal fluids required for mineralization, thereby controlling the distribution of gold deposits. In addition to the fault, the Tailegula Formation and the Lower Baogutu Formation contain a variety of lithological units closely related to mineralization. These strata have high gold content, providing important evidence for studying favorable geological and geochemical conditions for mineralization and are crucial for understanding the regional gold deposit distribution patterns.

[0133] This invention preprocesses the spatial observation dataset: normalizes the spatial coordinates (latitude and longitude) and gold ore grade values, and constructs a node feature matrix; it then employs... The KNN spatial graph captures the spatial proximity between samples; the adjacency matrix is ​​symmetrically normalized and then input into a two-layer GCN; the GCN encoder uses a 64-dimensional hidden layer and the ReLU activation function; the GCN output is embedded into a two-layer Transformer encoder (2 attention heads, model dimension 64 and feedforward network dimension 128), preserving the geospatial structure; the decoder uses a two-layer MLP with ReLU activation to reconstruct the input features; the GeoTransGNN model is trained unsupervised, by minimizing the MSE loss between the original features and the reconstructed features for 100 epochs, with the Adam optimizer (initial learning rate 0.01, batch size 16); based on validation loss... An early stopping strategy (10 epochs of patience) was adopted to prevent overfitting. The following hyperparameters were optimized through grid search: number of layers (1-5), number of Transformer attention heads (1-8), model dimension (8-128), batch size (4-64), dropout rate (0.1-0.9), number of training epochs (10-400), and number of neighborhoods K in the KNN spatial graph (3-17). The dropout rate refers to the proportion of hidden layer units that are randomly deactivated during training, which is used to suppress overfitting and improve generalization ability. As shown in Table 1, the key architecture and training hyperparameters of GeoTransGNN are summarized to ensure the clarity and reproducibility of the experiment.

[0134] The configuration in Table 1 ensures that the GeoTransGNN model possesses both spatial awareness and contextual robustness in geochemical anomaly detection. During hyperparameter tuning, MSE (mean squared error between observed gold grade and corresponding reconstructed output) is used to evaluate model performance. MSE is the average of the squared differences between observed gold grade and corresponding reconstructed output, which is very suitable for unsupervised learning frameworks such as graph-based models (the goal is to minimize the reconstruction error of spatial-geochemical data). In this study, gold anomalies are identified by ranking nodes according to MSE scores; higher scores indicate a greater likelihood of mineralization anomalies. The MSE method, combined with geochemical concentration data and spatial relationships, provides a robust means for detecting gold enrichment zones.

[0135] Table 1. Key architecture and training hyperparameters of the GeoTransGNN model:

[0136]

[0137] The results of the proposed anomaly detection method are evaluated through quantitative indicators and visualization analysis. In machine learning, hyperparameters are parameters set before training to control the behavior of the training algorithm. The goal of hyperparameter optimization is to find the parameter settings that minimize the reconstruction error during the training of the GeoTransGNN model. The performance of the GeoTransGNN model is affected by various factors, including network depth, batch size, dropout rate, and number of attention heads. To achieve high performance, the hyperparameters of the proposed model and the benchmark model need to be tuned to find the optimal values. Although hyperparameter fine-tuning is time-consuming, this invention still uses grid search to fine-tune the parameters on the data. The initial parameters of all models are shown in Table 1 (maintaining consistency). This invention uses pure graph neural networks (GNNs) and pure Transformers as benchmark models and compares them with the proposed GeoTransGNN model. To ensure a fair comparison, the benchmark models use the same preprocessing procedures, data segmentation methods, and hyperparameter ranges.

[0138] In this invention, "network depth" refers to the number of encoder layers in each model configuration: when tuning a certain parameter (such as the number of hidden units), other parameters remain unchanged; based on spatial features, a KNN spatial graph is constructed from sample points using the KNN algorithm, and the number of neighbors in the KNN... It is a hyperparameter of the algorithm, and choosing an appropriate value for k will affect the model output.

[0139] This invention tested =3 to Given multiple values ​​of 17, we found... The KNN spatial graph constructed with a KNN value of 13 has the best performance (lowest reconstruction error). Figure 3 Showing different The MSE result of the value.

[0140] In GCNs, the number of hidden units has a significant impact on model capacity and performance, such as Figure 4 As shown, the more hidden units a model has, the stronger its ability to capture complex patterns and the richer its feature representation, but too many may lead to overfitting; conversely, too few hidden units may lead to underfitting, limiting the performance of GCN. In addition, increasing the number of hidden units will increase training complexity, which may make optimization difficult. Therefore, choosing the optimal number of hidden units is crucial to balancing complexity, performance and generalization ability. Figure 4 The MSE scores of GCN with different numbers of hidden units are shown. Five values ​​(8, 16, 32, 64 and 128) were tested, and it was found that GCN performs best when the number of hidden units is 64. Figure 5The performance of GeoTransGNN, GNN, and Transformer at different network depths (1-5 layers) was compared. The results show that deep models (4-5 layers) did not consistently outperform shallow models (1-2 layers) in geochemical anomaly identification. Deep models face convergence difficulties, possibly due to increased training complexity. For example, setting a suitable learning rate for deep models is challenging, and even if convergence is achieved, deep models may get stuck in poor local minima, resulting in insufficient diversity of learned filters. The study found that the minimum MSE was minimized when the GCN depth was set to 2 layers, indicating that beyond this depth, increasing the number of layers does not significantly improve model performance.

[0141] Batch size affects a model's memory usage, performance, and generalization ability: large batch sizes improve gradient estimation and training speed but require more memory and may limit generalization ability; small batch sizes reduce memory requirements but increase gradient noise, potentially improving generalization ability but hindering training; this invention tested five different batch sizes. Figure 6 The MSE scores for each batch size are shown. Both excessively small and excessively large batch sizes fail to effectively improve model performance: GNN and GeoTransGNN models achieve the minimum MSE with a batch size of 16, while Transformer achieves the minimum MSE with a batch size of 32. Larger batch sizes consume more memory and reduce processing speed, so a batch size of 16 was used for all three models in the final experiment.

[0142] Figure 7 The MSE scores of the three models were compared under different dropout rates. Dropout helps eliminate network noise and overfitting: if the model is trained on an insufficient dataset, overfitting may occur; solutions include increasing the dataset size or reducing the number of hidden units for feature computation. Dropout can randomly deactivate hidden layer units of the model, ignoring these units in subsequent calculations. The GeoTransGNN model consistently outperformed the other two models, achieving the lowest MSE (0.14) at a dropout rate of 0.5, and demonstrating robustness at moderate dropout rates. Furthermore, the GNN model performed well. The GeoTransGNN model performs well (but not as well as the GeoTransGNN model), with an MSE of 0.16 at a dropout rate of 0.4. The Transformer and GNN models only show a significant decrease in MSE at a dropout rate of 0.4, with higher MSEs at other dropout rates, indicating that these two models are more prone to underfitting or require different hyperparameter settings to achieve optimal performance. Overall, the GeoTransGNN model performs best in balancing low MSE and dropout regularization. Optimizing the number of multi-head attention layers is crucial for maximizing the generalization ability and gold mine anomaly prediction accuracy of the GeoTransGNN model.

[0143] Figure 8 This study demonstrates the impact of varying the number of multi-head attention heads in the Transformer on model performance: more attention heads help capture diverse representations, especially in complex tasks, as different attention heads can focus on different contexts, enhancing the Transformer's ability to understand subtle differences in the data (such as long-range dependencies); the results show that using two attention heads minimizes the MSE, indicating that the configuration of two attention heads improves the Transformer's predictive ability and successfully captures complex relationships in the data; a single attention head has a higher MSE, indicating limited ability to extract diverse information; performance degrades after the number of attention heads exceeds two, indicating overfitting and reduced feature extraction efficiency; in each epoch, the Transformer traverses the entire batch once, performing forward and backward computations for each batch to update parameters; in unsupervised learning, choosing an appropriate number of training epochs is crucial, as overtraining may lead to model overfitting (capturing noise rather than meaningful patterns), reducing the ability to generalize to new data.

[0144] Figure 9 The training loss of the three models is shown as a function of epochs: the y-axis represents the loss value, and the x-axis represents the number of training epochs (maximum 400). All three models initially have high losses, which decrease rapidly in the first few epochs, indicating good learning performance. After 50 epochs, the losses of all models stabilize and approach zero, indicating model convergence and no longer significant changes in weights. The loss curves of all models overlap and show similar trends, indicating comparable training performance. All three models are trained on the same dataset using an unsupervised method, employing the same optimization and loss functions: the optimization function is based on adaptive learning rate adjustment, and the loss function is based on mean squared error. All three models have the same initial parameters, using the Adam optimizer and a learning rate of 0.01.

[0145] The performance of the proposed model was compared with that of the baseline model using bar charts and two-dimensional anomaly maps: the anomaly maps can reveal the spatial distribution of anomalies and their relationship with input features; to ensure fair comparison and geological consistency, all anomaly maps were drawn using the same classification threshold and geological overlays (faults and lithology), improving the interpretability of the visualization.

[0146] Figure 10 The MSE values ​​of the three models are shown. Figure 11 The SSIM value is shown; the GeoTransGNN model is the optimal model for gold anomaly detection that integrates spatial features and geochemical (gold deposit) features, achieving a balance between low error and structural integrity: MSE quantifies the pixel-level differences between the original and reconstructed geochemical features, and SSIM captures perceived similarity, especially sensitive to structural patterns and spatial coherence, which are crucial for anomaly detection in mineral exploration.

[0147] The MSE value reflects the degree of agreement between the reconstructed anomaly map and the original gold ore grade value: the GeoTransGNN model has the lowest MSE (0.072), indicating the best reconstruction fidelity and the ability to effectively capture the geochemical gradient and local variation of gold ore distribution, which is crucial for identifying subtle anomalies in mineralized areas; the GNN model has an MSE of 0.12, which is moderate, thanks to the use of spatial relationships in the graph structure, but lacks the depth to model complex geospatial interactions; the Transformer model has the highest MSE (0.19), highlighting its limitations in the context of geoscience (spatial continuity and neighborhood association are crucial). The results show that, like the GeoTransGNN model, which integrates spatial graph structure with attention-based contextual modeling, more accurate gold ore anomaly modeling can be achieved, reducing the risk of false positives or missed mineralization locations.

[0148] SSIM further tests the model's ability to preserve spatial structure in reconstructing geochemical models: the GeoTransGNN model has the highest SSIM (0.85), indicating its advantage in preserving the local spatial texture and global structural layout of gold anomalies; the GNN model has an SSIM of 0.82, performing similarly and maintaining good spatial consistency, but may lack the deep context captured by the Transformer enhanced architecture; the Transformer alone performs the weakest (SSIM=0.78), due to the lack of explicit graph modeling, making it difficult to capture the inherent topological features of geospatial data; overall, the high SSIM of the GeoTransGNN model confirms the preservation of key spatial relationships in gold anomaly detection; in mineral exploration tasks, the structural distribution of elemental content is an indicator of the mineralization process, and the GeoTransGNN model, through balanced architecture design, improves the reconstruction quality based on the tested GNN and Transformer benchmark models; in addition to quantitative indicators, the spatial patterns of anomalies are also compared by reconstructing two-dimensional graphs.

[0149] Figures 12-15 Geochemical maps generated by the various models in this invention are shown: Figure 12 This is a two-dimensional gold mine anomaly map of the original dataset. Figures 13-15 The images show the final anomaly maps reconstructed based on squared errors using the Transformer, GNN, and GeoTransGNN models, respectively. The maps illustrate the distribution of gold grades in the study area, with color gradients representing gold content: warm colors (red and yellow) indicate high content, and cool colors (green and blue) indicate low content. The coordinate axes are geographic coordinates, providing a spatial context for the gold distribution. All maps are classified using the same quantile threshold (0-1 normalized fraction), and lithological boundaries and Hatu fault traces are overlaid. The GeoTransGNN model highlights more anomaly patterns.

[0150] Figure 13 The graph reconstructed by the Transformer shows a visual improvement over the original data: the Transformer captures complex spatial relationships, making the representation of gold content more coherent, but there is some bias in sparse areas of the data, indicating that the Transformer can handle large datasets, but may be affected by data gaps; in contrast, Figure 14 The maps generated by GNN in China use different geological data reconstruction methods: GNN effectively utilizes the spatial relationships between geological features and maintains spatial context and local correlations, but it may not be able to fully capture the broad trends in structurally complex areas, and there is a risk of overfitting local features rather than representing the overall distribution of gold deposits; Figure 15 The GeoTransGNN model combines the advantages of Transformer and GNN architectures: it optimizes the spatial representation of gold deposits and improves the overall fidelity of the reconstruction; by combining geospatial data with graph-based learning, the GeoTransGNN model reveals the underlying geological structure more deeply.

[0151] The high-anomaly areas highlighted in red are highly correlated with known gold deposits, demonstrating the predictive accuracy of the GeoTransGNN model. The low-anomaly areas, displayed in cool tones, accurately reflect areas with extremely low gold content, closely matching the actual data. The maps generated by the GeoTransGNN model closely resemble the original data, accurately capturing local and global trends, highlighting the robustness and effectiveness of the GeoTransGNN model in geochemical anomaly prediction. By accurately identifying high-gold-content areas, the model improves the effectiveness of mineral resource assessment while ensuring that low-gold-content areas are also correctly characterized.

[0152] The interpretability of the GeoTransGNN model is demonstrated by its ability to correlate the model's attention patterns with geological features: the high anomaly areas generated by the GeoTransGNN model spatially correspond to known gold deposits and fault traces, confirming geological consistency; visualization of the self-attention distribution further shows that the GeoTransGNN model focuses on spatially influential nodes and structurally important regions, establishing an interpretable link between the learned representation and the actual geological factors controlling mineralization; compared with the baseline model, the GeoTransGNN model can delineate anomaly areas more clearly and accurately; reconstruction graph analysis reveals the advantages of the evaluated models: Transformer effectively captures complex spatial relationships, but its performance degrades in sparse data regions; GNN preserves local correlations, but may not be able to represent broad trends in tectonically complex regions; while the GeoTransGNN model combines the advantages of both models, providing a comprehensive reconstruction highly consistent with the original graph; the results highlight the value of hybrid architectures that integrate complementary advantages, which can be extended to other rich ore regions to improve exploration efficiency and prediction accuracy.

[0153] To quantitatively evaluate model performance, known gold deposits were used as positive samples, and background samples as negative samples. Receiver Operating Characteristic (ROC) curves and success rate curves were calculated. Positive samples represented gold deposits as point locations within mineralized zones, with a single point or centroid marking the positive sample location within each deposit region. For large deposits, the positive sample definition was expanded by including more geologically relevant points within a predefined radius (e.g., 500 meters). Background samples were defined as areas where gold content was absent or negligible (i.e., unmineralized areas), selected from locations far from known gold deposits or where gold content was below the anomaly threshold, ensuring that background samples truly represented the "non-anomaly" areas in the ROC analysis. The area under the curve (AUC) for each model was as follows: 0.79 for the gold deposit baseline model, 0.81 for Transformer, 0.88 for GNN, and 0.93 for GeoTransGNN. The results confirmed that the GeoTransGNN model had the strongest anomaly discrimination ability, indicating superior performance compared to the baseline model in gold deposit anomaly detection. Figure 16 As shown.

[0154] The GeoTransGNN model achieves superior results because it simultaneously captures both local spatial continuity and global contextual dependencies in geochemical data: GCN aggregates neighborhood node information, preserving short-distance geological connectivity and ensuring smooth transitions between adjacent sampling points. In contrast, the Transformer encoder models long-distance interactions through a self-attention mechanism, enabling the detection of patterns in spatially distant but geologically related areas. By integrating these two mechanisms, the GeoTransGNN model alleviates the oversmoothing problem typical of deep GNNs while enhancing the feature representation capabilities that are often reduced in pure attention models. This complementary representation improves reconstruction fidelity and anomaly discrimination. Experimental observations further support this explanation: the visualization of attention weights highlights fault-controlled mineralization zones, while the graph aggregation pattern emphasizes lithological coherence. These mechanisms collectively explain why GeoTransGNN consistently outperforms standalone GNN and Transformer benchmark models in reconstruction accuracy and ROC-based quantitative validation.

[0155] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. An intelligent prediction method based on graph neural networks and Transformers, characterized by: include, A spatial observation dataset is acquired and preprocessed. The spatial observation dataset contains multiple sampling points, and each sampling point includes spatial location and gold ore grade data. Based on spatial location, a KNN spatial graph is constructed using the K-nearest neighbor algorithm, and an adjacency matrix is ​​generated. An identity matrix is ​​added to the adjacency matrix to form self-loop connections. Then, symmetric normalization is performed on the adjacency matrix based on the degree matrix to obtain a normalized adjacency matrix. Gold ore grade data is used to construct a node feature matrix and input into a graph convolutional network encoder. Based on the normalized adjacency matrix, graph convolutional aggregation is performed on the node feature matrix to obtain local encoding. The local encoding is then input into a Transformer encoder to obtain a latent node representation matrix. The latent node representation matrix is ​​input into the decoder to reconstruct the reconstructed features. The decoder uses a multilayer perceptron (MLP) and contains two fully connected layers with ReLU activation functions. The node-level anomaly score is determined based on the difference between the reconstructed features and the node feature matrix. The specific steps are as follows. Calculate the first reconstructed feature and the node feature matrix. The reconstruction error of each node is used to generate a node-level anomaly score. The node-level anomaly score is calculated using the node-level metric of mean squared error (MSE), and the expression is as follows: ; in, Indicates the first Node-level anomaly score for each node Indicates the first The reconstructed value of each node, Indicates the first The original feature values ​​of each node; The mean squared error (MSE) is obtained by averaging the node-level anomaly scores of all nodes across the total number of nodes. The expression for MSE is: ; in, This represents the mean square error. This represents the total number of nodes in the reconstructed feature and node feature matrix; The node feature matrix and reconstructed features are reshaped into a two-dimensional grid similar to an image according to their spatial location. Before gridding, the unsampled positions are interpolated to obtain the original two-dimensional grid and the reconstructed two-dimensional grid. The original 2D mesh and the reconstructed 2D mesh are compared using the Structural Similarity Index (SSIM), expressed as follows: ; ; ; in, This represents the structural similarity index between the original 2D mesh and the reconstructed 2D mesh. These represent local windows of the original 2D mesh and the reconstructed 2D mesh, respectively. This represents the local mean of two local windows. This represents the variance of the two local windows. This represents the covariance of the two local windows. and Represents the stability constant. This represents the dynamic range of gold ore grade on a two-dimensional grid. Outputting a two-dimensional anomaly map based on node-level anomaly scores means mapping the node-level anomaly scores corresponding to each sampling point to the corresponding grid cells in a two-dimensional grid based on the spatial location of the sampling points, thus obtaining a two-dimensional anomaly map.

2. The intelligent prediction method based on graph neural networks and Transformers as described in claim 1, characterized in that: The spatial observation dataset is organized by sampling points, with the spatial location of the sampling points defined by latitude and longitude, and the gold ore grade data of the sampling points serving as node features.

3. The intelligent prediction method based on graph neural networks and Transformers as described in claim 1, characterized in that: The data preprocessing mentioned above refers to deleting missing values ​​and filling local mean values ​​in the gold ore grade data, and performing logarithmic transformation on the skewed gold ore grade data. Outlier and noise filtering is performed before constructing the KNN spatial graph, and min-max normalization is performed on the spatial locations.

4. The intelligent prediction method based on graph neural networks and Transformers as described in claim 1, characterized in that: The KNN spatial graph contains a set of nodes and a set of edges; The edge set is constructed using the K-nearest neighbor algorithm based on the Euclidean distance of the spatial locations of the sampling points; Using the latitude and longitude coordinates from the spatial observation dataset, a sparse matrix of the graph is constructed using the K-nearest neighbor algorithm, generating an adjacency matrix. ; The adjacency matrix is ​​subjected to symmetric normalization to generate a normalized adjacency matrix.

5. The intelligent prediction method based on graph neural networks and Transformers as described in claim 1, characterized in that: The specific steps for constructing a node feature matrix from gold ore grade data are as follows. Arrange the gold ore grade data corresponding to each sampling point in the spatial observation dataset according to the sampling point index, and use the gold ore grade data corresponding to each sampling point as the feature vector of the corresponding node. The feature vectors of all nodes are combined to form the node feature matrix.

6. The intelligent prediction method based on graph neural networks and Transformers as described in claim 1, characterized in that: The graph convolutional network encoder uses two layers of GCN to perform local structure encoding on the node feature matrix and outputs the local encoding.

7. The intelligent prediction method based on graph neural networks and Transformers as described in claim 6, characterized in that: The Transformer encoder captures global contextual relationships through a multi-head self-attention mechanism, with each attention head calculating the query matrix, key matrix, and value matrix through local encoding. Local encoding outputs the latent node representation matrix through a Transformer encoder.