Knowledge graph constraint-based remote sensing image interpretability interpretation method and system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By fusing adaptive inductive graph attention networks and remote sensing domain knowledge graphs, the problems of insufficient domain adaptability and weak interpretability in remote sensing image interpretation are solved. Dynamic interactive optimization and interpretable interpretation of remote sensing images are realized, improving the accuracy and interpretability of remote sensing image interpretation.

CN122198166APending Publication Date: 2026-06-12BEIJING WEITE SPACE TECH CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING WEITE SPACE TECH CO LTD
Filing Date: 2026-03-13
Publication Date: 2026-06-12

Application Information

Patent Timeline

13 Mar 2026

Application

12 Jun 2026

Publication

CN122198166A

IPC: G06N5/045; G06N5/025; G06V10/80; G06V10/44; G06N3/042; G06N3/045; G06V10/77

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing remote sensing image interpretation methods suffer from insufficient domain adaptability, lack of dynamic interaction, and weak interpretability. In particular, they lack effective semantic constraints and dynamic update mechanisms in the fusion of visual features and domain knowledge of remote sensing images.

⚗Method used

An adaptive inductive graph attention network and a pre-built remote sensing domain knowledge graph are employed. Visual features and knowledge graph entity embeddings are fused through a cross-modal alignment mechanism. By combining spatial adjacency, spectral similarity and object hierarchy constraints, knowledge graph entity embeddings are dynamically generated, and node representations are updated based on image observation evidence, forming an end-to-end interpretable interpretation method.

🎯Benefits of technology

It achieves improved domain adaptability, dynamic interactive optimization, and enhanced interpretability of remote sensing image interpretation. It can continuously optimize knowledge representation based on actual observation data, forming a cognitive closed loop of "observation-reasoning-correction" and improving interpretation accuracy and interpretability.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122198166A_ABST

Patent Text Reader

Abstract

The application discloses a kind of based on knowledge graph constraint remote sensing image explainability interpretation method comprising: obtaining original remote sensing image data, and carries out data preprocessing;The original remote sensing image data after pre-processing is input to visual feature extraction module and extracts visual feature representation with rich spatial-spectral information;Adaptive inductive graph attention network and pre-constructed remote sensing field knowledge graph are used to dynamically generate knowledge graph entity embedding, and knowledge graph entity embedding is fused with visual feature representation, to obtain fusion feature;Among them, remote sensing field knowledge graph includes spatial adjacency constraint, spectral similarity constraint and object level constraint;Fusion feature is mapped to specific interpretation task.The method solves the problems of missing field knowledge, interactive static and weak explainability in the prior art by constructing a field-adaptive multi-constraint knowledge graph and combining a dynamic inductive interactive fusion mechanism.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of remote sensing image interpretation technology, specifically to a method and system for interpretable remote sensing image interpretation based on knowledge graph constraints. Background Technology

[0002] Remote sensing image interpretation is one of the core tasks of geographic information science, aiming to extract semantically meaningful ground feature information from images. Traditional methods mainly rely on expert experience to construct rule systems or manually defined ontology frameworks. While these methods offer strong interpretability, they struggle to adapt to large-scale, multi-source, and heterogeneous remote sensing data. In recent years, deep learning methods, such as convolutional neural networks (CNNs) and graph neural networks (GNNs), have significantly improved classification accuracy by learning feature representations in a data-driven manner. However, they still suffer from the following problems:

[0003] 1. Semantic fragmentation: Visual features and domain knowledge (e.g., spatial distribution patterns of ground features, spectral characteristics) are not effectively integrated, resulting in interpretation results that are inconsistent with geographical logic;

[0004] 2. Static modeling: Existing knowledge graph embedding methods assume that the graph structure is fixed and cannot dynamically adapt to entities or relationships not seen in remote sensing scenes;

[0005] 3. Lack of constraints: Although probabilistic graphical models and Markov random fields (MRF) attempt to introduce spatial constraints, they are not optimized in conjunction with knowledge graph representation learning.

[0006] To address these challenges, existing research has explored the integration of knowledge graphs and deep learning. For example, patent document CN120542945A discloses a multimodal remote sensing and knowledge graph-based urban planning decision-making method and system. This method integrates multi-source heterogeneous data and uses deep learning technology to align and fuse features of optical images and SAR data from satellite remote sensing imagery, generating urban feature vectors. It then associates these urban feature vectors with an urban planning policy database, outputting a structured early warning report containing a set of illegal construction warning events and policy compliance labels, forming dynamic policy constraints for subsequent multi-objective optimization. Based on a dynamic graph model, it processes historical traffic flow data and outputs a spatiotemporal distribution prediction of future traffic conditions. Multi-objective optimization, combined with the spatiotemporal distribution prediction of future traffic conditions, generates a Pareto-optimal urban planning scheme. However, the core of this method lies in multimodal data fusion, and the cross-modal alignment during data fusion only reaches the numerical feature level, failing to establish consistency constraints at the semantic level, and does not involve dynamic graph generation or interpretability.

[0007] Therefore, while existing technologies in the inductive graph representation learning frameworks GraphSAGE and Graph Attention Networks (GAT) support dynamic node embedding, they are not tailored to the constraints specific to the remote sensing domain, such as spatial adjacency and spectral similarity. Hybrid methods (e.g., KG-enhanced CNNs), although integrating graph information, neglect semantic consistency in cross-modal alignment. Furthermore, existing work indicates that knowledge graphs are primarily used for post-processing inference in remote sensing interpretation, rather than for constrained embedding during end-to-end training.

[0008] In summary, existing remote sensing image interpretation methods mainly suffer from the following shortcomings:

[0009] 1. Insufficient domain adaptability: General graph representation learning methods do not explicitly encode the spatial-spectral constraints specific to remote sensing;

[0010] 2. Lack of dynamic interaction: The alignment of the map and visual features mostly adopts static mapping, which cannot optimize the map structure in real time based on image evidence;

[0011] 3. Weak interpretability: Although self-supervised methods based on contrastive learning can improve representation ability, they are difficult to provide semantic-level explanations for classification decisions. Summary of the Invention

[0012] To address these issues, this application provides a method and system for interpretable remote sensing image interpretation based on knowledge graph constraints, thereby solving the problems of insufficient domain adaptability, lack of dynamic interaction, and weak interpretability in existing remote sensing image interpretation methods.

[0013] To achieve the above objectives, this application provides the following technical solution:

[0014] Firstly, a method for interpretable remote sensing image processing based on knowledge graph constraints includes:

[0015] Step 1: Acquire raw remote sensing image data; the raw remote sensing image data includes multispectral data or panchromatic band data;

[0016] Step 2: Perform data preprocessing on the raw remote sensing image data; the data preprocessing includes radiometric correction and geometric correction;

[0017] Step 3: Input the preprocessed raw remote sensing image data into the visual feature extraction module to extract visual feature representations with rich spatial-spectral information; the visual feature extraction module adopts the Swing Transformer architecture;

[0018] Step 4: Dynamically generate knowledge graph entity embeddings using an adaptive inductive graph attention network and a pre-constructed remote sensing domain knowledge graph, and fuse the knowledge graph entity embeddings with the visual feature representation through a cross-modal alignment mechanism to obtain fused features; wherein, the attention coefficients of the adaptive inductive graph attention network are jointly modulated by spatial adjacency constraints, spectral similarity constraints and object hierarchy constraints;

[0019] Step 5: Map the fused features to specific interpretation tasks.

[0020] Preferably, the calculation formula for dynamically generated knowledge graph entity embedding is:

[0021]

[0022] in, Represents the target entity Embedded vector, This represents the LeakyReLU activation function. Represents the target entity The index of a certain neighboring entity, Represents the target entity index, Representing entities The neighborhood group, Let represent the attention coefficients of the adaptive inductive graph attention network, and W represent the learnable weight matrix. Representing neighboring entities The original embedding vector;

[0023]

[0024] Where k represents the target entity The index of all neighboring entities; MLP stands for Multilayer Perceptron. Representing neighboring entities The original embedding vector, Represents the target entity with neighboring entities The constraint encoding vector between them Represents the target entity with neighboring entities The constraint encoding vector between them.

[0025] Preferably, the adaptive inductive graph attention network employs a multi-head attention mechanism, where each attention head focuses on a specific type of constraint relation, ultimately generating a comprehensive entity representation through weighted summation.

[0026] Preferably, the knowledge graph entity embedding and the visual feature representation are fused through a cross-modal alignment mechanism, as expressed by the formula:

[0027]

[0028] in, Indicates fusion characteristics, Representation of visual features, Represents entity embedding in a knowledge graph. , and These represent the projection matrices of the query, key, and value, respectively. The dimension of the eigenvector is represented by T, and T represents the matrix transpose.

[0029] Preferably, the node representation of the remote sensing domain knowledge graph is also dynamically updated based on image observation evidence.

[0030] Preferably, the dynamic updating of the node representation of the remote sensing domain knowledge graph specifically includes: calculating the observation confidence of an entity in the image, balancing the contribution weights of prior knowledge and new observations based on the confidence, and updating the entity embedding.

[0031] Preferably, the calculation formula for updating entity embedding is:

[0032]

[0033] in, This indicates updated, more accurate entity features. Represents the balance coefficient. This represents the entity features of the original knowledge graph. Indicates fusion characteristics, This indicates the evidence observed in the image.

[0034] Secondly, a remote sensing image interpretability interpretation system based on knowledge graph constraints includes:

[0035] The data input module is used to acquire raw remote sensing image data; the raw remote sensing image data includes multispectral data or panchromatic band data.

[0036] The preprocessing module is used to perform data preprocessing on the original remote sensing image data; the data preprocessing includes radiometric correction and geometric correction.

[0037] The feature extraction module is used to input the preprocessed raw remote sensing image data into the visual feature extraction module to extract visual feature representations with rich spatial-spectral information; the visual feature extraction module adopts the SwinTransformer architecture;

[0038] The dynamic fusion module is used to dynamically generate knowledge graph entity embeddings using an adaptive inductive graph attention network and a pre-built remote sensing domain knowledge graph, and to fuse the knowledge graph entity embeddings with the visual feature representations through a cross-modal alignment mechanism to obtain fused features; wherein, the attention coefficients of the adaptive inductive graph attention network are jointly tuned by spatial adjacency constraints, spectral similarity constraints and object hierarchy constraints;

[0039] The interpretation module is used to map the fused features to specific interpretation tasks.

[0040] Thirdly, a computer device includes a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the steps of a remote sensing image interpretability interpretation method based on knowledge graph constraints.

[0041] Fourthly, a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of a remote sensing image interpretability interpretation method based on knowledge graph constraints.

[0042] Compared with the prior art, this application has at least the following beneficial effects:

[0043] 1. This application provides a remote sensing image interpretability interpretation method based on knowledge graph constraints, comprising: acquiring raw remote sensing image data and performing data preprocessing; inputting the preprocessed raw remote sensing image data into a visual feature extraction module to extract visual feature representations with rich spatial-spectral information; dynamically generating knowledge graph entity embeddings using an adaptive inductive graph attention network and a pre-constructed remote sensing domain knowledge graph, and fusing the knowledge graph entity embeddings with the visual feature representations to obtain fused features; wherein, the remote sensing domain knowledge graph includes spatial adjacency constraints, spectral similarity constraints, and object hierarchy constraints; and mapping the fused features to specific interpretation tasks. This method, by constructing a domain-adaptive multi-constraint knowledge graph and combining it with a dynamically inductive interactive fusion mechanism, transforms traditional "black box interpretation" into semantically transparent "interpretable reasoning," systematically solving the problems of domain knowledge gaps, static interaction, and weak interpretability in existing technologies.

[0044] 2. This application can dynamically update the node representation of the remote sensing knowledge graph based on image observation evidence, so that the knowledge representation of the remote sensing knowledge graph can be continuously optimized and updated according to actual observation data, forming a complete cognitive closed loop of "observation-reasoning-correction". Attached Figure Description

[0045] To more intuitively illustrate the prior art and this application, exemplary drawings are provided below. It should be understood that the specific shapes and structures shown in the drawings should not generally be regarded as limiting conditions for implementing this application; for example, based on the technical concept disclosed in this application and the exemplary drawings, those skilled in the art are able to easily make conventional adjustments or further optimizations to the addition / reduction / classification, specific shapes, positional relationships, connection methods, size ratios, etc. of certain units (components).

[0046] Figure 1 A flowchart of a remote sensing image interpretability interpretation method based on knowledge graph constraints provided in Embodiment 1 of this application;

[0047] Figure 2 A schematic diagram of the structure of a remote sensing image interpretability interpretation method based on knowledge graph constraints provided in Embodiment 1 of this application;

[0048] Figure 3 This is a schematic diagram of the structure of a remote sensing image interpretability interpretation system based on knowledge graph constraints, provided in Embodiment 2 of this application. Detailed Implementation

[0049] The present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0050] In the description of this application: unless otherwise stated, "a plurality of" means two or more. The terms "first," "second," "third," etc., in this application are intended to distinguish the objects referred to and do not have any special meaning in terms of technical connotation (e.g., they should not be construed as an emphasis on importance or order). Expressions such as "including," "comprising," and "having" also mean "not limited to" (certain units, components, materials, steps, etc.).

[0051] The terms used in this application, such as "upper," "lower," "left," "right," and "middle," are generally used to indicate the general relative positional relationship for the purpose of intuitive understanding by referring to the accompanying drawings, and are not absolute limitations on the positional relationship in the actual product.

[0052] Example 1

[0053] This embodiment provides a remote sensing image interpretability interpretation method based on knowledge graph constraints. The core of this method is to integrate domain knowledge into a deep learning model in a dynamic and learnable manner, which breaks through the limitation of the separation of visual features and semantic knowledge in traditional methods.

[0054] Please see Figure 1 and Figure 2 This embodiment provides a method for interpretability interpretation of remote sensing images based on knowledge graph constraints, including:

[0055] S1: Acquire raw remote sensing image data; raw remote sensing image data includes multispectral data or panchromatic data;

[0056] S2: Perform data preprocessing on the raw remote sensing image data; data preprocessing includes radiometric correction and geometric correction.

[0057] Specifically, radiometric correction is used to eliminate sensor noise and atmospheric effects; geometric correction is used to eliminate terrain and projection distortions.

[0058] S3: Input the preprocessed raw remote sensing image data into the visual feature extraction module to extract visual feature representations with rich spatial-spectral information; the visual feature extraction module adopts the Swin Transformer architecture.

[0059] Specifically, the visual feature extraction module, based on the Swing Transformer architecture, is responsible for extracting multi-scale visual feature representations from the raw remote sensing image data. These visual feature representations are feature maps with rich spatial-spectral information, which can provide a foundation for subsequent knowledge fusion.

[0060] S4: An adaptive inductive graph attention network and a pre-constructed remote sensing domain knowledge graph are used to dynamically generate knowledge graph entity embeddings. The knowledge graph entity embeddings and visual feature representations are then fused through a cross-modal alignment mechanism to obtain fused features. The attention coefficients of the adaptive inductive graph attention network are modulated by spatial adjacency constraints, spectral similarity constraints, and object hierarchy constraints.

[0061] Specifically, in this embodiment, a knowledge graph in the remote sensing field is pre-constructed through a knowledge graph processing module. This knowledge graph processing module includes functions for storing, representing, and updating knowledge graphs, supports standard knowledge graph input formats (such as RDF), and can automatically convert them into graph structure data suitable for neural network processing.

[0062] Adaptive Inductive Graph Attention Network (I-GAT):

[0063] Adaptive inductive graph attention networks enable the dynamic generation of entity embeddings in knowledge graphs. Given a knowledge graph... ,in, Represents a collection of entities (e.g., "city", "forest"). This indicates relationships between entities (e.g., "adjacent", "belonging to", etc.). For each entity... Its dynamic embedding Calculated using the following formula:

[0064]

[0065] in, Represents the target entity The embedding vector, where W represents the learnable weight matrix. This represents the LeakyReLU activation function. Represents the target entity The index of a certain neighboring entity, Represents the target entity index, Representing entities The set of neighbors, Representing neighboring entities The original embedding vector, This represents the attention coefficient. The calculation formula is:

[0066]

[0067] in, Represents the target entity The index of all neighboring entities; MLP stands for Multilayer Perceptron. Representing neighboring entities The original embedding vector, Represents the target entity with neighboring entities The constraint encoding vector between them Represents the target entity with neighboring entities The constraint encoding vector between them.

[0068] In this embodiment, the attention coefficient constraint coding in Specifically designed for the field of remote sensing, it includes three types:

[0069] ① Spatial adjacency constraints Encode typical spatial relationships between entities (e.g., "water bodies are usually adjacent to wetlands");

[0070] ② Spectral similarity constraints : Reflects the similarity of the spectral characteristics of entities (e.g., "spectral differences between coniferous forests and broad-leaved forests");

[0071] ③ Hierarchical constraints : Indicates the hierarchical relationship in the classification system (e.g., "residential areas belong to artificial surfaces").

[0072] As can be seen, this embodiment innovatively modulates the attention coefficients of the adaptive inductive graph attention network by spatial adjacency constraints, spectral similarity constraints, and object hierarchy constraints, thereby achieving adaptive embedding of domain knowledge.

[0073] It should be noted that in this embodiment, the adaptive inductive graph attention network adopts a multi-head attention mechanism. Each attention head of the multi-head attention mechanism focuses on a specific type of constraint relationship, and finally generates a comprehensive entity representation through weighted summation.

[0074] This embodiment is the first to realize the real-time dynamic generation of knowledge graph embedding, and it can adaptively adjust according to the content of the input image.

[0075] Specifically, this embodiment employs a cross-modal alignment mechanism when fusing knowledge graph entity embeddings and visual feature representations. The fusion process includes: assuming the feature map output by the visual feature extraction module is... Knowledge graph entity embedding as The fusion process can be expressed by the following formula:

[0076]

[0077] in, Indicates fusion characteristics, Representation of visual features, Represents entity embedding in a knowledge graph. , and These represent the projection matrices of the query, key, and value, respectively. Let T denote the dimension of the feature vector, and T denote the matrix transpose. This process enables the reweighting of visual features in the knowledge space, thereby enhancing features with strong semantic relationships. In other words, when the cross-modal alignment mechanism fuses knowledge graph entity embeddings with visual feature representations, it achieves semantic reweighting of visual feature representations in the knowledge space through query keys and projections, thus enhancing visual features with strong semantic relationships.

[0078] The fusion method provided in this embodiment replaces the traditional feature splicing method, achieving semantic space alignment and enhancement.

[0079] S5: Map the fused features to specific interpretation tasks.

[0080] Specifically, this step maps the fused features to specific interpretation tasks (e.g., classification, segmentation, detection). This step employs a pluggable design, allowing for rapid adjustments based on different application scenarios.

[0081] This embodiment provides a remote sensing image interpretability interpretation method based on knowledge graph constraints, which further includes dynamically updating the node representation of the remote sensing domain knowledge graph based on image observation evidence. Specifically, dynamically updating the node representation of the remote sensing domain knowledge graph includes: calculating the observation confidence of an entity in the image, and balancing the contribution weights of prior knowledge and new observations based on the confidence, thereby achieving progressive optimization of the knowledge representation.

[0082] In other words, for each predicted entity Based on its observational evidence in the images Update its embedded representation:

[0083]

[0084] in, This indicates updated, more accurate entity features. This represents the entity features of the original knowledge graph. Indicates fusion characteristics, This indicates the evidence observed in the image. This represents the balance coefficient, used to control the weighting of prior knowledge and new observations. This design allows knowledge graphs in the remote sensing field to continuously revise their knowledge representations to adapt to new observational data.

[0085] This embodiment establishes a reverse propagation path from image evidence to knowledge representation, enabling the continuous evolution of the knowledge base.

[0086] The following examples will further illustrate the interpretability interpretation method for remote sensing images based on knowledge graph constraints provided in this embodiment.

[0087] Example 1: Based on multi-source data fusion

[0088] This example addresses a scenario involving multi-source remote sensing data (such as the collaborative interpretation of optical imagery and SAR data). The method achieves cross-modal knowledge fusion through the following improvements:

[0089] (1) Multimodal feature extractor: In the visual feature extraction module, Swin Transformer (processing optical images) and 3D convolutional network (processing SAR time series data) are deployed in parallel, and the output features are spliced in the channel dimension;

[0090] (2) Modality-aware constraint encoding: In the knowledge graph processing module, multiple sets of embedding vectors are maintained for each entity, corresponding to the characteristics of different data modalities. Modality weight factors are introduced when calculating the attention coefficient to dynamically adjust the contribution of each modality;

[0091] (3) Cross-modal consistency loss: Add a loss term during the training phase to force the prediction results of the same entity to remain semantically consistent in different modalities.

[0092] Typical application scenarios include: optical-SAR co-classification in cloudy areas and joint analysis of nighttime urban light and daytime thermal infrared data.

[0093] Example 2: For real-time updates

[0094] This example is suitable for scenarios that require continuous learning of new land cover types. Key improvements include:

[0095] (1) Incremental graph expansion: A new entity detector is added to the graph optimization module. When a region with a confidence level below the threshold appears in the input image and continues to appear, the knowledge graph expansion process is automatically triggered.

[0096] (2) Memory replay mechanism: The system maintains a typical sample library and replays historical data and new data for training when the model is updated to avoid catastrophic forgetting;

[0097] (3) Lightweight fine-tuning strategy: Only the parameters of the last two layers of the dynamic fusion module are incrementally updated to maintain the stability of the basic feature extractor.

[0098] Typical application scenarios include: rapid adaptation to changes in land cover after disasters and continuous tracking of seasonal vegetation phenology.

[0099] Example 3: Support for multi-task collaboration

[0100] This example enables the system to simultaneously handle tasks such as classification, segmentation, and change detection. Key technical improvements include:

[0101] (1) Separation of shared and dedicated features: In the interpretation module, the fused features are decomposed into task-independent shared representations and task-specific dedicated representations;

[0102] (2) Gradient routing mechanism: During backpropagation, gradient update weights are dynamically allocated based on the importance of the task loss;

[0103] (3) Hierarchical knowledge query: For tasks of different granularities (such as global classification vs. pixel-level segmentation), the knowledge graph processing module provides multi-level entity relationship query interfaces from coarse to fine.

[0104] Typical application scenarios include: simultaneously outputting land use classification maps and building vector outlines, and synchronously identifying change types and degrees during change detection.

[0105] Example 4: Edge computing optimization

[0106] This example addresses deployment requirements for mobile or spaceborne devices, and the main optimization methods include:

[0107] (1) Knowledge graph pruning: Based on the characteristics of the target region, only relevant subgraphs are loaded into memory to reduce computational overhead;

[0108] (2) Attention approximation calculation: Locality Sensitive Hash (LSH) is used in the dynamic fusion module to accelerate the nearest neighbor search of attention coefficients;

[0109] (3) Quantization perception training: An 8-bit integer quantization strategy is adopted to compress the model size to 1 / 4 of the original while ensuring that the accuracy loss is less than 2%.

[0110] Typical application scenarios include: real-time land cover mapping by drones and intelligent on-orbit processing by satellites.

[0111] Example 5: Enhanced Interactive Annotation

[0112] This example is extended to a semi-supervised learning framework, with key improvements including:

[0113] (1) Uncertainty-guided annotation: Automatically identify regions with low confidence in the prediction results and request expert annotation first;

[0114] (2) Label propagation algorithm: A small number of manually labeled labels are propagated to similar entities through the relational network of the knowledge graph;

[0115] (3) Active learning strategy: Design sample selection criteria based on information entropy and diversity to maximize annotation efficiency.

[0116] Typical application scenarios include: interpretation of land features in regions with scarce samples (such as Antarctica) and specialized identification of rare land feature types (such as archaeological sites).

[0117] The remote sensing image interpretability interpretation method based on knowledge graph constraints provided in this embodiment has the following advantages:

[0118] (1) Dynamic knowledge fusion rather than static mapping: Although existing technologies (such as the patent document with publication number CN120542945A) construct urban knowledge graphs, their knowledge representations are usually static or only updated through post-processing, failing to achieve real-time collaborative optimization with visual features. This embodiment uses an adaptive inductive graph attention network to dynamically generate knowledge graph entity embeddings based on the content of the input remote sensing image, realizing end-to-end collaborative adjustment of knowledge representation and visual features, significantly improving the adaptability to unseen scenes.

[0119] (2) Remote Sensing-Specific Multi-Constraint Encoding: The knowledge graph in the patent document with publication number CN120542945A is mainly based on policy rules and entity relationships, without explicitly encoding the spatial adjacency, spectral similarity, and object hierarchical relationships unique to the remote sensing field. This embodiment introduces these three types of constraints as attention modulation factors into the graph attention network for the first time, so that the interpretation results simultaneously conform to the statistical laws of data and geographical logic, solving the problem of "semantic fragmentation" in the prior art.

[0120] (3) Explainable reasoning closed loop: Existing technologies mostly use knowledge graphs for post-processing reasoning or rule matching, lacking a feedback mechanism from image evidence to knowledge representation. This embodiment uses a graph optimization module to dynamically update the knowledge graph node representation based on image observation confidence, forming a cognitive closed loop of "observation-reasoning-correction". This not only improves the interpretation accuracy, but also provides a traceable semantic explanation for each decision, truly realizing interpretable interpretation.

[0121] (4) Cross-modal semantic alignment enhancement: Compared with the feature layer fusion of multimodal data in the patent document CN120542945A, this embodiment adopts a cross-modal attention mechanism to semantically reweight visual features in the knowledge space, thereby enhancing visual features with strong semantic associations and establishing consistency constraints at the semantic level, thus avoiding semantic misalignment caused by numerical feature fusion.

[0122] These advantages make this method particularly effective in interpreting complex scenarios. For example, in urban-rural transition zones, it can simultaneously consider the spatial distribution patterns of buildings (adjacency constraints), the spectral characteristics of vegetation (similarity constraints), and the hierarchical relationships between land use types (hierarchical constraints), thus producing interpretation results that are more consistent with geographical common sense.

[0123] In summary, the remote sensing image interpretability interpretation method based on knowledge graph constraints provided in this embodiment transforms the traditional "black box interpretation" into semantically transparent "interpretable reasoning" by constructing a domain-adaptive multi-constraint knowledge graph and combining it with a dynamic inductive interactive fusion mechanism. This systematically solves the problems of domain knowledge deficiency, static interaction, and weak interpretability in the prior art.

[0124] The remote sensing image interpretability interpretation method based on knowledge graph constraints provided in this embodiment constitutes a brand-new remote sensing image interpretation paradigm. It transforms static knowledge application into a dynamic knowledge co-evolution process, significantly improving interpretation capability and interpretability in complex scenarios and breaking through the limitations of traditional static knowledge fusion.

[0125] Example 2

[0126] Please see Figure 3 This embodiment provides a remote sensing image interpretability interpretation system based on knowledge graph constraints, namely, a remote sensing image interpretation framework—an inductive remote sensing embedding system based on knowledge graph constraints (IKG-RS). The entire IKG-RS system consists of the following main modules, forming an end-to-end processing flow, including:

[0127] The data input module is used to acquire raw remote sensing image data; the raw remote sensing image data includes multispectral data or panchromatic band data.

[0128] The preprocessing module is used to perform data preprocessing on the original remote sensing image data; the data preprocessing includes radiometric correction and geometric correction.

[0129] The feature extraction module is used to input the preprocessed original remote sensing image data into the visual feature extraction module to extract visual feature representations with rich spatial-spectral information; the visual feature extraction module uses SwinTransformer and 3D convolutional network in parallel;

[0130] The dynamic fusion module is used to dynamically generate knowledge graph entity embeddings using an adaptive inductive graph attention network and a pre-constructed remote sensing domain knowledge graph, and to fuse the knowledge graph entity embeddings with the visual feature representations through a cross-modal alignment mechanism to obtain fused features; wherein, the attention coefficients of the adaptive inductive graph attention network are jointly modulated by spatial adjacency constraints, spectral similarity constraints and object hierarchy constraints;

[0131] The interpretation module is used to map the fused features to specific interpretation tasks.

[0132] It should be noted that the parameters of all modules in this system can be jointly optimized.

[0133] For details on the specific implementation of each module in a knowledge graph-based remote sensing image interpretability interpretation system, please refer to the above description of the limitations of a knowledge graph-based remote sensing image interpretability interpretation method, which will not be repeated here.

[0134] Example 3

[0135] This embodiment provides a computer device, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the steps of a remote sensing image interpretability interpretation method based on knowledge graph constraints.

[0136] Example 4

[0137] This embodiment provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of a remote sensing image interpretability interpretation method based on knowledge graph constraints.

[0138] The technical features of the above embodiments can be combined in any way (as long as there is no contradiction in the combination of these technical features). For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described; these embodiments not explicitly written should also be considered to be within the scope of this specification.

Claims

1. A method for interpretability interpretation of remote sensing images based on knowledge graph constraints, characterized in that, include: Step 1: Acquire raw remote sensing image data; the raw remote sensing image data includes multispectral data or panchromatic band data; Step 2: Perform data preprocessing on the raw remote sensing image data; the data preprocessing includes radiometric correction and geometric correction; Step 3: Input the preprocessed raw remote sensing image data into the visual feature extraction module to extract visual feature representations with rich spatial-spectral information; the visual feature extraction module adopts the Swing Transformer architecture; Step 4: Dynamically generate knowledge graph entity embeddings using an adaptive inductive graph attention network and a pre-constructed remote sensing domain knowledge graph, and fuse the knowledge graph entity embeddings with the visual feature representation through a cross-modal alignment mechanism to obtain fused features; wherein, the attention coefficients of the adaptive inductive graph attention network are jointly modulated by spatial adjacency constraints, spectral similarity constraints and object hierarchy constraints; Step 5: Map the fused features to specific interpretation tasks.

2. The remote sensing image interpretability interpretation method based on knowledge graph constraints according to claim 1, characterized in that, The calculation formula for dynamically generated knowledge graph entity embedding is as follows: ； in, Represents the target entity Embedded vector, This represents the LeakyReLU activation function. Represents the target entity The index of a certain neighboring entity, Represents the target entity index, Representing entities The neighborhood group, Let represent the attention coefficients of the adaptive inductive graph attention network, and W represent the learnable weight matrix. Representing neighboring entities The original embedding vector; ； in, Represents the target entity The index of all neighboring entities; MLP stands for Multilayer Perceptron. Representing neighboring entities The original embedding vector, Represents the target entity with neighboring entities The constraint encoding vector between them Represents the target entity with neighboring entities The constraint encoding vector between them.

3. The remote sensing image interpretability interpretation method based on knowledge graph constraints according to claim 1, characterized in that, The adaptive inductive graph attention network employs a multi-head attention mechanism, where each attention head focuses on a specific type of constraint relation, ultimately generating a comprehensive entity representation through weighted summation.

4. The remote sensing image interpretability interpretation method based on knowledge graph constraints according to claim 1, characterized in that, The process of fusing the knowledge graph entity embedding with the visual feature representation through a cross-modal alignment mechanism is expressed by the following formula: ； in, Indicates fusion characteristics, Representation of visual features, Represents entity embedding in a knowledge graph. , and These represent the projection matrices of the query, key, and value, respectively. The dimension of the eigenvector is represented by T, and T represents the matrix transpose.

5. The remote sensing image interpretability interpretation method based on knowledge graph constraints according to claim 1, characterized in that, It also includes dynamically updating the node representations of the remote sensing domain knowledge graph based on image observation evidence.

6. The remote sensing image interpretability interpretation method based on knowledge graph constraints according to claim 5, characterized in that, The dynamic updating of the node representation of the remote sensing domain knowledge graph specifically includes: calculating the observation confidence of an entity in the image, balancing the contribution weights of prior knowledge and new observations based on the confidence, and updating the entity embedding.

7. The remote sensing image interpretability interpretation method based on knowledge graph constraints according to claim 6, characterized in that, The calculation formula for updating entity embedding is as follows: ； in, This indicates updated, more accurate entity features. Represents the balance coefficient. This represents the entity features of the original knowledge graph. Indicates fusion characteristics, This indicates the evidence observed in the image.

8. A remote sensing image interpretability interpretation system based on knowledge graph constraints, characterized in that, include: The data input module is used to acquire raw remote sensing image data; the raw remote sensing image data includes multispectral data or panchromatic band data. The preprocessing module is used to perform data preprocessing on the original remote sensing image data; the data preprocessing includes radiometric correction and geometric correction. The feature extraction module is used to input the preprocessed raw remote sensing image data into the visual feature extraction module to extract visual feature representations with rich spatial-spectral information; the visual feature extraction module adopts the SwinTransformer architecture; The dynamic fusion module is used to dynamically generate knowledge graph entity embeddings using an adaptive inductive graph attention network and a pre-constructed remote sensing domain knowledge graph, and to fuse the knowledge graph entity embeddings with the visual feature representations through a cross-modal alignment mechanism to obtain fused features; wherein, the attention coefficients of the adaptive inductive graph attention network are jointly modulated by spatial adjacency constraints, spectral similarity constraints and object hierarchy constraints; The interpretation module is used to map the fused features to specific interpretation tasks.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.