Knowledge graph completion method based on dynamic routing and double-channel reasoning

By employing dynamic routing and dual-path reasoning methods, the issues of modality alignment, reasoning efficiency, and robustness in multimodal knowledge graph completion are addressed, resulting in more efficient and accurate knowledge graph completion.

CN122287828APending Publication Date: 2026-06-26CHINA JILIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA JILIANG UNIV
Filing Date
2026-05-26
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing multimodal knowledge graph completion methods have shortcomings in modality alignment, inference efficiency and accuracy, robustness and information fusion, making them difficult to apply effectively in real-world scenarios.

Method used

We adopt a method based on dynamic routing and dual-path reasoning, which improves the adaptability and reasoning effect of multimodal knowledge graph completion by homogeneity-heterogeneity decoupling encoding, cross-modal semantic alignment, query uncertainty quantification and dual-path reasoning fusion.

Benefits of technology

It improves the reasoning reliability and computational efficiency in multimodal knowledge graph completion tasks, enhances the adaptability to complex structural patterns and multimodal semantic differences, and achieves more stable and accurate prediction results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122287828A_ABST
    Figure CN122287828A_ABST
Patent Text Reader

Abstract

A knowledge graph completion method based on dynamic routing and dual-path reasoning, belonging to the field of artificial intelligence and knowledge graph technology, includes the following steps: First, extract modality-specific feature representations from the structural data, entity association visual data, and text description data of the knowledge graph, and unify the representations of each modality to the same dimension; Second, use the topological structure representation as a semantic anchor point to guide the visual and text modalities to perform asymmetric alignment towards the anchor point, obtaining multimodal entity representations in a unified semantic space; Third, evaluate the structural determinism of the query based on the multimodal entity representations to obtain the query confidence, and dynamically allocate reasoning paths to the query based on the confidence; Fourth, predict the query through two complementary reasoning paths, dynamically fuse the prediction results of each path, and output the completion result. This invention improves the reasoning reliability and computational efficiency in multimodal knowledge graph completion tasks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence and knowledge graph technology, specifically relating to a knowledge graph completion method based on dynamic routing and dual-path reasoning. Background Technology

[0002] Knowledge graphs, as structured semantic representations, accurately depict semantic relationships between entities through triples (entity-relationship-entity) or attribute pairs (entity-attribute-attribute value), serving as a core infrastructure supporting deep semantic understanding and intelligent reasoning in artificial intelligence systems. Knowledge graph completion, a crucial step in knowledge graph construction and application, aims to predict missing triples in the knowledge graph, compensating for incompleteness and improving the performance of downstream tasks. With the increasing scale of multimodal data such as images and text, traditional knowledge graph completion methods relying solely on graph topology are insufficient for practical needs. Multimodal knowledge graph completion has become a significant research direction in recent years, aiming to improve the accuracy and generalization ability of completion reasoning by integrating multi-source information. However, existing multimodal knowledge graph completion methods still face numerous technical challenges, hindering their practical application.

[0003] Existing knowledge graph completion methods still have many shortcomings. Some methods rely solely on the topological structure of the knowledge graph for reasoning, failing to fully utilize multimodal information such as vision and text, thus limiting their reasoning effectiveness. For example, TransE and its variants model entities and relationships through vector space mapping, making it difficult to capture the multimodal semantic features of entities. Methods based on graph neural networks, such as R-GCN and CompGCN, are susceptible to "oversmoothing" and "overcompression" problems when extracting structural features through neighborhood aggregation. They not only struggle to capture long-distance dependencies but also cannot leverage complementary information from unstructured modalities, resulting in poor performance in scenarios with low entity semantic discriminability. Other multimodal completion methods suffer from adaptability defects. Some adopt a symmetrical alignment strategy that treats all modalities equally without considering the differences in semantic stability between different modalities. Visual features are easily affected by shooting angle and image noise, text features are ambiguous, while topological structures are directly constrained by triple relationships and have higher semantic consistency. This symmetrical alignment method is prone to misalignment with semantically similar but structurally contradictory features, reducing the reliability of inference. Others adopt a uniform fusion strategy without designing targeted adaptation mechanisms for different structural roles such as homogeneity and heterogeneity of entities. This makes it difficult to balance local semantic coherence and global functional complementarity, resulting in poor adaptation performance in complex scenarios such as NN relationships.

[0004] In summary, the core technical challenges currently facing the field of multimodal knowledge graph completion include: biases in modal alignment, with existing methods often employing symmetrical alignment or anchorless alignment strategies, failing to clearly define semantic priorities between modalities, resulting in underutilization of topological stability and susceptibility to visual and textual modal noise; an imbalance between inference efficiency and accuracy, with single-path inference strategies struggling to balance efficiency and accuracy, and pure structural methods being highly efficient but with limited accuracy, lacking a mechanism for dynamically allocating computational resources based on query complexity; insufficient robustness, with weak resistance to modal noise and a lack of differentiated processing mechanisms for entities with different structural roles, leading to poor adaptability; and poor fusion of structural and multimodal information, with simplistic fusion methods that fail to fully model deep interactions between modalities, hindering the synergistic effect of multi-source information. These issues constitute the core technical challenges that current multimodal knowledge graph completion technologies need to overcome in practical applications. Summary of the Invention

[0005] To overcome the shortcomings of existing technologies, this invention provides a knowledge graph completion method based on dynamic routing and dual-path reasoning. This method improves the adaptability and reasoning performance of existing multimodal knowledge graph completion methods by performing multimodal feature extraction, cross-modal semantic alignment, query uncertainty quantification, and dual-path reasoning fusion in stages. The overall process of this invention comprises four parts: a perception layer, a fusion layer, a dynamic routing module, and a dual-path reasoning fusion module. The core idea is as follows: First, at the perception layer, a homogeneous-heterogeneous dual-branch decoupled encoding structure is constructed based on a graph neural network (GNN) to perform topological encoding of homogeneous-heterogeneous decoupling. Combined with visual and text feature encoding, multimodal-specific feature representations of knowledge graph entities are extracted. Then, at the fusion layer, the topological representation is used as a semantic anchor to achieve cross-modal asymmetric contrast alignment, reducing the interference of external modal noise on the structural semantic space and obtaining multimodal entity representations in a unified semantic space. Subsequently, dynamic routing decisions are completed by quantifying the structural determinism of queries, and corresponding inference paths are allocated for query tasks of different complexities. On this basis, a fast and slow dual-path collaborative inference system is constructed, and the prediction results are fused by combining the characteristics of different inference strategies, thereby improving the inference reliability and computational efficiency in the multimodal knowledge graph completion task.

[0006] The technical solution adopted by this invention to solve its technical problem is: A knowledge graph completion method based on dynamic routing and dual-path reasoning, the method comprising the following steps: Step 1: Multimodal entity feature perception and unified encoding: Extract modality-specific feature representations from the structural data of the knowledge graph, the visual data of entity associations, and the text description data, and unify the representations of each modality to the same dimension; The second step is cross-modal semantic alignment based on topological constraints: using the topological representation as a semantic anchor point, the visual modality and the text modality are guided to perform asymmetric alignment to the anchor point to obtain multimodal entity representations in a unified semantic space; The third step is dynamic routing based on query uncertainty awareness: the structural determinism of the query is evaluated based on the multimodal entity representation to obtain the query confidence, and inference paths are dynamically allocated to the query based on the confidence. Step 4: Dual-path collaborative prediction and result fusion: The query is predicted through two complementary reasoning paths, and the prediction results of each path are dynamically fused to output the complete result.

[0007] Furthermore, the process of the first step is as follows: Step 1.1 Topological coding for homogeneity-heterogeneity decoupling: Extract the neighborhood subgraph of the target entity, divide the edges in the subgraph into homogeneous edge sets and heterogeneous edge sets according to structural attributes, encode them using differentiated message aggregation mechanisms, and then adaptively weight and fuse the two types of encoding results through a gating unit to obtain the topological representation; Step 1.2 Visual Feature Encoding: Visual semantic features are extracted from the image data of entity association, and after dimensional mapping and normalization, a visual representation consistent with the topological representation dimension is generated; Step 1.3 Text Feature Encoding: Extract global semantic information from the natural language description data of entities, and generate a text representation consistent with the topological representation dimension after linear projection and normalization.

[0008] Furthermore, the second step is as follows: Step 2.1 Topology Anchor Construction: Use the topology representation obtained in Step 1 as the unique anchor point for cross-modal alignment; Step 2.2 Asymmetric Contrast Alignment: Using topological representation as the alignment benchmark, the alignment weights of visual and textual modalities are adaptively assigned according to the topological roles of entities to achieve differentiated asymmetric alignment; Step 2.3 Relationship-aware similarity measurement: A relationship-aware similarity mechanism is adopted to enable cross-modal alignment to adapt to the semantic features of different relationship types; Step 2.4 Topology-aware weighted constraints: The alignment process is weighted based on the entity topology roles to finally obtain multimodal entity representations in a unified semantic space.

[0009] Furthermore, the third step is as follows: Step 3.1 Query confidence calculation: Based on the multimodal entity representation, a preliminary prediction of the query is made to obtain the probability distribution of candidate entities; the structural determinism of the query is evaluated based on the dispersion of the probability distribution to obtain the normalized query confidence. Step 3.2 Dynamic routing decision: Based on the comparison between the query confidence and the preset threshold, dynamically assign the corresponding inference path to the current query; Step 3.3 Multimodal prefix adaptation: Construct a multimodal prefix adaptation structure, mapping topological, visual, and textual modal features to the input space of the large language model, forming multimodal virtual prefixes adapted to the understanding of the large language model, which are used to guide subsequent semantic reasoning.

[0010] The process of the fourth step is as follows: Step 4.1 Fast Path Inference: For queries that route to fast paths, an efficient structured reasoning method is used to model the high-order interactions between entities and relationships to complete fast link prediction; Step 4.2 Slow Path Inference: For queries that are routed to slow paths, deep inference is performed based on multimodal prefix guidance and the semantic understanding capabilities of the large language model. Structured hints and output constraints are used to ensure that the prediction results are within the knowledge graph. Step 4.3 Dual-path collaborative reasoning: For queries that trigger collaborative reasoning, the prediction results obtained from the fast path and the slow path are adaptively weighted and fused to obtain the final knowledge graph completion result.

[0011] The beneficial effects of this invention are mainly reflected in: 1. By using a topological coding mechanism that decouples homogeneity and heterogeneity, we perform differentiated modeling of the entity neighborhood subgraph and combine visual and textual modal feature encoding to construct a unified-dimensional multimodal entity representation. At the same time, we use the topological representation as a semantic anchor point to guide the visual and textual modalities to perform asymmetric cross-modal alignment, effectively avoiding the reverse pollution of the semantic space by external modal noise, thereby improving the consistency and stability of the multimodal representation.

[0012] 2. By introducing a topology-aware weighted alignment strategy, the alignment weights of the visual and textual modalities are dynamically adjusted according to the homogeneity preference of entities, enabling different modal information to be differentiated and fused according to the entity's structural role, thereby enhancing the model's adaptability to complex structural patterns and multimodal semantic differences.

[0013] 3. By constructing an uncertainty-aware routing mechanism based on query confidence, inference paths are dynamically allocated to queries of varying complexity. Fast inference is used for simple queries with high structural determinism, while deep generative inference is employed for queries with complex structures or ambiguous semantics. This approach reduces unnecessary computational overhead and improves overall inference efficiency while ensuring inference accuracy.

[0014] 4. By constructing a dual-path collaborative prediction framework that combines structural reasoning and multimodal semantic reasoning, and leveraging the efficient relation modeling capability of the structural model and the semantic understanding capability of the large language model, we can achieve deep collaboration between multimodal information and graph structure information. In complex knowledge graph completion scenarios with multiple relations and multiple semantic dependencies, we can obtain more stable and accurate prediction results. Attached Figure Description

[0015] Figure 1 A diagram illustrating the overall framework of a knowledge graph completion method based on dynamic routing and dual-path reasoning. Figure 2 This is a schematic diagram of a topology-guided cross-modal alignment process. Figure 3 This is a schematic diagram illustrating the distribution of query routing decisions. Figure 4 This is a schematic diagram of the performance-efficiency trade-off curves under different gating thresholds. Detailed Implementation

[0016] The present invention will now be further described with reference to the accompanying drawings.

[0017] Reference Figures 1-4 A knowledge graph completion method based on dynamic routing and dual-path reasoning includes the following steps: Step 1: Multimodal entity feature perception and unified encoding: Extract modality-specific feature representations from the structural data of the knowledge graph, the visual data of entity associations, and the text description data, and unify the representations of each modality to the same dimension; In this embodiment, high-quality modality-specific feature representations are extracted from the structural data, visual data of entity associations, and textual description data of the knowledge graph, respectively, and the final embedding dimension of all representations is uniformly set to 1. This provides a dimensionally consistent and feature-robust multimodal embedding foundation for subsequent cross-modal alignment and inference; The first step of this embodiment is as follows: Step 1.1 Topological Coding for Homogeneity-Heterogeneity Decoupling: Taking target entities and their neighborhood structures in knowledge graphs as the research object, a topological representation with both structural discriminative and multi-scale expressive capabilities is extracted through a three-level process of subgraph partitioning, differential aggregation, and gating fusion. The edge set contained in the entity neighborhood subgraph is classified according to structural attributes. Differentiated message aggregation mechanisms are used to model different types of subgraphs. Then, the encoding results of the two types of subgraphs are adaptively weighted and fused through learnable gating units to finally generate a stable and discriminative topological representation.

[0018] The processing procedure for step 1.1 is as follows: Step 1.1.1 Neighborhood Subgraph Extraction: Targeting the knowledge graph using triples... The target header entity to be encoded exists in the form of Extract its Skip Neighbor Subgraph ,in Jumping finger with head entity From the center, expand outwards to directly connected first-order neighbors, indirectly connected second-order neighbors, and so on, until... The topological range of the order neighbors, The preset hyperparameters can adapt to the topological sparsity of different knowledge graphs, accurately covering the local topological context of entities. For this The set of all entities within the jump neighborhood. This is the set of edges connecting all entities within this range, ensuring that subsequent encoding can fully capture the structural relationship information of the entities; Step 1.1.2 Edge Set Partitioning and Subgraph Construction: To accurately distinguish between homogeneous and heterogeneous relationships between entities, a composite similarity metric is designed. This takes into account both the structural similarity and semantic consistency of entities. Representing entities respectively ,entity The comprehensive feature vector integrates the topological structure information and semantic association information of entities, and serves as the basic carrier for subsequent similarity measurement.

[0019] Structural similarity is quantified based on the degree of overlap between entity neighbor sets, using Jaccard similarity calculation. Jaccard similarity is a commonly used metric for measuring the degree of overlap between two sets, specifically through entity... and The ratio of the size of the intersection to the size of the union of the neighbor sets describes their proximity relationship in the topological structure.

[0020] Semantic consistency is measured by the difference in the probability distribution of entities participating in various relationships, and is initially based on the observed set of triples in the knowledge graph. , computational entity or Participation relationship probability or The probability is calculated using Laplace smoothing frequency normalization, a common data smoothing technique that avoids probabilities of zero. Specifically, it involves adding 1 to the numerator and the total number of relation types in the knowledge graph to the denominator. The frequency statistics results are corrected in a way that ensures a reasonable probability distribution can still be obtained in sparse relation scenarios. The specific formula is as follows: ,in For entities Participation relationship The number of triples, For entities The total number of triples involved in all relations. The total number of relation types in the knowledge graph is given; then, the sum of the absolute differences of the probability values ​​under each relation is calculated by the Manhattan distance of the probability distribution on all relation types to quantify the semantic differences between entities; finally, the difference is mapped to the [0,1] interval by an exponential function to achieve a reasonable measurement of semantic consistency.

[0021] Structural similarity and semantic consistency work synergistically through multiplication. Only when both meet a high level are entities determined to have a homogeneous relationship. The calculation formula is as follows: ; in, , For entities ,entity The neighborhood group, , For entities ,entity Participation relationship The probability distribution.

[0022] Combined with a preset structural similarity threshold edge set Divide into homogeneous edge sets and heterogeneous edge sets ,satisfy The edge is included This forms a homogeneous subgraph. The remaining edges are included. This forms a heterogeneous subgraph. .

[0023] Step 1.1.3 Differentiated Message Passing: To address the structural differences between homogeneous and heterogeneous subgraphs, different message passing mechanisms are designed to adapt to semantically coherent homogeneous associations and functionally complementary heterogeneous associations, respectively, ensuring that the features of both types of structural patterns can be accurately captured. The process is as follows: Step 1.1.3.1 Standard aggregation mechanism of homoproton graphs: homogeneity subgraphs The entities in the middle are semantically coherent and have similar features. A standard aggregation mechanism is used to enhance local semantic coherence. First, neighboring entities are considered. The Layer embedding Through relation-specific transformation Perform mapping. For entities and The relationship embedding between them, the transformation process is made up of a learnable weight matrix. Achieve this, and then through the Hadama accumulation. Compare the relation transformation result with neighboring entities The Layer embedding Element-wise multiplication is performed, where each element of the relation transformation result is multiplied by the corresponding element of the neighbor entity embedding. After fusion, the fused features of all neighbor entity embeddings are integrated using an aggregation function (AGG). This aggregation function is either mean aggregation or attention aggregation, and the result is optimized using a validation set. The aggregated features are then processed by a learnable weight matrix. Linear transformation, and by After processing by the activation function, the first... Layer messages ,in This indicates a modified linear unit activation function. Enter placeholders for function input; finally, link the message to the target entity. The Layer embedding Residual connections are used, and the homogeneous subgraph is output through layer normalization (LayerNorm). Layer embedding The calculation method is as follows: ; ; in, Let h be the set of neighbors of the target entity h in the homogeneous subgraph.

[0024] Step 1.1.3.2 Deviation Aggregation Mechanism of Heterogeneous Subgraphs: Heterogeneous Subgraphs The entities in the middle class have complementary functions and significant differences in features. A biased aggregation mechanism is used to explicitly model the functional differences between entities. First, neighboring entities are calculated. The Layer embedding With the target entity The Layer embedding The characteristic deviation is then shared with the homogeneous subgraph. Parameter Relationship Specific Transformation Through Hadama accumulation After fusion and integration via the aggregation function AGG, a heterogeneous aggregation-specific learnable weight matrix is ​​generated. Linear transformation and Activation function processing, to obtain the first Layer messages Finally, with the target entity The Layer embedding Residual connections are performed and normalized to output the heterogeneous subgraph. Layer embedding The calculation method is as follows: ; ; in, Let h be the set of neighbors of the target entity h in the heterogeneous subgraph; Step 1.1.4 Gated Adaptive Fusion: To dynamically balance the encoding contributions of homogeneous and heterogeneous subgraphs, a learnable sigmoid gate unit is designed. Wherein... The total number of aggregation layers encoded for the subgraph, and the number of the current aggregation layer used in the aggregation calculations of each layer mentioned above. For association identifier, The value range is 1 to ,Finish After layer aggregation, the final encoding results of the two types of subgraphs are obtained. and Concatenate the two vectors. After gating weight matrix Linear transformation and bias term After adjustment, the Sigmoid activation function is used. Constructing and generating gating coefficients This coefficient is used to adaptively adjust the weight ratio of the two types of encoding, and finally a stable topological representation is obtained by weighted fusion using the Hadamard product ∘. The calculation method is as follows: ; ; in, The range of values ​​is , The closer to 1, the more significant the homogeneity of the entity.

[0025] Step 1.2 Visual Feature Encoding: Robust visual semantic features are extracted from the image data of entity associations. After dimensionality mapping and normalization, a visual representation consistent with the topological representation dimension is generated. This provides semantic information with a visual dimension for multimodal fusion. To extract robust visual features from entity-related images, a VGG-16 model pre-trained on the ImageNet dataset is used as the encoder. The original visual features are first extracted from the last convolutional layer of the image, and then pooled to obtain... Then, it is mapped to the target dimension through a two-layer MLP projection network activated by Gaussian Error Linear Units (GELUs). The first layer weight matrix is Bias term is The second layer weight matrix is Bias term is Finally, LayerNorm normalization is used to eliminate feature distribution differences, resulting in a visual representation. The calculation method is as follows: ; Step 1.3 Text Feature Encoding: Capture global semantic information from the natural language description data of entities, and after linear projection and normalization, generate a text representation consistent with the topological and visual representation dimensions. This provides text-dimensional semantic information for multimodal fusion. To capture the global semantic information of entity text descriptions, a pre-trained BERT-Base model is used as the encoder. After inputting the entity text descriptions into the model, the global context embedding of the [CLS] token is extracted. Map it to the target dimension using a linear projection layer. The weight matrix of this layer is Bias term is After further layer normalization, the text representation is obtained. The calculation method is as follows: .

[0026] The second step is cross-modal semantic representation alignment based on topological constraints: using the topological representation as a semantic anchor point, the visual modality and the text modality are guided to perform asymmetric alignment to the anchor point to obtain multimodal entity representations in a unified semantic space; In this embodiment, the topology is represented by stable anchor points to establish a robust cross-modal semantic space, ensuring that multimodal information serves structural reasoning and avoiding interference from external modal noise.

[0027] The second step of this embodiment is as follows: Step 2.1 Topology Anchor Point Construction: The topology anchor points obtained in Step 1... Established as the sole anchor point for cross-modal alignment, this anchor point is directly derived from the graph triplet fact. Through homogeneity-heterogeneity decoupling encoding and gating fusion, it combines semantic stability, structural discriminativeness, and multi-scale expressive ability, and can provide a reliable and consistent semantic benchmark for cross-modal alignment. Step 2.2 Asymmetric Contrast Alignment: Based on the homogeneity-heterogeneity decoupling topological encoding results obtained in Step 1.1, the topological roles of entities are quantitatively analyzed. The proportions of homogeneous and heterogeneous subgraphs in the total neighborhood subgraphs of entities are statistically analyzed to obtain the entity role ratios. These ratios reflect the structural preference of entities in the knowledge graph for homogeneous or heterogeneous associations. Based on these role ratios, specific role weights for entities are generated to construct a topology-aware weighted alignment strategy. The alignment weights of visual and textual modalities are dynamically adjusted to achieve differentiated modality fusion. To avoid the negative contamination of the structural semantic space by noise from visual and textual modalities, an asymmetric contrast alignment strategy is adopted, guiding only visual and textual modalities to align unidirectionally towards topological anchor points, and defining a set of modality pairs. A contrastive learning framework is constructed based on InfoNCE loss to quantify the alignment error between modality pairs. A temperature coefficient is used to control the discriminative power between positive and negative samples, amplifying the similarity of positive samples and suppressing interference from negative samples. The calculation method is as follows: ; in Fixed as topological representation , For visual representation Or text representation , For temperature coefficient, For the negative sample set, For relationship-aware similarity measurement. During the alignment process, Keep the parameters fixed, only optimize and The parameters are set to minimize the loss.

[0028] Step 2.3 Relationship-Aware Similarity Measurement: Different relation semantics have different requirements for entity features. To adapt the alignment process to relation characteristics, a relation-specific learnable matrix is ​​introduced. The entity embeddings are linearly transformed using this matrix, and the similarity of the transformed embeddings is then calculated to achieve a relationship-aware similarity measure. This allows the model to dynamically adjust the importance of feature dimensions based on the relationship type. The calculation method is as follows: ; Step 2.4 Topology-Aware Weighted Loss Optimization: The topological role of an entity directly determines its dependence on visual and textual modalities. Homogeneity dominates the local structural semantic coherence of an entity, and visual modal features are more likely to complement it; heterogeneity dominates the significant global functional association of an entity, and textual modality can more accurately express its abstract semantic association. Therefore, it is necessary to dynamically adjust the loss weights for cross-modal alignment based on the entity's topological role to make the alignment process more adaptable to the entity's structural characteristics. First, calculate the entity's topological role. Homogeneity preference ratio This ratio is achieved through the entity Homogeneous subgraph encoding embedding With heterogeneous subgraph encoding embedding The modulus ratio is obtained, that is Used to quantify entities The proportion of homogeneous characteristics is such that the closer the ratio is to 1, the more homogeneous the entity tends to be, while the closer it is to 0, the more heterogeneous the entity tends to be.

[0029] Based on this ratio, modal alignment weights are assigned. Specifically, for homogeneous dominant entities, visual modal alignment weights are set. Enhance the alignment strength between visual and topological modalities; for heterogeneous dominant entities, set text modal alignment weights. This strengthens the alignment between textual and topological modalities. Among them... , To meet The balance coefficient, determined through validation set optimization, is used to coordinate the overall contribution ratio of visual and textual modalities. (Based on training batches...) The modal alignment loss of all entities within a batch is weighted and summed, then divided by the total number of entities in the batch. Obtain the overall alignment loss The specific calculation method is as follows: ; in , , , Entities Topological, visual, and textual representations.

[0030] By minimizing the overall alignment loss Complete cross-modal semantic alignment training to obtain multimodal entity representations in a unified semantic space. This representation is an aligned multimodal fusion representation, which can serve as the basic feature for subsequent inference; however, the contribution of different modal information in actual query scenarios varies, and further reliability assessment and adaptive fusion of each modal information are required.

[0031] The third step is dynamic reasoning routing based on query uncertainty awareness: the structural determinism of the query is evaluated based on the multimodal entity representation to obtain the query confidence, and a reasoning path is dynamically allocated to the query based on the confidence. In this embodiment, after completing cross-modal semantic alignment, in order to solve the problem of the difference in reliability of information from different modalities, this step dynamically adjusts the participation level of multimodal information by quantifying query uncertainty, and achieves adaptive fusion through a gating mechanism, thereby reducing computational complexity while ensuring inference accuracy.

[0032] The third step in this embodiment is as follows: Step 3.1 Query confidence calculation: To evaluate the structural determinism of the query, based on multimodal entity representation... The Tucker tensor fusion method is used to process query triples. This method performs computations by modeling high-order interaction features of head entities and relationships using core tensors, enabling rapid probabilistic prediction of candidate tail entities and obtaining the initial predicted probability distribution of candidate tail entity t. The entropy value is calculated based on this distribution. ,in, For a knowledge graph entity set, For any candidate tail entity in the entity set, the lower the entropy value, the more concentrated the prediction and the higher the confidence level; this is achieved by dividing by the total number of entities. The natural logarithm is normalized, and then the query confidence is obtained by subtracting the normalized entropy value from 1, resulting in a query confidence value strictly within the range [0,1]. The calculation formula is as follows: ; Step 3.2 Dynamic Routing Decision: Set Gating Threshold and buffer Based on the query confidence score calculated in step 3.1 To query triples Inference paths are dynamically allocated, and differentiated inference strategies will be implemented for different paths. The specific routing rules are as follows: like This indicates that the query structure is clear and the prediction certainty is high, so it is routed to the fast path in step 4.1 and efficient and fast reasoning is adopted; like This indicates that the query structure is ambiguous and semantically ambiguous, so it is routed to the slow path in step 4.2 and deep generative reasoning is used. like This indicates that the uncertainty of the query is within the boundary range, triggering the dual-path collaborative reasoning in step 4.3, which combines the advantages of both to improve the reliability of the prediction.

[0033] Step 3.3 Multimodal Prefix Adaptation: Multimodal prefix adaptation is achieved through a Relation-Aware Alignment Layer (RAAL). For the complex queries in Step 3.2 that route to the slow path, the multimodal representation needs to be transformed into an input format understandable by the Large Language Model (LLM). To this end, a multimodal prefix adapter is designed, which contains three independent modality-specific projection networks. , , , respectively , , Mapped to the token embedding space of the LLM. Each projection network achieves feature dimension adaptation through two linear transformations, introducing a GELU activation function to enhance feature expressiveness. Finally, layer normalization is applied to stabilize the feature distribution, ensuring that the transformed multimodal features accurately fit the LLM input space. The calculation method is as follows: ; in, , , This is the weight matrix. , For bias terms; After mapping, they are concatenated into a continuous virtual prefix token. As a prompt prefix, it guides the LLM to focus on the multimodal context related to the query.

[0034] Step 4: Dual-path collaborative prediction and result fusion: The query is predicted through two complementary inference paths, and the prediction results of each path are dynamically fused to output the completed result. In this embodiment, the dynamic fusion of two complementary reasoning strategies balances reasoning efficiency and accuracy, providing the optimal reasoning solution for queries of varying complexity.

[0035] The fourth step in this embodiment is as follows: Step 4.1 Fast Path Inference: For the query triples routed to the fast path in Step 3.2, an efficient model based on Tucker tensor decomposition is used to model the high-order interactions of the head entity, relation, and tail entity. This model is suitable for simple queries with clear structures. The process is as follows: Step 4.1.1 Tensor Fusion Calculation: First, represent the multimodal entities... Processed by LayerNorm and linearly projected to the dimension The multimodal fusion representation of the head entity is obtained. Then Relational embedding Tail entity embedding Input the Tucker tensor model, through the core tensor Modeling the third-order interaction among the three, where tensor product Used for embedding fusion relationships Embedded with tail entity Finally, the candidate tail entities were calculated. Reasonableness score The calculation method is as follows: ; in, The core tensor is used to model the third-order interactions between entities and relations. Represents tensor product.

[0036] Step 4.1.2 Complexity Optimization: To reduce the core tensor The storage and computational overhead is decomposed into small core tensors using Tucker decomposition. With three factor matrices , , The product of can be decomposed as follows: ; in , , Representing the core tensor respectively Perform matrix multiplication in the first, second, and third dimensions. Dimensions , The Tucker decomposition predefines the low-rank dimension, satisfying , , , All dimensions This significantly reduces computational complexity while retaining modeling capabilities.

[0037] Step 4.1.3 Prediction Distribution Generation: For all candidate entities Reasonableness score Perform Softmax normalization. To further reduce computational complexity, a probability distribution on the candidate entities is obtained from a pre-selected set of candidate entities based on rules or retrieval. The calculation method is as follows: ; Step 4.2 Slow Path Inference: This is completed using the Subgraph Hint Generation Module (SPPM). For the query triples routed to the slow path in Step 3.2, combined with the multimodal prefix tokens obtained in Step 3.3, deep inference is performed using the semantic modeling capabilities of LLM. Through multimodal feature alignment and structured subgraph hint construction, accurate matching of multimodal information with the model embedding space is achieved, and structured hint transformation is completed. Finally, enhanced hints are constructed and the output space is constrained. The process is as follows: Step 4.2.1 Structured Subgraph Construction: Based on Multimodal Entity Representation And the homogeneity-heterogeneity decoupling encoding results, extract the head entity. The study employs two types of complementary triples: first, tightly connected local neighborhood triples in homogeneous subgraphs, used to capture the local semantic coherence of entities; and second, heterogeneous triples with the same relation *r* but connecting different communities, used to reveal global relational patterns. The total subgraph size is limited to a preset threshold to balance structural information coverage and computational efficiency.

[0038] Step 4.2.2 Multimodal Context Transformation: Convert information from different modalities into a unified text format for easier understanding by the LLM. The structural triple is converted into a canonical textual representation of "entity A-relation-entity B"; Visual features are converted into concise semantic descriptions through a pre-trained visual model; Extract core attribute keywords from text features to form a concise semantic overview.

[0039] Step 4.2.3 Hierarchical Enhancement Hints: Construct hierarchical enhancement hints to ensure that the LLM can fully utilize multi-source information and guide it to focus on querying relevant contextual information. The specific hint structure includes system inference instructions with a clear task objective of predicting tail entities in the knowledge graph, and multimodal prefix tokens. Structured sub- Figure 3 Tuples and multimodal descriptions, query triples to be completed .

[0040] Step 4.2.4 Constraint Generation and Distribution Output: The enhanced prompts are input into the LLM, and a two-level constraint mechanism is used to limit the output range, avoiding the generation of entities outside the knowledge graph. Firstly, logit masking is applied at the LLM output layer, setting the logit odds of non-entity words to a specific value. Secondly, through retrieval enhancement, the cosine similarity between the generated text and the entity embeddings in the knowledge graph is calculated. Only text with a cosine similarity higher than a threshold is retained. The knowledge graph entities. The final predicted distribution of candidate entities. .

[0041] Step 4.3 Dual-Path Collaborative Inference: For the query triples that trigger dual-path collaborative inference in Step 3.2, the fast path prediction distribution obtained in Step 4.1 and the slow path prediction distribution obtained in Step 4.2 are combined to achieve a smooth transition and dynamic fusion of the dual-path inference results. We define gating weights. It is used to quantify the fusion ratio of fast and slow pathways, and its calculation method is based on the Sigmoid activation function. Construction, specifically ,in The threshold value defined in step 3.2 This is the query confidence score calculated in step 3.1. This gating weight... Its core function is to base the query confidence level on the query confidence level. With preset gate threshold The relative relationship between the two paths is used to dynamically allocate inference weights, including the transition sharpness parameter. Used to control the smoothness of the switching. The larger the value, the more drastic the weight change when the confidence level crosses the threshold; the smaller the value, the smoother the switching process. Final prediction distribution. It is obtained by weighted summation of the prediction results of the two pathways, and the calculation method is as follows: ; Based on the final predicted distribution Select the entity with the highest probability as the tail entity and output the completed triplet. .

[0042] Figure 1 This diagram illustrates the overall framework of the knowledge graph completion method based on dynamic routing and dual-path reasoning of this invention. It clearly shows the entire process from multimodal input to completion result output, mainly comprising a perception layer, a fusion layer, a dynamic routing module, and a dual-path reasoning fusion module. The perception layer corresponds to step 1, the fusion layer to step 2, the dynamic routing module to step 3, and the dual-path reasoning fusion module to step 4. The perception layer processes structural information through homogeneous and heterogeneous GNN streams to generate a topological representation. Visual representations are generated by processing image information using a VGG-16 model with two layers of MLP projection. Text representation is generated by processing text information through BERT-[CLS] with linear layer projection. All three output embedding vectors of the same dimension d; the fusion layer uses... Using semantic anchors, the multimodal aligned representation is output through modality-specific projection and asymmetric topology-guided contrast alignment. The dynamic routing module is based on The query confidence level is calculated by the basic predictor. And according to the formula Generate weight coefficients The system performs dynamic routing allocation for queries; the dual-path inference fusion module uses gating weights. The Tucker tensor fusion prediction score of the fast path and the generative inference result of the slow path are weighted and fused together. The model training and completion results are then completed through a scoring function and a loss module. The figure also marks two key components: RAAL (Relation-Aware Alignment Layer) and SPPM (Subgraph Prompt Preparation Module). RAAL corresponds to step 3.3, multimodal prefix adaptation, and is responsible for mapping multimodal representations to the LLM embedding space. SPPM corresponds to steps 4.2.1-4.2.3, and completes the construction of multimodal descriptions of homo- and hetero-prime subgraphs, context generation, and candidate entity selection.

[0043] Figure 2 This diagram illustrates the topology-guided cross-modal alignment process, visually demonstrating the core implementation logic of the fusion layer (corresponding to step 2). Based on multimodal inputs (visual / textual features / structural knowledge graphs), topological anchors are generated through modality-specific projections. Compared to visual / textual representation, we first decouple homogeneity and heterogeneity through topological role analysis to obtain entity role ratios and generate entity-specific role weights. Then, we calculate relationship-aware similarity and construct a basic alignment loss based on asymmetric InfoNCE. Finally, we minimize the overall alignment loss by combining a topologically aware weighting strategy, and output the aligned multimodal embedding. The two figures complement each other, fully demonstrating the core implementation logic of this invention, from structural decoupling coding and topology guidance alignment to dynamic routing and dual-path fusion.

[0044] This embodiment analyzes the actual effects of the present invention using specific data, as follows: 1) Baseline experiment; This invention conducts link prediction experiments on three publicly available standard multimodal knowledge graph datasets: FB15K-237-MM, WN18RR-MM, and MKG-2030, to verify the effectiveness of the proposed method. The datasets cover different sizes, relation complexities, and modal richness, comprehensively reflecting the applicability and robustness of the method across various scenarios. The experiments employ common evaluation metrics in the knowledge graph completion field. Mean Reciprocal Rank (MRR) measures the overall prediction accuracy of the model; Hit@1, Hit@3, and Hit@10 represent the percentage of correct entities appearing in the top 1, top 3, and top 10 positions of the prediction results, used to evaluate the model's ranking ability and inference reliability.

[0045] To fully verify the advancement of the method in this invention, mainstream technical approaches in the current field were selected as baseline methods for comparison, including traditional structural knowledge graph reasoning methods, multimodal knowledge graph reasoning methods, and large language model-enhanced reasoning methods. In traditional structural knowledge graph reasoning methods, TransE learns entity and relation embeddings based solely on translation hypotheses, failing to handle complex semantic relationships; RotE uses rotational embeddings to model relational patterns, but relies entirely on topological structures, failing to utilize multimodal information, thus limiting overall performance. In multimodal knowledge graph reasoning methods, MKGC achieves multimodal fusion through simple concatenation, lacking a targeted alignment mechanism; MMKG employs a unified cross-modal strategy, unable to adapt to different structural features of entities, resulting in insufficient modal complementarity. In large language model-enhanced methods, LKGC introduces a large model to improve reasoning accuracy, but the reasoning process is complex and computationally expensive; PromptKG optimizes input formats based on prompt learning, but still fails to solve the core problems of dynamic routing and efficient fusion, limiting its practicality.

[0046] Table 1 records the link prediction performance of each method on different datasets. URD-KGC is the knowledge graph completion method based on dynamic routing and dual-path reasoning proposed in this invention. Experimental results show that URD-KGC achieves the best performance on all datasets, especially in terms of MRR, which is an average improvement of 0.5 percentage points compared to the second-best method. On the MKG2030 dataset, which has the most complex structure and the richest relationship types, URD-KGC's performance advantage is even more significant, with an MRR of 0.568. This indicates that the multi-scale topological representation of this invention can effectively capture the semantics of entities and relationships under complex structures. A comparison of method categories reveals that traditional structural knowledge graph models rely solely on topological information and do not incorporate visual and textual modalities, resulting in a clear performance ceiling. Multimodal knowledge graph methods often employ fixed cross-modal alignment methods, making it difficult to adapt to heterogeneous structural entities. While large model enhancement methods improve inference accuracy, they suffer from high computational costs and slow inference speeds. In contrast, URD-KGC relies on topological anchors to achieve multi-scale semantic representation. Through dynamic routing and dual-path fusion mechanisms, it efficiently integrates structural information and multimodal information. While ensuring completion accuracy, it better balances the collaborative needs of structural semantic learning and modal information complementarity, resulting in more stable and superior overall performance.

[0047] Table 1 shows the link prediction performance of each method on different datasets;

[0048] 2) Analysis of reasoning efficiency and robustness; Table 2 compares the inference efficiency of URD-KGC with mainstream baseline methods on three datasets. URD-KGC achieves a processing speed of 347.6 ± 15.2 queries / second on FB15K-237-MM, significantly higher than the average level of LLM augmentation methods, while maintaining high prediction accuracy. Further analysis shows that approximately 80.3% of queries are routed to the fast path, indicating that the uncertainty gating mechanism effectively diverts high-confidence queries, reducing computational costs while maintaining accuracy.

[0049] Figure 3 The overall distribution of routing decisions is shown: the fast path handles 55% of queries, primarily dealing with simple scenarios with clear structures; the slow path handles 20% of queries, used for scenarios with complex relational semantics; and the dual paths collaboratively handle 25% of boundary queries to balance inference reliability. In terms of query volume, the fast path handles significantly more queries than the other two types. This aligns with the characteristics of actual knowledge graph inference, where simple queries account for a high proportion, complex queries account for a low proportion but require guaranteed accuracy, and also demonstrates that the gating mechanism can rationally allocate computational resources.

[0050] Figure 4 Showing different Performance-efficiency trade-off curves under threshold conditions. By adjusting the threshold, the model can smoothly transition between an MRR of 0.51-0.54 and a query speed of 250-380 queries / second to adapt to different application scenarios. In contrast, methods such as GLTW are fixed in the high-precision, low-efficiency region, and methods such as R-GCN are fixed in the low-precision, high-efficiency region, lacking this dynamic adaptability.

[0051] Table 2 compares the inference efficiency of URD-KGC with mainstream baseline methods on three datasets;

[0052] 3) Ablation experiment; All ablation experiments were conducted on FB15K-237-MM, a dataset with diverse structural patterns and balanced multimodal coverage, suitable for component analysis. The contributions of each component to overall performance were evaluated by removing or replacing key modules one by one; the remaining configurations were consistent with the full model.

[0053] 3.1 Impact of Topology-Guided Alignment Mechanism: Table 3 compares the effects of different alignment strategies. Under noise-free conditions, the MRR of the asymmetric alignment strategy reaches 0.540, which is better than that of symmetric alignment (0.519). It is noteworthy that symmetric alignment results in misalignment ("semantically similar but structurally contradictory") in approximately 31.6% of queries, while the asymmetric design reduces this proportion to 8.2%. This difference stems from the fundamental difference between the two strategies: symmetric alignment allows bidirectional interaction between all modalities but is susceptible to noise interference; asymmetric design uses topological modalities as anchors, guiding only non-topological modalities to align with the topological representation. This unidirectional constraint improves the robustness of the model representation. Figure 3 , Figure 4 The corresponding experimental results further confirm this viewpoint: the MRR of asymmetric alignment decreased by only 6.8%, significantly better than the 15.3% performance degradation of symmetric alignment. This indicates that using topological modes as alignment anchors can maintain high accuracy while effectively handling modal noise and structural complexity in real-world scenarios.

[0054] Table 3 shows the effects of different alignment strategies;

[0055] 3.2 Impact of Relationship-Aware Similarity: Table 4 evaluates the role of relationship-aware similarity metrics. Experimental results show that the basic cosine similarity performs the weakest, with an MRR of only 0.521, making it difficult to adapt to different relational semantics. The fixed bilinear transformation shows some improvement, achieving an MRR of 0.529, but still does not fully consider relational specificity. In contrast, our relationship-aware design achieves the best performance, with an MRR of 0.540, showing particularly significant improvement in complex relations such as located_in.

[0056] Experimental results show that introducing the relation-aware matrix further improves the matching performance of entity pairs with the same relation, indicating that relation-specific similarity modeling helps capture the semantic differences corresponding to different relations. This capability enables the model to more accurately capture semantically specific patterns such as spatial and occupational relations. Notably, the relation-aware matrix significantly improves performance while maintaining computational efficiency with only a 0.8% increase in parameters, supporting the theoretical design of the relation-specific metric in Section 3.3.3.

[0057] Table 4. The role of relation-perceived similarity measurement;

[0058] 3.3 Impact of Topology-Aware Weighting: Table 5 shows the impact of the topology-aware weighting mechanism. We evaluated three weight allocation strategies: (1) uniform weighting, where all entities use the same weight; (2) random weighting, where weights are randomly assigned; and (3) topology-aware weighting, where weights are assigned based on the homogeneity or heterogeneity of entities. Experimental results show that the topology-aware weighting strategy achieves an MRR of 0.540, comparable to uniform weighting, but outperforms in terms of correct route ratio and result stability.

[0059] Analysis shows that this advantage stems from the adaptive nature of the weighting mechanism. When the homogeneity preference of entities is close to the median, the model can balance the contributions of visual and textual modalities. When the preference is extreme, the weight allocation will more specifically select the modality most suitable for the structural role. This dynamic adjustment allows the model to avoid the limitations of a uniform weighting strategy and adopt a better representation for different structural patterns. At the same time, this mechanism only introduces lightweight weight modulation at the entity level, without involving additional graph propagation or cross-modal interaction, and the computational overhead is negligible.

[0060] Table 5 shows the impact of the topology-aware weighting mechanism.

[0061] Comprehensive ablation experiments show that the model achieves optimal performance when the three mechanisms—topology-guided alignment, relation-aware similarity, and topology-aware weighting—work synergistically. This demonstrates that the components complement each other, jointly constructing a structure-aware multimodal alignment framework. The alignment mechanism ensures the dominance of structural semantics, the similarity metric adapts to different relational characteristics, and the weighting is optimized for entity structural roles. This synergistic effect validates the rationality of our design.

[0062] Ablation studies validated the positive effects of each module, but URD-KGC still exhibits performance limitations in certain specific scenarios. Analysis of erroneous prediction examples revealed two main failure modes: The first type of failure occurs in queries where both structure and semantics are highly ambiguous. In these samples, the local topological structure of entities lacks clear patterns, and the semantic information provided by textual and visual modalities is sparse or generalized. For example, for queries with abstract relational semantics and a limited number of connected entities, the topology guidance mechanism struggles to form stable anchor points, while multimodal information is insufficient to provide clear distinguishing signals, leading to low-confidence predictions in both fast and slow paths. The second type of failure mainly involves boundary samples where the roles of entity structures change rapidly. When entities exhibit both homogeneous and heterogeneous characteristics in different subgraphs or relational contexts, the topology-aware weighting mechanism may be insufficiently adapted in modal allocation, thus affecting the final prediction results. This is particularly evident when entity connections are highly diverse, but a single query only exposes a local structure.

[0063] It should be noted that the aforementioned failure cases account for a relatively small percentage of the test set, and most are concentrated in high-uncertainty regions. This aligns with the design intent of the uncertainty gating mechanism, which allocates these cases to slow or dual-path processing. Overall, these analyses reveal that the current method still has room for improvement in scenarios with extreme structural ambiguity or rapid switching of structural roles, providing a reference for future research in dynamic structural modeling and context-aware alignment.

[0064] The embodiments described in this specification are merely examples of implementations of the inventive concept and are for illustrative purposes only. The scope of protection of this invention should not be considered limited to the specific forms described in these embodiments; rather, it extends to equivalent technical means conceived by those skilled in the art based on the inventive concept.

Claims

1. A knowledge graph completion method based on dynamic routing and dual-path reasoning, characterized in that, The method includes the following steps: Step 1: Multimodal entity feature perception and unified encoding: Extract modality-specific feature representations from the structural data of the knowledge graph, the visual data of entity associations, and the text description data, and unify the representations of each modality to the same dimension; The second step is cross-modal semantic alignment based on topological constraints: using the topological representation as a semantic anchor point, the visual modality and the text modality are guided to perform asymmetric alignment to the anchor point to obtain multimodal entity representations in a unified semantic space; The third step is dynamic routing based on query uncertainty awareness: the structural determinism of the query is evaluated based on the multimodal entity representation to obtain the query confidence, and inference paths are dynamically allocated to the query based on the confidence. Step 4: Dual-path collaborative prediction and result fusion: The query is predicted through two complementary reasoning paths, and the prediction results of each path are dynamically fused to output the complete result.

2. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 1, characterized in that, The process of the first step is as follows: Step 1.1 Topological coding for homogeneity-heterogeneity decoupling: Extract the neighborhood subgraph of the target entity, divide the edges in the subgraph into homogeneous edge sets and heterogeneous edge sets according to structural attributes, encode them using differentiated message aggregation mechanisms, and then adaptively weight and fuse the two types of encoding results through a gating unit to obtain the topological representation; Step 1.2 Visual Feature Encoding: Visual semantic features are extracted from the image data of entity association, and after dimensional mapping and normalization, a visual representation consistent with the topological representation dimension is generated; Step 1.3 Text Feature Encoding: Extract global semantic information from the natural language description data of entities, and generate a text representation consistent with the topological representation dimension after linear projection and normalization.

3. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 2, characterized in that, The process of step 1.1 is as follows: Step 1.1.1 Neighborhood Subgraph Extraction: Extract the multi-hop neighborhood subgraph of the target head entity. The multi-hop neighborhood subgraph includes entities within a preset number of hops centered on the head entity and the edges connecting these entities. Step 1.1.2 Edge set partitioning and subgraph construction: Calculate the comprehensive similarity based on the structural similarity and semantic consistency between entities, and divide the edges in the neighborhood subgraph into homogeneous edge sets and heterogeneous edge sets according to the preset similarity threshold, forming homogeneous subgraphs and heterogeneous subgraphs respectively; Step 1.1.3 Differentiated message passing: Homogeneous subgraphs are encoded using a standard message aggregation mechanism, and heterogeneous subgraphs are encoded using a biased message aggregation mechanism, capturing semantically coherent homogeneous associations and functionally complementary heterogeneous associations respectively; Step 1.1.4 Gated Adaptive Fusion: The encoding results of homogeneous and heterogeneous subgraphs are adaptively weighted and fused through learnable gating units to obtain the topological representation.

4. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 3, characterized in that, The process in step 1.1.3 is as follows: Step 1.1.3.1 Standard aggregation mechanism for homogeneous subgraphs: After performing relation-specific transformations on the embeddings of neighboring entities, the embeddings are merged with those of neighboring entities. Then, all neighbor information is integrated through an aggregation function, and the aggregation result is residually fused with the current embedding of the target entity to output the homogeneous subgraph encoding. Step 1.1.3.2 Heterogeneous subgraph bias aggregation mechanism: Calculate the feature bias between the neighbor entity embedding and the target entity embedding, fuse the bias with the relation-specific transformation result and then aggregate it, and finally perform residual fusion with the current embedding of the target entity to output the heterogeneous subgraph encoding.

5. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in any one of claims 2 to 4, characterized in that, The second step is as follows: Step 2.1 Topology Anchor Construction: Use the topology representation obtained in Step 1 as the unique anchor point for cross-modal alignment; Step 2.2 Asymmetric Contrast Alignment: Using topological representation as the alignment benchmark, the alignment weights of visual and textual modalities are adaptively assigned according to the topological roles of entities to achieve differentiated asymmetric alignment; Step 2.3 Relationship-aware similarity measurement: A relationship-aware similarity mechanism is adopted to enable cross-modal alignment to adapt to the semantic features of different relationship types; Step 2.4 Topology-aware weighted constraints: The alignment process is weighted based on the entity topology roles to finally obtain multimodal entity representations in a unified semantic space.

6. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 5, characterized in that, The process of the third step is as follows: Step 3.1 Query confidence calculation: Based on the multimodal entity representation, a preliminary prediction of the query is made to obtain the probability distribution of candidate entities; the structural determinism of the query is evaluated based on the dispersion of the probability distribution to obtain the normalized query confidence. Step 3.2 Dynamic routing decision: Based on the comparison between the query confidence and the preset threshold, dynamically assign the corresponding inference path to the current query; Step 3.3 Multimodal prefix adaptation: Construct a multimodal prefix adaptation structure, mapping topological, visual, and textual modal features to the input space of the large language model, forming multimodal virtual prefixes adapted to the understanding of the large language model, which are used to guide subsequent semantic reasoning.

7. The dynamic routing and double-pass reasoning based knowledge graph completion method of claim 6, wherein, The process of the fourth step is as follows: Step 4.1 Fast Path Inference: For queries that route to fast paths, an efficient structured reasoning method is used to model the high-order interactions between entities and relationships to complete fast link prediction; Step 4.2 Slow Path Inference: For queries that are routed to slow paths, deep inference is performed based on multimodal prefix guidance and the semantic understanding capabilities of the large language model. Structured hints and output constraints are used to ensure that the prediction results are within the knowledge graph. Step 4.3 Dual-path collaborative reasoning: For queries that trigger collaborative reasoning, the prediction results obtained from the fast path and the slow path are adaptively weighted and fused to obtain the final knowledge graph completion result.

8. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 7, characterized in that, The process of step 4.1 is as follows: Step 4.1.1 Multimodal Fusion Representation: Project the multimodal entity representations in the unified semantic space to obtain the multimodal fusion representation of the head entity; Step 4.1.2 Higher-order interaction modeling: Based on multimodal fusion representation, relation features and candidate tail entity features, a higher-order interaction model is constructed to obtain the rationality score of the candidate tail entity; Step 4.1.3 Prediction Distribution Generation: Normalize the rationality scores of candidate entities to obtain the prediction probability distribution of candidate entities under the fast path.

9. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 7, characterized in that, The process of step 4.2 is as follows: Step 4.2.1 Structured Subgraph Construction: Based on multimodal entity representation and topological decoupling features, extract local neighborhood triples and global relation triples corresponding to the head entity to form complementary structural information; Step 4.2.2 Multimodal context transformation: Convert topological, visual, and textual modal information into a unified textual context to meet semantic understanding requirements; Step 4.2.3 Hierarchical Enhancement Hints: Combine task instructions, multimodal prefixes, and structured context to construct hierarchical enhancement hints; Step 4.2.4 Constraint Generation and Output Distribution: Semantic reasoning is performed based on hierarchical enhanced prompts, and a two-layer constraint mechanism is used to limit the prediction results within the knowledge graph entity range to obtain the prediction probability distribution under the slow path.

10. The knowledge graph completion method based on dynamic routing and dual-path reasoning as described in claim 7, characterized in that, Step 4.3 includes: Step 4.3.1 Fusion Weight Generation: Based on the relationship between query confidence and preset threshold, dynamically generate fusion weights for fast and slow paths; Step 4.3.2 Adaptive Result Fusion: Based on the fusion weight, the prediction results of the two pathways are adaptively weighted and fused to obtain the final prediction distribution; Step 4.3.3 Output the completion result: Select the entity with the highest probability as the completion result based on the final predicted distribution, and output the complete knowledge graph triple.