An image diagnosis auxiliary decision-making method and device based on a knowledge graph and a medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By preprocessing and extracting features from image and text data, and combining knowledge graph instances for multimodal feature mapping and fusion, the problems of insufficient multimodal medical information fusion and insufficient interpretable reasoning under knowledge guidance are solved, thus achieving efficient support for the diagnosis of complex diseases.

CN122290975APending Publication Date: 2026-06-26南昌大学第一附属医院

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: 南昌大学第一附属医院
Filing Date: 2026-04-30
Publication Date: 2026-06-26

Application Information

Patent Timeline

30 Apr 2026

Application

26 Jun 2026

Publication

CN122290975A

IPC: G16H50/20; G16H30/40; G16H30/20; G16H50/70; G16H10/60; G06N5/025; G06N5/045; G06F18/25; G06F40/30; G06F18/213; G06F40/295; G06N3/0895

AI Tagging

Technology Topics

Medical record Semantic vector

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies have room for improvement in terms of deep fusion of multimodal medical information and interpretable reasoning guided by knowledge. Image features and text semantics are usually simply spliced together after independent encoding, lacking modeling of cross-modal semantic consistency. This makes it difficult to effectively overcome the semantic gap between modalities. Furthermore, the application of knowledge graphs has failed to fully realize the dynamic alignment and deep interaction between individual patient features and graph structure, limiting the reasoning ability for differential diagnosis of complex diseases.

Method used

By collecting image data and medical record text data, and performing preprocessing, image feature data and text semantic data are generated. A structured medical knowledge association dataset is established, and a contrastive learning method is used to map it to a unified semantic space to generate a fused trimodal feature matrix. This matrix is then combined with knowledge graph instances for matching and reasoning to output a set of candidate diagnostic paths, ultimately generating a structured diagnostic result.

Benefits of technology

It achieves deep alignment and collaborative characterization of multi-source medical data, improves the coverage and accuracy of auxiliary diagnosis in complex cases, and enhances the practicality of clinical auxiliary decision-making and physician trust.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122290975A_ABST

Patent Text Reader

Abstract

This invention discloses a knowledge graph-based image diagnosis auxiliary decision-making method, device, and medium, relating to the field of medical artificial intelligence technology. The method includes: performing entity recognition and semantic relation extraction on a cleaned medical record text dataset and medical knowledge resources to establish a structured medical knowledge association dataset and generate knowledge graph instances; extracting features from image feature data and text semantic data to obtain image feature vectors, text semantic vectors, and graph node vectors, and mapping them to a unified semantic space using a contrastive learning method to obtain a fused trimodal feature matrix; performing knowledge reasoning on a set of candidate diagnostic paths, calculating the confidence level of disease nodes based on weighted calculations, and generating a candidate disease list and reasoning chain; by mapping the image, text, and knowledge trimodal features to a unified semantic space through contrastive learning and adaptively fusing them, the method improves the coverage and accuracy of auxiliary diagnosis in complex cases and enhances the practicality of clinical auxiliary decision-making.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical artificial intelligence technology, and in particular to a method, device and medium for image diagnosis auxiliary decision-making based on knowledge graph. Background Technology

[0002] With the rapid development of medical imaging technology, computer-aided diagnosis (CAD) based on artificial intelligence is increasingly widely used in clinical diagnosis and treatment. Traditional CAD systems mainly rely on single-modal image data, using deep learning models such as convolutional neural networks to detect, segment, and classify lesion areas. In recent years, multimodal fusion methods have gradually become a research hotspot. Some advanced systems attempt to combine image data with text information in electronic medical records, using natural language processing technology to extract key clinical features. Knowledge-driven methods have also received widespread attention. Among them, medical knowledge graphs, as a structured semantic network, can effectively organize entities such as diseases, symptoms, examinations, and treatments and their interrelationships, providing prior knowledge support for clinical decision-making.

[0003] Existing technologies still have room for improvement in the deep fusion of multimodal medical information and interpretable reasoning guided by knowledge. Image features and text semantics are usually fused by simply splicing them together after independent encoding, lacking modeling of cross-modal semantic consistency, making it difficult to effectively overcome the semantic gap between modalities. The application of knowledge graphs is mostly limited to static retrieval or shallow matching, failing to fully realize the dynamic alignment and deep interaction between individual patient characteristics and graph structure, thus limiting its reasoning ability in the differential diagnosis of complex diseases. When faced with different diseases with similar imaging manifestations, how to accurately locate relevant paths from massive medical knowledge and generate clinically interpretable diagnostic suggestions in combination with specific patient characteristics remains a challenge for current technologies. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides an image diagnosis auxiliary decision-making method based on knowledge graphs to solve the problems of insufficient fusion of multimodal medical information and insufficient interpretable reasoning ability under knowledge guidance.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] In a first aspect, the present invention provides an image-based diagnostic decision support method based on knowledge graphs, comprising: collecting image data and medical record text data, and preprocessing them to obtain image feature data and text semantic data; performing entity recognition and semantic relationship extraction on the cleaned medical record text dataset and medical knowledge resources to establish a structured medical knowledge association dataset and generate knowledge graph instances; extracting features from the image feature data and text semantic data to obtain image feature vectors, text semantic vectors, and graph node vectors, and mapping them to a unified semantic space using a contrastive learning method to obtain a fused trimodal feature matrix; matching the fused trimodal feature matrix with the knowledge graph instances to retrieve the most relevant disease nodes, image manifestation nodes, and relationship paths, and outputting a candidate diagnostic path set; performing knowledge reasoning on the candidate diagnostic path set, calculating the confidence of disease nodes based on weighted calculations, and generating a candidate disease list and reasoning chain; and performing assisted diagnosis through the candidate disease list and reasoning chain to generate structured diagnostic results.

[0008] As a preferred embodiment of the knowledge graph-based image diagnosis auxiliary decision-making method of the present invention, the specific steps for obtaining image feature data and text semantic data are as follows:

[0009] The raw image data files are obtained from the medical image acquisition equipment, decoded in DICOM format, and normalized in size according to the spatial proportion information of the lesion area to generate a standardized image dataset.

[0010] Based on the standardized image dataset, Gaussian filtering was used to remove image noise, and contrast stretching was performed on the image edge regions to obtain an enhanced image dataset. The enhanced image dataset was then subjected to preliminary region segmentation, and lesion boundary coordinates, shape features, and texture statistics were extracted to generate a structured image matrix.

[0011] Patient medical record text data is extracted from electronic medical records, and sentence segmentation, word segmentation and stop word removal are performed to form a cleaned medical record text dataset. Then, medical natural language processing algorithms are used to perform named entity recognition and dependency parsing to generate a structured medical semantic entity set.

[0012] The structured image matrix is input into a feature encoding network to extract image depth features and generate image feature data; the structured medical semantic entity set is vectorized to generate text semantic data.

[0013] As a preferred embodiment of the knowledge graph-based image diagnostic auxiliary decision-making method of the present invention, the specific steps for generating the knowledge graph instance are as follows:

[0014] Entity recognition is performed on the cleaned medical record text dataset and medical knowledge resources to extract medical entities related to diseases, imaging signs and anatomical locations, and they are classified according to semantic categories to generate a structured set of medical entities.

[0015] The key attributes of each entity are extracted from the structured medical entity set and semantically normalized to generate a standardized entity set. Dependency parsing and semantic role labeling are performed on the standardized entity set to identify the semantic relationships between entities and generate a medical relationship set.

[0016] The structured medical entity set, standardized entity set, and medical relation set are structured in the form of triples to generate a knowledge triple dataset. The knowledge triple dataset is vectorized and trained using a graph embedding algorithm, and node and edge structures are constructed based on entity connection relationships to generate knowledge graph instances.

[0017] As a preferred embodiment of the knowledge graph-based image diagnosis auxiliary decision-making method of the present invention, the specific steps for obtaining the image feature vector, text semantic vector, and knowledge graph node vector are as follows:

[0018] Image feature data is input into a convolutional neural network model, and multi-layer convolution, pooling and non-linear activation operations are performed on the structured image matrix to generate a primary image feature map. Then, a fully connected layer and global average pooling operation are used to generate an image feature vector.

[0019] Text semantic data is input into a medical language understanding model. Word dependencies are calculated through word embedding, positional encoding, and self-attention mechanisms to generate a sequence of text context semantic vectors. Sentence-level and discourse-level semantic aggregation is then performed to obtain text semantic vectors.

[0020] Entity nodes from knowledge graph instances are input into the knowledge graph embedding model, and entities and relations are trained to generate graph node vectors.

[0021] As a preferred embodiment of the knowledge graph-based image diagnosis auxiliary decision-making method of the present invention, the specific steps for obtaining the fused three-modal feature matrix are as follows:

[0022] Fully connected mapping and nonlinear activation are performed on image feature vectors, text semantic vectors, and map node vectors respectively, and then embedded into a unified latent semantic space to generate an embedded feature representation set.

[0023] Based on the embedded feature representation set, and according to the correspondence between medical record text and structured medical semantic entity set, positive sample pairs and negative sample pairs are constructed, and the cosine similarity of image feature vector, text semantic vector and graph node vector is calculated to generate a cross-modal similarity matrix.

[0024] By minimizing the distance between positive samples and maximizing the distance between negative samples, the distribution relationship of the cross-modal similarity matrix in the semantic space is optimized, and a trimodal feature parameter set is obtained.

[0025] Adaptive weights are assigned to image, text, and knowledge features in the trimodal feature parameter set, weighted fusion is performed, and the samples are arranged in sequence to form a high-dimensional matrix, resulting in the fused trimodal feature matrix.

[0026] As a preferred embodiment of the knowledge graph-based image diagnostic auxiliary decision-making method of the present invention, the specific steps for outputting the candidate diagnostic path set are as follows:

[0027] L2 normalization is performed on the feature vectors of various types in the fused three-modal feature matrix to generate a fused feature vector set; embedding vectors of disease nodes, image manifestation nodes and anatomical site nodes are extracted from knowledge graph instances, and a high-dimensional index structure is established for each node vector to generate a node vector index set.

[0028] Calculate the cosine similarity between the fused feature vector and the node vector index set to obtain a candidate node set. Starting from the candidate node set, perform multi-hop queries along the edge relationships of knowledge graph instances and extract intermediate relationship entities to generate a candidate relationship path set.

[0029] Based on the semantic similarity, relation weight, and path length among nodes in the candidate relation path set, a comprehensive relevance score is calculated for each path to obtain a weighted candidate path list. Structural information of disease nodes, imaging manifestation nodes, and relation chains is extracted to obtain a candidate diagnostic path set.

[0030] As a preferred embodiment of the knowledge graph-based image diagnosis auxiliary decision-making method of the present invention, the specific steps for generating the candidate disease list and reasoning chain are as follows:

[0031] The attribute information of disease, image and symptom nodes within the candidate diagnostic path set is extracted to generate a path feature description set, which is then transmitted and attenuated through the attention propagation mechanism to obtain a weighted node feature set.

[0032] The weighted node feature set is input into the graph neural network model. Through adjacency propagation and aggregation operations, the semantic correlation between disease nodes and adjacent nodes is calculated, a disease node representation vector set is generated, and logical constraint verification is performed to filter out semantic conflict paths and adjust abnormal edge weights to form a disease node set.

[0033] Based on the disease node set, the comprehensive confidence score is calculated and sorted from high to low to generate a candidate disease list. Relevant image manifestation nodes, symptom nodes and relationship information are extracted from the candidate disease list to construct a disease-image-symptom-location inference chain and output the inference chain.

[0034] As a preferred embodiment of the knowledge graph-based image diagnostic auxiliary decision-making method of the present invention, the specific steps for generating structured diagnostic results are as follows:

[0035] The candidate disease list and reasoning chain are analyzed to extract the confidence level, image performance features and symptom association information of disease nodes, and matched with image feature data to generate a disease confidence information set.

[0036] Based on the disease confidence information set, the disease name, confidence level, key imaging evidence, and semantic interpretation are structured, organized, and visualized to output structured diagnostic results.

[0037] In a second aspect, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program, wherein when the computer program is executed by the processor, it implements any step of the knowledge graph-based image diagnosis auxiliary decision-making method as described in the first aspect of the present invention.

[0038] Thirdly, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein: when the computer program is executed by a processor, it implements any step of the knowledge graph-based image diagnostic auxiliary decision-making method as described in the first aspect of the present invention.

[0039] The beneficial effects of this invention are as follows: by mapping the three-modal features of images, text and knowledge to a unified semantic space through contrastive learning and adaptive fusion, deep alignment and collaborative representation of multi-source medical data are achieved. This not only improves the coverage and accuracy of auxiliary diagnosis in complex cases, but also enhances the practicality of clinical auxiliary decision-making and doctors' trust. Attached Figure Description

[0040] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 This is a flowchart of a knowledge graph-based image diagnosis decision support method.

[0042] Figure 2 A flowchart for data preprocessing.

[0043] Figure 3 The flowchart for generating knowledge graph instances.

[0044] Figure 4 This is a flowchart of feature extraction and fusion. Detailed Implementation

[0045] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0046] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0047] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0048] Reference Figures 1-4 This is one embodiment of the present invention, which provides an image diagnostic auxiliary decision-making method based on knowledge graphs, including the following steps:

[0049] S1. Collect image data and medical record text data, and perform preprocessing to obtain image feature data and text semantic data.

[0050] S1.1. Obtain the original image data file from the medical image acquisition device, decode it in DICOM format, and perform size normalization processing based on the spatial proportion information of the lesion area to generate a standardized image dataset.

[0051] Specifically, the raw image data files collected by the medical image acquisition device are exported to the data processing environment through the image acquisition interface. The raw image data files are usually medical image digital imaging and communication format files. The communication format files contain image pixel matrix information, image acquisition parameter information and metadata information of the examination site.

[0052] Perform format decoding processing on medical image digital imaging and communication format files. Extract image frame sequence, resolution, gray level depth and spatial coordinate information by parsing the file header information, and restore the image pixel data into a processable two-dimensional or three-dimensional matrix form. Unify the gray level values of the decoded structured image matrix pixels to a gray level range, for example, standardize the original gray level value range to a floating-point number form between 0 and 1.

[0053] Size normalization is performed based on the spatial proportion information of the lesion area in the image. The scaling factor in the longitudinal and transverse directions is obtained by extracting the spatial coordinate range of the lesion area and the spatial range of the overall structured image matrix. The structured image matrix is then spatially resampled according to the scaling factor to obtain a standardized image dataset.

[0054] S1.2. Based on the standardized image dataset, Gaussian filtering is used to remove image noise, and contrast stretching is performed on the image edge region to obtain an enhanced image dataset; preliminary region segmentation is performed on the enhanced image dataset, and lesion boundary coordinates, shape features and texture statistics are extracted to generate a structured image matrix.

[0055] Specifically, Gaussian filtering is used to smooth the standardized image dataset. High-frequency noise interference is reduced by calculating the weighted average value in the neighborhood of the image pixels, and image noise is removed. The Gaussian kernel parameter, such as the kernel radius or standard deviation, is selected based on the edge detail features of the lesion area in the image data, so as to reduce background noise while preserving the clarity of the lesion edge structure.

[0056] Contrast stretching is performed on the edge regions of images in the standardized image dataset. The dynamic range of image grayscale is expanded by linear grayscale transformation, making the grayscale difference between the lesion edge and the surrounding tissue more obvious, enhancing the visual contrast of the image and highlighting the lesion feature information, thus forming an enhanced image dataset.

[0057] Based on the enhanced image dataset, pixel-by-pixel aggregation is performed on regions in the image with significant differences in brightness, texture, or edge features to distinguish lesion regions from background regions, and the image pixel set corresponding to the lesion region is extracted. After region differentiation, the coordinate data of the outer boundary of the lesion is extracted based on the gray-level change information of the lesion region and the background region. Geometric feature parameters of the lesion region are calculated based on the boundary coordinates, including shape indicators such as boundary length, lesion area, roundness, and aspect ratio. At the same time, the gray-level distribution of the lesion region is statistically analyzed to calculate the gray-level co-occurrence matrix features, including texture statistics such as contrast, energy, homogeneity, and entropy. By extracting the lesion boundary coordinates, shape features, and texture statistics, a structured image matrix is generated.

[0058] S1.3. Extract patient medical record text data from electronic medical records, perform sentence segmentation, word segmentation and stop word removal to form a cleaned medical record text dataset, and use medical natural language processing algorithms to perform named entity recognition and dependency parsing to generate a structured medical semantic entity set.

[0059] Specifically, parse the electronic medical record file through the database reading interface, identify the text paragraph content including the patient's chief complaint information, imaging examination description, laboratory examination results, past medical history, and doctor's diagnosis conclusion, and integrate the extracted medical record text in the order of paragraphs to form a continuous text input;

[0060] Based on the continuous text input, by identifying the positions of full stops, semicolons, line breaks, and common delimiters in medical reports, divide the long text into independent semantic sentences, perform word segmentation on each semantic sentence, and use dictionary matching and probability statistics to divide the continuous character sequence into lexical units with medical significance. The word segmentation process can separate medical terms, disease names, symptom descriptions, and anatomical part words individually, and filter out common function words, conjunctions, and punctuation marks with no actual semantic meaning through a stop word list, such as "de", "le", and "bingqie", etc., to form a cleaned medical record text dataset;

[0061] Based on the cleaned medical record text dataset, use medical natural language processing algorithms to perform named entity recognition operations, label the medical concept types in the text through entity recognition, identify medical semantic entities such as disease names, symptom manifestations, imaging signs, anatomical parts, and examination items, and record the start and end positions and entity category labels, identify syntactic structures such as subject-predicate, verb-object, modification, and coordination, establish the relationships between medical semantic entities, and organize the medical semantic entities and semantic relationships extracted from the text in a structured form to generate a structured medical semantic entity set.

[0062] S(1).4. Input the structured image matrix into the feature encoding network, extract the image depth features and generate image feature data; generate text semantic data by vectorizing and encoding the structured medical semantic entity set.

[0063] Specifically, input the structured image matrix into the input layer of the feature encoding network according to the unified channel number and spatial resolution. The feature encoding network performs multi-layer convolution operations on the structured image matrix, calculates the local weighted sum by sliding the convolution kernel on the two-dimensional structured image matrix, and extracts the low-level gray-scale change and edge feature information. After the convolution operation, perform a pooling operation, and use max pooling to select representative feature values within a fixed window. As the number of layers of the feature encoding network increases, the convolutional and pooling operations at each layer accumulate to obtain different levels of image feature representations. Flatten the high-level feature map, convert the two-dimensional matrix into a one-dimensional vector, and perform linear transformation and non-linear activation through a fully connected layer to obtain the image depth features. Arrange and format the image depth features of all samples in sequence to form image feature data;

[0064] For each medical semantic entity, a word embedding operation is performed. The word embedding operation is based on the co-occurrence relationship in the medical corpus, converting discrete words into dense continuous vectors. After vectorizing each medical semantic entity, a semantic vector sequence is generated. All semantic vectors are arranged and merged according to the entity order to form text semantic data.

[0065] S2. Perform entity recognition and semantic relation extraction on the cleaned medical record text dataset and medical knowledge resources, establish a structured medical knowledge association dataset, and generate knowledge graph instances.

[0066] S2.1. Perform entity recognition on the cleaned medical record text dataset and medical knowledge resources, extract medical entities such as diseases, symptoms, imaging signs and anatomical locations, classify them according to semantic categories, and generate a structured medical entity set.

[0067] Specifically, the natural language recognition method based on word matching and contextual feature analysis locates and annotates medical terms in the cleaned medical record text dataset. It identifies candidate words that may contain medical meanings through medical dictionaries or domain terminology databases, obtains the grammatical roles of candidate words in sentences, filters out medical terms with independent semantic meanings, performs semantic classification on the identified medical terms, marks words with disease meanings as disease entities, marks words describing clinical manifestations as symptom entities, marks words describing image observation features as image sign entities, and marks words representing the location of human body structures as anatomical site entities.

[0068] For each medical entity, the location index, contextual syntactic relationship, and semantic boundary information in the cleaned medical record text dataset are recorded, and a corresponding entity attribute table is established. The entity attribute table includes entity name, category label, frequency of occurrence, and semantic context information. Through entity recognition and semantic classification, a complete set of disease entities, symptom entities, imaging sign entities, and anatomical site entities is extracted from the new medical record text data and medical knowledge resources. All extracted entities are grouped and encoded according to semantic categories, uniformly formatted into a structured data structure, and organized according to entity type and attribute fields to generate a structured medical entity set.

[0069] S2.2. Extract the key attributes of each entity from the structured medical entity set and perform semantic normalization to generate a standardized entity set; perform dependency parsing and semantic role labeling on the standardized entity set to identify the semantic relationships between entities and generate a medical relationship set.

[0070] Specifically, key attribute information for each medical entity is extracted from the structured medical entity set. This key attribute information includes the entity name, entity category, semantic context, location of occurrence, syntactic dependency tags, and co-occurrence frequency with other entities. The corresponding attribute values are read through the field index of the entity attribute table and uniformly mapped to the standard attribute template. For entity names with multiple expressions under the same semantic category, such as "hypertension" and "primary hypertension", the semantically consistent entity names are normalized, mapping multiple language expressions to unified standard medical terms and replacing non-standard expressions with standardized names, while keeping the original entity category, context location, and attribute information unchanged, thus generating a standardized entity set.

[0071] Dependency parsing and semantic role labeling are performed on the standardized entity set. The syntactic dependency relationship between entities in the sentence is identified by the syntactic parsing algorithm, and the syntactic hierarchical information of the subject, predicate, object and modification structure is extracted. Based on the dependency structure, the semantic role labeling identifies the semantic function of each entity in the sentence. For example, entities that represent diseases are labeled as central entities, entities that describe symptoms or imaging signs are labeled as feature entities, and entities that describe anatomical locations are labeled as location entities.

[0072] By performing dependency parsing and semantic role labeling on all sentences, the semantic relationship types between entities are identified, including semantic association forms such as disease-symptom relationship, disease-imaging sign relationship, and symptom-location relationship. Based on the analysis results, entity pairs and their corresponding semantic relationships are recorded as relation entries in a structured form, and all relation entries are organized into a medical relation set.

[0073] S2.3. The structured medical entity set, standardized entity set, and medical relation set are structured in the form of triples to generate a knowledge triple dataset; the knowledge triple dataset is vectorized and trained using a graph embedding algorithm, and a node and edge structure is constructed based on the entity connection relationship to generate a knowledge graph instance.

[0074] Specifically, after obtaining the structured medical entity set, the standardized entity set, and the medical relation set, the three types of data are formatted and standardized. Based on the semantic relation type in the medical relation set, corresponding entity pairs are extracted from the structured medical entity set and the standardized entity set as the two ends of the relation. Each medical relation consists of a head entity, a relation type, and a tail entity. By combining the three in a fixed structure, a knowledge triplet is formed in the form of entity-relation-entity. Taking the association between disease entities and symptom entities as an example, when the medical relation set records a disease-symptom relationship, the disease entity is used as the head entity, the symptom entity is used as the tail entity, and the manifestation is used as the relation item to form a complete triplet record, generating a knowledge triplet dataset.

[0075] Vectorization training is performed on the knowledge triple dataset. A graph embedding algorithm is used to map each entity and relation in the knowledge triple dataset into a continuous low-dimensional vector representation. The graph embedding algorithm initializes the entities and relations in the knowledge triples with vectors, calculates the distance between the head entity vector and the tail entity vector in the relation transformation space, optimizes the relative positions of entities and relations in the vector space, and continuously updates the parameters of entity vectors and relation vectors. Each entity vector is used as a node vector and the relation vector is used as an edge vector. The association structure of nodes and edges is established based on the connection information between entities, and a complete knowledge topology is constructed. Through the structural combination of the connection relationship of entity nodes and the relation edge, a graph structure data containing node attributes, relation type and semantic weight is formed, and a knowledge graph instance is output.

[0076] S3. Extract features from image feature data and text semantic data to obtain image feature vectors, text semantic vectors and graph node vectors, and use contrastive learning methods to map them to a unified semantic space to obtain a fused trimodal feature matrix.

[0077] S3.1. Input the image feature data into the convolutional neural network model, perform multi-layer convolution, pooling and non-linear activation operations on the structured image matrix to generate a primary image feature map, and use fully connected layers and global average pooling operations to generate image feature vectors.

[0078] Specifically, the structured image matrix is input into the convolutional neural network model. The convolutional neural network model receives the pixel intensity distribution information of the structured image matrix at the input layer, and uses multiple convolutional kernels to perform convolution calculations on local regions in the first convolutional layer to capture low-level feature information in the image, such as edge texture and brightness changes. The convolutional layer outputs a feature map, and spatial downsampling is performed through the pooling layer. Max pooling is usually used to reduce the dimension of the feature map and retain the main feature responses. After the first round of convolution and pooling, the output result is input into a higher-level convolutional structure to extract intermediate and high-level semantic features of the image layer by layer, such as shape patterns, texture distribution and anatomical structure boundaries. After each convolution operation, a non-linear activation function, such as the ReLU function, is applied to the convolution result to perform non-linear mapping to obtain the primary image feature map.

[0079] The primary image feature map is input into the fully connected layer, and the feature information of each convolutional channel is linearly combined and compressed. After the output of the fully connected layer, global average pooling is used to average and converge the feature responses of each channel, and the global feature information of different convolutional layers is integrated into a one-dimensional vector form to output the image feature vector.

[0080] It should be noted that the training of the convolutional neural network model uses a standardized image dataset with labels. The standardized image dataset is divided into a training set and a validation set. During training, the training set is input, and the convolutional neural network model performs forward propagation to calculate the prediction results. The difference between the prediction results and the true labels is obtained through a loss function (such as cross-entropy loss). The gradient of the loss function with respect to the parameters of each layer of the convolutional neural network model is calculated using the backpropagation algorithm. The convolutional kernel weights and bias parameters are iteratively updated using an optimization algorithm (such as stochastic gradient descent) to minimize the loss function, so that the convolutional neural network model can accurately extract image features. Training continues until the convolutional neural network model converges on the validation set, resulting in the trained convolutional neural network model.

[0081] S3.2. Input the text semantic data into the medical language understanding model, calculate the inter-word dependency relationship through word embedding, position encoding and self-attention mechanism, generate the text context semantic vector sequence, and perform sentence-level and discourse-level semantic aggregation to obtain the text semantic vector.

[0082] Specifically, the text semantic data is input into the medical language understanding model. The medical language understanding model performs word embedding operation on the text semantic data, mapping each word in the medical record text to a fixed-dimensional word vector representation. Position encoding information is added to the word vector representation, and each word is assigned an order identifier in the sentence, so that the medical language understanding model can distinguish the positional relationship of different words in the sentence structure.

[0083] The dependency relationship between words is calculated through a self-attention mechanism, and the attention weight between each word vector and other word vectors in the sequence is calculated through the self-attention mechanism. The context words most relevant to the current semantics are identified, and a word dependency relationship matrix is generated. The medical language understanding model performs weighted summation of word features based on the attention weights to obtain the semantic representation of each word in the context, and generates a text context semantic vector sequence.

[0084] Based on the text context semantic vector sequence, a weighted average or attention aggregation is performed on the context semantic vectors of all words in the same sentence to form a sentence-level semantic representation. All sentence-level semantic representations are aggregated according to the text semantic order. By global averaging or weighted summation, the information of the paragraph or the whole text is summarized into a unified high-dimensional semantic vector representation, and the text semantic vector is output.

[0085] It should be noted that medical text semantic data is collected, and the text context in the medical text semantic data is used as input to establish the association between medical terms and semantic context. The medical language understanding model adopts a neural network structure based on the self-attention mechanism. During the training phase, semantic understanding and context learning are carried out through semantic mask prediction task and inter-sentence relation prediction task. The parameters are continuously adjusted with minimizing the prediction error as the objective function. During the training iteration process, the weights of the medical language understanding model are optimized through backpropagation, so that the medical language understanding model can accurately identify the semantic dependency relationship between disease name, imaging feature and symptom description, and obtain the trained medical language understanding model.

[0086] The medical language understanding model is based on a deep semantic coding network framework. It achieves semantic representation and contextual understanding of medical text through hierarchical semantic modeling. The model mainly consists of an input layer, a word embedding layer, a positional encoding layer, a self-attention computation layer, a multi-layer semantic coding layer, and an output layer. Features are transferred and semantics are aggregated between these layers via non-linear connections. The input layer receives the medical text sequence after word segmentation and text cleaning, mapping each word or medical term to a corresponding word index vector. The word embedding layer converts the input word index vectors into fixed-dimensional word vector representations, capturing the semantic relationships between medical terms. Similarity; the positional encoding layer encodes the position of each word in the input sequence; the self-attention computation layer calculates the attention weights between words in the sequence to determine the semantic influence of different words in the context, realizing the weighted combination of semantic features to capture semantic associations across sentences or paragraphs; the multi-layer semantic encoding layer uses a multi-head self-attention structure and a feedforward network stacked alternately to perform contextual fusion and hierarchical aggregation on word-level semantic information, generating sentence-level and document-level semantic representations with semantic integrity; the output layer converts the encoded semantic vectors into fixed-dimensional text semantic vectors through linear transformation and normalization.

[0087] It should also be noted that medical natural language processing algorithms automatically identify, extract, and structure medical semantic information from medical texts. By combining technologies such as word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and semantic relation extraction, they achieve semantic parsing and knowledge representation of medical text content such as medical records, examination reports, and image descriptions.

[0088] S3.3. Input the entity nodes in the knowledge graph instance into the knowledge graph embedding model, perform vectorization training on the entities and relations, and generate graph node vectors.

[0089] Specifically, the entity nodes and relation edges in the knowledge graph instance are input into the knowledge graph embedding model, mapping each entity and relation to a low-dimensional dense vector. A negative sampling method is used to generate erroneous triples as negative samples. An optimization algorithm minimizes the difference between the scores of positive and negative sample triples, ensuring that in the vector space, entities with relations are close to each other, while entities without relations are far apart. After training, the knowledge graph embedding model generates a numerical vector representation for each entity node in the knowledge graph instance, i.e., a graph node vector.

[0090] It should be noted that the knowledge graph embedding model is trained using triples in the knowledge graph as the basic training unit. Entities and relations are initialized as vector representations. By constructing positive sample triples and negative sample triples generated by randomly replacing entities or relations, the semantic similarity difference between entities and relations in the triples is calculated. The optimization objective is to minimize the positive sample score and maximize the negative sample score. The distribution of entity vectors and relation vectors is continuously adjusted using a gradient descent-based parameter update method, so that the real triples score higher than the fake triples in the vector space. After multiple rounds of iterative training, the knowledge graph embedding model can learn the potential semantic associations between entities and relations, resulting in the trained knowledge graph embedding model.

[0091] S3.4. Perform fully connected mapping and nonlinear activation processing on the image feature vector, text semantic vector, and map node vector respectively, and embed them into a unified latent semantic space to generate an embedded feature representation set.

[0092] Specifically, fully connected mapping is performed on image feature vectors, text semantic vectors, and map node vectors, respectively. That is, each vector is projected onto a new dimensional space through an independent linear transformation layer. Non-linear activation processing, such as using the ReLU activation function, is applied to the projected results. The three modal vectors after fully connected mapping and non-linear activation processing are embedded into a dimension-unified latent semantic space. In this example, the dimension of the unified semantic space is 256. In this space, vectors from different modalities but with semantic relevance are geometrically close to each other, generating an embedded feature representation set.

[0093] S3.5. Based on the embedded feature representation set, construct positive sample pairs and negative sample pairs according to the correspondence between medical record text and structured medical semantic entity set, calculate the cosine similarity of image feature vector, text semantic vector and graph node vector, and generate cross-modal similarity matrix.

[0094] Specifically, based on the correspondence between the medical record text and the structured medical semantic entity set obtained from the medical record text through named entity recognition and dependency parsing, positive sample pairs and negative sample pairs are constructed. Positive sample pairs consist of different modal vectors describing the same medical fact. For example, a positive sample pair is formed by extracting a text semantic vector pointing to a specific disease from the same medical record and the graph node vector corresponding to that disease. Negative sample pairs are constructed by randomly replacing one of the vectors in the positive sample pairs. For example, a negative sample pair is formed by combining the text semantic vector of a correct medical record with the graph node vector of a randomly selected unrelated disease. The cosine similarity between the image feature vector, text semantic vector, and graph node vector is calculated, that is, the direction cosine value of each pair of different modal vectors in the unified latent semantic space is calculated to generate a cross-modal similarity matrix.

[0095] It should be noted that the expression for calculating the cosine similarity between image feature vectors, text semantic vectors, and map node vectors is as follows:

[0096] ;

[0097] in, Representing vectors sum vector Cosine similarity between them It represents any vector among image feature vectors, text semantic vectors, or graph node vectors. It represents any vector among image feature vector, text semantic vector, and graph node vector. Indicates correspondence The Euclidean norm of a vector Indicates correspondence The Euclidean norm of a vector.

[0098] S3.6. By minimizing the distance between positive samples and maximizing the distance between negative samples, the distribution relationship of the cross-modal similarity matrix in the semantic space is optimized to obtain the trimodal feature parameter set.

[0099] Specifically, based on the matching relationship between image feature vectors, text semantic vectors and graph node vectors in the cross-modal similarity matrix, the corresponding combinations of positive sample pairs and negative sample pairs are determined. Positive sample pairs refer to sample combinations in which image feature vectors, text semantic vectors and graph node vectors have a high degree of semantic correlation, while negative sample pairs refer to sample combinations in which the semantic correlation is low or there is no direct semantic correspondence.

[0100] For each pair of samples, the cosine similarity value between the image feature vector, text semantic vector, and map node vector is calculated. The distribution relationship between positive and negative sample pairs in the semantic space is controlled by optimizing the objective function. That is, during the optimization process, the distance between the feature vectors of positive sample pairs is minimized, so that the image feature vector, text semantic vector, and map node vector have a closer feature aggregation in the semantic space. At the same time, the distance between the feature vectors of negative sample pairs is maximized, so that semantically irrelevant features are more dispersed in the semantic space. During the optimization process, the embedding parameters of the feature vectors are updated by iterative calculation, and the semantic mapping direction of the image feature vector, text semantic vector, and map node vector is continuously adjusted. After multiple rounds of optimization iterations, a three-modal feature parameter set is formed.

[0101] S3.7. Assign adaptive weights to image, text and knowledge features in the trimodal feature parameter set, perform weighted fusion, and arrange them according to the sample sequence to form a high-dimensional matrix to obtain the fused trimodal feature matrix.

[0102] Specifically, based on the feature distribution of image feature vectors, text semantic vectors and map node vectors in the three-modal feature parameter set, the relevance weight of each feature in the semantic space is calculated, and the adaptive weight allocation ratio is determined by the similarity and variance distribution between features.

[0103] The adaptive weights are dynamically adjusted based on the semantic contribution of each feature in the trimodal feature parameter set. The semantic contribution can be estimated by the mean squared error, average similarity, or information entropy of the feature vector. For example, when the feature variance is large, the adaptive weight can be relatively reduced to reduce the fluctuation of the feature on the fusion result. The image feature vector, text semantic vector, and knowledge graph node vector are weighted according to the weight ratio. The feature values of each corresponding dimension are linearly combined by using element-wise weighted summation to generate the fusion feature vector. According to the order of the original sample input sequence, all fusion feature vectors are arranged in order by sample number to construct the fusion trimodal feature matrix. In the fusion trimodal feature matrix, each row or column corresponds to the fusion feature expression of a sample. Each element represents the weighted fusion value of image features, text semantic features, and knowledge graph node features in the semantic space.

[0104] S4. Match the fused trimodal feature matrix with knowledge graph instances to retrieve the most relevant disease nodes, image manifestation nodes, and relationship paths, and output a set of candidate diagnostic paths.

[0105] S4.1. Fuse the feature vectors of various types in the trimodal feature matrix and perform L2 normalization to generate a fused feature vector set; extract the embedding vectors of disease nodes, image manifestation nodes and anatomical site nodes from knowledge graph instances, and establish a high-dimensional index structure for each node vector to generate a node vector index set.

[0106] Specifically, L2 normalization is performed on the image feature vector, text semantic vector, and knowledge graph node feature vector in the fused three-modal feature matrix. The L2 normalization process calculates the Euclidean norm of each feature vector and scales all components in the feature vector according to the Euclidean norm to achieve consistency in numerical scale for different modal features and avoid the impact of differences in dimensions.

[0107] When performing L2 normalization, if the Euclidean norm of the feature vector is zero, a very small positive number can be set as a substitute value to complete the normalization operation. After L2 normalization, a fused feature vector set is obtained, consisting of standardized image feature vectors, standardized text semantic vectors, and standardized knowledge graph node feature vectors. Disease node embedding vectors, image manifestation node embedding vectors, and anatomical site node embedding vectors are extracted from the knowledge graph instances. Each node embedding vector contains structural expression information in the semantic space of the knowledge graph. A high-dimensional index structure is established for each extracted node embedding vector. The high-dimensional index structure is constructed using existing approximate nearest neighbor indexing methods, such as tree-based partitioning or hash mapping-based indexing. By spatially partitioning and indexing the node embedding vectors in the high-dimensional semantic space, the disease node embedding vector index, image manifestation node embedding vector index, and anatomical site node embedding vector index are organized and summarized according to node category and index number to generate a node vector index set.

[0108] S4.2. Calculate the cosine similarity between the fused feature vector and the node vector index set to obtain a candidate node set. Starting from the candidate node set, perform multi-hop queries along the edge relationships of the knowledge graph instances and extract intermediate relationship entities to generate a candidate relationship path set.

[0109] Specifically, the cosine similarity between the fused feature vector and the node vector index set is calculated. That is, for each fused feature vector obtained after L2 normalization of the fused three-modal feature matrix, the cosine similarity between it and the embedding vector of each disease node, imaging manifestation node and anatomical site node in the node vector index set is calculated. The nodes are sorted from high to low according to the cosine similarity value, and the nodes with the highest similarity (e.g., the first 10 nodes are selected in the example) are selected to form a candidate node set.

[0110] Using each node in the candidate node set as the starting point for the query, multi-hop queries are performed along the edge relationships in the knowledge graph instance. For example, starting from a certain image representation node, the directly connected disease nodes are traversed (one-hop query), and from the disease node, the connected symptom nodes or anatomical site nodes are traversed (two-hop query). During the multi-hop query process, all intermediate relation entities passed through the path are extracted, including the relation edges between intermediate nodes and connecting nodes, to generate a candidate relation path set.

[0111] S4.3. Based on the semantic similarity, relation weight, and path length between nodes in the candidate relation path set, calculate the comprehensive relevance score of each path to obtain a weighted candidate path list, and extract the structural information of disease nodes, imaging manifestation nodes, and relation chains to obtain a candidate diagnostic path set.

[0112] Specifically, based on the candidate relation path set, for each path in the candidate relation path set, the semantic similarity between adjacent nodes in the path is calculated. The similarity is based on the vector representation of the nodes in the unified semantic space. The cosine value of the angle between the node vectors is calculated by the cosine similarity method. The relation weight of each relation edge in the path is obtained. The relation weight reflects the strength of the connection in the medical relation. For example, the relation weight of "diagnosed as" is higher than the relation weight of "may cause". The length of the path is recorded, that is, the number of edges traversed by the path from the starting point to the ending point.

[0113] Based on semantic similarity, relation weights, and path length, a comprehensive relevance score for each path is calculated through weighted summation. Semantic similarity reflects the degree of semantic matching between disease nodes and image manifestation nodes, relation weights reflect the importance of relation edges in the path, and path length describes the semantic distance between disease nodes and image nodes. The path length is calculated in reciprocal form to penalize long paths. Each weight parameter can be automatically learned by minimizing the path relevance error during the training phase. For example, the sample values are semantic similarity weight 0.5, relation weight 0.3, and path length weight 0.2. The comprehensive relevance scores of all paths are sorted in descending order to generate a weighted candidate path list. The structural information of each path is extracted from the weighted candidate path list, including the disease nodes, image manifestation nodes, and relation chains (i.e., relation edge sequences) that appear in the path, connecting the disease nodes and image manifestation nodes, to generate a candidate diagnostic path set.

[0114] S5. Perform knowledge reasoning on the candidate diagnostic path set, calculate the confidence of disease nodes based on weighted calculations, and generate a candidate disease list and reasoning chain.

[0115] S5.1. Extract the attribute information of disease, image and symptom nodes within the candidate diagnostic path set, generate the path feature description set, and pass and attenuate it through the attention propagation mechanism to obtain the weighted node feature set.

[0116] Specifically, each path in the candidate diagnostic path set is traversed, and the attribute information of disease nodes, imaging manifestation nodes, and symptom nodes contained within each path is extracted. The attribute information of disease nodes includes disease name, disease category, and semantic identifier of the disease in the knowledge graph instance; the attribute information of imaging manifestation nodes includes image manifestation name, image feature description, and information on the corresponding anatomical location of the image; the attribute information of symptom nodes includes symptom name, symptom manifestation features, and the association weight between the symptom and the disease; the attribute information is integrated according to node category and path order to form a structured path feature description set.

[0117] Based on the connection relationships between nodes in the path, an attention propagation mechanism is applied to transmit and attenuate information about path features. By calculating the importance of the association between different nodes, attention weights are assigned to the information propagation between nodes. During the propagation process, nodes closer to the starting point of the path have relatively higher weights, while the weights of nodes farther away gradually decrease, thereby achieving the orderly diffusion and importance stratification of semantic information in the path. The attention weights are determined by the semantic similarity and connection strength between nodes, and the node features are updated in a weighted manner according to the weight coefficients after each propagation. The update results of the node features in all paths are summarized to generate a weighted node feature set.

[0118] S5.2. Input the weighted node feature set into the graph neural network model, calculate the semantic correlation between disease nodes and adjacent nodes through adjacency propagation and aggregation operations, generate a disease node representation vector set, perform logical constraint verification, filter out semantic conflict paths and adjust abnormal edge weights to form a disease node set.

[0119] Specifically, the weighted node feature set is input into the graph neural network model. The graph neural network model takes the weighted node feature set as input and the node connection relationship in the knowledge graph instance as the adjacency structure. Through multi-layer propagation and aggregation operations, feature information is transmitted in the graph structure. Based on the adjacency relationship between nodes, feature information is collected from the neighboring nodes of each node, and the features of neighboring nodes are weighted and aggregated to generate an intermediate representation containing neighborhood semantic features.

[0120] In the aggregation phase, the graph neural network model fuses the features of the current node with the features of its neighbors to reflect the semantic dependence between disease nodes, image manifestation nodes, and symptom nodes. After multiple layers of adjacency propagation and aggregation, the feature representation results of each disease node are obtained, forming a disease node representation vector set.

[0121] Logical constraint verification is performed on the disease node representation vector set. By detecting the logical consistency of node relationships in the knowledge graph instance, semantic conflict paths are filtered out. Semantic conflict paths refer to paths in which the node relationships have directional errors, semantic reversals, or repeated loops. During the verification process, the logical constraint verification compares the relationship types, connection directions, and semantic consistency between nodes. After anomalies are found, conflict paths are removed or adjusted, and nodes and connection relationships with reasonable semantic structure and high relevance are retained to form a disease node set.

[0122] It should be noted that, in training the graph neural network model, the initial feature vector of each node in the graph is used as input. The connection relationship between nodes is determined by the adjacency matrix. In each propagation stage, feature information from neighboring nodes is aggregated and nonlinearly transformed and weighted with the current node's features to update the features, enabling the node features to incorporate the semantic information of the local neighborhood. After multiple propagation layers, the range of node information reception is gradually expanded by stacking graph convolutional layers or attention layers, so that the node representation can reflect the global structural features. During the training process, the goal is to perform node classification, relationship prediction, or graph-level classification tasks. A loss function is constructed to minimize the error between the prediction result and the true label, and the backpropagation algorithm is used to update the network parameters. After iterative training, the trained graph neural network model is obtained.

[0123] S5.3. Based on the disease node set, calculate the comprehensive confidence score and sort the comprehensive confidence scores from high to low to generate a candidate disease list; extract relevant image manifestation nodes, symptom nodes and relationship information from the candidate disease list, construct a disease-image-symptom-location inference chain, and output the inference chain.

[0124] Specifically, based on the semantic relevance features, node attribute features, and edge weight information between each disease node in the disease node set, the comprehensive confidence of each disease node is calculated. The calculation process of the comprehensive confidence is to weight and summarize the semantic similarity and relationship weight between the disease node and adjacent image manifestation nodes, symptom nodes, and anatomical site nodes. After the comprehensive confidence of all disease nodes is calculated, the disease nodes are sorted from high to low confidence to generate a candidate disease list.

[0125] Based on the semantic relationships of each disease node in the candidate disease list, image manifestation nodes, symptom nodes, and anatomical site nodes directly connected to the disease node are extracted. At the same time, the connection relationship information between the disease node and the image manifestation node, symptom node, and anatomical site node is extracted. The disease node, image manifestation node, symptom node, and anatomical site node are integrated in the order of relational connection to construct a disease-image-symptom-site reasoning chain. The reasoning chain takes the disease node as the core and displays the correspondence between disease-related image features and symptoms through the semantic relationship paths between nodes. The disease-image-symptom-site reasoning chain is output to assist subsequent diagnostic reasoning and result interpretation.

[0126] S6. Assist in diagnosis by using a list of candidate diseases and a reasoning chain to generate structured diagnostic results.

[0127] S6.1. Analyze the candidate disease list and inference chain, extract the confidence level, image performance features and symptom association information of disease nodes, and match them with image feature data to generate a disease confidence information set.

[0128] Specifically, the candidate disease list is analyzed item by item, extracting the confidence value, semantic attribute information, and unique identifier of each disease node in the knowledge graph instance. The disease-image-symptom-site reasoning chain is structured and analyzed. Based on the connection order of the nodes in the reasoning chain, the image manifestation nodes and symptom nodes directly associated with the disease nodes are extracted. The image manifestation node information includes image feature description, lesion location description, and image type information. The symptom node information includes symptom name, symptom manifestation characteristics, and semantic association strength with the disease node. The confidence information of the disease node is integrated with the corresponding image manifestation characteristics and symptom association information to form a set of disease semantic association features.

[0129] The disease semantic association feature set is matched with the image feature data. The correspondence is determined by comparing the image performance features with the image shape features, texture statistics and spatial location information contained in the image feature data. The lesion area description in the image performance features is used as the key matching basis, and the similarity is judged by combining the lesion boundary coordinates and spatial proportion information in the image feature data. After all the matching is completed, the disease node confidence, corresponding image features and related symptom information contained in the matching results are integrated to generate a disease confidence information set.

[0130] S6.2. Based on the disease confidence information set, the disease name, confidence level, key imaging evidence and semantic interpretation content are structured and visualized, and structured diagnostic results are output.

[0131] Specifically, based on the disease node attributes in the disease confidence information set, the disease name, disease confidence level, key imaging evidence, and semantic interpretation content are classified and organized. Each disease node in the disease confidence information set is used as a basic unit, and the disease name and confidence level value are extracted accordingly. At the same time, the key imaging evidence information associated with the disease node is extracted, including the lesion area features in the image feature data, the image manifestation description, and the corresponding spatial location information. In the disease semantic interpretation part, the semantic relationship between the disease node, symptom node, and image manifestation node is combined to extract the logical association description content from the disease-image-symptom-site reasoning chain, and present the semantic logic of the disease occurrence and the source of the diagnostic evidence in a textual form.

[0132] The compiled disease names, confidence levels, key imaging evidence, and semantic interpretations are structured and organized. A unified hierarchical display format is established, placing the disease name in the primary structure, with disease confidence and key imaging evidence as secondary information, and semantic interpretations and symptom descriptions categorized as supplementary information. The structured content is then input into the visualization process, where the disease diagnosis results are presented using a combination of tables, text, and graphics. Disease confidence can be visually expressed through numerical bars or color gradients, and imaging evidence can be displayed through associated image annotations or feature description text. The output structured diagnostic results include multiple layers of information, such as disease name, disease confidence, key imaging evidence, and semantic interpretations, forming a diagnostic output that can be used to assist in decision-making.

[0133] This embodiment also provides a computer device applicable to the image diagnosis auxiliary decision-making method based on knowledge graphs, including: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to implement the image diagnosis auxiliary decision-making method based on knowledge graphs as proposed in the above embodiment.

[0134] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.

[0135] This embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the knowledge graph-based image diagnostic auxiliary decision-making method as proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0136] In summary, this invention achieves deep alignment and collaborative representation of multi-source medical data by mapping the features of images, text, and knowledge to a unified semantic space through contrastive learning and adaptive fusion. This not only improves the coverage and accuracy of assisted diagnosis in complex cases but also enhances the practicality of clinical decision support and doctors' trust.

[0137] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A knowledge graph-based image diagnostic decision support method, characterized in that: include, Image data and medical record text data are collected and preprocessed to obtain image feature data and text semantic data; Entity recognition and semantic relation extraction are performed on the cleaned medical record text dataset and medical knowledge resources to establish a structured medical knowledge association dataset and generate knowledge graph instances. Feature extraction is performed on image feature data and text semantic data to obtain image feature vectors, text semantic vectors and map node vectors, and then the contrastive learning method is used to map them to a unified semantic space to obtain a fused three-modal feature matrix. The three-modal feature matrix is fused and matched with knowledge graph instances to retrieve the most relevant disease nodes, image manifestation nodes and relationship paths, and output a set of candidate diagnostic paths. Knowledge reasoning is performed on the candidate diagnostic path set, and the confidence of disease nodes is obtained by weighted calculation to generate a candidate disease list and reasoning chain; The system uses a list of candidate diseases and a reasoning chain to assist in diagnosis and generate structured diagnostic results.

2. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 1, characterized in that: The specific steps for obtaining image feature data and text semantic data are as follows. The raw image data files are obtained from the medical image acquisition equipment, decoded in DICOM format, and normalized in size according to the spatial proportion information of the lesion area to generate a standardized image dataset. Based on the standardized image dataset, Gaussian filtering was used to remove image noise, and contrast stretching was performed on the image edge regions to obtain an enhanced image dataset. The enhanced image dataset was then subjected to preliminary region segmentation, and lesion boundary coordinates, shape features, and texture statistics were extracted to generate a structured image matrix. Patient medical record text data is extracted from electronic medical records, and sentence segmentation, word segmentation and stop word removal are performed to form a cleaned medical record text dataset. Then, medical natural language processing algorithms are used to perform named entity recognition and dependency parsing to generate a structured medical semantic entity set. The structured image matrix is input into the feature coding network to extract image depth features and generate image feature data; The structured medical semantic entity set is used to generate textual semantic data through vectorized encoding.

3. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 2, characterized in that: The specific steps for generating the knowledge graph instance are as follows: Entity recognition is performed on the cleaned medical record text dataset and medical knowledge resources to extract medical entities related to diseases, imaging signs and anatomical locations, and they are classified according to semantic categories to generate a structured set of medical entities. The key attributes of each entity are extracted from the structured medical entity set and semantically normalized to generate a standardized entity set. Dependency parsing and semantic role labeling are performed on the standardized entity set to identify the semantic relationships between entities and generate a medical relationship set. The structured medical entity set, standardized entity set, and medical relation set are structured in the form of triples to generate a knowledge triple dataset. A graph embedding algorithm is used to vectorize the knowledge triple dataset and to construct node and edge structures based on entity connection relationships to generate knowledge graph instances.

4. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 3, characterized in that: The specific steps for obtaining the image feature vector, text semantic vector, and graph node vector are as follows. Image feature data is input into a convolutional neural network model, and multi-layer convolution, pooling and non-linear activation operations are performed on the structured image matrix to generate a primary image feature map. Then, a fully connected layer and global average pooling operation are used to generate an image feature vector. Text semantic data is input into a medical language understanding model. Word dependencies are calculated through word embedding, positional encoding, and self-attention mechanisms to generate a sequence of text context semantic vectors. Sentence-level and discourse-level semantic aggregation is then performed to obtain text semantic vectors. Entity nodes from knowledge graph instances are input into the knowledge graph embedding model, and entities and relations are trained to generate graph node vectors.

5. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 4, characterized in that: The specific steps for obtaining the fused three-modal feature matrix are as follows: Fully connected mapping and nonlinear activation are performed on image feature vectors, text semantic vectors, and map node vectors respectively, and then embedded into a unified latent semantic space to generate an embedded feature representation set. Based on the embedded feature representation set, and according to the correspondence between medical record text and structured medical semantic entity set, positive sample pairs and negative sample pairs are constructed, and the cosine similarity of image feature vector, text semantic vector and graph node vector is calculated to generate a cross-modal similarity matrix. By minimizing the distance between positive samples and maximizing the distance between negative samples, the distribution relationship of the cross-modal similarity matrix in the semantic space is optimized, and a trimodal feature parameter set is obtained. Adaptive weights are assigned to image, text, and knowledge features in the trimodal feature parameter set, weighted fusion is performed, and the samples are arranged in sequence to form a high-dimensional matrix, resulting in the fused trimodal feature matrix.

6. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 5, characterized in that: The specific steps for generating the set of output candidate diagnostic paths are as follows. L2 normalization is performed on the feature vectors of various types in the fused three-modal feature matrix to generate a fused feature vector set; embedding vectors of disease nodes, image manifestation nodes and anatomical site nodes are extracted from knowledge graph instances, and a high-dimensional index structure is established for each node vector to generate a node vector index set. Calculate the cosine similarity between the fused feature vector and the node vector index set to obtain a candidate node set. Starting from the candidate node set, perform multi-hop queries along the edge relationships of knowledge graph instances and extract intermediate relationship entities to generate a candidate relationship path set. Based on the semantic similarity, relation weight, and path length among nodes in the candidate relation path set, a comprehensive relevance score is calculated for each path to obtain a weighted candidate path list. Structural information of disease nodes, imaging manifestation nodes, and relation chains is extracted to obtain a candidate diagnostic path set.

7. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 1, characterized in that: The specific steps for generating the candidate disease list and reasoning chain are as follows: The attribute information of disease, image and symptom nodes within the candidate diagnostic path set is extracted to generate a path feature description set, which is then transmitted and attenuated through the attention propagation mechanism to obtain a weighted node feature set. The weighted node feature set is input into the graph neural network model. Through adjacency propagation and aggregation operations, the semantic correlation between disease nodes and adjacent nodes is calculated, a disease node representation vector set is generated, and logical constraint verification is performed to filter out semantic conflict paths and adjust abnormal edge weights to form a disease node set. Based on the disease node set, the comprehensive confidence score is calculated and sorted from high to low to generate a candidate disease list. Relevant image manifestation nodes, symptom nodes and relationship information are extracted from the candidate disease list to construct a disease-image-symptom-location inference chain and output the inference chain.

8. The image diagnostic auxiliary decision-making method based on knowledge graph as described in claim 7, characterized in that: The specific steps for generating structured diagnostic results are as follows: The candidate disease list and reasoning chain are analyzed to extract the confidence level, image performance features and symptom association information of disease nodes, and matched with image feature data to generate a disease confidence information set. Based on the disease confidence information set, the disease name, confidence level, key imaging evidence, and semantic interpretation are structured, organized, and visualized to output structured diagnostic results.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: When the processor executes the computer program, it implements the steps of the knowledge graph-based image diagnosis auxiliary decision-making method according to any one of claims 1 to 8.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that: When the computer program is executed by the processor, it implements the steps of the knowledge graph-based image diagnosis auxiliary decision-making method according to any one of claims 1 to 8.