Radiology report generation method fusing clinical semantic modulation and hyperbolic prototype classification

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By integrating clinical semantic modulation and hyperbolic prototype classification, this method addresses the issues of independence between visual features and clinical knowledge and imbalance between disease categories in existing technologies. It enables the generation of high-quality radiology reports, improves the detection capability of rare diseases, and enhances the clinical relevance of reports.

CN121964039BActive Publication Date: 2026-06-26EAST CHINA JIAOTONG UNIVERSITY

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: EAST CHINA JIAOTONG UNIVERSITY
Filing Date: 2026-04-01
Publication Date: 2026-06-26

Application Information

Patent Timeline

01 Apr 2026

Application

26 Jun 2026

Publication

CN121964039B

IPC: G16H15/00; G06V10/44; G06V10/82; G06F18/213; G06F18/2415; G06V10/764; G06N3/045; G06N3/0464; G06N3/0442; G06N3/0499; G06F18/27

AI Tagging

Technology Topics

Radiology reportRadiology studies

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing automated radiology report generation methods fail to effectively utilize clinical context knowledge, resulting in visual features being independent of prior clinical knowledge and difficulty in handling disease category imbalances, thus affecting diagnostic efficacy and report quality.

Method used

A method integrating clinical semantic modulation and hyperbolic prototype classification is adopted. Features are extracted through visual encoder and medical language encoder. The dual-pathway architecture and hyperbolic prototype classification module are used to enhance the diagnostic relevance of visual features and the detection capability of rare diseases, generating radiological reports rich in clinical information.

Benefits of technology

It significantly improved the quality and clinical efficacy of radiology reports, especially enhancing the detection capabilities for rare diseases by 6.2%, while maintaining competitive performance for common diseases and alleviating the problem of class imbalance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121964039B_ABST

Patent Text Reader

Abstract

The present application provides a radiology report generation method fusing clinical semantic modulation and hyperbolic prototype classification, which comprises: acquiring chest X-ray images and their associated clinical context knowledge, extracting visual features of the images using a visual encoder, extracting clinical semantic embedding using a medical language encoder, and retrieving relevant report features from a reference report database; constructing and utilizing a clinical semantic modulation module to generate modulated visual representation; constructing and utilizing a hyperbolic prototype classification module to generate diagnosis awareness prompts; inputting the modulated visual features and diagnosis awareness prompts into a decoder to complete the autoregressive generation of the radiology report. The present application can fully utilize the cross-modal fusion of clinical knowledge and visual features, and utilize the exponential expansion embedding capacity of hyperbolic space to improve the disease detection ability under the condition of class imbalance, and improve the clinical accuracy of radiology report generation.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of medical image processing and natural language generation, and in particular to a method for generating radiology reports that integrates clinical semantic modulation and hyperbolic prototype classification. Background Technology

[0002] Radiology reports are crucial for clinical diagnosis and treatment planning; however, manually drafting these reports is time-consuming and labor-intensive, heavily reliant on the radiologist's professional experience, and can easily lead to diagnostic delays and inconsistencies in medical decisions. Automated Radiology Report Generation (RGG) technology has emerged as an effective solution to reduce the workload of radiologists and improve diagnostic efficiency.

[0003] Existing RRG methods typically employ an encoder-decoder architecture to extract visual features from medical images and generate text reports using language models. Recent research has further incorporated techniques such as retrieval enhancement, knowledge graphs, and large language models to improve generation quality. However, existing work suffers from the following drawbacks:

[0004] On the one hand, there are fundamental flaws in how clinical contextual knowledge (such as examination indications and patient history) is utilized. Current mainstream paradigms typically treat clinical contextual knowledge merely as input-level textual prompts or simply pieced-together contextual information, over-relying on the implicit alignment capabilities of large language model decoders. This not only incurs significant computational overhead, but more importantly, it fails to fundamentally change the behavior of the visual encoder, resulting in visual feature extraction being independent of prior clinical knowledge. The resulting visual representations lack diagnostic relevance to specific cases and struggle to capture fine-grained visual evidence highly relevant to current clinical indications. Another type of approach completely ignores this crucial prior knowledge.

[0005] On the other hand, the inherent characteristics of disease distribution present significant challenges. Unlike general image description tasks that treat all visual concepts equally, radiological report generation must prioritize anomalies, which are often subtle and exhibit severe class imbalances compared to normal observations. Medical datasets exhibit a severe long-tail distribution, with a few common diseases dominating while a large number of rare diseases are severely underrepresented. In RRGs, disease classifiers are often used to guide report generation, but traditional classifiers operating in Euclidean space tend to favor high-frequency classes, and their limited embedding capacity makes it difficult to provide sufficient separation for rare classes, resulting in predictions that overlook subtle rare lesions. Existing methods typically address this imbalance through loss reweighting or retrieval augmentation, but these operate at the output or decision level and fail to fundamentally improve the discriminability of rare disease representations. Summary of the Invention

[0006] In view of the above, the main objective of this invention is to propose a radiology report generation method that integrates clinical semantic modulation and hyperbolic prototype classification to solve the aforementioned technical problems.

[0007] This invention proposes a method for generating radiology reports that integrates clinical semantic modulation and hyperbolic prototype classification. The method includes the following steps:

[0008] Step 1: Obtain chest X-ray images and clinical context knowledge, preprocess the chest X-ray images to obtain preprocessed images;

[0009] Step 2: Use a visual encoder to extract features from the preprocessed image to obtain block-level visual features and global image representation; use a medical language encoder to extract features from clinical context knowledge to obtain clinical semantic embedding; retrieve reference report features related to the global image representation from a pre-set reference report database to obtain a reference report feature set.

[0010] Step 3: Construct a clinical semantic modulation module based on a dual-pathway architecture and gating fusion mechanism using spatial semantic injection pathway and channel feature recalibration pathway; input block-level visual features and clinical semantic embeddings into the clinical semantic modulation module for processing to obtain modulated visual representations;

[0011] Step 4: Construct a hyperbolic prototype classification module based on the Poincaré sphere and the Softmax function. Input the global image representation and reference report feature set into the hyperbolic prototype classification module for processing to generate diagnostic perception prompts.

[0012] Step 5: Input the modulated visual representation and diagnostic perception cues into the BERT-based decoder for processing to achieve autoregressive generation of radiology reports.

[0013] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0014] 1. This invention designs a Clinical Semantic Modulation (CSM) module, which employs a dual-pathway architecture of spatial semantic injection and channel feature recalibration to explicitly modulate visual features at the feature level using clinical knowledge, rather than relying solely on implicit alignment by the decoder. The spatial pathway injects clinical semantics into each visual block, achieving fine-grained alignment between clinical intent and visual patterns; the channel pathway recalibrates feature channels by learning to amplify or suppress specific activation patterns. A learnable gating mechanism adaptively fuses the two pathways to generate visual representations rich in clinical information, enhancing the case-specific diagnostic relevance of visual features.

[0015] 2. This invention designs a hyperbolic prototype classification module (HPC) to transfer disease classification from Euclidean space to hyperbolic space (Poincaré sphere model). Utilizing the exponentially growing volume characteristic of hyperbolic space, it provides greater separation capacity for rare disease categories. A prototype-based classification strategy replaces the traditional linear classifier. Through metric learning, it compares the distance between the input embedding and the prototype embedding, exhibiting stronger robustness to class imbalance and effectively improving the detection capability for underrepresented rare diseases.

[0016] 3. This invention comprehensively utilizes a visual encoder, a medical language encoder, and a report retrieval mechanism to extract visual features of images, semantic embedding of clinical context knowledge, and textual features of relevant historical reports, thereby fully exploring the complementary semantic information between multimodal information and providing rich and comprehensive feature representations for subsequent clinical semantic modulation and disease classification.

[0017] 4. This invention employs a diagnostic perception prompting mechanism to discretize the disease prediction results of the hyperbolic prototype classification module into classification status labels, constructing an ordered diagnostic perception prompting sequence as an explicit input guide for the report decoder. This enables the report generation process to perceive the diagnostic status of each disease, thereby generating radiological reports that are consistent with the diagnostic results and have stronger clinical relevance.

[0018] 5. This invention, through the collaborative work of a clinical semantic modulation module and a hyperbolic prototype classification module, significantly enhances clinical efficacy indicators while improving language generation quality, achieving state-of-the-art performance on both the MIMIC-CXR and IU X-Ray datasets. Long-tail analysis shows that this invention achieves an average F1 improvement of 6.2% for rare diseases, while maintaining competitive performance for common categories, effectively alleviating the class imbalance problem.

[0019] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by means of embodiments of the invention. Attached Figure Description

[0020] Figure 1 This is a flowchart illustrating the steps of a radiology report generation method that integrates clinical semantic modulation and hyperbolic prototype classification proposed in this invention.

[0021] Figure 2 This is a diagram illustrating the overall structure of a radiology report generation method that integrates clinical semantic modulation and hyperbolic prototype classification, as proposed in this invention.

[0022] Figure 3 This is a schematic diagram of the clinical semantic modulation module structure in this invention;

[0023] Figure 4 This is a schematic diagram of the hyperbolic prototype classification module structure in this invention. Detailed Implementation

[0024] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.

[0025] These and other aspects of the embodiments of the present invention will become clear from the following description and accompanying drawings. In these descriptions and drawings, some specific embodiments of the present invention are specifically disclosed to illustrate some ways of implementing the principles of the embodiments of the present invention; however, it should be understood that the scope of the embodiments of the present invention is not limited thereto.

[0026] Please see Figure 1 This embodiment provides a method for generating radiology reports that integrates clinical semantic modulation and hyperbolic prototype classification. The method includes the following steps:

[0027] Step 1: Obtain chest X-ray images and clinical context knowledge, preprocess the chest X-ray images to obtain preprocessed images.

[0028] Please see Figure 2 In step 1, a chest X-ray image and clinical context knowledge are acquired, and the chest X-ray image is preprocessed to obtain a preprocessed image. The chest X-ray image is a two-dimensional chest X-ray image. The clinical context knowledge includes examination indications, patient history, comparative examination information, and examination technology information. The preprocessing includes adjusting the chest X-ray image to a fixed resolution of 224×224.

[0029] Step 2: Use a visual encoder to extract features from the preprocessed image to obtain block-level visual features and global image representation; use a medical language encoder to extract features from clinical context knowledge to obtain clinical semantic embedding; retrieve reference report features related to the global image representation from a pre-set reference report database to obtain a reference report feature set.

[0030] In step 2, a visual encoder is used to extract features from the preprocessed image to obtain block-level visual features and a global image representation; a medical language encoder is used to extract features from clinical context knowledge to obtain clinical semantic embedding; and reference report features related to the global image representation are retrieved from a pre-set reference report database to obtain a reference report feature set. This process includes the following sub-steps:

[0031] Feature extraction is performed on the preprocessed image using a visual encoder (using ResNet-101) to obtain block-level visual features. The following relationship exists in the corresponding process:

[0032] ;

[0033] in, Represents block-level visual features. Represents a visual encoder. This represents the preprocessed image;

[0034] The block-level visual features are subjected to global average pooling along the spatial dimension to obtain a global image representation. The following relationship exists in the corresponding process:

[0035] ;

[0036] in, Represents the global image representation. This indicates that global average pooling has been applied.

[0037] A medical language encoder (using CXR-BERT) is used to extract features from clinical context knowledge to obtain clinical semantic embeddings. The following relationship exists in the correspondence process:

[0038] ;

[0039] in, This indicates clinical semantic embedding. This represents a medical language encoder. Indicates clinical context knowledge;

[0040] Based on a pre-defined reference report database, using global image representation as the query vector, the cosine similarity between the global image representation and the visual features (usually global image representations stored during training) of all reports in the database is calculated, and the reports with the highest similarity are selected. This report;

[0041] Using the MedKLIP encoder to filter the top The report is encoded into text features to obtain the reference report feature set. ;

[0042] in, Indicates the first Features of each reference report An index representing the characteristics of the reference report. This indicates the total number of features in the reference report.

[0043] It should be noted that, to ensure compatibility with subsequent modules, clinical semantics are typically embedded. Projected onto a 512-dimensional feature space.

[0044] Step 3: Construct a clinical semantic modulation module based on a dual-pathway architecture and gating fusion mechanism of spatial semantic injection pathway and channel feature recalibration pathway; input block-level visual features and clinical semantic embeddings into the clinical semantic modulation module for processing to obtain modulated visual representations.

[0045] Please see Figure 3 In step 3, a clinical semantic modulation module is constructed based on a dual-pathway architecture and a gated fusion mechanism, utilizing the spatial semantic injection pathway and the channel feature recalibration pathway. Block-level visual features and clinical semantic embeddings are input into the clinical semantic modulation module for processing to obtain modulated visual representations. This process includes the following sub-steps:

[0046] In the spatial semantic injection pathway, clinical semantics are embedded into block-level visual features through an attention mechanism to obtain the features output by the attention mechanism. The following relationship exists in the corresponding process:

[0047] ;

[0048] in, Represents the query matrix. Represents the key matrix. Represents a value matrix, , and Both represent learnable projection matrices. This indicates that the input features are expanded to match the dimensions of the block-level visual features. Features representing the output of the attention mechanism This indicates that the data has been processed using the Softmax function. Indicates matrix transpose. Indicates the dimension of the key vector;

[0049] The features output by the attention mechanism are sequentially processed through regularization, residual connections, and layer normalization to obtain the output of the spatial semantic injection path. The following relationship exists in the corresponding process:

[0050] ;

[0051] in, This represents the output of the spatial semantic injection pathway. This indicates that the process has undergone layer normalization. This indicates that regularization has been applied to prevent overfitting.

[0052] In the channel feature recalibration pathway, a multilayer perceptron is used to process the clinical semantic embedding to generate scaling and offset parameters. The following relationship exists in the corresponding process:

[0053] ;

[0054] in, This represents the scaling parameter. Indicates the offset parameter. This indicates that the data has been processed by a multilayer perceptron.

[0055] It should be noted that, in Figure 3 In the middle, multilayer perceptron As a calibration generator.

[0056] Layer normalization is performed on the block-level visual features to obtain normalized block-level visual features. Scaling and offset parameters are used as affine transformation coefficients to perform an affine transformation on the normalized block-level visual features to obtain the output of the channel feature recalibration path. The following relationship exists in this process:

[0057] ;

[0058] in, This indicates the output of the channel characteristic recalibration path. This represents element-wise multiplication;

[0059] The output of the spatial semantic injection pathway is concatenated with the output of the channel feature recalibration pathway to obtain the concatenated features. The following relationship exists in the corresponding process:

[0060] ;

[0061] in, Indicates the features after splicing. This indicates that the data has been spliced using features.

[0062] The concatenated features are sequentially processed by a learnable projection matrix and a sigmoid activation function to obtain a gated tensor. The following relationship exists in the corresponding process:

[0063] ;

[0064] in, Represents the gate tensor. This indicates that the process has been performed using the Sigmoid activation function. Represents the learnable projection matrix;

[0065] After mapping the stitched features using a learnable projection matrix, they are multiplied element-wise with a gate tensor. Finally, residual connections and layer normalization are performed sequentially to obtain the modulated visual representation. The following relationship exists in the corresponding process:

[0066] ;

[0067] in, This represents the modulation of visual representation. This represents the learnable projection matrix.

[0068] Step 4: Construct a hyperbolic prototype classification module based on the Poincaré sphere and the Softmax function. Input the global image representation and reference report feature set into the hyperbolic prototype classification module for processing to generate diagnostic perception prompts.

[0069] Please see Figure 4 In step 4, a hyperbolic prototype classification module is constructed based on the Poincaré sphere and the Softmax function. The global image representation and reference report feature set are input into the hyperbolic prototype classification module for processing to generate diagnostic perception prompts. Specifically, this includes the following sub-steps:

[0070] The reference report feature set is subjected to mean pooling to obtain the aggregated report features. The following relationship exists in the corresponding process:

[0071] ;

[0072] in, This indicates the aggregated report characteristics;

[0073] The aggregated report features are concatenated with the global image representation to obtain the initial fused features. The following relationship exists in the corresponding process:

[0074] ;

[0075] in, Indicates the initial fusion features;

[0076] It should be noted that, in Figure 4 In this context, C represents feature splicing.

[0077] The initial fused features are processed sequentially through a first-layer MLP, a GELU activation function, and a second-layer MLP to obtain case embedding features in Euclidean space. The following relationship exists in the corresponding process:

[0078] ;

[0079] in, This represents the case embedding features in Euclidean space. and Both represent weight matrices. and Both represent bias vectors. This indicates that the process has been performed using the GELU activation function;

[0080] After linearly projecting the case embedding features in Euclidean space onto the target hyperbolic cone, they are then subjected to L2 normalization and scaling to obtain the normalized case embedding vector in Euclidean space. The following relationship exists in the corresponding process:

[0081] ;

[0082] in, Let represent the normalized case embedding vector in Euclidean space, and ; Indicates the target hyperbolic taper. Indicates a fixed scaling factor. Represents the projection matrix. This indicates taking the L2 norm;

[0083] The normalized case embedding vector in Euclidean space is mapped to a curve with exponential mapping at the origin. Poincaré The hyperbolic embedding of the current case is obtained above, and the following relationship exists in the corresponding process:

[0084] ;

[0085] in, Represents the hyperbolic embedding of the current case. Represents the hyperbolic tangent function. Represents the curvature parameter. Indicates the Poincaré ball;

[0086] Building a learnable hyperbolic prototype:

[0087] Initialize a set of learnable prototype embeddings on each disease-state pair on the Poincaré sphere; [denotes...] For the number of diseases, The number of states includes four types: blank, positive, negative, and uncertain.

[0088] Maintaining the prototype tensor ;

[0089] in, Indicates disease In state Prototype embedding;

[0090] The prototypes are implemented as manifold parameters and optimized using Riemann gradient descent, which converts Euclidean gradients into Riemann gradients and projects the updated embeddings back onto the manifold, ensuring that all prototypes remain within the effective region of the Poincaré sphere throughout the training process.

[0091] Given case embeddings and prototype embedding Using the Poincaré distance Classify; embedded in the Poincaré sphere With Embedded The distance between them is defined as:

[0092] ;

[0093] in, Indicates the embedded in the Poincaré ball With Embedded The distance between them This represents the Möbius strip. Represents the inverse hyperbolic tangent function;

[0094] The Möbius method is defined as follows:

[0095] ;

[0096] in, Represents the standard Euclidean inner product. Represents the standard Euclidean norm;

[0097] For each disease and each state Calculate case embeddings and prototype embedding The square of the hyperbolic distance between them is transformed by scaling the negative distance with temperature parameters to obtain the disease. In state The classification scores correspond to the following relationship:

[0098] ;

[0099] in, Indicates disease In state Classification score, This represents the temperature parameter, and ; Indicates case embedding and prototype embedding The hyperbolic distance between them;

[0100] For disease In state The probability is calculated using the Softmax function on the classification scores of all states, and the following relationship exists in the corresponding process:

[0101] ;

[0102] in, Indicates disease In state The probability, Represents an exponential function. Indicates disease In state Classification score;

[0103] In reasoning, for each disease The state with the highest probability is selected as the prediction result, and the following relationship exists in the corresponding process:

[0104] ;

[0105] in, Indicates the prediction result. This indicates taking the maximum value;

[0106] Prediction results Mapped to specific disease markers The mapping rule is:

[0107] ;

[0108] All disease markers are assembled in a fixed order to obtain diagnostic perception cues. The following relationship exists in the correspondence process:

[0109] ;

[0110] in, This indicates a diagnostic perception prompt. , and All indicate specific disease markers

[0111] It should be noted that, in Figure 4 In China, disease markers , , , Marking specific diseases The result.

[0112] Step 5: Input the modulated visual representation and diagnostic perception cues into the BERT-based decoder for processing to achieve autoregressive generation of radiology reports.

[0113] In step 5, the modulated visual representation and diagnostic perception cues are input into a BERT-based decoder for processing to achieve autoregressive generation of the radiology report. This includes the following sub-steps:

[0114] By modulating the visual representation as the encoder's hidden state and paying attention to it through a cross-attention layer, the decoder can dynamically focus on the visual region most relevant to the currently generated content at each step of the report generation process.

[0115] The diagnostic perception prompts are labeled and placed before the input sequence to guide the autoregressive generation of the report, so that the decoder can perceive the diagnostic status of each disease when generating the report;

[0116] The report generation model is based on a conditional distribution, following a standard autoregressive formula:

[0117] ;

[0118] in, Indicates in a given image and clinical information Generate target report marker sequence under the condition of The joint conditional probability distribution, This indicates a series of multiplication operations. Indicates the length of the target report tag sequence. This represents the target report tag sequence, and ; Indicates time step Tags generated from a fixed vocabulary Indicates a previously generated tag;

[0119] Furthermore, during training, minimizing the sum of state-level cross-entropy losses for all diseases corresponds to the following relationship:

[0120] ;

[0121] in, Indicates disease classification loss, Indicates disease Unique hot-coded tags, Indicates taking the logarithm;

[0122] The report generator is optimized by minimizing the negative log-likelihood loss (i.e., autoregressive language modeling loss), and the following relationship exists in the process:

[0123] ;

[0124] in, This indicates the loss generated in the report;

[0125] The overall training loss of the model is the reported generation loss. and disease classification loss Together, they optimize the two losses during training.

[0126] It should be understood that although the steps in the flowcharts of the various embodiments of the present invention are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the various embodiments may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least a portion of the sub-steps or stages of other steps.

[0127] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0128] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0129] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of this patent should be determined by the appended claims.

Claims

1. A method for generating radiology reports that integrates clinical semantic modulation and hyperbolic prototype classification, characterized in that, The method includes the following steps: Step 1: Obtain chest X-ray images and clinical context knowledge, preprocess the chest X-ray images to obtain preprocessed images; Step 2: Use a visual encoder to extract features from the preprocessed image to obtain block-level visual features and global image representation; use a medical language encoder to extract features from clinical context knowledge to obtain clinical semantic embedding. Retrieve reference report features related to global image representation from a pre-defined reference report database to obtain a reference report feature set; Step 3: Construct a clinical semantic modulation module based on a dual-pathway architecture and gating fusion mechanism using spatial semantic injection pathway and channel feature recalibration pathway; input block-level visual features and clinical semantic embeddings into the clinical semantic modulation module for processing to obtain modulated visual representations; Step 4: Construct a hyperbolic prototype classification module based on the Poincaré sphere and the Softmax function. Input the global image representation and reference report feature set into the hyperbolic prototype classification module for processing to generate diagnostic perception prompts. Step 5: Input the modulated visual representation and diagnostic perception cues into the BERT-based decoder for processing to achieve autoregressive generation of radiology reports.

2. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping as described in claim 1, characterized in that, In step 1, a chest X-ray image and clinical context knowledge are acquired, and the chest X-ray image is preprocessed to obtain a preprocessed image. The chest X-ray image is a two-dimensional chest X-ray image. The clinical context knowledge includes examination indications, patient history, comparative examination information, and examination technique information. The preprocessing includes adjusting the chest X-ray image to a fixed resolution of 224×224.

3. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping as described in claim 2, characterized in that, In step 2, a visual encoder is used to extract features from the preprocessed image to obtain block-level visual features and a global image representation; a medical language encoder is used to extract features from clinical context knowledge to obtain clinical semantic embedding; and reference report features related to the global image representation are retrieved from a pre-set reference report database to obtain a reference report feature set. Specifically, this includes the following sub-steps: A visual encoder is used to extract features from the preprocessed image to obtain block-level visual features; The block-level visual features are subjected to global average pooling along the spatial dimension to obtain a global image representation. A medical language encoder is used to extract features from clinical context knowledge to obtain clinical semantic embeddings; Based on a pre-defined reference report database, using global image representation as the query vector, the cosine similarity between the global image representation and the visual features corresponding to all reports in the database is calculated, and the reports with the highest similarity are selected. This report; Using the MedKLIP encoder to filter the top Each report is encoded into text features to obtain a reference report feature set. ; in, Indicates the first Features of each reference report An index representing the characteristics of the reference report. This indicates the total number of features in the reference report.

4. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping as described in claim 3, characterized in that, In the step of extracting features from the preprocessed image using a visual encoder to obtain block-level visual features, the following relationship exists: ； in, Represents block-level visual features. Represents a visual encoder. This represents the preprocessed image; In the step of performing global average pooling on block-level visual features along the spatial dimension to obtain a global image representation, the following relationship exists: ； in, Represents the global image representation. This indicates that global average pooling has been applied. In the step of extracting features from clinical context knowledge using a medical language encoder to obtain clinical semantic embeddings, the following relationship exists: ； in, This indicates clinical semantic embedding. This represents a medical language encoder. This indicates knowledge of clinical context.

5. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping as described in claim 4, characterized in that, In step 3, a clinical semantic modulation module is constructed based on a dual-pathway architecture and a gated fusion mechanism, utilizing spatial semantic injection and channel feature recalibration pathways. Block-level visual features and clinical semantic embeddings are input into the clinical semantic modulation module for processing to obtain modulated visual representations. This process includes the following sub-steps: In the spatial semantic injection pathway, clinical semantics are embedded into block-level visual features through an attention mechanism to obtain the features output by the attention mechanism. The features output by the attention mechanism are sequentially processed by regularization, residual connection and layer normalization to obtain the output of the spatial semantic injection path; In the channel feature recalibration pathway, a multilayer perceptron is used to process the clinical semantic embedding to generate scaling and offset parameters; The block-level visual features are subjected to layer normalization to obtain the normalized block-level visual features. The scaling and offset parameters are used as affine transformation coefficients to perform affine transformation on the normalized block-level visual features in order to obtain the output of the channel feature recalibration path. The output of the spatial semantic injection pathway is concatenated with the output of the channel feature recalibration pathway to obtain the concatenated features. The concatenated features are processed sequentially through a learnable projection matrix and a Sigmoid activation function to obtain a gated tensor. After mapping the stitched features using a learnable projection matrix, the features are multiplied element-wise with a gate tensor. Finally, residual connections and layer normalization are performed sequentially to obtain the modulated visual representation.

6. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping according to claim 5, characterized in that, In the step of injecting clinical semantics into block-level visual features through an attention mechanism in the spatial semantic injection pathway to obtain the features output by the attention mechanism, the following relationship exists: ； in, Represents the query matrix. Represents the key matrix, Represents a value matrix, , and Both represent learnable projection matrices. This indicates that the input features are expanded to match the dimensions of the block-level visual features. Features representing the output of the attention mechanism This indicates that the data has been processed using the Softmax function. Indicates matrix transpose. Indicates the dimension of the key vector; In the steps of processing the features output by the attention mechanism through regularization, residual connections, and layer normalization to obtain the output of the spatial semantic injection pathway, the following relationship exists: ； in, This represents the output of the spatial semantic injection pathway. This indicates that the process has undergone layer normalization. This indicates that regularization has been applied.

7. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping as described in claim 6, characterized in that, In the step of recalibrating the channel features and processing the clinical semantic embedding using a multilayer perceptron to generate scaling and offset parameters, the following relationship exists: ； in, This represents the scaling parameter. Indicates the offset parameter. This indicates that the data has been processed by a multilayer perceptron. In the steps of performing layer normalization on the block-level visual features to obtain normalized block-level visual features, and then using the scaling and offset parameters as affine transformation coefficients to perform an affine transformation on the normalized block-level visual features to obtain the output of the channel feature recalibration path, the following relationship exists: ； in, This indicates the output of the channel characteristic recalibration path. This indicates element-wise multiplication.

8. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping according to claim 7, characterized in that, In the step of concatenating the output of the spatial semantic injection pathway with the output of the channel feature recalibration pathway to obtain the concatenated features, the following relationship exists: ； in, Indicates the features after splicing. This indicates that the data has been spliced using features. In the step of processing the concatenated features sequentially through a learnable projection matrix and a Sigmoid activation function to obtain the gated tensor, the following relationship exists: ； in, Represents the gate tensor. This indicates that the process has been performed using the Sigmoid activation function. Represents the learnable projection matrix; In the process of mapping the stitched features using a learnable projection matrix, multiplying them element-wise with a gate tensor, and finally performing residual connections and layer normalization to obtain the modulated visual representation, the following relationship exists: ； in, This represents the modulation of visual representation. This represents the learnable projection matrix.

9. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototyping according to claim 8, characterized in that, In step 4, a hyperbolic prototype classification module is constructed based on the Poincaré sphere and the Softmax function. The global image representation and reference report feature set are input into the hyperbolic prototype classification module for processing to generate diagnostic perception prompts. Specifically, this includes the following sub-steps: The reference report feature set is subjected to mean pooling to obtain the aggregated report features. The following relationship exists in the corresponding process: ； in, This indicates the aggregated report characteristics; The aggregated report features are concatenated with the global image representation to obtain the initial fused features. The following relationship exists in the corresponding process: ； in, Indicates the initial fusion features; The initial fused features are processed sequentially through a first-layer MLP, a GELU activation function, and a second-layer MLP to obtain case embedding features in Euclidean space. The following relationship exists in the corresponding process: ； in, This represents the case embedding features in Euclidean space. and Both represent weight matrices. and Both represent bias vectors. This indicates that the process has been performed using the GELU activation function; After linearly projecting the case embedding features in Euclidean space onto the target hyperbolic cone, they are then subjected to L2 normalization and scaling to obtain the normalized case embedding vector in Euclidean space. The following relationship exists in the corresponding process: ； in, This represents a normalized case embedding vector in Euclidean space. Indicates a fixed scaling factor. Represents the projection matrix. This indicates taking the L2 norm; The normalized case embedding vector in Euclidean space is mapped to a curve with exponential mapping at the origin. Poincaré The hyperbolic embedding of the current case is obtained above, and the following relationship exists in the corresponding process: ； in, Represents the hyperbolic embedding of the current case. Represents the hyperbolic tangent function. Represents the curvature parameter. Indicates the Poincaré ball; Building a learnable hyperbolic prototype: Initialize a set of learnable prototype embeddings on each disease-state pair on the Poincaré sphere; [denotes...] For the number of diseases, The number of states includes four types: blank, positive, negative, and uncertain. Maintaining the prototype tensor ; in, Indicates disease In state prototype embedding, Indicates the target hyperbolic taper; The prototype is implemented as a manifold parameter and optimized by Riemann gradient descent, which converts the Euclidean gradient into the Riemann gradient and projects the updated embedding back onto the manifold, ensuring that all prototypes remain within the effective region of the Poincaré sphere throughout the training process. Given case embeddings and prototype embedding Using the Poincaré distance Classify; embedded in the Poincaré sphere With Embedded The distance between them is defined as: ； in, Indicates the embedded in the Poincaré ball With Embedded The distance between them This represents the Möbius strip. Represents the inverse hyperbolic tangent function; The Möbius method is defined as follows: ； in, Represents the standard Euclidean inner product. Represents the standard Euclidean norm; For each disease and each state Calculate case embeddings and prototype embedding The square of the hyperbolic distance between them is transformed by scaling the negative distance with temperature parameters to obtain the disease. In state The classification scores correspond to the following relationship: ； in, Indicates disease In state Classification score, This represents the temperature parameter, and ; Indicates case embedding and prototype embedding The hyperbolic distance between them; For disease In state The probability is calculated using the Softmax function on the classification scores of all states, and the following relationship exists in the corresponding process: ； in, Indicates disease In state The probability, Represents an exponential function. Indicates disease In state Classification score; In reasoning, for each disease The state with the highest probability is selected as the prediction result, and the following relationship exists in the corresponding process: ； in, Indicates the prediction result. This indicates taking the maximum value; Prediction results Mapped to specific disease markers The mapping rule is: ； All disease markers are assembled in a fixed order to obtain diagnostic perception cues. The following relationship exists in the correspondence process: ； in, This indicates a diagnostic perception prompt. , and All of these indicate specific disease markers.

10. The radiology report generation method integrating clinical semantic modulation and hyperbolic prototype classification according to claim 9, characterized in that, In step 5, the modulated visual representation and diagnostic perception cues are input into a BERT-based decoder for processing to achieve autoregressive generation of the radiology report. This includes the following sub-steps: By modulating the visual representation as the encoder's hidden state and paying attention to it through a cross-attention layer, the decoder can dynamically focus on the visual region most relevant to the currently generated content at each step of the report generation process. The diagnostic perception prompts are labeled and placed before the input sequence to guide the autoregressive generation of the report, so that the decoder can perceive the diagnostic status of each disease when generating the report; The report generation model is based on a conditional distribution, following a standard autoregressive formula: ； in, Indicates in a given image and clinical information Generate target report marker sequence under the condition of The joint conditional probability distribution, This indicates a series of multiplication operations. Indicates the length of the target report tag sequence. Indicates the target report tag sequence, Indicates time step Tags generated from a fixed vocabulary This indicates a previously generated tag.

Citation Information

Patent Citations

Radiology report generation method based on semantic alignment
CN119517277A
Chest radiation medical report generation method based on mapping knowledge domain
CN120376026A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Radiology report generation method based on semantic alignment

Chest radiation medical report generation method based on mapping knowledge domain