Medical information processing device, medical imaging diagnostic device, medical information processing method, and medical information processing program

The medical information processing device enhances VQA reliability by using two generation models to generate and display findings from medical images, addressing the challenge of inaccurate answers in existing systems.

JP2026109613APending Publication Date: 2026-07-01CANON KK

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
CANON KK
Filing Date
2025-12-19
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Existing Visual Question Answering (VQA) systems in the medical field lack reliability in generating accurate answers to questions about medical images.

Method used

A medical information processing device and method that utilizes a first generation model to generate findings from medical image data, which are then input into a second generation model to output answer information, accompanied by displaying this information alongside the medical image data, enhancing reliability through evidence-based responses.

Benefits of technology

Improves the reliability of VQA systems by generating accurate and consistent answers to medical image-related questions, leveraging evidence generation and structured reporting to enhance accuracy and consistency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026109613000001_ABST
    Figure 2026109613000001_ABST
Patent Text Reader

Abstract

This invention provides a medical information processing device, method, and program that enable more reliable visual question answering (VQA) compared to conventional methods, when using VQA in the medical field. [Solution] The medical information processing device comprises at least one processing unit and a display control unit. The processing unit inputs medical image data 12 generated by a medical imaging device to a first generation model 16 that generates one or more findings based on the medical image data, inputs the medical image data, question information 14, and the generated one or more findings to a second generation model that outputs answer information related to the medical image data, and the display control unit displays the answer information 18 and at least a portion of the medical image data generated by the medical imaging device on a display device.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments described in this specification generally relate to a medical information processing device, a medical imaging diagnostic device, a medical information processing method, and a medical information processing program, and particularly relate to the automatic determination of medical information by a text generation model, but are not limited thereto.

Background Art

[0002] Visual Question Answering (VQA) relates to answering questions composed of natural language regarding the content of an image. In a deep learning system, Visual Question Answering (VQA) is generally executed by training an Answer Generator (AG) model that generates a text answer for a text question input together with a target image. In VQA, answers are given to text questions regarding the content of an image, such as "How many cars are shown in this image?" or "What color is the man's coat?". In the medical field, the questions usually relate to medical images, such as "How many nodules are there in the left lung?" or "Is the heart enlarged?".

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] When using VQA in the medical field, it would be beneficial to be able to realize a more reliable VQA compared to the prior art.

[0005] One of the problems to be solved by the embodiments disclosed in this specification and the drawings is to realize a more reliable VQA compared to the prior art when using VQA in the medical field. [Means for solving the problem]

[0006] A medical information processing device according to one embodiment comprises at least one processing unit and a display control unit. The processing unit inputs medical image data generated by a medical imaging device to a first generation model that generates one or more findings based on the medical image data, inputs the medical image data, question information, and the generated one or more findings to a second generation model that outputs answer information related to the medical image data, and the display control unit displays the answer information and at least a portion of the medical image data generated by the medical imaging device. [Brief explanation of the drawing]

[0007] Here, embodiments are described as non-limiting examples and are shown in the following figures. [Figure 1] Figure 1 is a schematic diagram of a visual question answering method. [Figure 2] Figure 2 is a schematic diagram of a medical information processing device according to the embodiment. [Figure 3] Figure 3 is a schematic diagram of a medical information processing method according to the embodiment. [Figure 4] Figure 4 is a schematic diagram of the method according to the embodiment. [Figure 5] Figure 5 is a schematic diagram of a method according to a further embodiment. [Figure 6] Figure 6 is a schematic diagram of a method according to another embodiment. [Modes for carrying out the invention]

[0008] One embodiment provides a medical information processing device comprising at least one processing unit and a display control unit, wherein the processing unit inputs medical image data generated by a medical imaging device to a first generation model that generates one or more findings based on the medical image data, inputs the medical image data, question information, and the generated one or more findings to a second generation model that outputs answer information related to the medical image data, and the display control unit displays the answer information and at least a portion of the medical image data generated by the medical imaging device.

[0009] One embodiment provides a medical imaging diagnostic device comprising a scanner for scanning a patient to obtain a medical image dataset, at least one processing unit, and a display control unit, wherein the processing unit inputs the medical image data to a first generation model that generates one or more findings based on the medical image data, inputs the medical image data, question information, and the generated one or more findings to a second generation model that outputs answer information related to the medical image data, and the display control unit displays the answer information and at least a portion of the medical image data generated by the medical imaging device.

[0010] One embodiment provides a medical information processing method that includes inputting medical image data generated by a medical imaging device into a first generation model that generates one or more findings based on the medical image data; inputting the medical image data, question information, and the generated one or more findings into a second generation model that outputs answer information related to the medical image data; and displaying the answer information and at least a portion of the medical image data generated by the medical imaging device on a display device.

[0011] Figure 1 shows an overview of the VQA deep learning process (Method 10). In Figure 1, medical image data 12 and a question 14 submitted by the user in natural language are provided to the answer generation model 16. The answer generation model 16 determines the answer 18. In Figure 1, the medical image data 12 includes images of the patient's lungs taken at multiple different time points, and the question 14 asks whether there has been any change in the lungs between the acquisition of the images (i.e., any change compared to a reference image). The answer generation model 16 responds to the question 14 with answer 18, specifically reporting that "the right pneumothorax has disappeared."

[0012] The VQA task can utilize the "chain of thoughts" prompting technique. This technique is, a) Users manually write "rationale" which includes reasoning related to each question-answer pair. b) Train a first generative model that generates "evidence" based on input image data and questions posted in natural language. c) Train a second generative model that answers the question based on the input image data, the question, and the evidence generated in (b). Includes.

[0013] Alternatively, image data can be processed using computer-aided design (CAD) algorithms before refining the results using machine learning models such as large language models (LLMs). While LLMs struggle with image processing, CAD algorithms have achieved significant success in this area. Generative models can be trained to produce CAD outputs such as classification, lesion segmentation, and reports. The LLMs can then be used to reorganize the generated outputs into natural language text.

[0014] Reporting is the task of creating accurate reports based on medical imaging data, and this is also possible.

[0015] A data processing apparatus 20 according to an embodiment is schematically shown in FIG. 2. In the present embodiment, the data processing apparatus 20 is configured to process medical information including image data and semantic data. In other embodiments, the data processing apparatus 20 may be configured to process any other suitable medical information.

[0016] The data processing apparatus 20 includes a computing device 22 which is a personal computer (PC) or a workstation in this example. The computing device 22 is connected to one or more output devices 26 such as a display (screen) or other display devices, and one or more input devices 28 such as a computer keyboard and a mouse.

[0017] The computing device 22 is configured to acquire a data set from the data storage unit 30. At least a part of the data acquired from the data storage unit includes medical information, and the medical information includes, for example, medical imaging data such as imaging data acquired using a scanner 32. The medical image data may include two-dimensional, three-dimensional, or four-dimensional data in any medical imaging device. For example, the scanner 32 as a medical imaging device may include a magnetic resonance (MR or MRI) scanner, a computed tomography (CT) scanner, a cone beam CT scanner, an X-ray scanner, an ultrasonic scanner, a positron emission tomography (PET) scanner, or a single photon emission computed tomography (SPECT) scanner. Therefore, the scanner 32 is a medical imaging device. The medical image data 52 is generated by the medical imaging device.

[0018] Computing device 22 may receive data from one or more additional data storage units (not shown) instead of, or in addition to, data storage unit 30. For example, computing device 22 may receive image data from one or more remote data storage units (not shown) that may form part of a Picture Archiving and Communication System (PACS) or other information system.

[0019] Computing device 22 provides processing resources for automatically or semi-automatically processing data. Computing device 22 includes a processing device 34. Processing device 34 includes a model training circuit 36 configured to train one or more models such as a machine learning model and a generation model, a data processing circuit 38 configured to apply the trained model and perform other processes such as image classification, visual question answering, image captioning, and automatic reporting, and an interface circuit 40 configured to obtain user input or other input and output data processing results. Note that data processing circuit 38 is an example of a processing unit.

[0020] In the present embodiment, circuits 36, 38, and 40 are implemented in computing device 22 by a computer program having computer-readable instructions executable by a computer for performing the methods of the embodiment. However, in other embodiments, various circuits may be implemented as one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs).

[0021] Furthermore, the computing device 22 includes a hard drive, an operating system including RAM, ROM, a data bus, various device drivers, and other PC components including hardware devices such as a graphics card. Such components are not shown in Figure 2 for clarity.

[0022] The data processing device 20 in Figure 2 is configured to perform the methods shown and / or described below.

[0023] Figure 3 shows a method 50 for processing medical information, including medical image data. In Figure 3, medical image data 52 and text data 54 are provided to a report generator 58. The report generator 58 is a generation model, also called the first generation model. The medical image data 52 includes an image of the chest obtained using X-ray imaging. In other embodiments, any other form of medical image data may be used. The image data includes two images: one obtained before the medical procedure and one obtained after the medical procedure. The text data 54 includes instructions and indications for the report generator 58 (e.g., the first generation model) to generate a report, specifically including indications such as "After chest tube insertion. Position confirmed by CXR." The report generator 58 generates one or more findings in the form of a report 60, based on the medical image data 52 and text data 54 as question information, for example. The report 60 includes findings / evidence, stating, "Right chest tube in place. Right pneumothorax resolved. Heart size normal. Left lung clear." Image data and text data may be input as vector representations.

[0024] Next, medical image data 52, a report 60 containing one or more pieces of evidence / findings generated by the report generator 58, and a question 56 are provided to the answer generator 62. The question 56 is in text format and may be composed of natural language. In Figure 3, the question 56 is written as, "What has changed compared to the reference image?" The answer generator 62 is a generative model, also called a second generative model. The answer generator 62 generates an answer 64 (answer information) based on the medical image data 52, the report 60, and the question 56. The question, report or other findings, and image data may be input as vector representations.

[0025] The output device 26 in Figure 2 (an example of an output unit, such as a display device) may be used to display the answer 64 to the user. The output device 26 may also display medical image data 52 or a portion thereof. The output device 26 may display the answer 64 and the medical image data 52 or a portion thereof simultaneously (side by side). The output device 26 and / or the processing device 34 may select a portion of the medical image data 52 based on the answer 64 and display the selected portion of the medical image data 52. The output device 26 and / or the processing device 34 may select one or more display parameters, such as rendering parameters, and display at least a portion of the medical image data 52 according to the selected one or more display parameters.

[0026] As explained, Method 50 in Figure 3 includes an intermediate step in which the generation model generates a report 60 before the response generation model generates the answer to the question. This can be thought of as an intermediate “evidence generation” step, and the response 64 can be considered grounded with the predicted report.

[0027] In some embodiments, the medical image data 52 provided to the report generator may include one image or multiple images, and the multiple images may or may not have a temporal relationship with each other. Any number of images obtained by any number of means may be included in the medical image data 52. Similarly, the text data 54 may include indications related to the medical image data and is not limited to the example in Figure 3. In some embodiments, only the medical image data 52 is provided to the report generator 58, and no text data 54 is provided. In some embodiments, the text data 54 is not required, and the report generator 58 may generate the report 60 without receiving the text data 54. In some embodiments, the need for the text data 54 may be eliminated by training the report generator.

[0028] In some embodiments, the medical image data 52 may be multimodal and may include semantic data or any other additional information in addition to the image data. In such examples, the multimodal data may be provided to the report generator 58 to obtain the report 60.

[0029] The report generator 58 and the response generator 62 may each comprise a machine learning model and / or a generative model such as an artificial neural network, a large-scale language model (LLM), a generative pre-trained transformer (GPT) network, or any combination thereof. Any other text generation model may be used. The report generator 58 may generate text autoregressively. A transformer network and / or a long short-term memory (LSTM) network may be used. In the embodiment of Figure 3, the report generator 58 and the response generator 62 have the same architecture but are trained separately and therefore have different weights. In some embodiments, the report generator 58 and the response generator 62 may have different architectures. In some embodiments, the report generator 58 (or the first generative model) may comprise a report generator. The report generator may be trained on the MIMIC-CXR dataset or a similar dataset. MIMIC-CXR is a large, publicly available dataset of chest X-ray images with free-text reports.

[0030] The table below lists the various types of findings or evidence that may constitute Report 60. [Table 1]

[0031] In addition to the table above, the report may consist of findings / evidence included in the report. Findings / evidence may include one or more of the following: semantic data, image data, and one or more segmentation masks.

[0032] Figure 4 shows a method 70 for processing medical image data. Features already described in relation to Figure 3 will not be described in detail in relation to Figure 4.

[0033] Figure 4 shows a structured report generator 72 and a structured report 74, in addition to the elements previously described in relation to the embodiment in Figure 3. The structured report generator 72 receives a report 60 containing evidence / findings as input from the report generator 58 or the first generating model. The structured report generator 72 is a generating model that edits the report 60 into a more structured form in the structured report 74. In the example in Figure 4, the structured report 74 includes a list of symptoms, their presence or absence, and / or their condition. The structured report 74 further includes any medical devices identified in the medical image data 52.

[0034] The structured report 74 in the example shown in Figure 4 is written as follows: "Atelectasis: None" Cardiac hypertrophy: None Infiltrative shadow: None Edema: None Cardiac septum dilation: None Fractures: None Lung lesions: None Lung shadows: None Pleural effusion: None Pneumonia: None Pneumothorax: Right side disappeared Auxiliary device: Chest drain Other: No special notes.

[0035] Subsequently, medical image data 52 and questions 56, along with a structured report 74, are provided to the answer generator 62, i.e., the second generation model. The answer generator 62 generates the answer 64 based on its input.

[0036] Method 70 adds the feature of “evidence sanitation,” i.e., report structure, to Method 50. The report 60 generated by the report generator 58 will differ in how its contents are presented and ordered from one instance to the next. Method 70 provides a consistent format for reports / findings / evidence to the response generator 62, i.e., the second generation model. The structured report 74 contains only information that is known to be reliably generated by the report generator 58. A processing circuit may be used to assess reliability by evaluating metrics such as “Clinical Efficiency” (or “Clinical Efficacy”), which assesses the recall / precision detected in the generated text report for each label such as “pleural effusion,” “cardiac hypertrophy,” and “assistive devices.” A user review may then be performed to select reliable information. “Clinical Efficiency” or “Clinical Efficacy” is an example and is given for illustrative purposes only. Any other appropriate metric may be used.

[0037] Regarding "clinical efficiency" or "clinical effectiveness," see "Jeremy Irvin et al, 2019. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'19 / IAAI'19 / EAAI'19). AAAI Press, Article 73, 590-597. https: / / doi.org / 10.1609 / aaai.v33i01.3301590".

[0038] The structured report generator 72 is configured to include or exclude data received in its input based on the expected accuracy and / or type of information.

[0039] The structured report generator 72 may include one or more of the following: a generative model, a machine learning model, a deep learning model, an LLM, and a transducer. Any other text generation model may be used. The structured report generator 72 may generate text autoregressively. A transducer network and / or a long-term memory (LSTM) network may be used. In other embodiments, the structured report may follow a format different from the format shown in Figure 4. The format in which the structured report 74 is composed may depend on the training of the structured report generator. In other embodiments, the structured report may include a list and / or status of anatomical features, pathological features, or any other clinically relevant medical information.

[0040] Figure 5 shows a method 80 for processing medical image data. Features already described in relation to Figure 3 or Figure 4 will not be described in detail in relation to Figure 5.

[0041] Figure 5 shows a report generator 58 that generates three reports 82, 84, and 60, in contrast to the embodiment in Figure 3 in which only one report 60 is generated. In other embodiments, the number of reports generated by the report generator 58 may be four or more, or two or fewer. Report 60 is the same as report 60 in Figure 3, but reports 84 and 82 are different.

[0042] Report 82 states that "the lungs are clear and the heart is of normal size," while Report 84 states that "the right pneumothorax has cleared and there are no abnormalities in the left lung."

[0043] As can be seen from the comparison of reports 60, 82, and 84, the reports generated by Method 80 may have different levels of completeness and may differ in content.

[0044] Subsequently, the medical image data 52 and the question 56, along with the generated reports 60, 82, and 84, are provided to the answer generator 62. The answer generator 62 generates an answer 64 based on its input.

[0045] Method 80 adds the feature of ensemble evidence to Method 50. Multiple reports 82, 84, and 60 generated by the report generator 58 may be produced by varying the medical image data 52 and / or text data 54 provided to the report generator 58. Variations of the image data 52 may include using rendering techniques to modify the medical image data 52 for each resulting report. Rendering techniques may include one or more of intensity transformations, spatial transformations, and other rendering parameters. Variations of the image data 52 may include selecting one or more subsets of the medical image data 52. Variations may be achieved by masking all or part of the clinical indications in the text data 54. Variations may be achieved by providing the report generator 58 with various text data 54, such as different instructions and / or different indications. Variations may be the result of using multiple differently trained report generators (not shown in Figure 5).

[0046] Figure 6 shows a method 90 for processing medical image data. Features already described in relation to Figures 3, 4, or 5 will not be described in detail in relation to Figure 6.

[0047] Method 60 comprises a QA generator 92 that generates a set of multiple question-answer pairs 94. The QA generator 92 is a generative model that uses report 60 as input. The question-answer pairs 94 generated by the generator 92 based on report 60 are described as follows:

[0048] Q1: Do you have any medical devices or assistive devices? A1: Yes, a right chest drain has been inserted.

[0049] Q2: Has anything changed since the last scan? A2: Yes, the right pneumothorax has disappeared.

[0050] Q3: What is the size of your heart? A3: The size of the heart is normal.

[0051] The question-answer pairs 94 may be generated using any suitable deep learning technique or natural language processing (NLP) technique. In other embodiments, the generator 92 may generate four or more or two or fewer question-answer pairs 94. The number of question-answer pairs 94 generated may depend on the training of the generator 92 and / or input from the user. The question-answer pairs 94 are provided to a generative model, the question similarity matcher 96. The similarity matcher 96 further takes the question 56 as input and selects one of the generated question-answer pairs as its output. From the question-answer pairs 94, the similarity matcher 96 selects the question-answer pair 98 that is closest to the user-configured question 56. The similarity matcher 96 may select the question-answer pair 98 based on semantic similarity with the question 56 or on other measures of similarity.

[0052] Direct matching of the user's question 56 with the generated question may be computationally simpler than directly answering the question. The QA generator 92 generates question-answer pairs 94 based on the user's anticipation of question 56 rather than on the user's knowledge of question 56. This means that question 56 can be answered efficiently without using the answer generator 62 model of method 50.

[0053] The QA generator 92 and the similarity collator 96 may comprise one or more of the following: machine learning models, neural networks, deep learning models, and transformers. The QA generator 92 may generate text autoregressively. The QA generator 92 may comprise any text generation model. A transformer network and / or a long-shorter-term memory (LSTM) network may be used. The similarity collator may comprise a text classification model such as Bidirectional Encoder Representations from Transformers (BERT) or a text dual encoder model that evaluates similarity using cosine similarity or similarity techniques.

[0054] In some embodiments, the scanner 32 may be configured to scan a patient to obtain a medical image dataset, and the processing circuit 38 inputs the medical image data to a first generation model 58 to generate one or more findings based on the medical image data, and then inputs the medical image data, question information, and the generated one or more findings to a second generation model 62 to output answer information related to the medical image data.

[0055] The processing circuit may be configured to acquire at least one previously acquired and stored medical image dataset for comparison with the medical image dataset in question, and to input the previously acquired medical image dataset into a first generative model 58 and / or a second generative model 62.

[0056] Experimental results Method 50 was evaluated using the publicly available Medical-Diff-VQA dataset, which consists of chest X-ray images and associated question-answer pairs. The Medical-Diff-VQA dataset contains 700,000 question-answer pairs and approximately 220,000 images. 10% of the question-answer pairs and their corresponding images were used for testing, 10% for validation, and the remaining 80% were used to train the answer generator.

[0057] To calculate performance metrics, questions were divided into “short-answer” and “long-answer” questions. For short-answer questions, such as YES / NO questions, prediction accuracy was calculated against known ground truth. The accuracy shown in the table below is based on the assumption that the predicted answer and the ground truth answer perfectly match. For long-answer questions, which exhibit higher variability, natural language generation metrics including BLEU, ROGUE, METEOR, and CIDEr, which measure the similarity between the predicted answer and the ground truth answer, are reported. The table below reports the performance metrics for Method 50 in three cases.

[0058] a) The report generator 58 does not generate any reports, and the answer generator 62 receives medical image data 52 and a question 56 as input. b) The report generator is given medical image data 52 and text data 54, and before providing them to the response generator 62, the report generator generates a report 60 in text format that includes findings / evidence. The response generator 62 further receives the medical image data 52 and question 56 as input. c) The answer generator 62 is given "ground truth" as input, which includes medical image data 52, questions 56, and text from expert-written reports. [Table 2]

[0059] The table above shows that feeding the generated report to the answer generation model improves performance across all metrics. However, when the Ground Truth report is fed to answer generator 62, perfect accuracy is not achieved. Examination of the results reveals that the Ground Truth report does not always contain complete information for answering question-answer pairs, and the answer generation model does not always perfectly interpret the text report input.

[0060] The following table shows several specific examples of applying Method 50 with and without evidence / findings. In the table, the first column, titled "Question," refers to the question 56 asked by the user. The second column, titled "Ground Truth Response," refers to the text from the expert-written report associated with the given medical image data 52. The third column, titled "Predictive Response Without Evidence," refers to the results of the known method described in relation to Figure 1. This method does not use the report generator 58 to generate findings / evidence. In the third column, errors in the ground truth response from the second column are highlighted in bold. The fourth column, titled "Evidence (Generated Report)," refers to the evidence / findings generated in report 60 from the output of the report generator 58. In the fourth column, the portion of the generated report related to the "Ground Truth Response" from the second column is highlighted in bold. The fifth, and final, column, titled "Predictive Response Using Evidence," refers to the response 64 provided by the response generator 62 of Method 50.

[0061] By providing a report 60 containing findings / evidence, the accuracy of the predicted responses 64 in the fifth column is improved. [Table 3] [Table 4] [Table 5]

[0062] According to various embodiments, a medical information processing device equipped with a processing circuit is provided. The processing circuit is configured to input medical image data into a first generation model that generates one or more findings based on the medical image data, and to input the medical image data, question information, and the generated one or more findings into a second generation model that outputs answer information related to the medical image data.

[0063] The one or more findings may include clinical findings and / or evidence related to the medical image data and / or one or more reports. The question information may include questions in text format. The question information may be composed of natural language.

[0064] The second generation model may include a response generation model. The second generation model may include an LLM. The second generation model may be configured to generate output response information related to the medical image data. The output response information may include text data, which may be in natural language and may semantically respond to the question information.

[0065] The one or more findings generated are, (a) at least a portion of the report, which may be generated by a generative model, (b) Location information representing the location of one or more landmarks or one or more anatomical or other notable features (medically noteworthy features), the location information may be in the form of one or more coordinates, (c) Segmentation of one or more anatomical features or other notable features, wherein the segmentation may define the area or volume of the feature. (d) Measurement of one or more anatomical features or other features of interest, the measurement may define the dimensions or other quantitative measurements of the feature, (e) One or more portions of the medical image data may be selected based on clinical relevance, (f) One or more enhanced images or derived images obtained based on the medical image data, such as the rendered image, It may include at least one of the following.

[0066] The one or more findings generated are, (a) Text data that may contain natural language, (b) Coordinates such as two-dimensional or three-dimensional spatial coordinates, (c) at least one segmentation mask, (d) at least one image and It may include at least one of the following.

[0067] The medical information processing device may include a display system. The display system may be configured to display the response information and at least a portion of the medical image data together. The response information may include natural language text. The medical image data may include at least a portion of the medical image data provided to the first and / or second generative models. The display system may be suitable for displaying the text, which includes the response information and at least a portion of the medical image data, to the user.

[0068] The processing circuit may be configured to select at least a portion of the medical image data based on the response information. The display system or device may further display the selected portion of the medical image data to the user.

[0069] The processing circuit may be configured to select one or more display parameters based on the response information and to display at least a portion of the medical image data according to one or more display parameters. The display parameters may include rendering parameters applied to the medical image data. The display of at least a portion of the medical image data may include at least a portion of the medical image data rendered according to the selected one or more rendering parameters.

[0070] The processing circuit and / or the first generation model may use a predetermined format for the output of the first generation model. The predetermined format may be applied to one or more of the generated reports. The predetermined format may present the output of the first generation model in a structured and / or consistent format.

[0071] The processing circuit and / or the second generation model may include or exclude, for example, the information received at its input, based on the expected accuracy of the information. The processing circuit and / or the second generation model may further include or exclude types of information based on the type of information.

[0072] The processing circuit may be configured to modify the medical image data provided to the first and / or second generative model in order to obtain multiple findings. The processing circuit may, instead of or in addition to the above, modify other inputs to the first generative model in order to obtain multiple findings. The processing circuit may also use multiple different first generative models or differently trained first generative models in order to obtain multiple findings. The multiple findings may include multiple different reports.

[0073] The aforementioned multiple findings, for example, the aforementioned multiple reports, may include an ensemble of multiple predictions for a given item.

[0074] The aforementioned multiple findings, for example, the aforementioned multiple reports, may contain at least some different information.

[0075] The processing circuit may, for example, use a trained model to generate a number of possible questions and their corresponding answers from the generated report. The processing circuit may, for example, use a trained model to select one or more of the possible questions based on how well they match the question information, and output answers corresponding to the selected one or more questions.

[0076] The medical image data may include a multimodal dataset containing semantic information and / or other additional information, and inputting the medical image data to the first generative model may include inputting the multimodal dataset to the first generative model.

[0077] At least one of the first generative model and the second generative model may include at least one of the following: a bidirectional encoder representations from transformers (BERT) model, a Generative Pre-trained Transformer (GPT) model, another transformer network, or a Large-Scale Language Model (LLM).

[0078] Inputting the medical image data into the first generative model may include inputting a vector representation of the medical image data into the first generative model. Inputting the medical image data, question information, and one or more generated reports into the second generative model may include inputting at least one of the medical image data, question information, and one or more generated reports as a vector representation.

[0079] In various embodiments, A scanner that scans a patient to obtain a set of medical image data, The medical image data is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more generated findings are input to a second generation model that outputs response information related to the medical image data. The present invention provides a medical image diagnostic apparatus comprising a processing circuit configured as described above.

[0080] The processing circuit may be configured to acquire at least one previously acquired and stored medical image dataset and to input the previously acquired medical image dataset to the first generative model and / or the second generative model in order to compare the medical image dataset with the previously acquired medical image dataset.

[0081] In various embodiments, The medical image data is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more generated findings are input into a second generation model that outputs response information related to the medical image data. The present invention provides a medical information processing method that includes [a specific component].

[0082] In various embodiments, a computer program product is provided which includes a computer-readable medium for storing executable instructions, wherein the instructions are The medical image data is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more generated findings are input into a second generation model that outputs response information related to the medical image data. Perform a method that includes this.

[0083] In various embodiments, the present invention provides a method for evidence chaining in a medical VQA system, comprising: a) a set of medical image data annotated with relevant questions and corresponding answers; b) a set of reports or other data corresponding to the medical image data, which may be predicted from the images by a deep learning model; c) an evidence generation model that takes a target image as input; and d) an answer generation model that takes the target image, a text question, and the generated evidence as input and outputs a predicted answer to the question.

[0084] The evidence generation model c) may include a converter network, such as a GPT model. The response generation model d) may include a converter network, such as a GPT model. The evidence generated by c) may include a report. The evidence may be further processed to ensure a consistent format. The evidence may be further processed to retain information that is known to have been reported reliably and remove information that is known to have been reported unreliably. One or more of the models may be LLMs.

[0085] An ensemble of multiple predictions may be generated for a given evidence item, and the ensemble may be passed to the answer generation model. Potential question-answer pairs may be generated from the generated report, and similarity matching may be used to match a user's question to the closest generated question.

[0086] While specific circuits are described herein, in alternative embodiments, one or more functions of these circuits may be provided by a single processing resource or other component, or a function provided by a single circuit may be provided by a combination of two or more processing resources or other components. A reference to a single circuit encompasses multiple components that provide the functionality of that circuit, regardless of whether such components are separated from each other. A reference to multiple circuits encompasses a single component that provides the functionality of those circuits.

[0087] While certain embodiments are described, these embodiments are presented for illustrative purposes only and are not intended to limit the scope of the invention. In practice, the novel methods and systems described herein can be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and modifications in the forms of methods and systems described herein may be made without departing from the spirit of the invention. The claims of the appended claims and equivalents thereof are intended to cover forms and modifications that fall within the scope of the invention.

Claims

1. It comprises at least one processing unit and a display control unit, The aforementioned processing unit, Medical image data generated by a medical imaging device is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more of the generated findings are input to a second generation model that outputs response information related to the medical image data. The display control unit, The response information and at least a portion of the medical image data generated by the medical imaging device are displayed on the display device. Medical information processing device. *Please provide support information regarding the display control unit. Please add a note to the example, such as, "Note that... is an example of a display control unit."

2. The one or more findings generated are, At least part of the report, Location information representing the location of one or more landmarks or one or more anatomical features or other notable features, Segmentation of one or more anatomical features or other notable features, Measurement of one or more anatomical features or other notable features, A selected portion or more of the aforementioned medical image data, One or more enhanced images or derived images obtained based on the aforementioned medical image data, Including at least one of the following: The medical information processing device according to claim 1.

3. The medical information processing apparatus according to claim 1, wherein the one or more generated findings include at least one of text, coordinates, at least one segmentation mask, and at least one image.

4. The medical information processing apparatus according to claim 1, wherein the display control unit causes the response information and at least a portion of the medical image data to be displayed on the same screen of the display device.

5. The medical information processing apparatus according to claim 4, wherein the display control unit selects a portion of the medical image data based on the response information and displays the selected portion of the medical image data on the display device based on the response information.

6. The medical information processing apparatus according to claim 4, wherein the display control unit selects one or more display parameters based on the response information and displays at least a portion of the medical image data on the display device according to the selected one or more display parameters.

7. The medical information processing apparatus according to any one of claims 1 to 6, wherein the processing unit or the second generation model uses a predetermined format for the one or more findings generated.

8. The medical information processing apparatus according to any one of claims 1 to 6, wherein the processing unit or the second generation model is configured to include or exclude information of at least one type of information from the findings based on the expected accuracy or type of information of the response information.

9. The medical information processing apparatus according to any one of claims 1 to 6, wherein the processing unit acquires a plurality of findings by using at least one of the following: a process for changing the medical image data; a process for changing other inputs to the first generative model; or a process for using a plurality of different first generative models or a first generative model that has been trained in a different way.

10. The medical information processing device according to claim 9, wherein the aforementioned multiple findings include an ensemble of multiple predictions for a given item.

11. The medical information processing apparatus according to claim 9, wherein the aforementioned multiple findings have at least some different information content.

12. The medical information processing apparatus according to any one of claims 1 to 6, wherein the processing unit generates a plurality of possible questions and corresponding answers from the generated findings.

13. The medical information processing apparatus according to claim 12, wherein the processing unit selects one or more of the possible questions based on the degree to which they match the question information, and outputs answers corresponding to the selected one or more questions.

14. The medical image data includes at least one of a multimodal dataset containing semantic information and other additional information. The input of the medical image data to the first generative model includes inputting the multimodal dataset to the first generative model. A medical information processing device according to any one of claims 1 to 6.

15. The first generative model or the second generative model, or both thereof, GPT model or other converter network, or Large-scale language model (LLM) Including at least one of the following: A medical information processing device according to any one of claims 1 to 6.

16. Inputting the medical image data to the first generative model involves inputting the vector representation of the medical image data to the first generative model. The input of the medical image data, question information, and one or more generated findings to the second generation model is to input at least one of the medical image data, question information, and one or more generated findings as a vector representation. Including at least one of the following: A medical information processing device according to any one of claims 1 to 6.

17. The system comprises a scanner for scanning a patient to obtain a set of medical image data, and at least one processing unit. The aforementioned processing unit, The medical image data is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more generated findings are input to a second generation model that outputs response information related to the medical image data. Medical imaging and diagnostic equipment.

18. The aforementioned processing unit, Obtain at least one previously acquired and stored medical image dataset, In order to compare the aforementioned medical image dataset with the previously acquired medical image dataset, the previously acquired medical image dataset is input into at least one of the first generation model and the second generation model. The medical imaging diagnostic apparatus according to claim 17.

19. The medical image data generated by the medical imaging device is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more generated findings are input into a second generation model that outputs response information related to the medical image data. The response information and at least a portion of the medical image data generated by the medical imaging device are to be displayed on the display device. A medical information processing method including [the specified term].

20. On the computer, The medical image data generated by the medical imaging device is input into a first generation model that generates one or more findings based on the medical image data. The medical image data, question information, and one or more generated findings are input into a second generation model that outputs response information related to the medical image data. The response information and at least a portion of the medical image data generated by the medical imaging device are to be displayed on the display device. A medical information processing program that executes [the following].