Medical image data processing method, medical image data processing device and program

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The method improves medical image data processing by displaying interactive maps for selecting image conditions, using machine learning models to generate and visualize semantic data, addressing the challenge of generating geometric regions in volume rendering.

JP2026096944APending Publication Date: 2026-06-15CANON KK

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: CANON KK
Filing Date: 2025-11-27
Publication Date: 2026-06-15

Application Information

Patent Timeline

27 Nov 2025

Application

15 Jun 2026

Publication

JP2026096944A

IPC: G06T11/00; G16H30/20

CPC: G06V20/70; G06V10/34; G06V10/25; G06V2201/03; G06V10/7715; G16H30/40; G06V10/945

AI Tagging

Application Domain

2D-image generation Medical images

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026096944000001_ABST

Patent Text Reader

Abstract

This invention provides a medical image data processing method, a medical image data processing device, and a program that improve user convenience in a technology that generates geometric regions corresponding to semantic concepts within a space based on medical image data. [Solution] The medical image data processing method according to the embodiment includes displaying a map representing multiple image generation conditions, setting an indicator in the map for selecting one or more of the multiple image generation conditions, inputting medical image data and the selected image generation condition into a model, and outputting medical image data generated based on the selected image generation condition.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments disclosed in this specification generally relate to a medical image data processing method, a medical image data processing apparatus, and a program.

Background Art

[0002] Volume rendering is used in many clinical applications. Generally, in the application of volume rendering, most of the rendering parameters are set using interactive screen operations, input boxes, sliders, and the like.

[0003] An image caption generation network including a multi-modal model (MMM) is known as a suitable tool for extracting semantic information from images including rendered images and the like.

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0005] One of the problems to be solved by the embodiments disclosed in this specification and the drawings is to improve the convenience of the user for a technique of generating a geometric region corresponding to a semantic concept in a space based on medical image data. However, the problems to be solved by the embodiments disclosed in this specification and the drawings are not limited to the above problems. It is also possible to position the problems corresponding to the respective effects of each configuration shown in the embodiments described later as other problems.

Means for Solving the Problems

[0006] The medical image data processing method according to the embodiment includes displaying a map representing multiple image generation conditions, setting indicators in the map for selecting one or more of the multiple image generation conditions, inputting medical image data and the selected image generation condition into a model, and outputting medical image data generated based on the selected image generation condition.

[0007] Next, embodiments will be described as non-limiting examples with reference to the attached drawings shown below. [Brief explanation of the drawing]

[0008] [Figure 1] Figure 1 is a schematic diagram showing an example of a medical imaging data processing device according to one embodiment. [Figure 2] Figure 2 is a flowchart illustrating an example of a method according to one embodiment. [Figure 3] Figure 3 shows an overview of an example of a medical image data processing method according to one embodiment. [Figure 4] Figure 4 shows a grid of a rendering image according to one embodiment. [Figure 5] Figure 5 shows an example of a parameter space according to one embodiment. [Figure 6] Figure 6 shows three examples of how the parameter space according to one embodiment can be represented. [Figure 7] Figure 7 shows an example of an image of a user interface according to one embodiment. [Figure 8] Figure 8 shows an example of a parameter space according to one embodiment. [Figure 9] Figure 9 shows an overview of an example of a method according to one embodiment. [Modes for carrying out the invention]

[0009] The medical image data processing method according to the embodiment includes displaying a map representing multiple image generation conditions, setting indicators in the map for selecting one or more of the multiple image generation conditions, inputting medical image data and the selected image generation condition into a model, and outputting medical image data generated based on the selected image generation condition.

[0010] The medical image processing apparatus according to the embodiment comprises a display control unit, a setting unit, an input unit, and an output unit. The display control unit causes a map representing multiple image generation conditions to be displayed on the display unit. The setting unit sets indicators on the map for selecting one or more of the multiple image generation conditions. The input unit inputs medical image data and one selected image generation condition into the model. The output unit outputs medical image data generated based on the selected image generation condition.

[0011] Figure 1 is a schematic diagram showing an example of a medical imaging data processing device 20 according to one embodiment. In this embodiment, the medical imaging data processing device 20 processes medical imaging data. In other embodiments, the medical imaging data processing device 20 may process other data as appropriate.

[0012] The medical imaging data processing device 20 includes a computing device 22. In this case, the computing device 12 is a personal computer (PC) or workstation, etc. The computing device 22 is connected to a display screen 26 or other display device, and one or more input devices 28 such as a computer keyboard or mouse. The display screen 26 or other display device is an example of a display unit.

[0013] The arithmetic unit 22 acquires a dataset from the data storage unit 30. The dataset is acquired or generated as appropriate using any device or from any source.

[0014] In embodiments, at least a portion of the data may include, or be identifiable from, medical imaging data acquired by, for example, the scanner 24. The scanner 24 generates medical imaging data that includes two-dimensional, three-dimensional, or four-dimensional data of any imaging modality. For example, the scanner 24 may include a magnetic resonance (MR or MRI) scanner, a computed tomography (CT) scanner, a cone-beam CT scanner, an X-ray scanner, an ultrasound scanner, a positron emission tomography (PET) scanner, or a SPECT (single-photon emission computed tomography) scanner. The medical imaging data may include, or be associated with, other adjustment data, such as non-imaging data.

[0015] In place of or in addition to the data storage unit 30, the arithmetic unit 22 may receive data from one or more other data storage units (not shown). For example, the arithmetic unit 22 may receive medical imaging data from one or more remote data storage units (not shown) that constitute part of a picture archiving and communication system (PACS) or other information system.

[0016] The arithmetic unit 22 provides processing resources for automatically or semi-automatically processing data. The arithmetic unit 22 includes a processing unit 32. The processing unit 32 includes a model training circuit 34 for training one or more models, a data processing circuit 36 for applying the trained models to perform other processing, and an interface circuit 38 for acquiring user input and / or outputting the results of data processing. The interface circuit 38 also generates a user interface and processes user input when the user operates the user interface using the input device 28 or other input devices. The interface circuit 38 is an example of a display control unit, setting unit, input unit, and output unit.

[0017] In the present embodiment, the circuits 34, 36, and 38 are each implemented in the arithmetic unit 22 by a computer program including computer-readable instructions for executing the method of the present embodiment. However, in another embodiment, these circuits may be implemented as one or more application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs).

[0018] Further, the arithmetic unit 22 includes other components of a PC such as a hard drive, an operating system having a RAM, a ROM, a data bus, various device drivers, and a hardware device having a graphics card. For clarity, FIG. 1 omits the illustration of these components.

[0019] The medical imaging data processing apparatus 20 of FIG. 1 executes the methods illustrated and / or described below.

[0020] FIG. 2 is a flowchart showing an example of a method 100 according to an embodiment, which is executed by the apparatus of FIG. 1, for example. In the method of FIG. 2, the user can interactively operate the arithmetic unit 22 to automatically or semi-automatically process medical image data.

[0021] In step 102 of the method, the arithmetic unit 22 displays a map representing one or more rendering parameters and / or image generation conditions on the display screen 26. In the embodiment of FIG. 2, the map includes a plurality of rendering images in a grid pattern as shown in FIG. 4 described later. In other embodiments, other maps can be displayed as appropriate. Also, the map is not limited to a grid arrangement.

[0022] In the process shown in Figure 2, the images within the map are rendered based on multiple different rendering parameters / image generation conditions. The map provides a clinically useful starting point for the user when analyzing and manipulating medical image data. In the embodiment shown in Figure 2, the generated map is interactive, allowing the user to select one or more rendering parameters, image generation conditions, rendered image data, and values associated with the rendering parameters and image generation conditions.

[0023] The map contents may be updated by further selecting parameters, conditions, or images and applying the newly selected parameters or conditions to the same or new images. The map may also include indicators. Indicators are set by the operation of the arithmetic unit 22 via an input device 28 such as a mouse or keyboard. The indicators mark the selected rendering parameters, image generation conditions, or values associated with one or the other. The map display on the display screen 26 is initiated by the user causing the arithmetic unit 22 to display rendering parameters and / or image generation conditions, or a subset thereof selected by the user. The displayed parameters and / or conditions may include some or all of the rendering parameters and / or image generation conditions used in the medical image data processing of the arithmetic unit 22.

[0024] In step 104, the user selects one or more rendering parameters and / or image generation conditions and provides the selected parameters / conditions to the arithmetic unit 22. This is achieved by setting one or more indicators associated with one or more rendering parameters and / or image generation conditions. The indicators include adding a "selected" mark to the display of the parameters / conditions, or interactive indicators such as sliders or rotatable "knobs" for setting values associated with the parameters / conditions or other indicator systems. The user may also use the indicators to select a subrange of values associated with each parameter / condition.

[0025] In step 106, the computing unit 22 provides the medical image data and one or more rendering parameters and / or image generation conditions to the trained machine learning model.

[0026] In step 108, the arithmetic unit 22 displays at least one rendered image and at least some medical image data, such as semantic data, in caption format on the display screen 26. It may also display semantic data associated with one or more images on the display screen 26. Captions containing semantic data are superimposed on the corresponding rendered images.

[0027] Figure 3 is a schematic diagram of an example of another processing method 210 for medical image data according to one embodiment. In step 212, medical imaging data is supplied to the processing device 32 (Figure 1). The medical imaging data may include three-dimensional volume data. The medical imaging data may also include any imaging modality. The imaging modality includes, but is not limited to, imaging data obtained from magnetic resonance (MR or MRI) scanners, computed tomography (CT) scanners, cone beam CT scanners, X-ray scanners, ultrasound scanners, positron emission tomography (PET) scanners, or SPECT (single photon emission computed tomography) scanners. The imaging data may include two-dimensional and / or one-dimensional data. The imaging data is a continuous image format of three-dimensional, two-dimensional, or one-dimensional images within a certain time period, such as video format or animation. In the embodiment, the images used do not need to be high-resolution images. An advantage of the present invention is that it is possible to output clinically useful visual representations for the user based on low-resolution images. In step 212, one or more rendering parameters or other image generation conditions may be provided to the processing device 32. Alternatively, one or more sets of values or ranges of values related to each rendering parameter or other image generation condition may be provided to the device.

[0028] The image rendering parameters in the embodiments described can be replaced with other image generation conditions as appropriate in other embodiments.

[0029] In various embodiments, the image generation conditions may include conditions relating to segmentation, or conditions relating to at least one of image rotation, scaling, or viewing direction, or conditions relating to rendering. Multiple image generation conditions may include multiple types of image generation conditions.

[0030] If a range of rendering parameter values is specified, the processing circuit obtains a set of rendering parameter values for rendering the image. The rendering parameter values do not need to be discontinuous.

[0031] In step 214, medical imaging data is rendered based on rendering parameters given to or obtained by the device. Image data rendering involves processing image data to obtain other image data. Rendering includes filtering of image data. Rendering also improves or reduces the appearance of image data. The imaging data is rendered by batch rendering of the graphics processing unit (GPU). The medical imaging data is rendered for one or more values of the rendering parameters given to or obtained by the device. The medical imaging data is also rendered for each value within a set of values or discrete values given to or obtained by the device for each rendering parameter. Furthermore, the medical imaging data is rendered for all combinations of the given / obtained values of the rendering parameters. Each rendered image obtained in step 214 is generated using different values of each rendering parameter given to or obtained by the device. Thus, each medical image is rendered for all values of all given rendering parameters and for all combinations of given values and rendering parameters. For example, if N discrete values are given / obtained for M different rendering parameters to draw one original image, the total number of rendered images will be at least N! / [M!(NM)!], for example, N M That is the case.

[0032] In step 216, the rendered image is processed by a trained machine learning model to generate semantic data associated with the rendered image. The semantic data describes and / or represents one or more features in the rendered image. The model identifies these features in the rendered image and assigns semantic data to each identified feature. Features include anatomical features and / or specific pathologies.

[0033] Pre-trained machine learning models include generative large-scale language models (LLMs) and other models. Pre-trained machine learning models perform caption generation processing on rendered image data. They also perform caption generation processing on identified features within rendered images, generating semantic data in caption format. Pre-trained machine learning models are trained on medical data and present medical or clinical semantic data. These models may include LLMs that use caption generation and visualization models such as CLIP / BLIP. These models are directly available. Models include medically trained caption generation / visualization / multimodal LLMs. Alternatively or additionally, models may include at least one of GPT-2, GPT-3.5, GPT-4, PaLM, LLaMa, BLOOM, Ernie, T5, Claude or Claude2, or their derivatives or extensions. Furthermore, pre-trained machine learning models include generative LLMs that receive image data or combinations of image and text data as input and return semantic data. In this embodiment, the LLM is located on a server 25, which is remote from the computing unit 22 in Figure 1. Communication between the computing unit 22 and the trained model takes place via the internet or other communication or network methods. In this embodiment, the data processing circuit 36 provides an application programming interface (API) that receives prompts and other inputs, transmits them to the LLM or other models, and receives responses from the LLM or other models.

[0034] In another embodiment, the trained model is stored or implemented locally in the arithmetic unit 22. The trained model may also be implemented by the data processing circuit 36.

[0035] The processing method 210 includes receiving a prompt from the user in step 217. The prompt is given to the LLM via the input device 28. The prompt also adjusts the processing task assigned to the LLM. The prompt includes at least one of text data and image data, which are processed by the LLM along with the rendered images. The prompt can also guide the LLM on the processing task by providing it with additional content. For example, the prompt instructs the LLM to search for specific anatomical features or specific pathologies. In an embodiment, the prompt instructs the LLM to search for major biological structures in one or more images. The prompt also adjusts the LLM's output format, such as the number of words and / or characters. The prompt also instructs the LLM to generate a specific number of semantic descriptions from the rendered images. The prompt also instructs the interface circuit 38 to generate output displays and / or a user interface that are provided to the user during and after the execution of the processing method 210. Furthermore, the prompt instructs the interface circuit 38 to generate a specific user interface and define the elements that the interface should generate.

[0036] In step 218, the semantic data obtained from the LLM is further processed. This processing is performed by the LLM, a second LLM trained specifically for this processing, or another pre-trained model. Alternatively, this processing is performed by the processing unit 32 according to a predefined set of instructions. This processing of semantic data includes simplification of the LLM's semantic output. Simplification includes reduction and editing of text data within the LLM's semantic output. Alternatively, a user may perform the processing in step 218.

[0037] In step 240, the semantic data output from step 218 is stored in a dataset. The dataset includes rendered images and semantic data associated with one or more features identified by the LLM within each rendered image. The dataset also includes rendering parameters used to plot medical imaging data and the values of the rendering parameters used in the rendering process. Furthermore, the dataset includes the correspondence between the rendering parameter values, the rendered images obtained for each rendering parameter value, the identified features of each rendered image, and the semantic data generated to identify the features. The dataset may also include information indicating the presence or absence of one or more features of the rendered image, expressed in the semantic data as a function of the rendering parameter values used to generate the rendered image. The dataset is in a multidimensional parameter space format, containing the rendering parameters used to render the medical image for each dimension. Coordinate points along each dimension of the multidimensional parameter space indicate the values of the rendering parameters used to render the image. For example, a rendered image is plotted for discrete values of a first rendering parameter, while the values of other rendering parameters are zero. Such a rendered image is associated with coordinate points on a single dimension of the first rendering parameter. Other rendered images are drawn simultaneously for specific non-zero values of two or more rendering parameters. These rendered images are associated with coordinate points that do not lie on any single dimension of the multidimensional coordinate space.

[0038] In step 244, the completeness of the semantic data collected in the preceding steps is evaluated. This evaluation is performed by the LLM, a second LLM, or a third LLM or other model trained specifically for completeness evaluation. Alternatively, this evaluation is performed by the processing unit 32 according to a predefined set of instructions, or by the user. The user operates the device using the display screen 26 and the input device 28.

[0039] For example, once semantic data or at least a portion of the dataset from step 240 is displayed on the display screen 26, the user uses the input device 28 to input information about the completeness of the semantic data. This completeness is defined as containing sufficient semantic information to convey clinically useful identified features to the user of the device. If the semantic information is assessed as incomplete, the method returns to step 216 and reprocesses the rendered image data as before using an LLM that has been adjusted to obtain a more complete semantic dataset from the rendered image. The more complete semantic dataset contains more identified features and / or more detailed descriptions of the identified features. If the semantic information is assessed as complete, the method proceeds to step 246.

[0040] In step 246, the data collected in the preceding steps is evaluated, for example, based on resolution. This evaluation is performed by the LLM, a second LLM, a third LLM, or a fourth model or other model trained specifically for resolution evaluation. Alternatively, this evaluation is performed by the processing unit 32 according to a predefined set of instructions, or by the user. The user operates the device using the display screen 26 and the input device 28. For example, once at least a portion of the dataset from step 240 is displayed on the display screen 26, the user inputs the resolution of the semantic data using the input device 28. In this case, resolution is defined as the device having assigned / acquired discrete values of one or more rendering parameters sufficient to convey clinically useful identified features to the user of the device. If the data resolution is evaluated as lower than the required resolution, the method returns to step 214 and reprocesses the image data as before, based on the imaging data rendered with respect to more discrete values of one or more rendering parameters. Image data reprocessing is applied to one or more of the rendering parameters. For example, the image is redrawn at a higher resolution using one or more rendering parameters. In another example, the image is redrawn at a higher resolution using all the provided rendering parameters. If the data resolution is deemed sufficient, the method proceeds to step 248.

[0041] In step 248, filtering is performed on the semantic data associated with the identified features of the rendered image based on incidence and / or relevance. Filtering based on incidence includes removing semantic data with low incidence. The incidence threshold is set by the user, and the user can update the incidence threshold during operation. The incidence threshold is also an adaptive threshold calculated by the processor. The adaptive incidence threshold is calculated for the incidence of all semantic data associated with a given medical image input dataset. In this way, the processor can identify key semantic data related to key features identified in the rendered image while removing questionable semantic data. Similarly, the relevance threshold is set based on an understanding of the semantics of the generated semantic dataset. The relevance of a given semantic data to all generated semantic datasets is verified using one of the aforementioned LLMs or other models. This provides an additional method for removing erroneous semantic data from the generated semantic data. The output of step 248 is a dataset containing semantic data associated with one or more features identified in each rendered image. The semantic data is simplified (step 218), evaluated and corrected for completeness (step 244) and resolution (step 246), and filtered based on incidence and / or relevance (step 248).

[0042] In step 250, an example of semantic data corresponding to a feature identified in the rendered image is selected. The semantic data may include text captions generated by the LLM in step 216.

[0043] In step 252, the device generates a visual representation of the semantic data as a function of rendering parameters used to generate a rendering image associated with the semantic data.

[0044] The visual representation of the parameter space includes multiple dimensions. Each dimension represents one or more of the rendering parameters. In this embodiment, the visual representation is a region form of the parameter space representing the semantic data. The region corresponds to a coordinate position in the parameter space that corresponds to the rendered image in which the semantic data was identified by the LLM in step 216.

[0045] Semantic data representations in a parameter space can also be referred to as parameter space datasets.

[0046] A region represents the presence or absence of one or more features in the rendered image, as indicated by semantic data. In another embodiment, other forms of visual representation are used. The visual representation includes visually marking the coordinates where one or more features of the rendered image are identified in a multidimensional parameter space. In another embodiment, the visual representation includes visually marking the coordinates where one or more features in the rendered image are absent. In an embodiment, all coordinate points where a particular feature is identified are marked identically by coloring, shading, other visual representations, etc. In an embodiment, all coordinate points indicating the location of a particular feature in the parameter space are marked in such a way that adjacent visual representations are linked together to create a region indicating the location of the feature. Such a region represents the presence or absence of a particular feature at each coordinate point in the multidimensional parameter space that the region encompasses. This is performed for one or more features, including the dataset generated in the output of step 248. The region may also be referred to as a mask. Each mask is associated with a semantic description of one feature identified in one or more rendered images. Multiple masks showing the same semantic description may be adjacent or not adjacent.

[0047] In step 254, one or more masks created in step 252 are smoothed. Smoothing can be achieved by a filtering process. The smoothing process may include morphological filtering.

[0048] In step 256, one or more filtered masks are visually simplified. The simplification process involves approximating the visual shape of the mask to one of several predefined shapes.

[0049] In step 258, a visual representation of semantic data in a multidimensional parameter space is presented to the user. This representation is stored in memory or displayed on the display screen 26. In addition to the visual features described above, the representation can take several other forms. The processing circuit generates a visual representation of the parameter space dataset on the display screen. Semantic data associated with each region within the representation may be added. When only one rendering parameter is used to plot medical image data, the output includes a bar graph with a one-dimensional axis. The length of the bar graph is divided into segments. Each segment represents a region consisting of the values of the rendering parameter, and semantic data associated with a predetermined feature is identified within that region.

[0050] Segments in an image where two or more features are identified may overlap at a specific y-coordinate point on the axis. This y-coordinate point represents the value of one rendering parameter used to plot the medical image data. When two rendering parameters are used to plot the input medical image data, the output includes a two-dimensional graph, where each rendering parameter is represented as one of the graph's dimensions. The coordinate points of such a graph represent the simultaneous values of the two rendering parameters used to plot the input medical image data.

[0051] Regions representing semantic data that identify features within the rendered image in a two-dimensional parameter space are displayed as two-dimensional masks in the graph. When rendering input medical image data based on three rendering parameters, the output may include a three-dimensional graph. Each rendering parameter is represented as one dimension of the graph. The coordinate points of such a graph represent the simultaneous values of the three rendering parameters used to render the input medical image data.

[0052] Regions in the three-dimensional parameter space that represent semantic data identifying features within the rendered image are displayed as three-dimensional objects or voxel masks. When rendering medical image data input based on four or more rendering parameters, the visual representation involves reducing dimensionality and projecting the parameter space and masks or geometric shapes into a two-dimensional or three-dimensional coordinate space. The resulting two-dimensional or three-dimensional coordinate space will possess the characteristics described above.

[0053] Figure 4 shows an example of a two-dimensional grid 300 of a rendered image. The grid may be referred to as a map. In other embodiments, other forms of maps are available. In some embodiments, the two-dimensional grid 300 is presented to the user as output. In other embodiments, no image is output. The output is displayed on the display screen 26. In the embodiment of Figure 4, the input imaging data is a three-dimensional volume dataset containing voxel data. In other embodiments, a series of images can be obtained from a two-dimensional image set or imaging data of a higher dimension than three. The image, including Figure 4, is rendered using two rendering parameters: rotation and thresholding. In other embodiments, the input imaging data is rendered based on other rendering parameters. The image includes images of a human head and neck. In this embodiment, the rotation of the biological structure is sagittal rotation. Sagittal rotation changes vertically on the grid shown in Figure 4. Rendering parameters may be controlled by an input box into which a rotation angle can be entered, a slider to change the rotation angle, or other interactive elements that allow the user to manipulate the image data. The same applies to a second rendering parameter for obtaining the image of Figure 4, such as a thresholding parameter. In this embodiment, the threshold is related to the absorption of imaging light by the material of the human head and neck. A higher threshold allows for observation of the biological structures of the head and neck at a greater depth. The depth depends on the light absorption rate. As is evident from the increase in imaging depth from left to right in Figure 4, the threshold increases horizontally from left to right.

[0054] Figure 4 shows the rendered image as a function of the rendering parameters used to draw it. As can be seen from Figure 4, the visual content of the image changes as a function of the rendering parameters used, particularly the values of the rendering parameters. Furthermore, Figure 4 shows that a trained LLM given an image as input can generate various semantic data and identify features of the rendered image.

[0055] Figure 5 shows a two-dimensional parameter space, where the rendering parameters change along the dimensions of the graph. The parameter space shown in Figure 5 may be presented to the user as a visual output. This output is displayed on display screen 26.

[0056] In Figure 5, sagittal rotation changes vertically, and the threshold changes horizontally. The graph in Figure 5 contains two regions, region 1 402 and region 2 404. Each region contains all possible combinations of rendering parameter values that generate a rendering image on which a specific semantic description is generated by the LLM. As mentioned above, the semantic description generated by the LLM identifies visual features within the rendering image. Therefore, regions 402 and 404 consist of coordinate points corresponding to the rendering image. By processing this rendering image, the LLM generates the same semantic description. That is, each coordinate point in each region 402 and 404 corresponds to one rendering image, and the LLM processes this rendering image to generate the same semantic description that identifies one feature within that rendering image. In Figure 5, region 1 402 is labeled with the semantic description "head and neck". In other embodiments, no semantic description may be labeled for a region. In some embodiments, a legend may be provided on the display screen 26 as a visual element to identify the semantic description corresponding to the region. Furthermore, in the embodiment, a legend may be used to identify semantic descriptions corresponding to regions by color-coding and shading the regions according to the colors and shading shown in the legend. The annotation for region 1 402 is a semantic description generated by the LLM for one or more features it has identified within all rendered images at coordinate locations within region 1 402. In other words, the LLM identifies all rendered images at coordinates within region 1 402 as images containing images of the head and neck. Similarly, since the annotation for region 2 404 in Figure 5 is "skeleton," the LLM identifies all rendered images with coordinates within region 1 402 as images containing images of the skeleton. Regions 1 402 and 2 404 intersect across a region in the parameter space shown in Figure 5. That is, in the rendered images within the region where region 1 402 and region 2 404 intersect, the skeleton and the head and neck are identified simultaneously. The parameter space in Figure 5 is the same as the parameter space in Figure 4. Furthermore, the correspondence between the identified semantic descriptions and the rendered images can be understood by comparing the two figures.

[0057] Figure 6 shows three representations of an example of a two-dimensional parameter space according to one embodiment. Figure 6 shows three embodiments of the two-dimensional parameter space, which includes a visual representation in the form of a two-dimensional graph of the region related to the semantic description. One or more of these graphs are presented to the user as output. The output is displayed on the display screen 26. In each of Figures 6(a), 6(b), and 6(c), the coordinate space is a two-dimensional parameter space. The dimensions of this parameter space include rendering parameters such as the sagittal rotation and threshold parameters mentioned above, as an example. Dimensionless values indicating the values of rendering parameters that change along the axis are added to both axes of the graphs in Figures 6(a), 6(b), and 6(c). In other embodiments, the values of rendering parameters such as the angle of sagittal rotation or radians may be shown on the axis. In each figure, the region indicating the existence of a unique semantic description is visually superimposed on the parameter space. In embodiments, these regions are color-coded in contrast to the background. Each figure indicates the existence of one unique semantic description, but in other embodiments, the existence of multiple semantic descriptions may be displayed in a single figure.

[0058] Figure 6(a) shows a region representing a unique semantic description as a pixel-based mask within a parameter space. Each pixel represents a coordinate point in the parameter space, which represents a rendered image containing features identified by the LLM as having the same semantic description. Some pixels appear discontinuous, while others are continuous, forming larger regions. Figure 6(a) is generated by starting with the dataset generated in step 248 of processing method 210 and creating a mask from this dataset that covers all coordinate points representing a particular semantic description.

[0059] The graph in Figure 6(b) is generated by processing the graph in Figure 6(a). This processing includes morphological filtering. Morphological filtering can be automated, semi-automatic, or manual, requiring user input. Morphological filtering filters the image to approximate the morphology of human anatomical structures, as instructed, when the medical imaging data is data of human anatomical structures. This processing may include generating a mask by filtering or smoothing the graph in Figure 6(a). The region showing a unique semantic description in Figure 6(b) is more continuous and has smoother boundaries than the corresponding region in Figure 6(a). In other embodiments or other instances of automated filtering, the generated shapes are generally continuous and the boundaries of the regions are generally smooth.

[0060] The graph in Figure 6(c) is generated by processing the graph in Figure 6(b) with the aim of obtaining a simplified final mask 502 that depicts regions representing a specific semantic concept. This processing may further include smoothing and filtering. The processing also includes discarding one or more discontinuous regions in Figure 6(b) from the final mask. The processing also includes including coordinate points for which no semantic description exists in the final mask. It can be seen that the final mask in Figure 6(c) overlaps the generated mask in Figure 6(b). As is clear from Figure 6(c), with respect to this particular embodiment, the boundary of the final mask is smoother than the boundary of the mask in Figure 6(b). Regions included in the mask in Figure 6(b) may not be included in the final mask, and regions not included in the mask in Figure 6(b) may be included in the final mask. In another embodiment, the edges of the mask are generally smooth, and the mask includes to some extent regions not included in the mask in Figure 6(b) and to some extent regions included in the mask in Figure 6(b) are not included. While the final mask in Figure 6(c) fits within a single boundary, in other embodiments, the mask may include two or more discontinuous regions whose boundaries do not overlap.

[0061] Figure 7 is an image showing an example of a user interface according to one embodiment. Figure 7 shows an image of the user interface 600. The user interface 600 displays one or more visual results to the user on the display screen 26 or other display device and receives user input via the input device 28 or other input device. The display content of the display screen 26 or other display device is adjusted according to the user input.

[0062] The embodiment in Figure 7 includes, for example, four main elements (612, 614, 616, 618) and legends (602, 604, 606) in the form of windows or other area formats. Other embodiments may include more or fewer elements and elements not shown in Figure 7. The sidebar 612 shown in Figure 7 functions as an input and / or output element of the user interface. The sidebar provides detailed information about other elements of the user interface. It also allows for the selection of other elements on the display screen and the modification of element configurations on the screen. Furthermore, the sidebar allows for the selection of semantic information to be displayed on the screen and the selection of subsets to be displayed from a dataset. Medical image data 614 and 616 show input medical image data related to a specific embodiment of Figure 7.

[0063] In Figure 7, the input data is a three-dimensional image, but in other embodiments, imaging data from other modalities may be displayed. The input data can be navigated by user operations such as clicking and dragging with a mouse or rotating using keyboard arrow buttons. The input data can also be processed before being displayed on the screen. For example, the input data can be drawn before being displayed on the screen. The input data, medical image data 614, shows a rendered image drawn with at least a high threshold. On the other hand, the medical image data 616 shows a rendered image drawn with at least a low threshold and rotated relative to the input data, medical image data 614. The semantic domain graph 618 shows a two-dimensional parameter space with both axes representing changes in sagittal rotation values and rendering parameter thresholds. The vertical axis of graph 618 represents changes in sagittal rotation, and the horizontal axis represents changes in the rendering parameter threshold values. The domains showing three unique semantic descriptions are layered in the parameter space. A legend including semantic captions (602, 604, 606) indicating features in the input data identified by LLM is overlaid on the display. The semantic domain graph 618 is an interactive graph in which one or more semantic captions selected by the user move to the front of the layered structure, or the visibility of captions is switched based on user selection.

[0064] The semantic domain graph 618 in Figure 7 includes an indicator (arrow) 620 (to be shown). The indicator 620 can be placed at any point on the screen. The indicator 620 is used for area selection on the semantic domain graph 618. The semantic domain graph 618 in Figure 7 includes or corresponds to the mask shown in Figure 6 and includes three regions representing features obtained from medical image data and associated semantic descriptions. In Figure 7, the indicator 620 is located near the region associated with the semantic concept "human head and neck" based on legend item 604. By placing the indicator 620 near the area labeled according to the semantic concept, the arithmetic unit 22 can display only the rendered images (and optionally the corresponding semantic data) from the processed data associated with the corresponding semantic concept. Alternatively, it may display non-rendered input images associated with the semantic concept. This allows the user to access a subset of rendered or non-rendered images on the semantic domain graph 618 that are visually selected by the indicator 620 based on the relevant semantic concept. For example, by moving the indicator 620 to a different region on the semantic domain graph 618, it is possible to display an image related to a "human head" on the processing unit 22, and by moving the indicator 620 to yet another region, it is possible to display an image related to a "human skeleton".

[0065] One or more images from the input image set and / or rendered images can be displayed in the user interface. Any of the graphs in Figures 4, 5, and 6 can be elements of the user interface display. For example, when the mouse pointer is moved over one or more coordinate points in the parameter space, one or both of the input medical image data 614 and medical image data 616 will display thumbnails of the corresponding input rendered images.

[0066] In the above embodiment, the rendering parameter threshold / level is shown on the horizontal axis and sagittal rotation on the vertical axis in the map. However, in another embodiment, other desired image generation parameters can be shown on both axes, or in a map or other representation format. For example, segmentation parameters (e.g., presence or absence of specific anatomical features or parameters) or one or more of image rotation, scaling, and viewing direction can be shown on both axes of the map or other representation format.

[0067] In other words, the map contains anatomical information, and the multiple image generation conditions include conditions related to segmentation. The multiple image generation conditions include conditions related to at least one of the following: image rotation, scaling, and viewing direction. The multiple image generation conditions also include conditions related to rendering. The multiple image generation conditions in the present invention may be a combination of the above conditions. For example, image generation conditions can be set by combining conditions related to segmentation, conditions related to rendering (transparency), and conditions related to image rotation. Furthermore, image generation conditions can be set by combining segmentation conditions and rendering conditions (transparency). Additionally, image generation conditions can be set by combining rendering conditions (transparency) and viewing direction.

[0068] Figures 8(a) to 8(d) show four representations of the parameter space according to one embodiment. Figures 8(a) to 8(d) show four graphs of the two-dimensional parameter space. The output may be displayed on the display screen 26. In each of Figures 8(a) to 8(d), the coordinate space is a two-dimensional parameter space. The dimensions of this parameter space are composed of rendering parameters such as the sagittal rotation and threshold parameter in the example described above. Dimensionless values indicating the change in the rendering parameter values along both axes are added to both axes of the graphs in Figures 8(a) to 8(d). In other embodiments, the values of rendering parameters such as the angle of sagittal rotation or radians may be shown on the axes. In each figure, areas indicating the existence of a unique semantic description are visually superimposed on the parameter space. In embodiments, these areas are color-coded in contrast to the background. Each figure shows the existence of one unique semantic description, but in other embodiments, the existence of multiple semantic descriptions may be displayed in a single figure.

[0069] Figure 8(a) shows the final mask 72, including the generated mask with smoothed boundaries placed in the parameter space of Figure 6(c).

[0070] Figure 8(b) shows another mask superimposed on the final mask 72 of Figure 8(a). The other mask 74 has a simpler shape than the mask in Figure 8(a), for example, because it is symmetrical and has smooth edges. In Figure 8(b), the other mask 74 is shown as an elliptical mask, but there are no restrictions on its shape, and it may have a more complex shape than the underlying mask (the mask in Figure 8(a)). However, preferably, the other mask covers part or most of the area of the underlying mask, and the mask is suitable for the function described below with reference to Figures 8(c) and 8(d).

[0071] Figures 8(c) and 8(d) show trajectories contained within another mask of Figure 8(b). One trajectory 76 is zigzag-shaped (Figure 8(c)), and the other trajectory 78 is roughly spiral-shaped (Figure 8(d)). There are no restrictions on the configuration of the trajectories, but it is preferable that the trajectories are smooth and traverse most of the other mask. It is also preferable that they extend over substantially the entire area of the other mask. Using the trajectories, a series of rendered images corresponding to coordinate points that coincide with the trajectory traversing the parameter space can be animated. The series of images follows a sequence of coordinate points that coincide with or substantially coincide with the coordinate points of the trajectory. A series of images consisting of rendered images corresponding to coordinate points that coincide with or substantially coincide with the coordinate points of the trajectory may also follow other sequences. The trajectories and the series of images that constitute the animation are automatically generated by the processing unit or depend on user input. In this way, the user can select a specific semantic caption and view an animation of rendered images that show the features of a biological structure described exclusively, or at least substantially exclusively, by that semantic caption. Furthermore, since there are no restrictions on the shape or range of the trajectory, the user can use the processing unit to define the trajectory in a parameter space where rendering parameters change predictably, while viewing the features identified by the semantic caption of the target. For example, the user may have the processing circuit generate an animation consisting substantially of rendered images of a human head, with a varying threshold but sagittal rotation kept constant or within a predetermined range.

[0072] Figure 9 is a schematic diagram of an example of a method according to one embodiment. Figure 9 shows a method 800 for generating a visual representation of clinically useful rendered image data for a user. Method 800 is used as an extension of processing method 210 and depends on the data generated during the execution of processing method 210. Alternatively, method 800 may be an independent method that does not include processing method 210.

[0073] In method 800, the processing unit is given a medical report 812. The medical report 812 is a clinical report and may include semantic data and / or image data. The medical report 812 is also provided to a processing circuit 814. Method 800 may use its own processing circuit or share a processing circuit with the processing unit 32 in Figure 1. The processing circuit 814 includes an LLM trained to generate semantic data from semantic data and / or image data inputs. The LLM may be the LLM used in step 216 of processing method 210 or another LLM. The processing circuit 814 also receives data including a semantic description 816 as input. The semantic description 816 is obtained by processing method 210. The semantic description 816 provided to the processing circuit 814 is generated directly from the LLM of processing method 210. Before being provided to the processing circuit 814, the semantic description 816 is filtered, similar to step 248 of processing method 210. The processing circuit 814 searches for semantic overlap between the contents of the medical report and the given semantic description 816. In one embodiment, the determination of semantic duplication includes semantic data matching between semantic descriptions 816 and the medical report 812. The processing circuit 814 processes the medical report and obtains a set of semantic descriptions for the text and / or image data within the medical report. The output of the processing circuit 814 includes a semantic dataset, which is a subset of semantic descriptions 816 that relate to or match the semantic data contained in the medical report 812. Thus, method 800 can be viewed as a filtering process of semantic descriptions obtained by processing method 210 based on their relevance to the medical report.

[0074] Figure 9 also includes a display of a two-dimensional parameter space. This parameter space is similar to, or generated based on, any of the parameter spaces in Figures 5-8. The parameter space is two-dimensional and includes a mask 802 generated by the aforementioned method. The processing circuit 814 uses a semantic dataset filtered based on its relevance to the medical report 812 to select or visually mark a portion of the parameter space within the parameter space map 818. This is shown in Figure 9 as a selection point 804. In another embodiment, other indicators may be used as selection points 804 as appropriate. For example, indicator settings on the map may allow for the selection of one or more image rendering parameters or other image generation conditions in an appropriate manner.

[0075] The selection point 804 is placed at a coordinate point in the parameter space that corresponds to semantic data filtered based on its relevance to the medical report 812. Although Figure 9 shows only one selection point 804, in other embodiments, two or more selection points may be generated, such as when there are many matches between semantic information and semantic descriptions contained in the medical report 812. It is also possible to automate the screen display to the user using one or more selection points. The processing device may generate an image sequence that periodically repeats an image corresponding to the coordinate position of the selection point. Alternatively, the processing device may generate an image to be displayed to the user with the selection point marked, or generate an image sequence that magnifies the relevant portion of the parameter space. Furthermore, one or more selection points may be used as coordinate points traversed by a trajectory (as described with reference to Figure 8) in order to automatically animate semantic concepts that are considered relevant based on the medical report.

[0076] Based on the embodiments, a medical visualization apparatus or method is provided comprising an image caption generation model or multimodal LLM and a sampled one-dimensional or multi-dimensional rendering parameter set for creating a grid image representing parameter changes. Samples of the image grid are supplied via an image caption generation process and plotted as semantic concepts on the original grid after simplification. Each associated unique image concept is converted into a mask and optionally into a set of geometric patterns mapping each concept to a parameter region.

[0077] Multimask graphs and geometric graphs may be provided to the user as user interface (UI) elements. Parameter combinations can be set by clicking on the graph. Graphs may include, for example, 1D graphs such as bar graphs of semantic concepts on a 1D axis. Graphs may also include 2D graphs such as image-based graphs where semantic concepts are represented as 2D geometric shapes or 3D masks. Furthermore, graphs may include 3D graphs such as navigable 3D scenes where semantic concepts are represented as 3D geometric shapes or voxel masks. Semantic concepts may exist as N (e.g., N>3) dimensional objects, and the space may be an active dimensionality reduction screen that projects N-dimensional geometric shapes into 2D / 3D.

[0078] Multiple prompts and / or instructions are given to the caption generation model. The output may be extended to a multi-layered geometric shape and displayed to the user as a multi-layered composite screen. Corresponding images are displayed as thumbnail images when the mouse hovers over the parameter geometric shape graph. The semantic concepts of the text may be graphed as geometric regions. The mapping from text to regions may be provided as a separate legend. To create a more consistent space, the mask may be filtered before being converted to a geometric shape. For example, a morphological filter including open and subsequent closed actions can be used.

[0079] To reduce the visual complexity of the graph, the fitted geometric shapes may be simplified. Furthermore, the geometric shapes can be further simplified, and the trajectories can be graphed within the semantic conceptual space to create an automated animation. The animation begins by creating a path of parameters that is continuous between concepts and smoothly connected along the boundaries of semantic concepts. The subject of the parameter graph is selected based on its relevance to the text / report. The central point of the selected semantic concepts can serve as the starting point for automatically generated images that are displayed or appended to the text / report. Both the parameter trajectories and the geometric shapes of the automatically selected semantic concepts may be used to create the automated animation.

[0080] Based on the embodiment, a medical image processing apparatus is provided comprising a method having the following operations, or a processing circuit that performs said operations: receiving medical image data, receiving one or more image rendering parameters, processing the medical image data to obtain a set of rendering images, each generated based on different values of the one or more rendering parameters, processing each of the set of rendering images using a trained machine learning model to obtain semantic data representing one or more features in each rendering image, and generating a parameter space dataset that expresses the presence or absence of the one or more features in the rendering image as a function of the values of the rendering parameters used to generate the rendering image.

[0081] While specific circuits have been described in this specification, in other embodiments, one or more functions of these circuits can be implemented by a single processing resource or other component. Alternatively, functions implemented by a single circuit can be implemented by combining two or more processing resources or other components. A single circuit includes multiple components, whether geographically separated or not, that implement the functions of that circuit. Multiple circuits include single components that implement the functions of those circuits.

[0082] While several embodiments have been described, these embodiments are presented as examples only and are not intended to limit the scope of the invention. The novel methods and systems can be implemented in a variety of other forms, and various omissions, substitutions, and modifications can be made without departing from the spirit of the invention. These embodiments and their variations are included in the scope and spirit of the invention, as well as in the claims and their equivalents. [Explanation of Symbols]

[0083] 12, 22 Arithmetic unit 20 Medical imaging data processing equipment 24 Scanners 25 servers 26 Display screen 28 Input devices 30 Data storage unit 32 Processing Unit 34 Model Training Circuits 36 Data Processing Circuits 38 Interface Circuit

Claims

1. A step of displaying a map representing multiple image generation conditions, The steps include setting an indicator in the map for selecting one of the multiple image generation conditions, The steps include inputting medical image data and the selected image generation condition into the model, The steps include outputting medical image data generated based on the selected image generation conditions, A medical image data processing method including [a specific component].

2. The aforementioned map includes anatomical information, and the aforementioned multiple image generation conditions include conditions related to segmentation. The medical image data processing method according to claim 1.

3. The aforementioned plurality of image generation conditions include conditions relating to at least one of image rotation, scaling, and viewing direction. The medical image data processing method according to claim 1.

4. The aforementioned multiple image generation conditions include conditions related to rendering. The medical image data processing method according to claim 2 or 3.

5. The aforementioned map includes a set of rendered images, each generated based on different values of one or more rendering parameters. The medical image data processing method according to claim 2.

6. Obtaining semantic data representing one or more features within each rendered image, The process includes generating a parameter-space dataset that represents the presence or absence of one or more features in the rendered image as a function of the values of rendering parameters used to generate the rendered image, and providing an output that includes a visual representation of the parameter-space dataset. The medical image data processing method according to claim 3.

7. The visual representation of the parameter space dataset includes multiple dimensions, each dimension representing one or more of the rendering parameters. The medical image data processing method according to claim 6.

8. The visual representation of the parameter space dataset includes regions that indicate the presence or absence of one or more features in the rendered image. The medical image data processing method according to claim 6.

9. This includes displaying at least one rendered image corresponding to at least one point in the parameter space where the at least one selected feature exists, in response to the user's selection of at least one feature. The medical image data processing method according to claim 6.

10. The at least one rendered image includes a series of rendered images that correspond to a series of points in the parameter space dataset and form a trajectory within the parameter space dataset. The medical image data processing method according to claim 9.

11. The image, which includes the series of rendered images, is displayed in an order corresponding to the sequence of points constituting the trajectory through the parameter space dataset. The medical image data processing method according to claim 10.

12. This includes providing a user interface configured to display a corresponding screen of a parameter space dataset in response to the user's selection of one or more rendering parameters, The medical image data processing method according to claim 6.

13. The selection of one or more rendering parameters includes selecting a set of values from one or more of the rendering parameters. The medical image data processing method according to claim 12.

14. The selection of one or more rendering parameters is performed by the user manipulating the displayed parameter space dataset screen. The medical image data processing method according to claim 12.

15. The features represented in the semantic data are one or more of the following: anatomical features, pathological features, and other features. The medical image data processing method according to claim 6.

16. The aforementioned model includes at least one of the following: a multimodal language model, a large-scale language model (LLM) utilizing a caption generation and visualization model, GPT-2, GPT-3.5, GPT-4, PaLM, LLaMa, BLOOM, Ernie, T5, Claude or Claude2, and their derivatives or developments. The medical image data processing method according to claim 6.

17. A display control unit that displays a map representing multiple image generation conditions on the display unit, The map includes a setting unit for setting an indicator for selecting one of the multiple image generation conditions, An input unit that inputs medical image data and the selected image generation condition into the model, An output unit that outputs medical image data generated based on the selected image generation conditions, A medical image data processing device equipped with [a specific feature].

18. The aforementioned model is stored on a remote server or in the cloud. The input of the medical image data and the selected one or more image rendering parameters or other image generation conditions includes transmitting the medical image data and the selected one or more image rendering parameters or other image generation conditions to the remote server or cloud. The medical image data processing apparatus according to claim 17.

19. On the computer, A step of displaying a map representing multiple image generation conditions, The steps include setting an indicator in the map for selecting one of the multiple image generation conditions, The steps include inputting medical image data and the selected image generation condition into the model, The steps include outputting medical image data generated based on the selected image generation conditions, A program that executes the command.