Methods, devices, and virtual display equipment for displaying regions of interest
By determining the gaze point position and segmenting the region of interest in a virtual display scene, the problems of resource waste and low convenience in existing technologies are solved, achieving efficient display of the region of interest and resource conservation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI LIANYING ZHIYUAN MEDICAL TECH CO LTD
- Filing Date
- 2024-12-31
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies require processing the entire image with the same quality when displaying 3D images, which leads to a waste of graphics processing unit and memory resources, and reduces the ease of observing the region of interest.
By determining the user's gaze point position in the virtual display scene, the region of interest is determined based on the gaze point position, and the region of interest is highlighted or masked in the virtual display scene to reduce resource consumption on non-regions of interest. The region of interest is quickly segmented and displayed using a segmentation model, combined with an adaptive display method that incorporates annotation information.
It improves the ease of displaying images in regions of interest, reduces the consumption of graphics processing units and memory resources, and enhances the display effect of 3D images, especially in virtual display scenes, significantly reducing resource waste.
Smart Images

Figure CN122308595A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of virtual reality technology, and in particular relates to a method, apparatus and virtual display device for displaying a region of interest. Background Technology
[0002] With the development of visualization technology, three-dimensional images of objects are usually displayed on the two-dimensional display screen of the device for users to observe intuitively.
[0003] Currently, 3D images are usually the overall picture generated by the device through graphics rendering technology. When users are interested in a certain area of the 3D image, they manually zoom in on the overall 3D image to observe the 3D image of the area of interest.
[0004] However, this method requires processing the entire 3D image at the same quality, which can easily lead to a waste of resources such as the device's graphics processing unit and memory. Furthermore, it is less convenient to observe the 3D image of the region of interest. Summary of the Invention
[0005] This application provides a method, apparatus, and virtual display device for displaying regions of interest, which can solve the problem that displaying regions of interest in an image requires a large amount of resources such as graphics processing units and memory.
[0006] In a first aspect, embodiments of this application provide a method for displaying a region of interest, the method comprising:
[0007] In a virtual display scene, determine the location of the user's gaze point;
[0008] Determine the region of interest based on the gaze point location;
[0009] Present the area of interest in the virtual display scene.
[0010] In one embodiment, presenting the region of interest in a virtual display scene includes:
[0011] In a virtual display scene, highlight the edge outline of the region of interest.
[0012] In one embodiment, presenting the region of interest in a virtual display scene includes:
[0013] In a virtual display scene, areas of non-interest are masked.
[0014] In one embodiment, presenting the region of interest in a virtual display scene includes:
[0015] Segment the region of interest and display it.
[0016] In one embodiment, it further includes:
[0017] Obtain the preset area centered on the gaze point;
[0018] Displays annotation information for annotation points located in the preset area.
[0019] In one embodiment, displaying annotation information for annotation points located in a preset area includes:
[0020] The text display method of the annotation information of each annotation point is determined based on the distance between each annotation point and the gaze point;
[0021] The annotation information is displayed in a text-based manner based on the annotation information.
[0022] In one embodiment, the text display method of the annotation information of each annotation point is determined based on the distance between each annotation point and the gaze point, including:
[0023] Sort the distances between each annotation point and the gaze point from closest to farthest to obtain the sorting result;
[0024] Based on the sorting results, the text display method of the annotation information of the annotation points is adaptively adjusted.
[0025] Secondly, embodiments of this application provide another method for displaying a region of interest, the method comprising:
[0026] In a virtual display scene, determine the location of the user's gaze point;
[0027] Determine the region of interest based on the gaze point location;
[0028] Present the area of interest in the virtual display scene;
[0029] If the user confirms the presented region of interest, a preset region centered on the gaze point is obtained, and annotation information of the gaze point located in the preset region is displayed;
[0030] Otherwise, repeat the above steps until you receive confirmation from the user regarding the presented area of interest.
[0031] Thirdly, embodiments of this application provide a display device for a region of interest, the device comprising:
[0032] The first determining module is used to determine the position of the user's gaze point in the virtual display scene;
[0033] The second determination module is used to determine the region of interest based on the gaze point location;
[0034] The first display module is used to present the area of interest in the virtual display scene.
[0035] Fourthly, embodiments of this application provide another display device for a region of interest, the device comprising:
[0036] The third determining module is used to determine the position of the user's gaze point in the virtual display scene;
[0037] The fourth determination module is used to determine the region of interest based on the gaze point location;
[0038] The second display module is used to present the area of interest in the virtual display scene;
[0039] The third display module is used to obtain a preset area centered on the gaze point and display annotation information of the gaze point located in the preset area if the user confirms the presented area of interest.
[0040] The execution module is used to repeat the above steps unless otherwise specified, until the user confirms the presented area of interest.
[0041] Fifthly, embodiments of this application provide a virtual display device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the methods described in the first and / or second aspects above.
[0042] Sixthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the methods described in the first and / or second aspects above.
[0043] In a seventh aspect, embodiments of this application provide a computer program product that, when run on a virtual display device, causes the virtual display device to perform the methods described in the first and / or second aspects.
[0044] The beneficial effects of this application's embodiments compared to existing technologies are as follows: By determining the user's gaze point position in a virtual display scene, the region of interest (ROI) corresponding to that gaze point position can be determined. Therefore, only the ROI image can be displayed in the virtual display scene. Based on this, it eliminates the need for manually segmenting the entire 3D image, improving the convenience of displaying the ROI image. Furthermore, the effect of displaying the 3D image of the ROI in a 3D virtual display scene is generally superior to displaying the 3D image of the ROI on a 2D display screen. Also, since only the image corresponding to the ROI is displayed, the virtual display scene does not require a large amount of computing resources. Attached Figure Description
[0045] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0046] Figure 1 This is a flowchart illustrating the implementation of a method for displaying a region of interest according to an embodiment of this application;
[0047] Figure 2 This is a schematic diagram illustrating an application scenario of an image in a method for displaying a region of interest according to an embodiment of this application;
[0048] Figure 3 This is a schematic diagram illustrating one implementation of a method for training a segmentation model in a region of interest display method provided in an embodiment of this application;
[0049] Figure 4 This is a schematic diagram illustrating one implementation of training a segmentation model in a method for displaying a region of interest according to another embodiment of this application;
[0050] Figure 5 This is a flowchart illustrating the implementation of a method for displaying a region of interest according to another embodiment of this application;
[0051] Figure 6 This is a flowchart illustrating the implementation of a method for displaying a region of interest according to another embodiment of this application;
[0052] Figure 7 This is a structural block diagram of an image display system provided in an embodiment of this application;
[0053] Figure 8 This is a schematic diagram of the structure of a display device for a region of interest according to an embodiment of this application;
[0054] Figure 9 This is a schematic diagram of the structure of a display device for a region of interest according to another embodiment of this application;
[0055] Figure 10 This is a schematic diagram of the structure of a virtual display device provided in an embodiment of this application. Detailed Implementation
[0056] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.
[0057] It should be understood that, when used in this application specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or a collection thereof.
[0058] It should be noted that the information collection process (such as the facial image collection process, fingerprint information collection process, etc.) / feature extraction process involved in this application is carried out with the user's knowledge and permission. That is, the information collection process / feature extraction process complies with the requirements of laws and regulations and does not constitute an act that harms the public interest.
[0059] Furthermore, in the description of this application and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0060] With the development of visualization technology, three-dimensional images of objects are usually displayed on the two-dimensional display screen of the device for users to observe intuitively.
[0061] Currently, 3D images are usually the overall picture generated by the device through graphics rendering technology. When users are interested in a certain area of the 3D image, they manually zoom in on the overall 3D image to observe the 3D image of the area of interest.
[0062] However, this method requires processing the entire 3D image at the same quality, which can easily lead to a waste of resources such as the device's graphics processing unit and memory. Furthermore, it is less convenient to observe the 3D image of the region of interest.
[0063] For example, the aforementioned three-dimensional images include, but are not limited to, three-dimensional images of buildings and characters in the gaming field, and three-dimensional images of biological tissues that provide doctors with intuitive visualizations in the medical field; there is no limitation on these.
[0064] For ease of explanation, we will take three-dimensional images of biological tissues as an example. In disease diagnosis, treatment planning, and clinical research, we are increasingly relying on imaging information of biological tissues.
[0065] For example, CT and MR (Magnetic Resonance) scanners are typically used to scan biological tissues and generate three-dimensional models of the organs and tissues. These models are then displayed on a two-dimensional screen in an electronic device. To reduce the waste of image processing units and memory resources by displaying only the regions of interest within the organ or tissue, doctors typically segment the overall three-dimensional model based on the regions of interest and display the segmented three-dimensional models corresponding to those regions on the two-dimensional screen.
[0066] However, segmentation using the above method requires a significant amount of time and effort, and is not very convenient. Furthermore, displaying a three-dimensional model on a two-dimensional plane has certain limitations, failing to provide doctors with an intuitive understanding.
[0067] Based on this, in order to improve the convenience of observing 3D images of regions of interest and reduce the waste of resources such as graphics processing units and memory, this application provides a method for displaying regions of interest, which can be applied to virtual display devices such as augmented reality (AR) / virtual reality (VR) devices. This application does not impose any restrictions on the specific type of virtual display device.
[0068] Please see Figure 1 , Figure 1 The following is a flowchart illustrating the implementation of a method for displaying a region of interest according to an embodiment of this application. The method includes the following steps:
[0069] S101. In a virtual display scene, determine the position of the user's gaze point.
[0070] In one embodiment, the aforementioned virtual display scene is an immersive digital scene environment generated through computer technology, which can be a scene constructed by a virtual display device. The virtual display device includes, but is not limited to, the AR device and VR device described above.
[0071] The virtual display scene may pre-display images, allowing users to select areas of interest from these images for display. These images can be 3D images of buildings, characters, etc., in the game field, or 3D images of biological tissues in the medical field, without limitation. For ease of explanation, the following description uses 3D images corresponding to biological tissues as an example.
[0072] In one embodiment, the biological tissue can be an animal organ or a human organ, and there is no limitation in this regard. For ease of explanation, this embodiment will use a human organ as an example for illustration.
[0073] In one embodiment, the biological tissue can be tissue scanned by imaging equipment such as CT or MR devices. The imaging equipment can image the scanned biological tissue to obtain the aforementioned image.
[0074] Understandably, after biological tissue is scanned by an imaging device, volumetric data of the biological tissue is obtained. At this point, the imaging device can directly input the volumetric data into a virtual display device for imaging, so that an image is displayed on the virtual display device. Alternatively, the volumetric data can be imaged first to obtain an image, and then the image can be transmitted to the virtual display device for display; there is no limitation on which method is used.
[0075] In another embodiment, after obtaining the volume data, the volume data can be stored on a server or in the cloud. When the above method needs to be performed on the biological tissue, the server or cloud can send the volume data to a virtual display device for reconstruction and rendering to display the image of the region of interest. Alternatively, the server or cloud can perform reconstruction and rendering based on the volume data to obtain the image, and then send it to a virtual display device for display.
[0076] In one embodiment, the aforementioned volumetric data is typically primarily tomographic planar data. Through preprocessing such as image analysis, histogram window adaptation, and material remapping, the volumetric data can be used for three-dimensional reconstruction to obtain the aforementioned medical images. The reconstruction methods include, but are not limited to, traditional mesh rendering or volumetric data rendering using ray tracing, etc., and are not limited to any particular method.
[0077] In this virtual display scenario, images of multiple different biological tissues of a human body can be displayed simultaneously to create a unified virtual human body or organ. Alternatively, images of the same biological tissues from multiple human bodies can be displayed simultaneously, allowing users to compare and contrast them in a teaching setting. This embodiment does not limit the number or types of biological tissues that can be displayed simultaneously in the virtual display scenario.
[0078] It is understandable that in the medical field, by merging and displaying multiple virtual display scenes contained in a virtual display device when displaying images of biological tissues or regions of interest, users can perform operations such as surgical planning and simulated surgery in advance in a virtual environment, or provide excellent teaching tools and methods for medical education, thereby improving the application scenarios of this embodiment.
[0079] It should be noted that the images described above are three-dimensional images, and the virtual display scene is also a three-dimensional scene. Therefore, displaying images within a virtual display scene allows users to intuitively experience the three-dimensional structure of biological tissues.
[0080] In another embodiment, to adapt the subsequently displayed image of the region of interest to scenarios such as disease diagnosis, treatment planning, and clinical research, the virtual display device can also pre-store multiple display scenes. For example, virtual clinic scenes, conference room scenes, classroom scenes, etc., can be pre-stored, without limitation. In this case, the virtual display device can first display the corresponding virtual display scene based on the user's selection command, and then display the aforementioned image within the virtual display scene.
[0081] In one embodiment, the aforementioned gaze point location can be considered as the location where the user's gaze is focused within the virtual display scene. As an example, the virtual display device can determine the user's gaze position in the screen coordinate system of the virtual display device. Then, the gaze point location is determined based on the gaze position.
[0082] The aforementioned virtual display device can be a head-mounted device, such as a VR headset, and has eye-tracking functionality. In a virtual display scenario, the virtual display device can collect eye-tracking information such as the user's head position, head rotation angle, and eye movement data to determine the aforementioned gaze position.
[0083] It should be noted that due to differences in wearing habits (e.g., whether or not glasses are worn) or interpupillary distance among different wearers, there may be some deviation in the gaze position. Therefore, the collected eye movement information needs to be corrected before generating the gaze position.
[0084] The correction methods include, but are not limited to, the five-point correction method, viewpoint calibration, delay alignment, and data filtering, to ensure the accuracy and reliability of eye tracking. Furthermore, the method for determining the gaze position based on the aforementioned eye movement information is an existing method and will not be described in detail.
[0085] In one embodiment, the virtual display device can convert the gaze position into three-dimensional coordinates based on a preset mapping relationship between the screen coordinate system and the three-dimensional coordinate system of the virtual display scene, thereby obtaining the aforementioned gaze point position.
[0086] For example, the relationship between the gaze position and the gaze point position can be represented as follows:
[0087] P(x,y,z)=f(S x ,S y );
[0088] Among them, S x ,S y Let P(x,y,z) be the gaze position, P(x,y,z) be the transformed gaze point position, and f be the mapping relationship between the screen coordinate system and the three-dimensional coordinate system of the virtual display scene.
[0089] In another embodiment, in the virtual display scene, the gaze point position is determined by the virtual display device based on the gaze point provided by the user's eyes. For example, the virtual display device can emit a virtual detection ray towards the gaze point based on a virtual helmet. The point corresponding to the ray's collision with the image is then the three-dimensional volumetric data position of the gaze point, i.e., the gaze point position.
[0090] As explained above, the image can be obtained through volumetric data rendering using ray tracing. Therefore, for volumetric data rendered using ray tracing or similar methods, ray detection can also determine the gaze point position using a depth map. First, the depth map of the camera device in the virtual display device is acquired. Then, the depth map is sampled based on the gaze point's position on the depth map. Finally, the 3D position of the sampled point is calculated using the camera device's projection and rotation matrices, and this position is used as the gaze point position.
[0091] In this embodiment, the method for determining the location of the gaze point is not limited.
[0092] S102. Determine the region of interest based on the gaze point location.
[0093] In one embodiment, after determining the gaze point location, the area where the gaze point is located can be designated as the region of interest. For example, taking the display of images in the medical field as an example, the area corresponding to the tissue where the gaze point is located can be designated as the region of interest. For example, the overall image displayed in the virtual display device can be pre-divided into multiple areas corresponding to biological tissues. In this case, the virtual display device can use the area corresponding to the biological tissue where the gaze point is located as the region of interest.
[0094] It should be noted that since the images corresponding to the biological tissues mentioned above are rendered based on three-dimensional volume data, it can be assumed that the area corresponding to the biological tissues where the gaze point is located is also three-dimensional data, so that users can intuitively understand the structure of the biological tissues in the area of interest.
[0095] S103. Present the area of interest in the virtual display scene.
[0096] In one embodiment, when displaying an image of a region of interest, the image of the region of interest may be highlighted, or only the image of the region of interest may be displayed; there is no limitation on this.
[0097] For example, a virtual display device can highlight the edge outline of a region of interest in a virtual display scene. For instance, it can use a highlighting method or preset colors to display the edge outline of the image in the region of interest, making it easier for users to distinguish the image in the region of interest from images in other regions of interest.
[0098] For example, refer to Figure 2 , Figure 2 This is a schematic diagram illustrating an application scenario of an image display method for a region of interest provided in an embodiment of this application. Taking the human foot bones as an example, the corresponding region of interest includes, but is not limited to, any region such as the tarsal bones, metatarsal bones, and toes.
[0099] Among them, based on Figure 2 (a) Figure, where the highlighted edge outline corresponds to one of the toes, specifically the first metatarsal bone (i.e., toe 1).
[0100] In another embodiment, displaying only the region of interest can be achieved by masking the non-region of interest in the virtual display scene; or by segmenting the region of interest in the virtual display scene and displaying it, without limitation.
[0101] The methods for masking non-interested areas include, but are not limited to, adjusting the transparency or color of non-interested areas, blurring, adding noise, adjusting lighting parameters, and creating occlusions to cover them up in a virtual display scene.
[0102] For example, the transparency of the area of non-interest can be adjusted to semi-transparent or completely transparent, or the color of the area of non-interest can be adjusted to pure black or pure white to cover the image corresponding to the area of non-interest. Alternatively, the area of non-interest can be weakened by adjusting the lighting effects. For example, the light intensity of the area of non-interest can be reduced, or shadows can be used to hide part of the area. In a virtual display scene, the ambient light intensity around the area of interest can be reduced to darken the background, thereby covering the area of non-interest. Furthermore, parameters such as the attenuation radius and angle of the light can be adjusted to precisely control the lighting range, so that the area of non-interest is covered by shadows.
[0103] In this embodiment, the method of masking non-interesting regions is not limited.
[0104] It should be noted that while highlighting the edges of the region of interest and masking the non-region of interest can enhance the user's attention to that region, the virtual display device still requires significant resources, such as graphics processing units and memory, to process the images of the non-regions of interest when displaying the images of the region of interest.
[0105] Therefore, in order to reduce the required graphics processing units and memory resources, virtual display devices can also segment and display regions of interest. For example, this can be achieved by segmenting the image corresponding to the region of interest from the image corresponding to the region of non-interest, and then deleting the image corresponding to the region of non-interest.
[0106] For example, the virtual display device can pre-store images corresponding to each region after the user has manually segmented the overall image. In this case, within the virtual display scene, the virtual display device can only display the image corresponding to the region of interest. Consequently, it can render images corresponding to non-regions of interest without consuming resources such as graphics processing units and memory.
[0107] In another embodiment, in order to improve the user's viewing experience, when segmenting the region of interest, the virtual display scene can be divided into two sub-scenes: one sub-scene for displaying the image of the whole object, and the other sub-scene for displaying the segmented image of interest.
[0108] Therefore, displaying images by dividing them into sub-scenes can more effectively highlight the segmented regions of interest and compare them with the overall image. This allows users to easily switch their focus between the two sub-scenes and observe the style and details of individual regions of interest and the overall image, thus improving the user's viewing experience.
[0109] For example, in a virtual display scene, sub-scenes can be divided by setting different rendering layers or rendering channels. For instance, the image of the entire object can be rendered in one sub-scene, and the segmented regions of interest can be displayed through separate rendering channels.
[0110] In segmenting regions of interest (ROIs), users typically need to manually segment and label the entire image beforehand, allowing the virtual display device to segment and display the ROI. However, manually segmenting and labeling the entire image requires considerable time and effort. Therefore, to quickly segment and display the ROI from the entire image, the virtual display device can input the gaze point position and the image into a preset segmentation model to segment the ROI.
[0111] The segmentation model can be a neural network model generated by training data such as pre-trained regions of interest and training images. This neural network model can be a model capable of image processing.
[0112] For example, the model structure of the above segmentation model can be a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN), and there is no limitation on this.
[0113] As an example, a virtual display device can be based on, for example... Figure 3 The steps S301-S305 shown or Figure 4 The training of the segmentation model is detailed below:
[0114] S301. Obtain training data.
[0115] In one embodiment, the training data described above may include... Figure 3 The training gaze point location, training image, and training region of interest are included. Taking medical images as an example, the training image can be an image obtained after imaging and scanning an existing patient. The training gaze point location and training region of interest can be obtained by the user by annotating the training image.
[0116] It should be noted that the training data required during the training process needs to be labeled by the user. However, after the segmentation model is trained, all image segmentation processing can be performed by this model. This reduces the manual and time costs of image segmentation.
[0117] S302. Input the training gaze point location and training image into the initial segmentation model for processing to obtain the predicted region of interest.
[0118] In one embodiment, the segmentation model has already been described above and will not be explained further. For example, the initial segmentation model may use a preset convolutional module or a 3D image encoding module to process the image to obtain the corresponding image vector. Furthermore, a preset position encoder is used to encode the training gaze point positions to obtain position encoding vectors. Then, based on the position encoding vectors and image features, model prediction is performed to obtain the predicted region of interest.
[0119] As an example, refer to Figure 4 The initial segmentation model can use a 3D image encoder to encode the training medical images, obtaining initial image vectors. Then, to reduce the dimensionality of the initial image vectors and make them usable for subsequent machine learning tasks (e.g., classification, clustering), the initial segmentation model can also perform image embedding processing on the initial image vectors, obtaining the aforementioned image vectors. Furthermore, a 3D position encoder is used to encode the training gazepoint positions, obtaining position encoded vectors. Finally, the initial segmentation model can use a 3D data decoder to decode the image vectors and position encoded vectors, outputting the predicted region of interest.
[0120] Image embedding is a method for transforming image data (e.g., an initial image vector) into a continuous, low-dimensional vector representation, which will not be described in detail.
[0121] S303. Generate training loss based on the predicted region of interest and the training region of interest.
[0122] In one embodiment, the training loss can be the similarity or overlap between the predicted region of interest and the training region of interest.
[0123] In another embodiment, the loss function between image data can be a mean squared error (MSE), cross-entropy loss, or other loss functions, to calculate the aforementioned training loss value, thereby accurately calculating the difference between the predicted region of interest and the trained region of interest. In this embodiment, the method for calculating the training loss value is not limited.
[0124] S304. If the training loss does not converge, update the model parameters of the initial segmentation model and return to the steps of obtaining training data and subsequent steps until the training loss converges.
[0125] S305. If the training loss converges, then the current initial segmentation model is taken as the trained segmentation model.
[0126] In one embodiment, training loss convergence can be determined when the number of training iterations reaches a preset number, or the training loss is less than a preset loss, or the training loss is less than the preset loss for a consecutive preset number of iterations. Otherwise, training loss non-convergence is determined when the number of training iterations is less than a preset number, or the training loss is greater than or equal to the preset loss, or there is no consecutive preset number of training loss values less than the preset loss.
[0127] Among them, when updating the model parameters of the initial segmentation model, methods such as gradient descent and adaptive learning rate can be used for updating, but this is subject to limitations.
[0128] It should be noted that if the training loss does not converge, the training data needs to be obtained again and the above steps S202-205 need to be executed until the training loss converges and the segmentation model is obtained.
[0129] For example, refer to Figure 4 The initial segmentation model can calculate the training loss based on the predicted region of interest and the trained region of interest. If the training loss does not converge, the model parameters of the initial segmentation model are iteratively updated, and model training continues. Finally, when the training loss converges, the aforementioned segmentation model is obtained.
[0130] In one embodiment, the training data described above are all three-dimensional data. However, the region of interest for training can be a tissue region composed of preset pixel values. The preset pixel values can be set according to actual conditions and are not limited thereto.
[0131] For example, in training medical images, the training region of interest (ROI) can be a tissue region composed of pixel values of 1, and the non-training ROI can be a tissue region composed of pixel values of 0, for differentiation. Based on this, when the initial segmentation model learns and understands the training ROI, it does not need to consider the individual pixel values within the training ROI, thus reducing the difficulty of understanding and learning for the initial segmentation model and improving the prediction accuracy of subsequent output ROI predictions. Furthermore, the predicted ROIs output during training can also all be represented by 1.
[0132] It should be added that, based on the segmentation model obtained from the above training data, when performing step S103, the pixel values in the first 3D data coordinates corresponding to the image of the region of interest can also consist only of 1s. In subsequent steps, the virtual display device can display the image of the region of interest in the virtual display scene according to the acquired first 3D data coordinates and the second 3D data coordinates corresponding to the image. For example, displaying the pixel values corresponding to the 3D data coordinates that are the same as the first 3D data coordinates in the second 3D data coordinates can achieve the display of the image of the region of interest.
[0133] In another embodiment, during the training of the segmentation model based on steps S301-S305 above, the predicted region of interest can also be input into a preset multi-classifier to output the predicted region name. Then, the classification loss is calculated based on the predicted region name and the training region name corresponding to the trained region of interest to update the model parameters of the multi-classifier, thereby achieving the training of the multi-classifier.
[0134] In training a multi-classifier, the training data must also include the name of the training organization corresponding to the region of interest.
[0135] It should be noted that the region of interest included in the training can be one or more regions; there is no limitation on this.
[0136] In another embodiment, the predicted region of interest can be input into a neural network model for image classification and recognition to obtain the region name. The model structure of the neural network model for image classification and recognition can be similar to that of the segmentation model, and is not limited thereto.
[0137] Therefore, in practical applications, virtual display devices can not only display the image of the region of interest corresponding to the gaze point position based on the segmentation model, but also label the names of the regions of interest. This further reduces the annotation costs for users in practical applications.
[0138] In this embodiment, by determining the gaze point position of the user's line of sight in the virtual display scene, the region of interest (ROI) corresponding to the gaze point position can be determined. Therefore, only the ROI image can be displayed in the virtual display scene. Based on this, it eliminates the need for manually segmenting the entire 3D image, improving the convenience of displaying the ROI image. Furthermore, the effect of displaying the 3D image of the ROI in a 3D virtual display scene is generally superior to displaying the 3D image of the ROI on a 2D display screen. Also, since only the image corresponding to the ROI is displayed, the virtual display scene does not require a large amount of graphics processing unit and memory resources.
[0139] In another embodiment, after displaying the image of the region of interest, the virtual display device may also display corresponding annotation information in the virtual display scene to facilitate users or others to understand the information in the image of the region of interest.
[0140] For example, taking medical imaging as an example, the above annotation information includes, but is not limited to, the names, labels, and remarks of each sub-tissue covered by the region of interest, etc. The remarks can be used to describe one or more types of information such as the physiological condition of the tissue, etiology, and treatment plan, etc., and are not limited thereto.
[0141] It should be noted that the above example also illustrates that organization names can be obtained through processing using a multi-classifier or a neural network model for image classification and recognition. Therefore, the above annotation information may not include the names of the organization or its sub-organizations.
[0142] For example, refer to Figure 2 Figure (b) in the figure has the following annotation information: Name: Toe; Label: Toe 1; Remarks.
[0143] In another embodiment, the above-mentioned annotation information may be only the annotation information of annotation points existing in the region of interest, or it may be the annotation information of annotation points located in a preset area of the gaze point position, and there is no limitation thereto.
[0144] For example, a virtual display device can acquire a preset area centered on a gaze point and display annotation information for annotation points located within that preset area. It is understood that, since the gaze point is the location of the user's line of sight, the annotation information described by the annotation points located within the preset area is generally considered to be information of greater interest to the user. This, in turn, can improve the user's viewing experience.
[0145] The preset area can be set according to the actual situation and is not limited in this regard. It should be noted that users can annotate the image in the virtual display scene in advance, that is, add annotation points and corresponding annotation information. Alternatively, after displaying the image of the region of interest, the virtual display device can add annotation points or annotation information in the region of interest based on the user's annotation operation, and this is not limited in this regard.
[0146] It should be noted that while all annotations can be displayed in the same way—for example, the font, size, and format can be identical—the importance or user interest of different annotations may vary. In such cases, displaying each annotation in the same way will fail to differentiate between importance and level of interest.
[0147] Therefore, in order to distinguish the importance or level of interest of different annotation information, when displaying the annotation information corresponding to each annotation point, the text display method of the annotation information of the annotation point can be determined based on the distance between each annotation point and the gaze point, so as to display the annotation information based on the text display method of the annotation information.
[0148] For example, annotations closer to the gaze point are generally more likely to attract user attention. That is, they are more important or more interesting. Based on this, the virtual display device can sort the annotation points from closest to furthest from the gaze point to obtain a sorting result; then, based on the sorting result, adaptively adjust the text display method of the annotation information at each annotation point.
[0149] For example, adaptively adjusting the text display of annotation information at annotation points can result in a higher display effect for annotation points that are closer to the viewpoint compared to those that are farther away. In other words, the closer the annotation point is to the viewpoint, the stronger the display effect of the annotation information.
[0150] The display effects include, but are not limited to, font, size, and formatting effects. For example, multiple display effects can be set for different text display methods. Then, multiple preset distance ranges can be set, each corresponding to a different text display method.
[0151] For example, taking a set of three fixations as an example, the text displayed for the closest fixation in the ranking results could be in SimSun font, size 3, bold and highlighted in red. The text displayed for the second closest fixation in the ranking results could be in SimSun font, size 12, bold and not highlighted in red. And the text displayed for the farthest fixation in the ranking results could be in SimSun font, size 4, not bold and not highlighted in red.
[0152] As an example, refer to Figure 2 In Figure (a), the distance between the annotation point corresponding to toe 1 and the gaze point is the shortest, therefore, the font size of toe 1 is the largest. That is, the clarity is the most obvious (the display effect is the best).
[0153] It should be added that when displaying the above annotation information, the annotation information can be displayed in a region of non-interest within the virtual scene. Therefore, when users are viewing the image in the region of interest, they will not be affected by the annotation information, further improving the user's viewing experience.
[0154] In another embodiment, the user can also set the display method of the annotation information when adding annotation points and corresponding annotation information. Then, the virtual display device can display it based on the preset display method.
[0155] Users can add annotation points and corresponding annotation information to the image using a virtual input device (e.g., a virtual keyboard) provided by the virtual display device, or a physical input device (e.g., a physical keyboard, or a device with an input device) connected to the virtual display device, without any limitation.
[0156] It should be added that, as explained above, the virtual display device determines the gaze point position based on the aforementioned eye-tracking information. However, when implementing the method for displaying the region of interest, users may not maintain a single posture for an extended period, resulting in fluctuations in the user's gaze point position. When the gaze point position changes, the region of interest also changes, consequently altering the annotation information displayed in the virtual scene and affecting the user's viewing experience.
[0157] Therefore, in order to display the image of the region of interest and the corresponding annotation information in a fixed position, please refer to [link to relevant documentation]. Figure 5 , Figure 5 The following is a flowchart illustrating an implementation method for displaying a region of interest according to another embodiment of this application. The method includes the following steps:
[0158] S501. In a virtual display scene, determine the position of the user's gaze point.
[0159] S502. Determine the region of interest based on the gaze point location.
[0160] S503, Present the area of interest in the virtual display scene.
[0161] S504. If the user confirms the presented region of interest, then obtain the preset region centered on the gaze point and display the annotation information of the gaze point located in the preset region.
[0162] S505. Otherwise, repeat the above steps until you receive confirmation from the user regarding the presented area of interest.
[0163] In one embodiment, steps S501-S503 are the same as steps S101-S103, and the method of obtaining a preset area centered on the gaze point and displaying annotation information of the gaze point located in the preset area can be referred to the above example description and will not be described again.
[0164] It should be noted that, in the absence of a confirmation operation, the virtual display device may only display the region of interest (ROI) for the user to view. In this case, to avoid the displayed annotation information interfering with the user's viewing experience, the annotation information for the preset region's fixation points may not be displayed, and the above steps can be repeated. That is, steps S501 to S505 are repeated. Otherwise, upon receiving a confirmation operation, it can be assumed that the user requires detailed viewing of the ROI; therefore, the annotation information for the fixation points of the preset region may be displayed to assist the user in understanding the information corresponding to the image in the ROI region.
[0165] Based on the above steps, the virtual display device can present only the corresponding region of interest (ROI) when the gaze point position changes, without simultaneously displaying the annotation information of the gaze point within the ROI. This avoids changes in the corresponding annotation information caused by frequent changes in the ROI within the virtual display scene, thus improving the user's viewing experience.
[0166] The aforementioned confirmation operation includes, but is not limited to, the user's blinking operation, or the click operation for confirmation on the virtual input device or the aforementioned physical input device, and there are no limitations on these.
[0167] For example, when the virtual display device detects that the user has blinked, it determines that the user has performed a confirmation operation; otherwise, if no blinking operation is detected, it determines that the user has not performed a confirmation operation. This is not limited.
[0168] Since blinking is usually an unconscious and normal eye movement, relying on a single blink to confirm a user's action could lead to false positives. Therefore, to improve the accuracy of confirmation recognition, a blink count can be set. When the blink count reaches a first preset number, the user is confirmed to have performed a confirmation action. This first preset number of blinks can be set according to specific circumstances and is not limited to any particular number.
[0169] In one embodiment, the virtual display device may further execute steps S101-S103 or S501-S505 again upon receiving a user's cancellation operation, to re-display the region of interest and the corresponding gaze point annotation information. The cancellation operation may be similar to the confirmation operation described above, for example, reaching a second preset number of blinks, or a cancellation click operation performed on the virtual input device or the aforementioned physical input device; there is no limitation on this.
[0170] As an example, refer to Figure 6 , Figure 6 This is a flowchart illustrating the implementation of a method for displaying a region of interest (ROI) according to another embodiment of this application. In this embodiment, a VR device is used as an example for explanation. The VR device tracks the user's eyes and performs eye-tracking correction to obtain eye-tracking information, thereby generating the aforementioned gaze position. Then, based on the mapping relationship between a preset screen coordinate system and the three-dimensional coordinate system of the virtual display scene, the gaze position is transformed to obtain the gaze point position. Next, the gaze point position and the image of the ROI are input into a preset segmentation model to obtain the ROI. The virtual display device can also input the image of the ROI into a neural network model for image classification and recognition to obtain the corresponding region name. Finally, when displaying the image of the ROI, it can also be displayed synchronously as part of the annotation information.
[0171] In another embodiment, please refer to Figure 7 , Figure 7 This is a structural block diagram of an image display system provided in an embodiment of this application. The image display system 1 includes a virtual display device 11 and a processing device 12 that interacts with the virtual display device 11. The virtual display device 11 is used at least to display images (e.g., images of biological tissues in the medical field) and images of regions of interest.
[0172] In one embodiment, the processing device 12 can be a physical terminal device, such as a tablet computer or a laptop computer. It can also be a virtual terminal device, such as a cloud host; there is no limitation on this. The processing device 12 is used to interact with the virtual display device 11 to display an area of interest on the virtual display device.
[0173] In one embodiment, the virtual display device may include a gaze point position capture module 111, a scene display module 112, a medical image rendering module 113, a target tissue area display module 114, and an annotation information display module 115.
[0174] The system includes a gaze point location capture module 111, which captures the user's eye movement information to determine the gaze point location. A scene display module 112 stores virtual scenes such as a clinic, conference room, and classroom, and displays these scenes based on the user's selected treatment. A medical image rendering module 113 renders images for display in the virtual scene. A region of interest display module 114 displays images of regions of interest within the virtual scene. An annotation information display module 115 displays annotation information corresponding to annotation points.
[0175] Each module is used for execution. Figure 1 , Figure 5 and Figure 6 For details of each step in the corresponding embodiment, please refer to [link / reference]. Figure 1 , Figure 5 and Figure 6 as well as Figure 1 , Figure 5 and Figure 6 The relevant descriptions in the corresponding embodiments.
[0176] In another embodiment, taking the medical field as an example, the aforementioned image display system 1 can be a GazeSam system with image segmentation capabilities. GazeSam is a screen-based eye-tracking system capable of processing images commonly used in the medical field. As a collaborative human-computer interaction system, it combines eye-tracking technology with image segmentation capabilities and performs the aforementioned... Figure 1 , Figure 5 and Figure 6 The steps in the corresponding embodiments can achieve automatic segmentation and 3D display of medical images. This eliminates the need for extensive time and effort in manually annotating and segmenting images, thus improving work efficiency.
[0177] Please see Figure 8 , Figure 8 This is a structural block diagram of a region of interest display device provided in an embodiment of this application. The modules included in the region of interest display device in this embodiment are also used to perform... Figure 1 The steps in the corresponding embodiments. Please refer to the details. Figure 1 as well as Figure 1 The relevant descriptions in the corresponding embodiments are shown below. For ease of explanation, only the parts relevant to this embodiment are shown. See also... Figure 8The display device 800 for the region of interest may include: a first determining module 810, a second determining module 820, and a first display module 830, wherein:
[0178] The first determining module 810 is used to determine the position of the user's gaze point in a virtual display scene.
[0179] The second determining module 820 is used to determine the region of interest based on the gaze point position.
[0180] The first display module 830 is used to present the area of interest in the virtual display scene.
[0181] In one embodiment, the first display module 830 is used for:
[0182] In a virtual display scene, highlight the edge outline of the region of interest.
[0183] In one embodiment, the first display module 830 is used for:
[0184] In a virtual display scene, areas of non-interest are masked.
[0185] In one embodiment, the first display module 830 is used for:
[0186] Segment the region of interest and display it.
[0187] In one embodiment, the display device 800 for the region of interest further includes:
[0188] The fourth display module is used to acquire a preset area centered on the gaze point.
[0189] The fifth display module is used to display annotation information for annotation points located in the preset area.
[0190] In one embodiment, the fourth display module is used for:
[0191] The text display method of the annotation information of each annotation point is determined based on the distance between each annotation point and the gaze point; the annotation information is displayed based on the text display method of the annotation information.
[0192] In one embodiment, the fourth display module is used for:
[0193] The distances between each annotation point and the gaze point are sorted from closest to farthest to obtain a sorting result; based on the sorting result, the text display method of the annotation information of the annotation points is adaptively adjusted.
[0194] When it is understood that, Figure 8 In the structural block diagram of the display device for the region of interest shown, each module is used to perform... Figure 1 The steps in the corresponding embodiments, and for Figure 1 The steps in the corresponding embodiments have been explained in detail in the above embodiments. Please refer to them for details. Figure 1 as well as Figure 1 The relevant descriptions in the corresponding embodiments will not be repeated here.
[0195] Please see Figure 9 , Figure 9 This is a structural block diagram of a region of interest display device according to another embodiment of this application. The modules included in the region of interest display device in this embodiment are also used to perform... Figure 5 The steps in the corresponding embodiments. Please refer to the details. Figure 5 as well as Figure 5 The relevant descriptions in the corresponding embodiments are shown below. For ease of explanation, only the parts relevant to this embodiment are shown. See also... Figure 9 The display device 900 for the region of interest may include: a third determining module 910, a fourth determining module 920, a second display module 930, a third display module 940, and an execution module 950, wherein:
[0196] The third determining module 910 is used to determine the position of the user's gaze point in the virtual display scene.
[0197] The fourth determination module 920 is used to determine the region of interest based on the gaze point location.
[0198] The second display module 930 is used to present the area of interest in the virtual display scene.
[0199] The third display module 940 is used to obtain a preset area centered on the gaze point and display annotation information of the gaze point located in the preset area if it receives a confirmation operation from the user for the presented area of interest.
[0200] Execution module 950 is used to otherwise repeat the above steps until the user confirms the presented area of interest.
[0201] When it is understood that, Figure 9 In the structural block diagram of the display device for the region of interest shown, each module is used to perform... Figure 5 The steps in the corresponding embodiments, and for Figure 5 The steps in the corresponding embodiments have been explained in detail in the above embodiments. Please refer to them for details. Figure 5 as well as Figure 5 The relevant descriptions in the corresponding embodiments will not be repeated here.
[0202] Figure 10 This is a structural block diagram of a virtual display device provided in another embodiment of this application. For example... Figure 10As shown, the virtual display device 1000 of this embodiment includes: a processor 1010, a memory 1020, and a computer program 1030 stored in the memory 1020 and executable on the processor 1010, such as a program for displaying a region of interest. When the processor 1010 executes the computer program 1030, it implements the steps of each embodiment of the above-described region of interest display method, for example... Figure 1 S101 to S103 and / or shown Figure 5 S501-S505 are shown. Alternatively, the processor 1010 implements the above when executing the computer program 1030. Figure 8 and Figure 9 The functions of each module in the corresponding embodiments, for example, Figure 8 and Figure 9 For details on the functions of each module shown, please refer to [link / reference]. Figure 8 and Figure 9 The relevant descriptions in the corresponding embodiments.
[0203] For example, the computer program 1030 can be divided into one or more modules, one or more of which are stored in the memory 1020 and executed by the processor 1010 to implement the region of interest display method provided in this application embodiment. One or more modules can be a series of computer program instruction segments capable of performing specific functions, which describe the execution process of the computer program 1030 in the virtual display device 1000. For example, the computer program 1030 can implement the region of interest display method provided in this application embodiment.
[0204] The virtual display device 1000 may include, but is not limited to, a processor 1010 and a memory 1020. Those skilled in the art will understand that... Figure 10 This is merely an example of a virtual display device 1000 and does not constitute a limitation on the virtual display device 1000. It may include more or fewer components than shown, or combine certain components, or different components. For example, a virtual display device may also include input / output devices, network access devices, buses, etc.
[0205] The processor 1010 may be a central processing unit, or it may be other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor, etc.
[0206] The memory 1020 can be an internal storage unit of the virtual display device 1000, such as a hard disk or RAM of the virtual display device 1000. The memory 1020 can also be an external storage device of the virtual display device 1000, such as a plug-in hard disk, smart memory card, flash memory card, etc., equipped on the virtual display device 1000. Furthermore, the memory 1020 can include both internal storage units and external storage devices of the virtual display device 1000.
[0207] This application provides a computer-readable storage medium, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the method for displaying the region of interest as described in the above embodiments.
[0208] This application provides a computer program product that, when run on a virtual display device, causes the virtual display device to execute the display method of the region of interest in the above embodiments.
[0209] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.
Claims
1. A method for displaying a region of interest, characterized in that, The method includes: In a virtual display scene, determine the location of the user's gaze point; Based on the location of the gaze point, determine the region of interest; The region of interest is presented in the virtual display scene.
2. The method according to claim 1, characterized in that, Presenting the region of interest in the virtual display scene includes: In the virtual display scene, the edge outline of the region of interest is highlighted.
3. The method according to claim 1, characterized in that, Presenting the region of interest in the virtual display scene includes: In the virtual display scene, areas of non-interest are masked.
4. The method according to claim 1, characterized in that, Presenting the region of interest in the virtual display scene includes: The region of interest is segmented and displayed.
5. The method according to any one of claims 1-4, characterized in that, Also includes: Obtain a preset area centered on the gaze point; Display annotation information for annotation points located in the preset area.
6. The method according to claim 5, characterized in that, The annotation information displayed at the annotation points located in the preset area includes: The text display method of the annotation information of each annotation point is determined based on the distance between each annotation point and the gaze point; The annotation information is displayed in a textual manner based on the annotation information.
7. The method according to claim 6, characterized in that, The method of determining the text display method of the annotation information of each annotation point based on the distance between each annotation point and the gaze point includes: The distances between each annotation point and the gaze point are sorted from closest to farthest to obtain the sorting result; Based on the sorting results, the text display method of the annotation information of the annotation points is adaptively adjusted.
8. A method for displaying a region of interest, characterized in that, include: In a virtual display scene, determine the location of the user's gaze point; Based on the location of the gaze point, determine the region of interest; The region of interest is presented in the virtual display scene; If a user confirms the presented region of interest, a preset region centered on the gaze point is obtained, and annotation information of the gaze point located in the preset region is displayed; Otherwise, repeat the above steps until you receive confirmation from the user regarding the presented area of interest.
9. A display device for a region of interest, characterized in that, include: The first determining module is used to determine the position of the user's gaze point in the virtual display scene; The second determining module is used to determine the region of interest based on the position of the gaze point; The first display module is used to present the region of interest in the virtual display scene.
10. A virtual display device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method as described in any one of claims 1 to 8.