Display control device, method, program, and storage medium

The display control device addresses the challenge of intuitive prompt representation and efficient image display by generating and switching between low- and high-resolution images using multimodal AI, enhancing user convenience and responsiveness.

WO2026121147A1PCT designated stage Publication Date: 2026-06-11CANON KK

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CANON KK
Filing Date
2025-11-28
Publication Date
2026-06-11

AI Technical Summary

Technical Problem

Existing image generation technologies using AI struggle with intuitive representation of prompts and efficient display of high-resolution images, leading to reduced user convenience due to time-consuming resolution increases and insufficient thumbnail resolution.

Method used

A display control device that acquires prompts and associated low-resolution images, generates high-resolution images using multimodal AI, and switches displays between thumbnail and high-resolution images to improve response while maintaining image quality.

Benefits of technology

Enhances display responsiveness and maintains image resolution by seamlessly transitioning between low-resolution thumbnails and high-resolution images, improving user convenience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025041567_11062026_PF_FP_ABST
    Figure JP2025041567_11062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure provides a display control device capable of improving a display response while suppressing any reduction in the resolution of a display image. This display control device comprises: an acquisition unit that acquires a prompt, which is text information describing the content of a first image, and a second image that is associated with the prompt, is based on the first image, and has a lower resolution than the first image; a generation unit that generates, from the prompt, a third image that reproduces the content described by the prompt and has a higher resolution than the second image; and a control unit that, when an image selected from a plurality of second images displayed in list form is displayed on a display unit, performs control such that the second image corresponding to the selected image is resized and displayed on the display unit, and the third image that corresponds to the selected image and is generated by the generation unit is subsequently displayed on the display unit.
Need to check novelty before this filing date? Find Prior Art

Description

Display control device and method, program, storage medium

[0001] The present disclosure relates to a display control device that controls the display of information related to an image.

[0002] Conventionally, digital image data captured and acquired by a digital camera, smartphone, etc. has been used to share experiences and emotions.

[0003] In addition, as a technology related to digital images, the progress in the field of image generation AI that utilizes high-performance CPUs and cloud services is remarkable.

[0004] For example, when an image is generated by inputting a combination of words written in a text-based form called a "prompt" as input data into a generation AI, an image that expresses the description written in the prompt can be generated.

[0005] In recent years, a technology (Non-Patent Document 1) has been proposed in which a generative AI model called a diffusion model is made to learn the process of removing noise from an image, and the process is controlled by a prompt to generate a new image. This technology also contributes to the evolution of image generation technology by AI.

[0006] In addition to this, in recent years, as in the research results of multimodal AI technology (Non-Patent Document 2) that have been published, it has become possible to accurately realize image generation that reproduces the content of text and text generation that describes the scenario of the content of an image with a single AI model. As a result, bidirectional conversion between an image and text has become possible.

[0007] Furthermore, as the performance of the image generation AI model improves and the situation becomes such that an image and a prompt are associated in a roughly one-to-one manner, it is assumed that the prompt will be treated as content instead of the image in order to reduce the data amount of the image.

[0008] Jonathan Ho, Ajay Jain, Pieter Abbeel “Denoising Diffusion Probabilistic Models”, in NeurIPS (2020)Meta AI Research, “Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning”, https: / / ai.meta.com / research / publications / scaling-autoregressive-multi-modal-models-pretraining-and-instruction-tuning

[0009] However, when treating prompts as content, it is difficult for humans to intuitively grasp the image that can be generated from a prompt simply by looking at the prompt itself.

[0010] Furthermore, as prompts are optimized to allow image generation AI models to more faithfully reproduce the content of the prompt as an image, even text-based prompts may not necessarily be easy for humans to read.

[0011] Therefore, it is conceivable to improve the visibility of prompt data by recording thumbnail images generated from prompts in association with those prompts.

[0012] However, in cases such as full-screen playback of a single image, thumbnail images lack sufficient resolution. On the other hand, generating and displaying images from a prompt to increase resolution takes time to display, which impairs user convenience.

[0013] This disclosure has been made in view of the above-mentioned problems, and provides a display control device that can improve display response while suppressing a decrease in the resolution of the displayed image.

[0014] The display control device relating to this disclosure is characterized by comprising: an acquisition means for acquiring a prompt which is text information describing the content of a first image, and a second image associated with the prompt which is based on the first image and has a lower resolution than the first image; a generation means for generating a third image from the prompt which is an image that reproduces the content described by the prompt and has a higher resolution than the second image; and a control means for, when displaying an image selected from a plurality of the second images displayed in a list on a display means, resizing the second image corresponding to the selected image and displaying it on the display means, and then controlling the display means to display the third image corresponding to the selected image, which was generated by the generation means.

[0015] According to this disclosure, it is possible to improve display response while suppressing a decrease in the resolution of the displayed image.

[0016] Other features and advantages of the technical ideas derived from this disclosure will become apparent from the following description with reference to the attached drawings. In the attached drawings, the same or similar components are given the same reference numeral.

[0017] The attached drawings are included in the specification and constitute a part thereof, illustrating embodiments in this disclosure and used to explain the technical ideas derived from this disclosure together with their descriptions. Diagram showing the configuration of a recording device according to the first embodiment of this disclosure. Diagram showing the configuration of a recording device according to the first embodiment of this disclosure. Diagram showing the configuration of a recording device according to the first embodiment of this disclosure. Diagram showing the configuration of a thumbnail generation unit of the first embodiment. Flowchart showing the operation of the recording process in the recording device. Diagram showing an input image, prompt and thumbnail image. Diagram showing an input image, prompt and thumbnail image. Diagram showing an input image, prompt and thumbnail image. Flowchart showing the display process of the first embodiment. Diagram explaining the display process of the first embodiment. Diagram explaining the display process of the first embodiment. Diagram explaining the display process of the first embodiment. Diagram explaining the display process of the first embodiment. Diagram showing the configuration of a recording device according to the second embodiment. Diagram showing the configuration of a recording device according to the second embodiment. Diagram showing the configuration of a recording device according to the second embodiment. Diagram showing the configuration of an image generation unit of the second embodiment. Flowchart showing the shooting process of the second embodiment. Flowchart showing the prompt editing process of the second embodiment. Diagram showing the prompt editing process of the second embodiment. A flowchart illustrating the display process of the second embodiment.

[0018] The embodiments will be described in detail below with reference to the attached drawings. Note that the following embodiments do not limit the scope of the claims. While the embodiments describe multiple features, not all of these features are necessary, and the features may be combined in any way. Furthermore, in the attached drawings, identical or similar configurations are given the same reference numerals, and redundant descriptions are omitted.

[0019] (First Embodiment) Figure 1A-1C shows the configuration of a recording device 100, which is a first embodiment of the display control device of the present disclosure.

[0020] Figure 1A is a block diagram showing an example of the functional configuration of the recording device 100. Throughout the drawing, functional blocks can be implemented by software, or a combination of software and hardware, except for parts that are clearly only implementable by hardware (e.g., recording media). For example, a functional block may be implemented by dedicated hardware such as an ASIC. Alternatively, a functional block may be implemented by one or more program-executable processors, such as a CPU, executing a program stored in memory. Multiple functional blocks may be implemented by a common configuration (e.g., one ASIC). Furthermore, hardware that implements some functions of one functional block may be included in hardware that implements other functional blocks.

[0021] Figure 1A shows only the image input unit 101, prompt generation unit 102, thumbnail generation unit 103, and recording unit 104 as functional blocks of the recording device 100, in order to facilitate the explanation and understanding of this embodiment. However, the functions of the recording device 100 are not limited to the functions realized by the functional blocks shown in Figure 1A.

[0022] The image input unit 101 acquires image data recorded on a recording medium attached to the recording device 100 as input image data. Here, the image data is, for example, a captured image taken with a digital camera or smartphone and recorded on a recording medium. The image input unit 101 supplies the acquired image data to the prompt generation unit 102. In addition to the image data, the image input unit 101 may also supply the prompt generation unit 102 with some of the supplementary information recorded in the data file in which the input image data is stored (for example, one or more tags related to the shooting conditions and shooting status of the captured image).

[0023] In this embodiment, the case of acquiring image data from a recording medium installed in the recording device 100 will be described as an example. However, the method of acquiring image data is not limited to this, and for example, image data recorded on an external server or cloud that the recording device 100 can communicate with may be acquired. Alternatively, image data recorded on an external storage device such as a USB memory or external HDD connected to the recording device 100 may be acquired.

[0024] The prompt generation unit 102 has the function of converting image data into prompts, which are text information that describes the scene represented by the image data. It generates prompts based on the image data supplied from the image input unit 101 and various other information.

[0025] The prompt generation unit 102 uses a multimodal AI learning model (generative AI model) to convert image data into text information that describes the scene represented by the image data. The multimodal AI learning model may be pre-stored in the recording device 100, for example, or it may reside in an external device that the recording device 100 can communicate with. The multimodal AI learning model in this embodiment is a neural network that takes image data as input and learns from text data such as captions and tags about the scene associated with the image data as training data.

[0026] A multimodal AI learning model outputs text data corresponding to input image data. This text data describes the scene represented by the image data. The multimodal AI learning model also outputs image data corresponding to input text data. This image data reproduces the content described by the text data. Such a multimodal AI learning model capable of bidirectional conversion between image data and text data can be realized, for example, using known technologies such as those described in Non-Patent Document 2.

[0027] The prompt generation unit 102 may also obtain prompts from an external device with which the recording device 100 can communicate. In this case, the prompt generation unit 102 transmits image data for prompt generation (and other information as needed) to the external device. The prompt generation unit 102 then receives the prompt generated by the external device based on the image data.

[0028] The thumbnail generation unit 103 generates a thumbnail image from the prompt supplied by the prompt generation unit 102. This thumbnail image is an image that reproduces the content described by the prompt and is smaller in size than the input image used to generate the prompt.

[0029] Figure 2 is a block diagram showing an example of the functional configuration of the thumbnail generation unit 103. The thumbnail generation unit 103 includes at least a prompt / image conversion unit 201.

[0030] The prompt / image conversion unit 201 uses a multimodal AI learning model to convert a prompt, which is text information, into an image that reproduces the content described by the prompt. The multimodal AI learning model in this embodiment is the same model as the multimodal AI learning model used in the prompt generation unit 102. However, the multimodal AI learning model used in the prompt / image conversion unit 201 is not limited to the same model as the multimodal AI learning model used in the prompt generation unit 102, and may be a different model. In that case, for example, it may be pre-stored in the recording device 100, or it may reside in an external device that the recording device 100 can communicate with.

[0031] Furthermore, when the prompt / image conversion unit 201 converts a prompt into an image that reproduces the content described by the prompt, it may refer to the input image data used to generate the prompt. By referring to the input image data, it becomes possible to convert the prompt into an image in which the elements constituting the image, such as composition and subject matter, are closer to the input image data.

[0032] The prompt / image conversion unit 201 may also acquire the converted image from an external device with which the recording device 100 can communicate. In this case, the prompt / image conversion unit 201 transmits prompt data for image generation (and other information as needed) to the external device. The prompt / image conversion unit then receives the image generated by the external device in response to the prompt.

[0033] The recording unit 104 records the prompt output by the prompt generation unit 102 and the thumbnail image output by the thumbnail generation unit 103 to the recording destination. The recording destination may be a recording medium mounted on the recording device 100, a connected storage device, or a communication-capable external device.

[0034] Images recorded on the recording destination by the recording device 100 are played back on the playback device as needed by the user. When the playback device is instructed to display a screen for selecting the image the user wants to play back, a list of prompts recorded on the recording destination and their corresponding thumbnail images are displayed. Alternatively, only thumbnail images may be displayed without the prompts. By looking at this list, the user can easily select the image they want to play back. If only the text data of the prompts is displayed in the list, the user cannot intuitively select the image they want from the prompts. In contrast, by displaying thumbnail images along with the prompts, or only thumbnail images, as in this embodiment, the user can intuitively understand the prompts and select an image.

[0035] Furthermore, the recording device 100 records not the original image data, but the prompt and thumbnail image corresponding to that image data to a recording medium or other recording location. Therefore, the amount of data to be recorded can be significantly reduced compared to when the original image data is recorded. The playback device uses a multimodal AI learning model to convert the prompt, which is text information, from the prompt recorded on the recording location into an image that reproduces the content that the prompt describes. In this way, the playback device can reproduce an image that is close to the original image, which has a large amount of data, from the small amount of data data that is the prompt. However, if there is sufficient recording capacity, the original image data may also be recorded on the recording medium, not just the prompt and thumbnail image.

[0036] Figure 1C is a block diagram showing the functional configuration of the display function portion of the recording device 100 that displays images reproduced from the recording unit 104.

[0037] Figure 1C shows only the recording unit 104, display instruction selection unit 121, list display creation unit 122, simple single-page display creation unit 123, prompt / image conversion unit 124, display unit 125, and display switching unit 126 as functional blocks related to the display function of the recording device 100, in order to facilitate the explanation and understanding of the embodiment. However, the display function of the recording device 100 is not limited to the functions realized by the functional blocks shown in Figure 1C.

[0038] The recording unit 104 records the prompt and thumbnail image in association with each other on the recording medium 115 of the recording device 100, which will be described later. The recording destination may be an external device capable of communication.

[0039] The display instruction selection unit 121 receives display instructions such as list display and single-page playback based on operations from the input device 117, which will be described later, of the recording device 100. For example, if the display device 119 is a touch panel, touch operations on the display device 119 are also included in operations from the input device 117.

[0040] The list display creation unit 122 and the simplified single-image display creation unit 123 resize the thumbnail images recorded on the recording medium 115 and output an image of any size. Resizing is performed by calculations such as bilinear or bicubic methods.

[0041] The prompt / image conversion unit 124 converts the prompts recorded on the recording medium 115 into images that reproduce the content described by the prompts, for example, using a multimodal AI learning model.

[0042] The display unit 125 displays images created by the list display creation unit 122, the simple single-page display creation unit 123, and the prompt / image conversion unit 124 on the display device 119.

[0043] The display switching unit 126 outputs an instruction to the display unit 125 to switch the image displayed on the display device 119 according to the program.

[0044] Figure 1B is a block diagram showing an example of the hardware configuration of the recording device 100. Each block is connected to the others via the system bus 111 so that they can communicate with each other.

[0045] The CPU (Central Processing Unit) 112 and the GPU (Graphics Processing Unit) 116 are each one or more processors capable of executing programs. The GPU 116 is configured to execute specific operations faster than the CPU 112, and in recent years, it is often used particularly for executing inference processing using neural networks at high speed. Instead of the GPU 116, an NPU (Neural Processing Unit) specialized for executing learning and inference processing using neural networks may be used. Instead of using separate CPUs 112 and GPUs 116, a SoC (System on Chip) integrating a CPU and a GPU (and possibly an NPU in some cases) may also be used.

[0046] The CPU 112 realizes each functional block described in FIGS. 1A - 1C and FIG. 2 in cooperation with the GPU 116 by, for example, reading a program stored in the ROM 113 into the RAM 114 and executing it. The CPU 112 achieves high-speed processing by using the GPU 116 for processing using neural networks. Note that the GPU 116 may have a dedicated RAM separate from the RAM 114.

[0047] The ROM 113 is, for example, an electrically rewritable non-volatile memory. The ROM 113 stores programs executable by the CPU 112, set values, GUI (Graphical User Interface) data, parameters for realizing a learned neural network (learning model), and the like.

[0048] The RAM 114 is used to read a program executed by the CPU 112 and to store values necessary during the execution of the program.

[0049] The recording medium 115 is, for example, a semiconductor memory card or an SSD (Solid State Drive), and is used as a recording location for the prompts generated by the prompt generation unit 102 and the thumbnail images generated by the thumbnail generation unit 103. It is also used as a recording location for the input image data that formed the basis of the prompts.

[0050] The input device 117 includes multiple operating members such as buttons, touch panels, and switches that receive operation input to the recording device 100. The input device 117 may also include one or more devices (such as sensors) for acquiring additional information when prompts are generated.

[0051] This input device 117 may include, but is not limited to, a GPS receiver for acquiring location information of the recording device 100, or a clock for acquiring the date and time of creation.

[0052] The communication interface 118 is a circuit for communicating with an external device in accordance with one or more communication standards. It includes a connector for wired communication, an antenna for wireless communication, and a transmitting / receiving circuit. The recording device 100 can transmit image data to an external device and receive data from an external device through the communication interface 118. Typical communication standards that the communication interface 118 conforms to include, but are not limited to, HDMI®, USB, Bluetooth®, and wireless LAN (Wi-Fi).

[0053] The display device 119 is a liquid crystal display provided on the surface of the housing of the recording device 100. The display device 119 may also be a touch display. The display device displays live view images or images read from the recording medium 115, menu screens, and information about the recording device 100 (e.g., setting values, battery level, remaining number of shots, etc.).

[0054] The functional blocks shown in Figures 1A, 1C, and 2 are implemented by one or more of the hardware components shown in Figure 1B. For example, the prompt generation unit 102 is mainly implemented by the CPU 112 and GPU 116. The RAM 114 is used as a temporary storage location for data to be processed, data being processed, and processing result data, while the ROM 113 is used as a reference point for pre-stored settings and programs for various processes.

[0055] <Operation during recording> Next, the recording operation of the recording device 100 will be explained with reference to Figures 1A to 4C.

[0056] Figure 3 is a flowchart showing the operation of the recording process in the recording device 100. The processing of each step in Figure 3 is achieved by the CPU 112 or GPU 116 executing a program stored in the ROM 113 and controlling other hardware as needed. "S" indicates the step number.

[0057] Furthermore, the recording operation by the recording device 100 may be performed in response to instructions via the input device 117, or it may be performed according to predetermined conditions other than instructions. For example, the recording operation may be performed sequentially on multiple image data received from an external device via the communication interface 118.

[0058] In Figure 3, at S301, the image input unit 101 acquires input image data. The image input unit 101 outputs the acquired input image data and the associated information described above to the prompt generation unit 102. The image input unit 101 may also process the input image data so that it is suitable for use by the prompt generation unit 102 before outputting it.

[0059] In S302, the prompt generation unit 102 generates text information (prompt) that describes the scene of the image represented by the input image data from the input image data output by the image input unit 101.

[0060] Here, we will further explain the operation of the prompt generation unit 102.

[0061] The prompt generation unit 102 generates text information (prompts) from the input image data that describes the scene of the image represented by the input image data. The prompt generation unit 102 stores the generated prompts in the RAM 114.

[0062] As described above, the prompt generation unit 102 can obtain a prompt by inputting image data into the learning model stored in the ROM 113. Alternatively, the prompt generation unit 102 may transmit image data to an external device via the communication interface 118 and receive a prompt from the external device.

[0063] The level of detail of the prompts generated by the prompt generation unit 102 can be changed by setting. For example, if the level of detail is set lower than a predetermined value, a prompt describing only the gender of the person subject can be generated, while if the level of detail is set higher than a predetermined value, a prompt describing gender, age, hair color and length, etc. can be generated.

[0064] Furthermore, the prompt generation unit 102 may generate prompts that include not only descriptions of elements included in the image (positive prompts) but also descriptions of elements not included in the image (negative prompts).

[0065] Furthermore, the prompt generation unit 102 may generate a prompt that includes supplementary information of the input image data or information based on the supplementary information.

[0066] Figure 4A shows an example of an image represented by the input image data. Figure 4B shows an example of a prompt generated from the input image data. The prompt in Figure 4B includes not only the elements contained in the image in Figure 4A, but also information based on the supplementary information of the input image data, such as the date and time and location of the image.

[0067] Returning to Figure 3, in S303, the thumbnail generation unit 103 generates a thumbnail image from the prompt output by the prompt generation unit 102. This thumbnail image is an image that reproduces the content described by the prompt and is smaller in size than the input image used to generate the prompt.

[0068] Now, let's further explain the operation of the thumbnail generation unit 103.

[0069] The prompt / image conversion unit 201 of the thumbnail generation unit 103 generates a thumbnail image from the prompt, which is an image that reproduces the content described by the prompt and is smaller in size than the input image used to generate the prompt. The thumbnail generation unit 103 stores the generated thumbnail image in the RAM 114.

[0070] As described above, the prompt / image conversion unit 201 can acquire a thumbnail image by inputting a prompt to the learning model stored in the ROM 113. Alternatively, the prompt / image conversion unit 201 may send a prompt to an external device via the communication interface 118 and receive a thumbnail image from the external device.

[0071] Figure 4C shows an example of a thumbnail image generated from a prompt.

[0072] In this embodiment, the prompt / image conversion unit 201 generates a thumbnail image by referring to the input image data used to generate the prompt. As a result, as shown in Figure 4C, it is possible to generate an image in which the elements constituting the image, such as composition and subject matter, are closer to the input image data.

[0073] Furthermore, the thumbnail image generated by the prompt / image conversion unit 201 is smaller in size than the input image data. This is because thumbnail images are used as a simplified playback method for displaying prompt data, which is text, in a visually appealing manner, and also to reduce the amount of data recorded. For example, when displaying multiple prompt data in a list, thumbnail images can be displayed side by side instead of the prompt content.

[0074] In step S304, the recording unit 104 associates the prompt generated by the prompt generation unit 102 in step S302 with the thumbnail image generated by the thumbnail generation unit 103 in step S303 and records it on the recording medium 115.

[0075] Any known method can be used to associate the prompt with the thumbnail image. In this embodiment, the prompt and the thumbnail image are included in the same file container and recorded as a single container file.

[0076] Furthermore, the recording unit 104 may record a container file containing the prompt and thumbnail image in association with the input image data used to generate the prompt. For example, the container file may be recorded as a separate file with a filename that is consistent with the input image data.

[0077] The recording unit 104 may also perform digital certification processing on the prompt and thumbnail image before recording. The digital certification processing can be any process that serves to guarantee the content of the prompt and thumbnail image. For example, it may be a process that assigns an NFT (Non-Fungible Token).

[0078] <Operation during display processing> The display operation of the recording device 100 will be explained with reference to Figures 1A, 1B, 5, and 6A-6D.

[0079] The operation of each step in the flowchart in Figure 5 is achieved by the CPU 112 or GPU 116 executing a program stored in the ROM 113 and controlling other hardware as needed.

[0080] Furthermore, the display operation by the recording device 100 is performed in response to instructions via the input device 117, as well as according to predetermined conditions other than those indicated.

[0081] In S501, the display instruction selection unit 121 determines whether or not it has received a list display instruction from the input device 117 based on user operation. If it has received the instruction, it proceeds to S502; otherwise, it proceeds to S504.

[0082] In S502, the list display creation unit 122 obtains thumbnail images corresponding to predetermined display settings from the recording medium 115 and creates a list display image.

[0083] For example, when displaying a list in a 2x4 grid as shown in Figure 6A, eight thumbnail images associated with the prompt are retrieved in chronological order of the recorded date and time, and a list display image is created. Alternatively, the last displayed image may be stored and used as the basis for retrieving the list display image. Furthermore, there may be multiple layouts for the list display, such as a 4x8 grid.

[0084] In S503, the display unit 125 displays the list display image created by the list display creation unit 122 on the display device 119, and returns the process to S501.

[0085] In S504, the display instruction selection unit 121 determines whether or not it has received a single-image display instruction from the input device 117 based on user operation. If it has received the instruction, it proceeds to S505; otherwise, it returns to S501. For example, the user instructs the display of a single image by performing an operation such as tapping the image of a dog as shown in Figure 6B.

[0086] In S505, the simplified single-page display creation unit 123 acquires the thumbnail image tapped by the user from the recording medium 115, resizes it to match the resolution of the display device 119, and creates a simplified single-page display image. Alternatively, a portion of the display area of ​​the display device 119 may be used as the image display area, and the image may be resized to match the resolution of that portion.

[0087] In S506, the display unit 125 displays the simplified single-page display image created by the simplified single-page display creation unit 123 on the display device 119 as shown in Figure 6C.

[0088] In S507, the prompt / image conversion unit 124 obtains a prompt associated with the thumbnail image for which a display instruction was received in S504 from the recording medium 115 and generates an image. In this case, the image can be obtained by inputting the prompt into the learning model stored in the ROM 113. Alternatively, the prompt / image conversion unit 124 may send the prompt to an external device via the communication interface 118 and receive the generated image from the external device. Here, the generated image is approximately the same as the image from which the prompt was generated.

[0089] In S508, the display switching unit 126 receives confirmation that the prompt / image conversion unit 124 has completed image generation and instructs the display unit 125 to switch the display.

[0090] In S509, the display unit 125 receives a display switching instruction from the display switching unit 126 and displays an image generated by the prompt / image conversion unit 124 on the display device 119 as shown in Figure 6D, instead of the resized thumbnail image. The image displayed here is almost identical to the image from which the prompt was generated, and therefore has a higher resolution than the resized thumbnail image.

[0091] In S510, the display instruction selection unit 121 determines whether or not it has received a single-page display termination instruction from the input device 117 based on user operation. If it has received the instruction, it returns to S502; otherwise, it repeats the process in S510 and continues single-page display.

[0092] In this embodiment, the display switching unit 126 has been described as switching from a simplified single-image display generated from a thumbnail image to a single-image display generated from a prompt. However, if the resolution of the display area of ​​the display device 119 is lower than a predetermined threshold, the thumbnail image may still have sufficient resolution for a single-image display. In that case, the processing from S507 to S509 may be skipped.

[0093] As explained above, in this embodiment, a simplified single image, which is a resized thumbnail image, is displayed until the generation of a high-resolution image from the prompt is complete and ready for display. This enables image display that suppresses a decrease in resolution while improving display response.

[0094] (Second Embodiment) The functional configuration of the second embodiment of this disclosure will be described below with reference to the block diagram in Figures 7A-7C.

[0095] The second embodiment relates to the operation of a digital camera 700 when the imaging unit 701 captures captured image data and the prompt generation unit 102 converts it into a prompt, and the prompt is recorded and displayed as the main data corresponding to the shooting operation.

[0096] Figures 7A and 7C are block diagrams showing examples of the functional configuration of the digital camera 700 in the second embodiment. Figure 7A shows the functional block for recording processing, and Figure 7C shows the functional block for display processing. Since some of the functional configurations of the digital camera 700 can be implemented using the functional configuration of the recording device 100 described in the first embodiment, the same reference numerals are used for the parts described in the first embodiment, and their description is omitted. Also, since Figure 7C is the same as Figure 1C described in the first embodiment, its description is omitted.

[0097] Figure 7A shows only the imaging unit 701, prompt generation unit 102, image generation unit 702, prompt editing unit 703, and recording unit 104 as functional blocks of the digital camera 700, in order to facilitate the explanation and understanding of this embodiment. However, the functions of the digital camera 700 are not limited to the functions realized by the functional blocks shown in Figure 7A.

[0098] The imaging unit 701 acquires RAW image data corresponding to the optical image of the subject using a lens, image sensor, etc. The imaging unit 701 also applies predetermined image processing to the RAW image data to generate image data according to its intended use. Here, the intended use may be, for example, recording, display, or prompt generation. The image data for prompt generation may be reused from the image data for recording or display, or generated based on the image data for recording or display.

[0099] Furthermore, the imaging unit 701 may supply the prompt generation unit 102 with some of the supplementary information to be recorded in the data file that stores the image data for recording (for example, one or more tags related to shooting conditions and shooting status). In addition, the imaging unit 701 can supply the prompt generation unit 102 with any information it can acquire, such as information regarding the characteristics of the image sensor and evaluation values ​​used for exposure control.

[0100] The prompt generation unit 102 has the same configuration as described in the first embodiment, except that it is supplied with image data and various information from the imaging unit 701.

[0101] In addition to the operations described for the thumbnail generation unit 103 in the first embodiment, the image generation unit 702 generates an image of a different size from the thumbnail image. The image generation unit 702 also supplies generation information to the prompt editing unit 703 when generating the image. Here, the image of a different size from the thumbnail image is an image smaller than the input image data before prompt generation. This image, which is an intermediate size between the thumbnail image and the input image, can be used depending on the application even if its resolution is slightly lower than the original data (input image data), and can therefore be used in place of the original data.

[0102] Figure 8 is a block diagram showing an example of the functional configuration of the image generation unit 702 in the second embodiment. The image generation unit 702 includes at least a prompt / image conversion unit 201.

[0103] The prompt / image conversion unit 201, similar to the operation described in the first embodiment, uses a multimodal AI learning model to convert the text information, which is a prompt, into an image that reproduces the content described by the prompt. However, in the second embodiment, in addition to converting the prompt into a thumbnail image, the prompt is also converted into a reduced image with a different image size than the thumbnail image. The image size of the reduced image generated by the prompt / image conversion unit 201 can be changed by setting.

[0104] Furthermore, the prompt / image conversion unit 201 outputs some of the generation information when generating an image. This generation information includes, for example, a seed value for controlling the randomness of image generation, the model name and version of the multimodal AI learning model used for generation, and hyperparameters representing the learning rate and batch size used in the generation process.

[0105] The prompt editing unit 703 edits the prompts generated by the prompt generation unit 102 based on the generation information when the image generation unit 702 generates the thumbnail images, generates the final prompt (text data) and outputs it to the recording unit 104.

[0106] The recording unit 104 records the prompts output by the prompt editing unit 703 and the thumbnail and reduced images output by the image generation unit 702 to the recording destination.

[0107] Figure 7B is a block diagram showing an example of the hardware configuration of the digital camera 700.

[0108] The display device 119 is, for example, a liquid crystal display provided on the surface of the housing of the digital camera 700. The display device 119 may also be a touch display. The display device 119 displays the live view image or an image read from the recording medium 115, the menu screen, and information about the digital camera 700 (for example, settings, battery level, remaining number of shots, etc.).

[0109] The imaging device 712 includes, for example, an optical system unit such as a lens, aperture, and shutter, and an image sensor. The optical system unit may have a compound lens or a multi-lens system. The optical system unit may also be able to change optical characteristics such as zoom and aperture depending on the image content to be acquired. The image sensor may be, for example, a CMOS color image sensor having a primary color Bayer array color filter.

[0110] <Operation during recording> Next, the shooting operation of the digital camera 700 will be explained with reference to Figures 7A to 11.

[0111] Figure 9 is a flowchart showing the shooting operation of the digital camera 700. Figure 10 is a flowchart showing the prompt editing process in the prompt editing unit 703.

[0112] The operation of each step in the flowcharts in Figures 9 and 10 is realized by the CPU 112 or GPU 116 executing a program stored in ROM 113 and controlling other hardware as needed. Since some of the shooting operations of the digital camera 700 are the same as the recording operations of the recording device 100 described in the first embodiment, the operations described in the first embodiment will not be explained.

[0113] Furthermore, still image capture by the digital camera 700 may be performed in response to instructions via the input device 117, or it may be performed according to predetermined conditions other than instructions. For example, shooting may be performed continuously at regular time intervals, or still image capture may be performed when information obtained from video being shot for live view display meets predetermined conditions.

[0114] In S901, the imaging unit 701 performs still image capture and acquires captured image data. Alternatively, the imaging unit 701 may perform video recording and use the video frame images as still image data. The imaging unit 701 outputs the acquired captured image data and the aforementioned supplementary information to the prompt generation unit 102. The imaging unit 701 outputs so-called developed image data as the captured image data.

[0115] The image data after development processing has pixel data composed of three components (RGB or YCbCr). The imaging unit 701 may also process the image data to make it suitable for use by, for example, the prompt generation unit 102 before outputting it.

[0116] If both prompts and captured image data are to be recorded, the imaging unit 701 also outputs the captured image data and associated information to the recording unit 104. The imaging unit 701 may also output to the recording unit 104 image data to which processing (e.g., encoding processing) corresponding to the data format to be recorded in the recording unit 104 has been applied.

[0117] In step S902, the prompt generation unit 102 uses the captured image data output by the imaging unit 701 as input image data to generate text information (prompts) that describe the image scene represented by the input image data. The operation of the prompt generation unit 102 is the same as in step S302 of the first embodiment.

[0118] In S903, the image generation unit 702 generates a thumbnail image from the prompt output by the prompt generation unit 102. This thumbnail image is an image that reproduces the content described by the prompt and is smaller in size than the input image used to generate the prompt.

[0119] The process by which the prompt / image conversion unit 201 of the image generation unit 702 generates a thumbnail image from a prompt is the same as the process in step S303 of the first embodiment.

[0120] In S904, the image generation unit 702 generates a reduced image with a different image size from the thumbnail image generated in S903. This image reproduces the content described by the prompt, based on the prompt output by the prompt generation unit 102.

[0121] Now, let's further explain the operation of the image generation unit 702.

[0122] The prompt / image conversion unit 201 of the image generation unit 702 refers to the input image data used to generate the prompt and generates a reduced image from the prompt that reproduces the content described by the prompt and has a different image size than the thumbnail image. The image generation unit 702 stores the generated MPF (Multi-Picture Format) image in the RAM 114.

[0123] The prompt / image conversion unit 201 can acquire a reduced-size image using the same process as the process for acquiring a thumbnail image.

[0124] Here, when inputting a prompt to the learning model stored in ROM 113, it is possible to set the seed value used when generating the thumbnail image, thereby generating a reduced-size image with the same content as the thumbnail image but with a different image size.

[0125] In this embodiment, the image size of the reduced image generated by the prompt / image conversion unit 201 is smaller than the input image data but larger than the thumbnail image data; however, the image size is not limited to this and may be any size. Furthermore, the prompt / image conversion unit 201 may generate multiple reduced images, each with a different image size.

[0126] In S905, the prompt editing unit 703 edits the prompt based on the prompt and the generation information for the thumbnail image.

[0127] Here, the prompt editing process will be further explained using the flowchart shown in Figure 10 and Figure 11.

[0128] In step S1001, the prompt / image conversion unit 201 of the image generation unit 702 outputs the model name, version, and seed value information of the multimodal AI learning model, which are part of the generation information when the thumbnail image was generated in step S903, to the prompt editing unit 703.

[0129] Furthermore, the generated information supplied to the prompt editorial department 703 is not limited to the information described above; it may also include one or more pieces of information from the generated data used to create the thumbnail image.

[0130] In step S1002, the prompt editing unit 703 edits the prompt generated in step S902 by adding the generated information output by the prompt / image conversion unit 201, and outputs the final prompt (text data).

[0131] Figure 11 shows an example of an edited prompt. If the prompt generated in step S902 is the prompt shown in Figure 4B, the edited prompt, as shown in Figure 9, further includes the model name, version information, and seed value information of the multimodal AI learning model.

[0132] Returning to Figure 9, in S906, the recording unit 104 associates the prompt generated by the prompt generation unit 102 in step S902, the thumbnail image generated by the image generation unit 702 in S903, and a reduced image such as an MPF ​​image generated by the image generation unit 702 in S904, and records them on the recording medium 115.

[0133] In S904, it was explained that the image generation unit 702 generates multiple MPF images, which are reduced images with different image sizes from the thumbnail image. However, one of the multiple MPF images may be a resized image generated without using a learning model. Specifically, it may be an image obtained by resizing the captured image data that the imaging unit 701 captures and which is the source for creating the prompt, using calculations such as bilinear or bicubic methods. Similarly, the thumbnail image may also be an image obtained by resizing the captured image data that is the source for creating the prompt using calculations such as bilinear or bicubic methods.

[0134] The method for associating the prompt, thumbnail image, and reduced image is to include the prompt, thumbnail image, and reduced image in the same file container and record them as a single container file. However, other methods of association may be used. For example, the text file containing the prompt and the image file in which the reduced image is recorded in the main image area may be recorded with the same name. Alternatively, the prompt may be recorded in the metadata area of ​​the image file in which the reduced image is recorded in the main image area.

[0135] Furthermore, the recording unit 104 may record a container file containing the prompt, thumbnail image, and reduced image, associating it with the input image data used to generate the prompt. For example, the container file may be recorded as a separate file with a filename that shares commonality with the input image data.

[0136] The recording unit 104 may also perform digital certification processing on the prompt, thumbnail image, and reduced image before recording. The digital certification processing can be any process that serves to guarantee the content of the prompt, thumbnail image, and reduced image. For example, it may be a process that assigns an NFT (Non-Fungible Token).

[0137] <Operation during display processing> The display operation of the digital camera 700 will be explained with reference to Figures 7A-7C and 12.

[0138] The operation of each step in the flowchart of Figure 12 is realized by the CPU 112 or GPU 116 executing a program stored in ROM 113 and controlling other hardware as needed. Since some of the display operations of the digital camera 700 are the same as the display operations of the recording device 100 described in the first embodiment, the details described in the first embodiment will not be explained.

[0139] Furthermore, the display operation of the digital camera 700 is performed in response to instructions via the input device 117, as well as according to predetermined conditions other than those indicated.

[0140] Steps S1201 to S1204 are the same as steps S501 to S504 described in the first embodiment.

[0141] In S1205, the simplified single-page display creation unit 123 determines whether the resolution of the display device 119 is lower than a preset threshold. If it is lower, the process proceeds to S1206; if it is above the threshold, the process proceeds to S1207. Alternatively, a portion of the display area of ​​the display device 119 may be used as the image display area, and the determination may be made based on the resolution of that portion.

[0142] In S1206, the simplified single-image creation unit 123 acquires the tapped thumbnail image from the recording medium 115, resizes it to match the resolution of the display device 119, and creates a simplified single-image display.

[0143] In S1207, the simplified single-page display creation unit 123 acquires the MPF image associated with the tapped thumbnail image from the recording medium 115, resizes it to match the resolution of the display device 119, and creates a simplified single-page display image. The image displayed here has a higher resolution than the thumbnail image, but a lower resolution than the image used to create the prompt.

[0144] S1208 is the same as S506 described in the first embodiment.

[0145] In S1209, the prompt / image conversion unit 124 generates the main image from the prompt associated with the thumbnail image for which a display instruction was received in S1204.

[0146] Steps S1210 to S1212 are the same as steps S508 to S510 described in the first embodiment.

[0147] As described above, in this embodiment, a simplified single image, which is a resized thumbnail image, is displayed until the generation of a high-resolution image from the prompt is complete and ready for display. This enables image display that suppresses a decrease in resolution while improving the display response. In addition, multiple MPF images with different resolutions can also be associated with the prompt, and the image quality of the simplified single image can be improved by selecting and using an image with an appropriate resolution from the thumbnail image or MPF image according to the display resolution.

[0148] In the above-described embodiment, steps S507 and S1209 explained an example of generating the final image from prompts only. However, the final image may also be generated using image data in addition to prompts. For example, if a prompt is associated with an image (MPF image) obtained by resizing the image data that was used to generate the prompt using a arithmetic process such as bilinear or bicubic, then in addition to the prompt, these images are also acquired from the recording medium 115 and referenced to generate the image. In this case, the image is acquired by inputting the prompt and the MPF image into the learning model stored in the ROM 113. In this case, the learning model is a model that has been trained in advance using not only prompts but also images as input. Alternatively, the prompt / image conversion unit 124 may transmit the prompt and the MPF image to an external device via the communication interface 118 and receive the generated image from the external device. Here, the generated image is approximately the same as the image that was used to generate the prompt.

[0149] In this way, by referencing MPF images based on captured image data, in addition to prompts, during image generation in the learning model, it becomes possible to generate images that closely resemble the original image in terms of composition, subject matter, and other elements that make up the image at the time of shooting. This reduces the impact of model version upgrades and minimizes the differences between images generated from prompts.

[0150] Furthermore, in image generation using the learning model, not only the prompt itself but also the thumbnail image generated from the prompt may be used as input. This allows the main image to be similar to the thumbnail even if, for example, the learning model is updated and can no longer output the same thumbnail. In other words, the possibility of a main image being displayed that differs significantly from the thumbnail viewed on the list screen is reduced, thereby minimizing the chances of users experiencing a sense of incongruity.

[0151] (Other Embodiments) The Disclosure can also be realized by supplying a program that implements one or more of the functions of the above-described embodiments to a system or device via a network or storage medium, and by having one or more processors in the computer of that system or device read and execute the program. It can also be realized by a circuit (e.g., ASIC) that implements one or more functions.

[0152] The technical ideas derived from this disclosure are not limited to the exemplary embodiments disclosed, but are intended to encompass various modifications of the exemplary embodiments, or substitutions with equivalent structures or functions. The scope of the following claims should be interpreted in the broadest way to encompass all such modifications and equivalent structures and functions.

[0153] This application claims priority based on Japanese Patent Application No. 2024-212427, filed on December 5, 2024, and all of its contents are incorporated herein by reference.

Claims

1. A display control device comprising: an acquisition means for acquiring a prompt which is text information describing the content of a first image, and a second image associated with the prompt which is based on the first image and has a lower resolution than the first image; a generation means for generating a third image from the prompt which is an image that reproduces the content described by the prompt and has a higher resolution than the second image; and a control means for controlling the display means to resize the second image corresponding to the selected image and display it on the display means when an image selected from a list of the second images is displayed on the display means, and then display the third image corresponding to the selected image, which was generated by the generation means, on the display means.

2. The display control device according to claim 1, characterized in that the control means selects whether or not to display the third image according to the resolution of the display means.

3. The display control device according to claim 2, characterized in that the control means continues to display the second image without displaying the third image when the resolution of the display means is lower than a predetermined threshold.

4. The display control device according to claim 3, characterized in that the acquisition means further acquires a fourth image with a higher resolution than the second image associated with the prompt, and the control means controls the display means to display the fourth image on the display means without displaying the second image on the display means when the resolution of the display means is equal to or greater than a predetermined threshold.

5. The display control device according to claim 4, characterized in that the fourth image has a lower resolution than the third image.

6. The display control device according to claim 4, characterized in that the second image or the fourth image is created by resizing the first image, and the generation means generates the third image from the prompt by referring to the second image or the fourth image created by resizing.

7. The display control device according to claim 1, characterized in that the generation means generates the third image from the prompt using a generation AI model.

8. The display control device according to claim 1, further comprising a second generation means for generating the prompt from the first image.

9. The display control device according to claim 1, further comprising a third generation means for generating the second image from the prompt.

10. The display control device according to claim 1, further comprising recording means for recording the prompt and the second image in association.

11. The display control device according to claim 1, further comprising a means for capturing the first image.

12. A display control method characterized by comprising: an acquisition step of acquiring a prompt which is text information describing the content of a first image, and a second image associated with the prompt which is based on the first image and has a lower resolution than the first image; a generation step of generating a third image from the prompt which is an image that reproduces the content described by the prompt and has a higher resolution than the second image; and a control step of, when displaying an image selected from a plurality of the second images displayed in a list on a display means, resizing the second image corresponding to the selected image and displaying it on the display means, and then controlling the display means to display the third image corresponding to the selected image which was generated in the generation step.

13. A program for causing a computer to function as one of the means of a display control device according to any one of claims 1 to 11.

14. A computer-readable storage medium storing a program for causing the computer to function as each of the means of the display control device described in any one of claims 1 to 11.