Display method and display apparatus
The display method ensures consistent image features by updating templates with changing text, improving display effects and user experience in electronic devices.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-12-20
- Publication Date
- 2026-06-25
AI Technical Summary
Current image generation methods for text-to-image synthesis in electronic devices often produce inconsistent features of objects and backgrounds in images corresponding to the same story, leading to inferior display effects and user experience.
A display method that generates images based on templates updated with changing text, ensuring consistency in object and background features across multiple images by updating parameters in the template as the text progresses, using parameters such as scene and object attributes.
Maintains consistent background and object features across images, enhancing display quality and user experience by aligning images with the narrative context of the story.
Smart Images

Figure CN2024141177_25062026_PF_FP_ABST
Abstract
Description
DISPLAY METHOD AND DISPLAY APPARATUSTECHNICAL FIELD
[0001] The present disclosure relates to the field of computer technology, and in particular, to a display method and a display apparatus.BACKGROUND
[0002] Text-to-image generation is a research hotspot in the field of artificial intelligence, which involves the intersection of natural language processing (NLP) and computer vision (CV) . This technology aims to generate corresponding images by an electronic device based on a given text description. With the development of deep learning, especially the application of generative adversarial networks (GANs) , variational autoencoders (VAEs) , transformers and diffusion models, the quality of text-to-image generation has improved significantly.
[0003] In current image generation methods, for the same input (i.e., prompt text) , output images generated by performing multiple image generation may be different. For example, features of an object, such as the face, hairstyle or clothing of a person, in the multiple images may be different. Similarly, for example, the background of the multiple images, such as the locations of the background in the multiple images, may be different.
[0004] In one possible application, an image generation algorithm may be used for the electronic device to display images corresponding to a story being read by the user (e.g., a story) , which may help the user gain a more intuitive feeling and understanding for the current story. Generally, if an object is in the same location throughout the same story, unless the text content of the story changes significantly, the same features of the object should be maintained and the same background should be presented in each generated image. However, current technology may not guarantee this result, and the same object in the same story may have different features or background in the images, thereby leading to inferior display effects of the images on the electronic device and influencing the reading experience of users.SUMMARY
[0005] The present disclosure provides a display method and apparatus, which may make features of same objects and background, corresponding to the same story, in multiple images as consistent as possible, thereby improving display effects of the images on an electronic device.
[0006] According to a first aspect, a display method is described. The method may be performed by an electronic device, or by a chip, circuit, or module within the electronic device. The display method includes: displaying a first interface of a reading application, where the first interface includes a first text of a story and a first image corresponding to the first text; displaying a second interface of the reading application in response to a page-turning operation performed by a first user on the first interface, where the second interface includes a second text of the story and a second image corresponding to the second text. At least part of a background of the second image is same as a background of the first image, and at least part of features of one or more objects in the second image are same as features of the one or more objects in the first image.
[0007] In the embodiments of the present disclosure, an image corresponding to a current text of a story is generated based on the current text, and the image corresponding to the current text of the story may be kept as consistent as possible with the features of one or more objects and the background in the images corresponding to previous or following text of the story, thereby improving the display effects of the images on the electronic devices and the reading experience of users.
[0008] In a possible design, the first interface is displayed in response to an operation of the first user, and the method further includes: displaying a third interface of the reading application in response to an operation of the second user, where the third interface includes the first text of the story and a third image corresponding to the first text, a background of the third image is different from the background of the first image, and / or features of the one or more objects in the third image are different from the features of the one or more objects in the first image.
[0009] Based on this design, for the same story, different images may be displayed for different users, thereby enhancing the flexibility and diversity of display method of images and improving the reading experience of users.
[0010] In a possible design, the first image is generated based on a first template of the story, the first template includes parameters of one or more scenes and parameters of the one or more objects, the background of the first image is generated based on the parameters of the one or more scenes included in the first template, and the features of the one or more objects in the first image are generated based on the parameters of the one or more objects included in the first template. The second image is generated based on a second template of the story, and the second template is obtained by updating values of one or more parameters in the first template with the second text.
[0011] Based on this design, an image corresponding to a current text of a story is generated based on the current text and parameters in the template of the story, and the parameters in the template are updated as the current text changes. The parameters in the template retain parameters of one or more scenes and parameters of the one or more objects. Thus, the background and the features of one or more objects in the image corresponding to the current text of the story may be kept as consistent as possible with the background and the features of the one or more objects in the previously generated image, thereby improving the display effects of the images on the electronic devices and the reading experience of users.
[0012] In a possible design, parameters of a scene of the one or more scenes include at least one of: a region, location or time corresponding to the scene; and / or parameters of an object of the one or more objects include at least one of: a name, gender, role, facial features, costume, hair, size, or age of the object, or a spatial position of the object in the first image.
[0013] In a possible design, the first image is a first generated image for the story, and the first template is extracted based on the first text.
[0014] In a possible design, the first image is not a first generated image for the story, and the first template is generated by updating a template corresponding to a previous image of the first image based on the first text.
[0015] Based on this design, values of one or more parameters in the first template may be updated as the text of the story changes, thereby ensuring that the generated image matches the text of the story, and the background and features of the one or more objects in the image corresponding to the context of the story may be maintained as consistent as possible, thereby improving the display effects of the images on the electronic device.
[0016] In a possible design, the first template is determined based on a user input for the first image, and the user input is used to set at least one of the features of the one or more objects, or the one or more scenes.
[0017] Based on this design, the background or features of the object in the image displayed on a reading application may set by users based on their own needs, and the reading application on the electronic device may display more personalized images for users, thereby further improving reading experience of users.
[0018] In a possible design, the user input is used to set the features of an object in the one or more objects, and before displaying the first interface of the reading application, the method further includes: displaying a fourth interface of the reading application, where the fourth interface includes a first control, and the first control is used to select an image of the object, and the image of the object indicates the features of the object; displaying a fifth interface of the reading application in response to a user operation on the first control, where the fifth interface includes at least one of one or more second controls for selecting one or more candidate images of the object, or a third control for selecting an image of the object captured by a camera, and the user input is a user operation on any one of the one or more second controls or the third control.
[0019] Based on this design, there may be multiple ways for users to set the features of one or more objects, for example, choosing an image from the device gallery or capturing an image with a camera on the electronic device, thereby improving the diversity of device interactions.
[0020] In a possible design, the user input is used to set a scene in the one or more scenes, and before displaying the first interface of the reading application, the method further includes: displaying a sixth interface of the reading application, where the sixth interface includes at least one of one or more fourth controls for selecting one or more candidate scenes, or an input box for inputting parameters of the scene, and the user input is a user operation on any of the one or more fourth controls or an input operation in the input box.
[0021] Based on this design, there may be multiple ways for users to set the parameters of the one or more scenes of the story. For example, the current location of the user may be set as a region where the scene occurs or a region corresponding to a text input by the user may be set as the region where the scene occurs, thereby improving the diversity of device interactions.
[0022] In a possible design, before displaying the first interface, the method further includes: obtaining prompt parameters, where the prompt parameters are related to the first text; generating, based on the prompt parameters, a first candidate image using an image generation algorithm; and obtaining the first image by processing the first candidate image based on the first template.
[0023] Based on this design, processing the first candidate image based on the first template may make the features of the one or more objects and background in the first image more consistent with the parameters in the first template, thereby improving the consistency of the features of the one or more objects and background in multiple images displayed by the electronic device and enhancing the user reading experience.
[0024] According to a second aspect, a display apparatus is described, including a processing unit and a display unit for performing the display method as described in the first aspect.
[0025] According to a third aspect, another display apparatus is described. The display apparatus includes a memory and one or more processors. The memory stores a part or all of a necessary computer program or instructions for implementing a function in the first aspect. The one or more processors may execute the computer program or the instructions, and when the computer program or the instructions is / are executed, the display apparatus is enabled to implement the method in any possible design or implementation of the method of first aspect.
[0026] In a possible design, the display apparatus may further include an interface circuit, and the one or more processors are configured to communicate with another apparatus or component through the interface circuit.
[0027] In a possible design, the memory may be integrated with the processor or implemented as a separate component.
[0028] According to a fourth aspect, a computer-readable storage medium is described. The computer-readable storage medium has instructions stored thereon which, when executed by a display apparatus, cause the display apparatus to perform the method in any one of the possible designs of the first aspect.
[0029] According to a fifth aspect, a computer program product is described. The computer program product stores instructions which, when executed, cause a display apparatus to perform the method in any one of the possible designs of the first aspect.BRIEF DESCRIPTION OF THE DRAWINGS
[0030] For a better understanding of the various described embodiments, reference should be made to the Detailed Description below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
[0031] FIG. 1 illustrates a hardware structure of an example electronic device in which one or more embodiments of the present disclosure may be implemented.
[0032] FIG. 2 illustrates a software architecture diagram of an example electronic device in which one or more embodiments of the present disclosure may be implemented.
[0033] FIG. 3 illustrates a schematic diagram of an example process of a user opening and reading an e-book.
[0034] FIG. 4 illustrates a schematic diagram of example interfaces for reading.
[0035] FIG. 5 illustrates a flowchart of an image generation method in accordance with some embodiments of the present disclosure.
[0036] FIG. 6 illustrates a schematic diagram of a process for a user to read an e-book in accordance with some embodiments of present disclosure.
[0037] FIGS. 7A and 7B illustrate a schematic diagram of a process of customizing images of objects in accordance with some embodiments of the present disclosure.
[0038] FIG. 8 illustrates a schematic diagram of a process of customizing a region where a story occurs in accordance with some embodiments of the present disclosure.
[0039] FIG. 9 illustrates a schematic diagram of example interfaces for a user reading an e-book in accordance with some embodiments of present disclosure.
[0040] FIG. 10 illustrates a schematic diagram of a software architecture in accordance with some embodiments of the present disclosure.
[0041] FIGS. 11 to 12 are schematic diagrams of display apparatuses in accordance with some embodiments of the present disclosure.DETAILED DESCRIPTION
[0042] Technical solutions in some embodiments of the present disclosure will be described clearly below with reference to the accompanying drawings.
[0043] To better understand the embodiments of the present disclosure, an example electronic device in which one or more embodiments of the present disclosure may be implemented is introduced first.
[0044] An image generation method according to some embodiments of the present disclosure may be performed at the electronic device. Such electronic devices may be terminal devices with shooting functions, such as mobile phones, tablet computers, smart screens, laptops, vehicle-mounted devices, wearable devices (like smart watches) , ultra-mobile personal computers (UMPC) , netbooks, personal digital assistants (PDA) , artificial intelligence (AI) devices, smart home appliances, etc.
[0045] The hardware structure and software architecture of the electronic device will be introduced in combination with FIGS. 1 and 2.
[0046] FIG. 1 illustrates a hardware structure of an example electronic device in which one or more embodiments of the present disclosure may be implemented. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a sensor module 180, a camera 193, a display screen 194, a subscriber identification module (SIM) card interface 195, etc.
[0047] It is understood that the hardware structure illustrated in FIG. 1 is only an example, and does not constitute a limitation on the electronic device 100. In some other embodiments, the electronic device 100 may include more, fewer, or different components than shown in FIG. 1. In some other embodiments, some of the components may be split or may be combined. The illustrated components may be implemented in hardware, software, or a combination thereof.
[0048] The processor 110 may include one or more processing units, such as an application processor (AP) , a modem (modulator / demodulator) processor, a graphics processing unit (GPU) , an image signal processor (ISP) , a controller, a memory, a video codec, a digital signal processor (DSP) , a baseband processor, and / or a neural network processing unit (NPU) . Herein, different processing units may be independent devices or integrated within one or more processors.
[0049] The controller may be a neural center and command center of the electronic device 100. The controller may generate control signals based on instructions and timing signals to complete the control of fetching instructions and executing instructions.
[0050] In addition, the memory may be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The cache memory may store instructions or data that the processor 110 has just used or repeatedly used.
[0051] The internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 may execute various functions and data processing of the electronic device 100 by running instructions stored in the internal memory 121. The internal memory 121 may include a program storage region and a data storage region. In addition, the internal memory 121 may include a high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, and universal flash storage (UFS) .
[0052] The external memory interface 120 may be used for connecting an external storage card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external storage card communicates with the processor 110 through the external memory interface 120 to achieve data storage functions. For example, music, videos, and other files may be saved in the external memory card. The external memory interface 120 may include connectors, data transmission lines, control lines, power lines, interface chips, etc.
[0053] The USB interface 130 may be a compliant interface with USB standard specifications, which may be a Mini USB interface, Micro USB interface, or USB Type C interface, etc. The USB interface 130 may be used to connect a charger for charging the electronic device 100, and it may also be used for data transmission between the electronic device 200 and peripheral devices or other devices, such as headphones, AR devices, etc.
[0054] The charging management module 140 may be used to receive charging input from a charger. While charging the battery 142, the charging management module 140 may also supply power to the electronic device 100 through the power management module 141. The power management module 141 is connected to the battery 142 and receives input from the battery 142, and / or is connected to the charging management module 140 to supply power to the components in the electronic device 100. The charging management module 140 may include power management chip, current and voltage detection circuit, temperature sensor, charging control circuit, communication interface, etc. The power management module 141 may include power conversion circuits, voltage regulation circuits, current limiting circuits, battery charging management circuits, power monitoring circuits, power factor correction circuits, and communication interfaces, etc. The battery 142 may include positive and negative electrode materials, electrolytes, and electrode leads, etc.
[0055] The wireless communication function of the electronic device 100 may be achieved through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulator / demodulator processor, and the baseband processor, etc.
[0056] The antenna 1 and antenna 2 may be used for transmitting and receiving electromagnetic wave signals. The antenna may include radiation elements, feed networks, reflectors or directors, support structures, matching networks, etc.
[0057] The mobile communication module 150 may be used for providing wireless communication solutions including 2G / 3G / 4G / 5G for use on the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, and low noise amplifier (LNA) .
[0058] The wireless communication module 160 may provide solutions for wireless communication including wireless local area networks (WLANs) such as wireless fidelity (Wi-Fi) networks, Bluetooth (BT) , global navigation satellite system (GNSS) , frequency modulation (FM) , near field communication (NFC) , infrared (IR) technology and other wireless communication technologies, for use in the electronic device 100. The wireless communication module 160 may include a radio frequency transmitter and receiver, baseband processor, power amplifier, low noise amplifier, antenna interface, control unit, storage unit, and power management unit, etc.
[0059] The electronic device 100 achieves display functionality through the GPU, the display screen 194, and the application processor, etc. The GPU may be a microprocessor for image processing, connecting the display screen 194 and the application processor. The GPU may be used to perform mathematical and geometric calculations for graphical rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or modify display information.
[0060] The display screen 194 may be used to display images, videos, etc. The display screen 194 includes a display panel. In some embodiments, the electronic device 100 may include one or more display screens.
[0061] The electronic device 100 may achieve the shooting function through the ISP, the DSP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc.
[0062] The camera 193 may be used to capture still images or videos. For example, when taking a picture, a shutter of the camera 193 may be opened and light passes through one or more lenses of the camera 193 to a photosensitive element of the camera 193. The photoelectric element may convert the light signal into an electrical signal, and sends the electrical signal to the ISP for processing. The photoelectric element may be a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) photoelectric transistor. In some embodiments, the electronic device 100 may include one or more cameras 193.
[0063] The electronic device 100 may achieve audio functions, such as music playback, recording, etc., through the audio module 170 and the application processor. The audio module 170 may be used to convert digital audio information into analog audio signals for output, and also used to convert analog audio input into digital audio signals. The audio module 170 may also be used for audio signal encoding and decoding. In some embodiments, the audio module 170 may be located within the processor 110, or some functions of the audio module 170 may be integrated into the processor 110. The audio module 170 may include audio codec, audio amplifier, audio input interfaces, audio output interfaces, digital signal processor (DSP) , audio storage unit, and control unit, etc.
[0064] The sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, an ambient light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an environmental light sensor, and / or a bone conduction sensor, etc. The gyroscope sensor may be used to determine the posture of the electronic device 100. The ambient light sensor may be used to sense the brightness of the ambient light.
[0065] The SIM card interface 195 may be connected to a SIM card. The SIM card may be connected to or disconnected from the electronic device 100 by inserting or removing it from the SIM card interface 195.
[0066] It is understood that the above examples do not constitute limitations on the electronic device 100. In some other implementations, the electronic device 100 may also have other components.
[0067] The electronic device 100 may run an operating system (OS) , which may be an operating system used in the industry, such as an operating system developed based on OpenHarmony, such as HarmonyOS; or another operating system such as an AndroidTM or iOS mobile operating system; it may also be an open-source operating system or its derivative operating system, such as Linux OS, or another embedded operating system; it may also be a future new type of operating systems, such as an operating system based on artificial intelligence. The operating system includes a set of system software or programs that are interrelated and manage and control an operation of the electronic device, use and run hardware and software resources and provide public services to organize user interactions. The operating system in the electronic device 100 may be connected down to physical devices in a hardware layer, and may also provide a running environment for application software.
[0068] The operating system may include a kernel layer, a middleware layer, and an application layer. The application layer may include applications, and the applications may include system applications and third-party applications. The middleware layer may include a series of software that provides various services to application developers, or frameworks that provide services such as databases, multimedia, and graphics, or capabilities such as distributed scheduling and system expansion. For example, the middleware layer may include a framework layer and / or a system service layer. The framework layer may provide application programming interfaces (APIs) and programming frameworks for applications of the application layer. The system service layer may include a set of system core capabilities, which provide services to the applications of the application layer through the framework layer. The kernel layer is a layer between hardware and software. The kernel layer may include hardware drivers and an operating system kernel. In addition to providing hardware drivers, the kernel layer may also support memory management and system process management functions.
[0069] There are various types and shapes of electronic devices used in our daily lives, and the scenarios in which electronic devices are used are also very diverse. Therefore, based on the shapes and functions of different electronic devices, different application scenarios and the needs of different users, the operating systems used in electronic devices may also be different. The basic functions implemented by the electronic device according to the embodiments of the present disclosure may be realized by a general operating system or a dedicated operating system. In order to more clearly introduce the implementation of the embodiments of the present disclosure in a specific operating system, the architecture of HarmonyOS is shown below. Based on this, a person skilled in the art may infer the implementation of the embodiments of the present application in other specific operating systems, such as AndroidTM.
[0070] FIG. 2 illustrates a software architecture diagram of an example electronic device. In this example, the software architecture of the electronic device 200 may be divided into several layers, which in some implementations are, from bottom to top: a kernel layer, a system service layer, a framework layer, and an application layer. These layers communicate with each other through software interfaces. System functions may be trimmed, added, or combined at subsystem granularity in different deployment scenarios for device form factors, and each subsystem may be trimmed, added, or combined at the granularity of functions.
[0071] 1. kernel layer: the kernel layer is also known as a kernel abstract layer (KAL) , which may shield differences in kernel implementations and provide the upper layers with basic kernel capabilities, including but not limited to process / thread management, memory management, file system, network management, and peripheral management. The kernel layer may include kernel subsystems and driver subsystems.
[0072] Kernel subsystem: the kernel subsystem may support the selection of suitable OS kernels for electronic devices with various resource limitations, including but not limited to Linux kernel, HarmonyOS kernel, Lite Operating System (LiteOS) , etc.
[0073] Driver subsystem: the driver subsystem is the foundation for an open hardware ecosystem, and may allow a unified access from peripheral devices and provide a driver framework for driver development and management. The driver framework may include display driver, camera driver, audio driver, Bluetooth driver, sensor driver, etc.
[0074] 2. System Service Layer: the system service layer may include a set of system core capabilities, which may provide services for the applications through the framework layer. The system service layer may include multiple subsystem sets, including but not limited to:
[0075] Basic capability subsystem set: which may provide basic capabilities for running, scheduling, migration, etc. of distributed applications across multiple devices; the basic capability subsystem set may include distributed soft (DSoft) bus, distributed data management, distributed task scheduling, Ark multi-language runtime; it may also include a multi-modal input subsystem, graphics subsystem, security subsystem, AI subsystem, etc.
[0076] Basic software service subsystem set: which may provide common and universal software services, and it may include an event and notification subsystem, a telephony subsystem, a multimedia subsystem, etc.
[0077] Enhanced software service subsystem set: which may provide differentiated and enhanced software services dedicated to electronic devices, and it may include proprietary business subsystems for smart screens, wearables, and IoT devices, etc.
[0078] Hardware service subsystem set: which may provide hardware-related services, and it may include location service subsystem, user identity and access management (IAM) subsystem, wearable proprietary hardware service subsystem, biometric recognition subsystem, IoT proprietary hardware service subsystem, etc.
[0079] The distributed task scheduling described above may achieve distributed service management (including discovery, synchronization, registration, and invocation) , supporting operations such as remote startup, remote invocation, remote connection, and migration for cross-device applications.
[0080] The distributed data management may achieve full-scene, cross-device data synchronization, data storage, data sharing, and data access functions.
[0081] The distributed soft (Dsoft) bus may provide communication-related capabilities for seamless interconnection between multiple electronic devices, including: WLAN service capabilities, Bluetooth service capabilities, soft bus, remote procedure call (RPC) (i.e., a paradigm of inter-process communication (IPC) ) , and NearLink communication capabilities.
[0082] The Ark multi-language runtime may be a unified compiler and runtime platform designed to support joint compilation and execution of multiple programming languages and multiple chip platforms.
[0083] 3. Frame layer: the frame layer provides application programming interfaces (APIs) and programming frameworks for the applications of the application layer. The frame layer may include ArkUI framework (which provides a complete infrastructure for user interface (UI) development of system applications, including UI functionalities such as components, layouts, animations, and interaction events, as well as real-time interface preview tools) , user program framework, and Ability framework (which is a lightweight application, and schedules and manages the running and life cycle of Ability) . Different electronic devices may have different operating systems, and the APIs supported by them may also be different.
[0084] HarmonyOS API may provide a set of open capabilities for supporting the development of HarmonyOS applications. The HarmonyOS API may be provided at the framework layer or independently of the framework layer. The HarmonyOS APIs may include Audio API (audio service) , Push API (push service) , Account API (account service) , etc.
[0085] 4. Application layer: the application layer may include multiple applications. The applications may include system applications and extended applications / third-party applications. The system applications may include desktop, notification bar, settings, contacts, phone, camera, etc. The extended applications / third-party applications may include social applications, travel applications, reading applications, etc.
[0086] It is understood that the software architecture shown in FIG. 2 is only an example, and in some other implementations, the electronic device 200 may include more or fewer modules to achieve corresponding functions.
[0087] In order to facilitate the understanding of the technical solutions in the embodiments of the present disclosure, relevant terminologies herein will be briefly introduced first.
[0088] 1. Natural Language Processing
[0089] Natural language processing (NLP) is a technique that enables interaction and communication between humans and machines using natural language employed in human communication. It is an important direction in the fields of computer science and artificial intelligence. NLP research involves the development of theories and methods that enable effective communication between humans and machines using natural language. The NLP involves multidimensional operations such as speech, grammar, semantics, and pragmatics, and the basic task of NLP is to segment input text based on lexicon, term frequency statistics, and contextual semantic analysis, resulting in word items that are semantically rich and expressed at the minimum grammatical level. NLP is mainly used in machine translation, sentiment monitoring, automatic summarization, opinion extraction, text classification, question answering, text semantic comparison, speech recognition, and Chinese OCR.
[0090] 2. Large Language Model
[0091] A large language model (LLM) is an artificial intelligence model designed to understand and generate human language. LLMs are trained on large amounts of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, etc. The key feature of LLMs is the large scale, with billions of parameters, which may help them learn complex patterns in language data. LLMs are typically based on deep learning architectures such as transformers, which may help them achieve impressive performance on various NLP tasks.
[0092] 3. Named Entity Recognition
[0093] Named entity recognition (NER) is a technique in NLP that focuses on identifying and classifying entities. The purpose of NER is to automatically extract structured information from unstructured text, enabling machines to understand and categorize entities in a meaningful manner for various applications like text summarization, building knowledge graphs, question answering, and knowledge graph construction. NER is also referred to as entity identification, entity chunking, or entity extraction. NER is the component of information extraction that aims to identify and categorize named entities within unstructured text. NER involves the identification of key information in the text and classification into a set of predefined categories. An entity is the thing that is consistently talked about or refers to in the text, such as person names, organizations, locations, time expressions, quantities, percentages and more predefined categories.
[0094] Text-to-image generation is a research topic in the field of artificial intelligence, which involves the intersection of NLP and computer vision (CV) . The technology aims to synthesize images based on a given description in natural language. For example, when a user is reading text, an image generation algorithm that displays, for a user, images corresponding to a story being read may help the user understand and feel the current story more intuitively. For example, in an example of an e-book application (also known as a reading application) , the user may open the e-book application on his electronic device and read a story. The electronic device may display the text of the e-book and images generated by the image generation algorithm based on the text during the reading process.
[0095] The process of opening the e-book application in the electronic device is described below, using the electronic device being a smartphone as an example.
[0096] FIG. 3 illustrates a schematic diagram of a process of a user opening and reading an e-book. As shown in FIG. 3, interface 301 is a main interface of the smartphone, which displays icons of applications such as camera, address book, call, SMS, clock, and e-book. The E-book icon on the interface 301 may be clicked by a user, which may be referred to as operation 302. In response to the operation 302, the smartphone opens the e-book application and displays interface 303. The interface 303 is an interface of the e-book application, which displays names of 6 e-books, corresponding icons of the 6 e-books, and a setting control. Among them, the setting control is used to manage the e-books and set parameters of the e-book application. The Book2 icon on the interface 303 may be clicked by the user, which may be referred to as operation 304. In response to the operation 304, the smartphone opens book2 and displays interface 305. The interface 305 includes text of a first page of book2 and an image generated based on the text of the first page. Furthermore, a page-turning operation may be performed on the interface 305 by the user to make the smartphone display text and an image of another page.
[0097] With the development of deep learning, especially the application of GANs, VAEs, transformers, and diffusion models, the quality of text-to-image generation has improved significantly. For example, some image generation methods, such as latent diffusion models, recover sampled variables from Gaussian noise to the distribution of sample data by denoising. These latent diffusion models accept auxiliary text input and represent it as a vector in the latent space to generate latent vectors. Then the latent vectors serve as inputs to the diffusion network, so the latent diffusion model may generate high-quality images based on text input.
[0098] However, the image generated by the text-to-image generation algorithm has randomness. Due to the random sampling of Gaussian noise in each iteration and each step, for the same input (i.e., prompt text) , executing multiple image generation may generate multiple images that may be different. For example, features of an object in the multiple images are different, such as a gender, hairstyle, or clothing of the object (e.g., a character) . Similarly, for example, the background of the multiple images may be different, such as location background in the images. For example, as shown in FIG. 4, for the same text, images generated in interface 401 and interface 402 have different features of objects (i.e., characters in FIG. 4) . For example, both characters in the interface 401 are female, while one character in the interface 402 is female and the other is male. Similarly, the background of the generated images for the same text is also different. For example, in the interface 401, the two characters are walking in the background with vehicles, and one of them is holding an umbrella. In the interface 402, the two characters are walking in the background with the moon and no vehicles in the background, and neither character is holding an umbrella.
[0099] In this case, the consistency of characteristics of the object and background of the same story may not be ensured. For example, the object is a person, and if features of the person are not changed according to the text in continuous two pages, but different faces or hairstyles of the person are presented, the user may be confused. If the text describes a location and a corresponding background image for the location is created in the previous page, the user may expect to see the same background in the image displayed on the current page. However, it may be difficult for current image generation algorithms to maintain consistency in the features of the object and background based on context throughout the story, which may lead to inferior display effects of the images and influence the user reading experience.
[0100] In light of this, some embodiments of the present disclosure provide a display method, where an image corresponding to current text of a story is generated based on the current text, and features of one or more objects and background in the image corresponding to the current text of the story may be kept as consistent as possible with features of the one or more objects and background in images corresponding to previous and following text of the story, thereby improving the display effects of the images on the electronic devices and the reading experience of users.
[0101] FIG. 5 illustrates a flowchart of a display method in accordance with some embodiments of the present disclosure. The display method 500 is performed at an electronic device (e.g., one or more processors and related components of the electronic devices) , which may have, for example, the hardware structure shown in FIG. 1 and / or the software architecture shown in FIG. 2, but the embodiments of the present disclosure are not limited thereto. The display method 500 includes steps 501 to 502.
[0102] In step 501, the electronic device displays a first interface of a reading application, where the first interface includes a first text of a story and a first image corresponding to the first text.
[0103] In one possible implementation, the first image may be generated based on a first template of the story, and the first template includes parameters of one or more scenes and parameters of one or more objects. The background of the first image is generated based on the parameters of the one or more scenes included in the first template, and features of the one or more objects in the first image are generated based on the parameters of the one or more objects included in the first template.
[0104] In one possible implementation, the first template is an initial template. For example, the first text is a text on a first page of a current story of an e-book, and there is no other template exists before step 501; then the first template is determined based on the first text. In another possible implementation, if there is a previous text for the story before the first text, then the first template may be updated based on a template corresponding to the previous text using the first text. For example, the first template is obtained by updating a template corresponding to a previous page of the current story of the e-book by using the first text on the current page of the current story.
[0105] In step 502, the electronic device displays a second interface of the reading application in response to a page-turning operation performed by a first user on the first interface, where the second interface includes a second text of the story and a second image corresponding to the second text, and at least part of a background of the second image is same as a background of the first image, and at least part of features of one or more objects in the second image are same as features of the one or more objects in the first image.
[0106] In one possible implementation, the second image is generated based on a second template of the story, and the second template is obtained by updating values of one or more parameters in the first template with the second text.
[0107] Based on this design, when current text of a story changes, an image corresponding to the changed text of the story is generated based on parameters in a template of the story, which are obtained by updating corresponding parameters of a previous template with the changed text. The parameters in the updated template retain parameters of one or more scenes and parameters of one or more objects. Thus, a background and features of one or more objects in the image corresponding to the changed text of the story may be kept as consistent as possible with a background and features of the one or more objects in the previously generated image, thereby improving the display effects of the images on the electronic devices and the reading experience of users.
[0108] The second text is a text after the first text. For example, the first text may be a text on the first page, and the second text may be a text on a second page. For another example, the first text may be the text on the second page, and the second text may be a text on a fifth page.
[0109] Besides, the page-turning operation performed by the first user may be turning one or more pages when the first user is reading the story, where turning multiple pages may be, for example, that multiple single-page-turning operations are performed, which means that the page-turning operation may include multiple single-page-turning operations. The page-turning operation performed by the first user on the first interface (e.g., the user input) and other operations referred herein may be single-click, double-click, sliding, pinching, etc., and the operations are not limited thereto.
[0110] In some embodiments, parameters of a scene of the one or more scenes include at least one of: a region, location or time corresponding to the scene; and / or parameters of an object of the one or more objects include at least one of: a name, gender, role, facial features, costume, hair, size, or age of the object, or a spatial position of the object in the first image. The facial features (for example, chin, eyebrow, bridge of the nose, nose tip, top lip, bottom lip, etc. ) may be extracted from a face image or representation of the face image. The spatial position of the object means which numeral position of the object is located from left to right in the first image. For example, the object may be a person, and the parameters of the object may be the gender, hairstyle and clothing of the person. For another example, the object may be a car, and the parameters of the object may be a shape, color, and size of the car. In addition, one object may correspond to one role or multiple roles.
[0111] Based on this design, values of one or more parameters in the first template may be updated as the text of the story changes, thereby ensuring that the generated image matches the text of the story, and the background and features of the one or more objects in the image corresponding to the context of the story may be maintained as consistent as possible, thereby improving the display effects of the images on the electronic device.
[0112] For example, in the case of reading an e-book, Table 1 shows the first template in a form of table, and Table 2 shows the second template in a form of table. In Tables 1 and 2, parameters of one or more scenes and parameters of one or more objects are represented by entity attributes, and each entity attribute is presented in the form of key-value pairs. In Table 1, the entities include story tag, scenario, character, story region, and location, where the scenario, story region, and location, along with their corresponding entity attributes represent the parameters of the one or more scenes. The character and its corresponding entity attributes represent the parameters of the one or more objects.
[0113] In these embodiments, the first template includes a tag of the story (e.g., the story tag) .
[0114] Based on this design, the first template may include more relevant information about the story, thereby making the generated image more consistent with the story.
[0115] As shown in the entities included in Table 1, entity attributes for the story tag are used to introduce the title, author, type of the story, etc. The entity attributes corresponding to the character include one character, the ID of the character is ID 1, the name of the character is Ada Chen, the gender of the character is female and the role of the character is hairstylist. Table 1
[0116] The parameters of the first template shown in Table 1 are determined based on the first text. The electronic device may update the values of the parameters in the first template shown in Table 1 based on the second text to obtain the second template shown in Table 2. In Table 2, since a new character appears in the second text and the role of the character is a customer, a character is added to the entity attributes in Table 2 with ID 2. The name of the character is David, and the gender of the character is male. Table 2
[0117] By updating Table 1 to obtain Table 2, it may be ensured that the features of the object and background of the second image generated based on Table 2 are as consistent as possible with those of the first image generated based on Table 1.
[0118] Therefore, the display method of the embodiments of the present disclosure may continuously update values of one or more parameters in the template corresponding to the current text as the current text of the story changes, thereby reserving parameters of one or more scenes and parameters of one or more objects of the generated image of the story within the template.
[0119] As an example of the method 500, FIG. 6 illustrates a schematic diagram of an example process of a user reading an e-book on a mobile phone. As shown in FIG. 6, the mobile phone displays a first interface 601, which includes a first text 6011 of a first page of a story and a first image 6012 generated based on the first text 6011 of the first page. The user slides up on the first interface 601, which may be referred to as a user input 602. In response to the user input 602, the mobile phone displays a second interface 603. The second interface 603 includes a second text 6031 of a second page of the story and a second image 6032 generated based on the second text 6031 of the second page. It can be observed from FIG. 6 that the first text 6011 in the first interface 601 is different from the second text 6031 in the second interface 603, but the features (e.g., gender, dressing, hairstyle, body shape, etc. ) of objects (i.e., characters) of the first image 6012 in the first interface 601 are consistent with the features of the objects (i.e., characters) of the second image 6032 in the second interface 603. Furthermore, the background of the first image 6012 in the first interface 601 is consistent with the background of the second image 6032 in the second interface 603. For example, both characters are in a background with the moon, and the shape, size, and position of the moon are the same in the first image 6012 and the second image 6032.
[0120] It is understood that all images corresponding to a story are generated in accordance with the order of the story's timeline. In a display interface of an e-book, one page may display one image or multiple images, and the embodiments of the present disclosure do not limit this. With reference to the example displayed in the FIG. 6, in some other embodiments, the first interface 601 may include the first image and a third image, where the third image is above the first image, and a third text corresponding to the third image and the first text corresponding to the first image both belong to the text of the first page of the story, but the third text corresponding to the third image is before the first text corresponding to the first image. Therefore, the electronic device first generates the third image, updates values of one or more parameters in a template corresponding to the third image, based on the text corresponding to the first image, and then generates the first image based on the updated values of the one or more parameters in the template. The background and features of objects of the first image are consistent with those of the third image. After that, the electronic device updates the values of the one or more parameters in the template once again based on the second text of the second page and generates the second image based on the updated values of the one or more parameters in the template. Then the electronic device displays the second interface 603 including the second image 6032.
[0121] In one possible implementation, the first image is a first generated image for the story, and the first template is extracted based on the first text.
[0122] In one possible implementation, the first image is not a first generated image for the story, and the first template is generated by updating a template corresponding to a previous image of the first image based on the first text.
[0123] Based on this design, values of one or more parameters in a template may be updated as the text of the story changes, thereby ensuring that the generated image matches the text of the story, and the background and features of the one or more objects in the image corresponding to the context of the story are maintained as consistent as possible, thereby improving the display effects of the images on the electronic device.
[0124] The parameters of the one or more scenes and parameters of the one or more objects in the template in the embodiments of the present disclosure may be obtained based on the information extracted from the first text, or determined by the electronic device based on a user input. Herein, the process of generating parameters of one or more scenes and parameters of one or more objects in the template will be described in detail, using the first template as an example.
[0125] In one possible implementation, the user does not set the content of the first image, the first template is obtained based on the information extracted from the first text by using language generation models (e.g., Natural Language Processing (NLP) model) .
[0126] In another possible implementation, the first template is determined based on a user input for the first image, and the user input is used to set at least one of: the features of the one or more objects, or the one or more scenes.
[0127] Based on this design, the background or features of the object in the image displayed on the reading application may be set by users based on their own needs, and the reading application may display more personalized images for the users, thereby increasing the variety of settings and further improving reading experience of the users.
[0128] In some embodiments, the user input is used to set features of an object in the one or more objects, and before step 501, i.e., displaying the first interface, the method further includes: displaying a fourth interface of the reading application, where the fourth interface includes a first control, and the first control is used to select an image of the object, and the image of the object indicates the features of the object; displaying a fifth interface of the reading application in response to a user operation on the first control, where the fifth interface includes at least one of: one or more second controls for selecting one or more candidate images of the object, or a third control for selecting an image of the object captured by a camera, and the user input is a user operation on any one of the one or more second controls or the third control.
[0129] Based on this design, there may be multiple ways to set features of one or more objects, thereby improving the diversity of device interactions.
[0130] The UI interaction process for users to set features of one or more objects will be described in detail below.
[0131] FIGS. 7A and 7B illustrate a schematic diagram of a process of setting images of objects in accordance with some embodiments of the present disclosure. FIGS. 7A and 7B are illustrated using a user reading an e-book on a mobile phone as an example. As shown in FIGS. 7A and 7B, an interface 701 is an interface of an e-book application, which displays names of 6 e-books, corresponding icons of the 6 e-books, and a setting control, where the setting control may not only be used for managing the e-books and setting parameters of the e-book application but also for customizing image information displayed for the e-books. The user clicks the setting control on the interface 701, which may be referred to as an operation 702. In response to the operation 702, the mobile phone displays an interface 703. The interface 703 includes Book1 control, Book2 control, Book3 control, Book4 control, Book5 control, and Book6 control, which are used for the user to select an image of a book to be set. The user clicks the Book4 control on the interface 703, which may be referred to as an operation 704, indicating that the user has chosen to set the image of book4. In response to the operation 704, the mobile phone displays an interface 705 shown in FIG. 7B. The interface 705 includes a “choose characters” control, “choose story location” control and / or other controls such as “show your choices” control and “do not show again” control. The “choose characters” control is used to set images of one or more objects. The “choose story location” control is used to set a region where the story occurs. The “show your choices” control is used to display setting results of the user. When the user clicks the “show your choices” control, the mobile phone will display images corresponding to the one or more objects set by the user. Additionally, the “do not show again” control is used to select not to set image information. When the user clicks the "do not show again" control, it means that the user does not want to set images.
[0132] Subsequently, the user clicks the "choose characters" control on the interface 705, which may be referred to as an operation 706. In response to the operation 706, the mobile phone displays an interface 707 (i.e., an example of the fourth interface) . The interface 707 includes an “Ada Chen” control and “David” control (i.e., an example of the first control) . Ada Chen and David (i.e., an example of the object) are objects extracted from the first text.
[0133] Furthermore, the user clicks the “Ada Chen” control on the interface 707, which may be referred to as an operation 708. In response to the operation 708, the mobile phone displays an interface 709 (i.e., an example of the fifth interface) . The interface 709 includes three controls of face1, face2, and face3 images (i.e., an example of the one or more second controls) for selecting candidate images of the object “Ada Chen” , and / or a camera control (i.e., an example of the third control) for selecting an image of the object “Ada Chen” captured by the camera. When the user clicks the control of face3 image on the interface 709, an operation 710 (i.e., an example of the user input) occurs. In response to the operation 710, the mobile phone stores the face3 image as the image of the object “Ada Chen” in the first template.
[0134] In some other embodiments, the user may click the camera control to select a captured image as the image of the object.
[0135] In some embodiments, the user input is used to set a scene in the one or more scenes, and before step 501, i.e., displaying the first interface, the method further includes: displaying a sixth interface of the reading application, where the sixth interface includes at least one of: one or more fourth controls for selecting one or more candidate scenes, or an input box for inputting parameters of the scene, and the user input is a user operation on any one of the one or more fourth controls or an input operation in the input box.
[0136] User interface (UI) interaction process for users to set a scene of the story is described in detail below.
[0137] FIG. 8 illustrates a schematic diagram of a process of setting a scene where a story occurs in accordance with some embodiments of the present disclosure. As shown in FIG. 8, a user reading an e-book on a mobile phone is taken as an example. Interface 801, 803 and 805 shown in FIG. 8 are the same as the interfaces 701, 703 and 705 shown in FIGS. 7A and 7B, and will not be repeated here. In response to the operation 804 of the user clicking the Book4 control, the mobile phone displays an interface 805. The user clicks the “choose story location” control on the interface 805, which may be referred to as an operation 806. In response to the operation 806, the mobile phone displays an interface 807 (i.e., an example of the sixth interface) . The interface 807 includes “choose region” control (i.e., an example of the fourth control) and an input box (i.e., an example of the input box) . In one implementation, the user may click the "choose region" control, and the mobile phone may display one or more region controls (e.g., a drop-down list) corresponding to multiple different regions (i.e., an example of the one or more candidate scenes) , such as New York, Paris, Beijing, etc. When the user clicks any one of the one or more region controls (i.e., an example of the user input) , in response to the user click, the mobile phone may take the region as the region where the story occurs. For another example, when the user clicks the "choose region" control, the mobile phone may display a "choose mine" control. The user may click the "choose mine" control, and in response to the user click, the mobile phone will take its own current location as the region where the story occurs. In another implementation, the user may input a text of any region (i.e., an example of the parameters of the scene) in the input box, as shown in operation 808 on the interface 807. In response to the user input, the mobile phone may take the region corresponding to the text input by the user as the region where the story occurs.
[0138] In some embodiments, before displaying the first interface, the method further includes: obtaining prompt parameters, where the prompt parameters are related to the first text; generating, based on the prompt parameters, a first candidate image using an image generation algorithm; and obtaining the first image by processing the first candidate image based on the first template.
[0139] Based on this design, processing the first candidate image based on the first template may make the features of the one or more objects and background in the first image more consistent with the parameters in the first template, thereby improving the consistency of the features of the one or more objects and background in multiple images displayed by the electronic device and enhancing the user reading experience.
[0140] The prompt parameters may be generated based on the first text or the first template by using prompt generator models (e.g., large language model) , and the embodiments of the present disclosure do not limit this. The prompt parameters may include the parameters in the template mentioned above, and the prompt parameters may be designed as a plain text or as a template as it is. The prompt parameters may be used for finetuning an LLM. So, the prompt parameters may follow the prompt structure used during the training or finetuning.
[0141] The processing the first candidate image based on the first template may include at least one of features fixing and background fixing.
[0142] For the feature fixing, instance segmentation models (e.g., Faster Region-based Convolutional Neural Network (Faster R-CNN) ) may be applied on the first candidate image. First, an input image's feature map is extracted by a convolutional neural network, and then candidate regions are generated on the feature map by a region proposal network (RPN) . Features corresponding to the candidate regions are extracted from the feature map. These features are input into a classification and bounding box regression network for target category judgment and adjustment of bounding box position.
[0143] For example, in the first candidate image, once a character is located by using the Faster R-CNN, the next step is to identify features (e.g., hair, costumes, gender, etc. ) of the character. The semantic segmentation models (e.g., UNet Model) may be applied to identify and extract features of the character from the background. Once an image of the features is extracted, the next step is to swap it with character’s features included in the first template by using technique like Encoder Decoder based Deep Face Swap GAN. After processing the first candidate image, the features of the character may be consistent with the features specified in the first template.
[0144] For the background fixing, similar to the feature fixing, the instance segmentation is applied here to extract background and foreground objects of the first candidate image. The foreground objects may be characters specified in the first template. The background may be the other items in the first candidate image other than the characters.
[0145] Once the background is extracted, there may be gaps where the objects were present. These gaps may be filled through methods like Weiner filter or Kalman Filter. The extracted background image may be saved under a unique location ID in the first template.
[0146] Once the first candidate image is generated, and the foreground characters are extracted and feature fixing is performed, these foreground characters are fixed on the background image if the location ID already has an existing background image which is constructed in the previous step. The first template may be searched for this location ID to get the background image, and swap it with the original background on the first candidate image.
[0147] It is noted that the features of objects and background in the first candidate image may not be consistent with previous generated image. By performing the above processing, it is possible to make the features of objects and the background in the processed first image as similar as possible to those in the previous image, thereby improving the user reading experience.
[0148] In one possible implementation, the first interface is displayed in response to an operation of the first user, and the method further includes: displaying a third interface of the reading application in response to an operation of the second user, where the third interface includes the first text of the story and a third image corresponding to the first text, a background of the third image is different from the background of the first image, and / or features of the one or more objects in the third image are different from the features of the one or more objects in the first image.
[0149] The electronic device may distinguish different users based on the accounts logged in by the users. In a possible implementation, the first user may log in his or her account on an electronic device or a reading application, which may be referred to as first account. The second user may log in his or her account on an electronic device or a reading application, which may be referred to as second account. The electronic device may distinguish the first user and the second user based on the first account and the second account.
[0150] Based on this design, for the same story, different images may be displayed for different users, thereby enhancing the flexibility and diversity of display method for images and improving the reading experience of users.
[0151] FIG. 9 illustrates a schematic diagram of example interfaces for a user reading an e-book in accordance with some embodiments of present disclosure. As shown in FIG. 9, the mobile phone may display a first interface, which includes a first text of a first page of a story and a first image generated based on the first text of the first page. Besides, the mobile phone may also display a third interface, which includes the first text of the first page of the story and a third image generated based on the first text of the first page. The background of the third image is different from the background of the first image, and features of characters in the third image are different from features of the characters in the first image. For example, in the first image, the two characters are walking in the background with a moon, and one of them is holding an umbrella. In the third image, the two characters are walking in the background with a vehicle, and neither character is holding an umbrella. Besides, the face and hair of the female in the first interface are different from those of the female in the third interface. However, the features of the one or more objects and the background of subsequent images corresponding to subsequent texts of the story may be kept as consistent as possible.
[0152] A process of an electronic device performing the display method according to the embodiments of the present disclosure is illustrated below, using generation of images for stories in a reading application by an electronic device as an example, in combination with FIG. 10 and the method 500.
[0153] FIG. 10 illustrates a schematic diagram of a software architecture of an electronic device in accordance with some embodiments of the present disclosure.
[0154] As shown in FIG. 10, the software architecture 1000 may include a read application 1001. The read application is used by users to read stories, for example, which may be the reading application described with reference to FIG. 2. The read application 1001 is able to display interfaces involved in the embodiments of the present disclosure, such as the first interface, the second interface, the third interface, the fourth interface, the fifth interface, etc.
[0155] The software architecture 1000 may further include a book info module 1002. The book info module 1002 is used to generate book basic information based on story text. The book basic information may include at least one of an author’s name, a story location, story summary, etc. The generation process may be realized using an NLP model.
[0156] The software architecture 1000 may further include a text summarizer module 1003. The text summarizer module 1003 is used to summarize the story text to a summary of small length which may be given to later modules (for example, the template management module) . Besides, for example, the summarization process may be performed using an NLP model. And the text summarizer module 1003 may also summarize a current page which is displayed to the user.
[0157] The software architecture 1000 may further include a page NER Extractor module 1004. The page NER Extractor module 1004 is used for recognizing named entities in the story text. For example, named entities may be name, gender, age, region, location, etc. The recognition process may be performed using an NLP model.
[0158] The book info module 1002, text summarizer module 1003 and page NER extractor module 1004 may transmit the obtained information to a template management module 1005.
[0159] The software architecture 1000 may further include the template management module 1005. The template management module 1005 is used to manage one or more story templates, where a story template contains parameters of one or more scenes and parameters of one or more objects in a story, such as the parameters in the first template and the second template mentioned above.
[0160] The software architecture 1000 may further include a context memory module 1006. The context memory module 1006 is used to store the parameters in the templates.
[0161] The software architecture 1000 may further include a user selection module 1007. The user selection module 1007 is used to provide choices to users and generate parameters of one or more scenes and parameters of one or more objects in a template of a story based on user operations on the interface.
[0162] The software architecture 1000 may further include a prompt generator module 1008. The prompt generator module 1008 is used to generate prompt parameters based on the template and deliver the prompt parameters to an image generator module 1009.
[0163] The software architecture 1000 may further include the image generator module 1009. The image generator module 1009 is used to generate an image based on the prompt parameters. The image generator module 1009 may be a deep learning text-to-image conversion model like Latent Diffusion model. For example, the image generator module 1009 may use the Latent Diffusion model to generate the image. The images generated by the image generator module 1009 may be delivered to an image postprocessor module 1011 or to an object extractor module 1010.
[0164] The software architecture 1000 may further include the object extractor module 1010. The object extractor module 1010 is used to extract feature information of one or more objects and scene information in an image generated by the image generator module 1009, and deliver the feature information and scene information to the template management module 1005 to update values of one or more parameters in the template of the story.
[0165] The software architecture 1000 may further include the image post-processing module 1011. The image post-processing module 1011 is used to correct the image generated by the image generator module 1009 based on the template and deliver the resulting image to the read application 1001 so that the read application 1001 may display the image to the user.
[0166] Based on the software architecture 1000 of the electronic device illustrated in FIG. 10, the display method in the embodiments of this disclosure may include following steps. The read application 1001 displays the first text of the story, and the book info module 1002, text summarizer module 1003, and page NER extractor module 1004 extract relevant feature information and scene information by using language generation models (e.g., NLP model) based on the first text, which are then delivered to the template management module 1005 to obtain the one or more parameters in the first template in a case where the user does not set the features of the one or more objects or the one or more scenes. If the user sets the features of the one or more objects or the one or more scenes through the user selection module 1007, the features of the one or more objects or the one or more scenes set by the user are delivered to the template management module 1005 to obtain the one or more parameters in the first template. The prompt generator module 1008 uses the first template from the template management module 1005 to generate prompt parameters by using prompt generator models (e.g., large language model) . The image generator module 1009 generates a first candidate image based on the prompt parameters by using large image generation models (e.g., Latent Diffusion Model) . The object extractor module 1010 extracts one or more parameters of one or more object and background from a previous image of the first image by using NLP module, and then deliver the parameters to the template management module 1005 to update the first template. The image post processor module 1011 obtains the parameters in the updated first template from the template management module 1005 and processes the first candidate image based on the updated first template by using instance segmentation models (e.g., Faster R-CNN) and semantic segmentation models (e.g., UNet Model) to obtain a first image. Similarly, the book info module 1002, text summarizer module 1003, and page NER extractor module 1004 extract one or more parameters from the second text and then deliver it to the second template in the template management module 1005. The prompt generator module 1008 uses the second template to generate prompt parameters, and the image generator module 1009 generates a second candidate image based on the prompt parameters. The image post processor module 1011 processes the second candidate image based on the one or more parameters in the second template to obtain a second image.
[0167] In the embodiments of the present disclosure, without special instructions and logical conflicts, the terms and / or descriptions between different embodiments are consistent and can be referenced to each other, and the technical features in different embodiments can be combined to form new embodiments according to their inherent logical relationships.
[0168] It will be understood that, in order to achieve the above functions, the electronic device include corresponding hardware and / or software modules for implementing various functions. Those skilled persons in the art should easily realize that the embodiments of present disclosure can be implemented in the form of a hardware or a combination of hardware and computer software in combination with the units and algorithm steps described in the embodiments of the present disclosure. Whether a certain function is executed by hardware or by computer software driving hardware depends on the specific application and design constraint conditions of the technical solutions.
[0169] The display method provided in the embodiments of the present disclosure are described in detail above with reference to FIGS. 1 to 10. Next, the display apparatuses in the embodiments of the present disclosure will be described in detail below with reference to FIGS. 11 to 12.
[0170] FIGS. 11 to 12 are schematic structural diagrams of display apparatuses in accordance with some embodiments of the present disclosure. These display apparatuses can be used to realize the functions of the electronic device in the above method embodiments, and therefore can also achieve the beneficial effects of the above method embodiments. In the embodiments of the present disclosure, the display apparatus may be an electronic device as shown in FIG. 1 or FIG. 2.
[0171] As shown in FIG. 11, the display apparatus 1100 may include a processing unit 1110 and a display unit 1120. The display apparatus 1100 is used to implement the functions of the electronic device in the method embodiments shown in FIG. 5.
[0172] In some embodiments, the processing unit 1110 is coupled to the display unit 1120 and configured to: display, using the display unit 1120, a first interface of a reading application, where the first interface includes a first text of a story and a first image corresponding to the first text; display, using the display unit 1120, a second interface of the reading application, in response to a page-turning operation performed by a first user on the first interface, where the second interface includes a second text of the story and a second image corresponding to the second text; at least part of a background of the second image is same as a background of the first image, and at least part of features of one or more objects in the second image are same as features of the one or more objects in the first image.
[0173] In some embodiments, the first interface is displayed in response to an operation of the first user, and the processing unit 1110 is further configured to display, using the display unit 1120, a third interface of the reading application, in response to an operation of the second user, where the third interface includes the first text of the story and a third image corresponding to the first text, a background of the third image is different from the background of the first image, and / or features of the one or more objects in the third image are different from the features of the one or more objects in the first image.
[0174] In some embodiments, the first image is generated based on a first template of the story. The first template includes parameters of one or more scenes and parameters of the one or more objects, the background of the first image is generated based on the parameters of the one or more scenes included in the first template, and the features of the one or more objects in the first image are generated based on the parameters of the one or more objects included in the first template. The second image is generated based on a second template of the story, and the second template is obtained by updating values of one or more parameters in the first template with the second text.
[0175] In some embodiments, the parameters of a scene of the one or more scenes include at least one of: a region, location or time corresponding to the scene; and / or parameters of an object of the one or more objects include at least one of: a name, gender, role, facial features, costume, hair, size, or age of the object and a spatial position of the object in the image frame.
[0176] In some embodiments, the first image is a first generated image for the story, and the first template is extracted based on the first text.
[0177] In some embodiments, the first image is not a first generated image for the story, and the first template is generated by updating a template corresponding to a previous image of the first image based on the first text.
[0178] In some embodiments, the first template is determined based on a user input for the first image, and the user input is used to set at least one of the features of the one or more objects, or the one or more scenes.
[0179] In some embodiments, the user input is used to set the features of an object in the one or more objects, and the processing unit 1110 is further configured to: display, using the display unit 1120, a fourth interface of the reading application, where the fourth interface includes a first control, and the first control is used to select an image of the object, and the image of the object indicates the features of the object; and display, using the display unit 1120, a fifth interface of the reading application, in response to a user operation on the first control, where the fifth interface includes at least one of: one or more second controls for selecting one or more candidate images of the object, or a third control for selecting an image of the object captured by a camera, and the user input is a user operation on any one of the one or more second controls or the third control.
[0180] In some embodiments, the user input is used to set a scene in the one or more scenes, and the processing unit 1110 is further configured to display, using the display unit 1120, a sixth interface of the reading application, where the sixth interface includes at least one of: one or more fourth controls for selecting one or more candidate scenes, or an input box for inputting parameters of the scene, and the user input is a user operation on any of the one or more fourth controls or an input operation in the input box.
[0181] In some embodiments, the processing unit 1110 is further configured to obtain prompt parameters, where the prompt parameters are related to the first text; generate, based on the prompt parameters, a first candidate image using an image generation algorithm; and obtain the first image by processing the first candidate image based on the first template.
[0182] As for the detailed description of the steps performed by the processing unit 1110 and the display unit 1120, reference can be made to the relevant description in the method embodiments shown in the FIG. 5.
[0183] As shown in FIG. 12, the display apparatus 1200 includes a processor 1210 and an interface circuit 1220. The processor 1210 and the interface circuit 1220 are coupled to each other. It will be understood that the interface circuit 1220 may be a transceiver or an input / output interface. In some examples, the display apparatus 1200 may further include a memory 1230 for storing instructions executed by the processor 1210, or storing input data required by the processor 1210 for executing the instructions, or storing data generated after the processor 1210 executing the instructions. In some examples, the interface circuit 1220 may be understood as part of the processor 1210, and thus the display apparatus 1200 includes the processor 1210.
[0184] In a case where the display apparatus 1200 is used to implement the method shown in FIG. 5, the processor 1210 is used to implement the functions of the processing unit 1110.
[0185] In a case where the display apparatus is a chip applied to the electronic device, and the chip realizes the functions of the electronic device in the method embodiments. The chip receives information from the electronic device. It will be understood as that the information is first received by other modules (such as a radio frequency module or antenna) in the electronic device, and then sent to the chip by these modules.
[0186] It will be understood that, the processor in the embodiments of the present disclosure may be a central processing unit, or may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array, or any other programmable logic device, a transistor logic device, a hardware component or any combination thereof. The general-purpose processor may be a microprocessor, or any conventional processor.
[0187] Some embodiments of the present disclosure provide a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium) . The computer-readable storage medium has instructions stored thereon which, when executed by a display apparatus, cause the display apparatus to perform the display method corresponding to the electronic device in any of the above embodiments.
[0188] Some embodiments of the present disclosure provide a computer program product. The computer program product stores instructions which, when executed, cause a display apparatus to perform the display method corresponding to the electronic device in any of the above embodiments.
[0189] Some embodiments of the present disclosure provide a computer program. The computer program stores instructions which, when executed, causes a display apparatus to perform the display method corresponding to the electronic device in any of the above embodiments.
[0190] In the present disclosure, the terms “a” , “an” and “one” are defined to mean “at least one” , that is, these terms do not exclude a plural number of items, unless stated otherwise.
[0191] In the present disclosure, terms such as “substantially” , “generally” and “about” , which modify a value, condition or characteristic of a feature of an exemplary embodiment, should be understood to mean that the value, condition or characteristic is defined within tolerances that are acceptable for the proper operation of this exemplary embodiment for its intended application.
[0192] In the present disclosure, unless stated otherwise, the terms “connected” and “coupled” , and derivatives and variants thereof, refer herein to any structural or functional connection or coupling, either direct or indirect, between two or more elements. For example, the connection or coupling between the elements can be acoustical, mechanical, optical, electrical, thermal, logical, or any combinations thereof.
[0193] In the present disclosure, expressions such as “match” , “matching” and “matched” , including variants and derivatives thereof, are intended to refer herein to a condition in which two or more elements are either the same or within some predetermined tolerance of each other. That is, these terms are meant to encompass not only “exactly” or “identically” matching the two elements but also “substantially” , “approximately” or “subjectively” matching the two or more elements, as well as providing a higher or best match among a plurality of matching possibilities.
[0194] In the present disclosure, the expression “based on” is intended to mean “based at least partly on” , that is, this expression can mean “based solely on” or “based partially on” , and so should not be interpreted in a limited manner. More particularly, the expression “based on” can also be understood as meaning “depending on” , “representative of” , “indicative of”, “associated with” or similar expressions.
[0195] In the present disclosure, the terms "system" and "network" may be used interchangeably in embodiments of this disclosure. "At least one" means one or more, and "aplurality of" means two or more. The term "and / or" describes an association relationship of associated objects, and indicates that three relationships may exist. For example, A and / or B may indicate the following three cases: Only A exists, both A and B exist, and only B exists, where A and B may be singular or plural. The character " / " usually indicates an "or" relationship between associated objects. "At least one of the following items (pieces) " or a similar expression thereof indicates any combination of these items, including a single item (piece) or any combination of a plurality of items (pieces) . For example, "at least one of A, B, or C" includes A, B, C, A and B, A and C, B and C, or A, B, and C, and "at least one of A, B, and C" may also be understood as including A, B, C, A and B, A and C, B and C, or A, B, and C. In addition, unless otherwise specified, ordinal numbers such as "first" and "second" in embodiments of this disclosure are used to distinguish between a plurality of objects, and are not used to limit a sequence, a time sequence, priorities, or importance of the plurality of objects.
[0196] A person skilled in the art should understand that embodiments of the present disclosure may be provided as a method, an apparatus (or system) , computer-readable storage medium, or a computer program product. Therefore, this disclosure may use a form of a hardware-only embodiment, a software-only embodiment, or an embodiment with a combination of software and hardware. Moreover, this disclosure may use a form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a disk memory, an optical memory, and the like) that include computer-usable program code.
[0197] This disclosure is described with reference to the flowcharts and / or block diagrams of the method, the device (system) , and the computer program product. It should be understood that computer program instructions may be used to implement each process and / or each block in the flowcharts and / or the block diagrams and a combination of a process and / or a block in the flowcharts and / or the block diagrams. The computer program instructions may be provided for a general-purpose computer, a dedicated computer, an embedded processor, or a processor of another programmable data processing device to generate a machine, so that the instructions executed by the computer or the processor of the another programmable data processing device generate an apparatus for implementing a specific function in one or more procedures in the flowcharts and / or in one or more blocks in the block diagrams.
[0198] The computer program instructions may alternatively be stored in a computer-readable memory that can indicate a computer or another programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory generate an artifact that includes an instruction apparatus. The instruction apparatus implements a specific function in one or more procedures in the flowcharts and / or in one or more blocks in the block diagrams.
[0199] The computer program instructions may alternatively be loaded onto a computer or another programmable data processing device, so that a series of operations and steps are performed on the computer or the another programmable device, so that computer-implemented processing is generated. Therefore, the instructions executed on the computer or the another programmable device provide steps for implementing a specific function in one or more procedures in the flowcharts and / or in one or more blocks in the block diagrams.
[0200] It is clearly that a person skilled in the art can make various modifications and variations to this disclosure without departing from the scope of this disclosure. This disclosure is intended to cover these modifications and variations of this disclosure provided that they fall within the scope of protection defined by the following claims and their equivalent technologies.
Claims
1.A display method, comprising:displaying a first interface of a reading application, wherein the first interface comprises a first text of a story and a first image corresponding to the first text;displaying a second interface of the reading application in response to a page-turning operation performed by a first user on the first interface, wherein the second interface comprises a second text of the story and a second image corresponding to the second text;at least part of a background of the second image is same as a background of the first image, and at least part of features of one or more objects in the second image are same as features of the one or more objects in the first image.2.The display method of claim 1, wherein the first interface is displayed in response to an operation of the first user, and the method further includes:displaying a third interface of the reading application in response to an operation of the second user, wherein the third interface comprises the first text of the story and a third image corresponding to the first text, a background of the third image is different from the background of the first image, and / or features of the one or more objects in the third image are different from the features of the one or more objects in the first image.3.The display method of claim 1, wherein the first image is generated based on a first template of the story, the first template comprises parameters of one or more scenes and parameters of the one or more objects, the background of the first image is generated based on the parameters of the one or more scenes included in the first template, and the features of the one or more objects in the first image are generated based on the parameters of the one or more objects included in the first template;the second image is generated based on a second template of the story, and the second template is obtained by updating values of one or more parameters in the first template with the second text.4.The display method of claim 3, wherein parameters of a scene of the one or more scenes comprise at least one of: a region, location or time corresponding to the scene; and / orparameters of an object of the one or more objects comprise at least one of: a name, gender, role, facial features, costume, hair, size, or age of the object, or a spatial position of the object in the first image.5.The display method of claim 3 or 4, wherein the first image is a first generated image for the story, and the first template is extracted based on the first text.6.The display method of claim 3 or 4, wherein the first image is not a first generated image for the story, and the first template is generated by updating a template corresponding to a previous image of the first image based on the first text.7.The display method of any of claims 3 to 6, wherein the first template is determined based on a user input for the first image, and the user input is used to set at least one of: the features of the one or more objects, or the one or more scenes.8.The display method of claim 7, wherein the user input is used to set features of an object in the one or more objects, and before displaying the first interface of the reading application, the method further comprises:displaying a fourth interface of the reading application, wherein the fourth interface comprises a first control, and the first control is used to select an image of the object, and the image of the object indicates the features of the object;displaying a fifth interface of the reading application in response to a user operation on the first control, wherein the fifth interface comprises at least one of: one or more second controls for selecting one or more candidate images of the object, or a third control for selecting an image of the object captured by a camera, and the user input is a user operation on any one of the one or more second controls or the third control.9.The display method of claim 7 or 8, wherein the user input is used to set a scene in the one or more scenes, and before displaying the first interface of the reading application, the method further comprises:displaying a sixth interface of the reading application, wherein the sixth interface comprises at least one of one or more fourth controls for selecting one or more candidate scenes, or an input box for inputting parameters of the scene, and the user input is a user operation on any one of the one or more fourth controls or an input operation in the input box.10.The display method of any of claims 3 to 9, wherein before displaying the first interface, the method further comprises:obtaining prompt parameters, wherein the prompt parameters are related to the first text;generating, based on the prompt parameters, a first candidate image using an image generation algorithm; andobtaining the first image by processing the first candidate image based on the first template.11.A display apparatus, comprising:a display unit; anda processing unit coupled to the display unit and configured to:display, using the display unit, a first interface of a reading application, wherein the first interface comprises a first text of a story and a first image corresponding to the first text;display, using the display unit, a second interface of the reading application, in response to a page-turning operation performed by a first user on the first interface, wherein the second interface comprises a second text of the story and a second image corresponding to the second text;at least part of a background of the second image is same as a background of the first image, and at least part of features of one or more objects in the second image are same as features of the one or more objects in the first image.12.The display apparatus of claim 11, wherein the first interface is displayed in response to an operation of the first user, wherein the processing unit is further configured to display, using the display unit, a third interface of the reading application, in response to an operation of the second user, wherein the third interface comprises the first text of the story and a third image corresponding to the first text, a background of the third image is different from the background of the first image, and / or features of the one or more objects in the third image are different from the features of the one or more objects in the first image.13.The display apparatus of claim 11, wherein the first image is generated based on a first template of the story, the first template comprises parameters of one or more scenes and parameters of the one or more objects, the background of the first image is generated based on the parameters of the one or more scenes included in the first template, and the features of the one or more objects in the first image are generated based on the parameters of the one or more objects included in the first template;the second image is generated based on a second template of the story, and the second template is obtained by updating values of one or more parameters in the first template with the second text.14.The display apparatus of claim 13, wherein parameters of a scene of the one or more scenes comprise at least one of: a region, location or time corresponding to the scene; and / orparameters of an object of the one or more objects comprise at least one of: a name, gender, role, facial features, costume, hair, size, or age of the object and objects spatial position in the image frame.15.The display apparatus of claim 13 or 14, wherein the first image is a first generated image for the story, and the first template is extracted based on the first text.16.The display apparatus of claim 13 or 14, wherein the first image is not a first generated image for the story, and the first template is generated by updating a template corresponding to a previous image of the first image based on the first text.17.The display apparatus of any of claims 13 to 16, wherein the first template is determined based on a user input for the first image, and the user input is used to set at least one of the features of the one or more objects, or the one or more scenes.18.The display apparatus of claim 17, wherein the user input is used to set the features of an object in the one or more objects, the processing unit is further configured to:display, using the display unit, a fourth interface of the reading application, wherein the fourth interface comprises a first control, and the first control is used to select an image of the object, and the image of the object indicates the features of the object;display, using the display unit, a fifth interface of the reading application, in response to a user operation on the first control, wherein the fifth interface comprises at least one of: one or more second controls for selecting one or more candidate images of the object, or a third control for selecting an image of the object captured by a camera, and the user input is a user operation on any one of the one or more second controls or the third control.19.The display apparatus of claim 17 or 18, wherein the user input is used to set a scene in the one or more scenes, wherein the processing unit is further configured to display, using the display unit, a sixth interface of the reading application, wherein the sixth interface comprises at least one of: one or more fourth controls for selecting one or more candidate scenes, or an input box for inputting parameters of the scene, and the user input is a user operation on any of the one or more fourth controls or an input operation in the input box.20.The display apparatus of any of claims 13 to 19, wherein the processing unit is further configured to:obtain prompt parameters, wherein the prompt parameters are related to the first text;generate, based on the prompt parameters, a first candidate image using an image generation algorithm; andobtain the first image by processing the first candidate image based on the first template.21.A display apparatus comprising:one or more processors; anda memory storing instructions which, when executed by the one or more processors, cause the display apparatus to perform the method of any one of claims 1 to 10.22.A computer-readable storage medium having instructions stored thereon which, when executed by a display apparatus, cause the display apparatus to perform the methods of any one of claims 1 to 10.23.A computer program product storing instructions which, when executed, cause a display apparatus to perform the method of any one of claims 1 to 10.