Information display method and electronic device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a multimodal fusion recommendation method that integrates semantics and images, this approach addresses the problem of image recommendations not matching user intent in existing technologies, achieving more accurate and narrative-driven image recommendation results.

WO2026123859A1PCT designated stage Publication Date: 2026-06-18HUAWEI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-09-15
Publication Date: 2026-06-18

Smart Images

Figure CN2025121252_18062026_PF_FP_ABST

Patent Text Reader

Abstract

Embodiments of the present invention disclose an information display method and an electronic device. The method comprises: in response to an operation of inputting first information in a search control, searching an image library on the basis of a multi-modal fusion of semantics of the first information and images to obtain second information, the first information being a text segment or a document-type file, and the second information comprising a plurality of first images found from the image library and matching the first information; and, displaying at least one image set on the basis of the second information, the images in the image set being from the plurality of first images in the second information. Multi-modal fusion based on semantics and images is used for recommendation, thereby making the recommendation results more consistent with a user's intent.

Need to check novelty before this filing date? Find Prior Art

Description

An information display method and electronic device

[0001] This application claims priority to Chinese Patent Application No. 202411844251.8, filed on December 13, 2024, entitled "An Information Display Method and Electronic Device", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This invention relates to the field of computer technology, and in particular to an information display method and an electronic device. Background Technology

[0003] When electronic devices recommend images to users, they typically retrieve results based on inherent image attributes such as time, location, or tags. Relying on these inherent attributes or tags for identification results in insufficient richness of the retrieval sources, often leading to recommendations that do not include the images the user most desires. Therefore, the recommended results usually do not match the user's intent. Summary of the Invention

[0004] In view of this, embodiments of the present invention provide an information display method and an electronic device that make recommendation results more consistent with user intent based on multimodal fusion recommendation of semantics and images.

[0005] In a first aspect, embodiments of the present invention provide an information display method, the method comprising:

[0006] In response to the operation of entering first information in the search control, the first information is a text or document file. Based on the semantics of the first information and the multimodal fusion of the images, the image library is searched to obtain second information, which includes multiple first images that match the first information found in the image library.

[0007] The second information displays at least one image set, where the images in the image set are derived from multiple first images in the second information.

[0008] This invention enables multimodal fusion search based on semantics and images, rather than simply matching basic image attributes to keywords. Therefore, it can search based on a user-input text or document and retrieve relevant images, making the second set of search results more relevant to the first set and better aligned with the user's intent. Because the second set of search results better matches the user's intent, the at least one set of images displayed based on that second set also better aligns with the user's intent.

[0009] In conjunction with the first aspect, in some implementations of the first aspect, the second information is obtained by searching the image library based on the semantics of the first information and the multimodal fusion of images, including:

[0010] Based on the semantics of the first information, search the image library for the first image corresponding to the image features that match the semantics of the first information. The image features include the first type of features, which includes the scene features of the first image.

[0011] Compared to image features that only include basic image attributes, the image features in this embodiment of the invention also include scene features. Scene features are more flexible and contain richer scene information, allowing for the association and generation of more scenes, thus providing users with images that are more scene-related.

[0012] Compared to keyword retrieval based solely on the basic attributes of images, the scene features of images in this embodiment of the invention enhance the user's search context, making the scene stories that users can retrieve and piece together richer, thus enriching the data sources for retrieval or recommendation.

[0013] In conjunction with the first aspect, in some implementations of the first aspect, at least one image set may be multiple;

[0014] Display at least one set of images based on the second information, including:

[0015] Based on the second information, at least one image set is displayed in order of first relevance, where the first relevance is the degree of relevance between the image set and the first information.

[0016] In this embodiment of the invention, the search results based on the first information include at least one image set, and the image sets can be sorted according to the relevance between the image sets and the first information. Compared to sorting by time or basic attributes, this embodiment of the invention provides a new sorting method that can improve the user's search efficiency and enable the user to quickly locate the image set they want to view.

[0017] In conjunction with the first aspect, some implementations of the first aspect also include: displaying a first number of second images based on the second information, wherein the second images are a subset of images selected from the image set based on a recommendation strategy.

[0018] In this embodiment of the invention, the search results based on the first information also include a first number of second images, which are a subset of images selected from the image set based on a recommendation strategy. This embodiment of the invention prioritizes displaying the selected images to the user, creating a more memorable and preferred result for the user.

[0019] In conjunction with the first aspect, some implementations of the first aspect also include:

[0020] At least one image set is obtained by clustering multiple first images according to at least one dimension.

[0021] In this embodiment of the invention, search results are clustered and grouped according to at least one dimension to obtain at least one image set. The dimension can be time, location, theme, or style, etc.

[0022] For example, the clustering method can be a clustering algorithm, such as the K-means++ algorithm.

[0023] In conjunction with the first aspect, some implementations of the first aspect also include:

[0024] The allocation quota for each image set is obtained by allocating quotas to each image set based on the first quantity;

[0025] The images in the image collection are rated to obtain image scores;

[0026] Based on the allocated quota, the image with the highest score is selected from the image set as the second image.

[0027] Since the recommended results contain multiple image sets, only the first number of results can be displayed on the first screen or primary page. Therefore, the recommendation strategy of this embodiment first clusters the second information according to certain dimensions, and allocates quotas to each group based on the first number. Then, based on the image ratings within each group, images with higher ratings are selected from each group according to the allocated quotas and displayed to the user first. Compared to sorting by basic dimensions, the images optimized by the recommendation strategy in this embodiment can be grouped based on dimensions such as time uniformity, location uniformity, and theme uniformity before selection from each group, creating a more memory-based selection result for the user.

[0028] In conjunction with the first aspect, some implementations of the first aspect also include the following methods:

[0029] The second correlation degree is obtained based on the first information and the first image. The second correlation degree is the correlation degree between the first image and the first information.

[0030] The first correlation of the image set is obtained based on the second correlation of the images in the image set.

[0031] In this embodiment of the invention, since image features include scene features, the correlation between the image and the first information, i.e., the correlation between the image and the scene, can be obtained by calculating the correlation between the image features and the first information. Then, based on the correlation between the image and the first information, the correlation between the image set and the first information, i.e., the correlation between the image set and the scene, can be obtained.

[0032] In conjunction with the first aspect, in certain implementations of the first aspect, clustering multiple first images according to at least one dimension to obtain at least one image set includes:

[0033] In response to a user's setting operation for at least one dimension, multiple first images are clustered and grouped according to at least one dimension to obtain at least one image set.

[0034] In this embodiment of the invention, users can customize the dimensions of clustering groups according to their needs.

[0035] In conjunction with the first aspect, in certain implementations of the first aspect, images in the image set are scored to obtain image scores, including:

[0036] In response to a user's setting operation on at least one image parameter, the images in the image set are rated based on at least one image parameter to obtain an image rating.

[0037] In this embodiment of the invention, users can customize the image parameters used for image scoring according to their needs, thereby flexibly setting the scoring criteria as required.

[0038] In conjunction with the first aspect, in certain implementations of the first aspect, the allocation quota for each image set is obtained by allocating quotas according to the first quantity, including:

[0039] Input the first quantity and at least one image set into the seat allocation model to obtain the allocation quota of the image set output by the seat allocation model.

[0040] For example, when a user performs a photo search, if the first-level search results page can only display a first number of images, then a quota needs to be allocated to each image set based on the first number. Furthermore, in most scenarios, a specific number of images need to be selected to prioritize and present them to the user. Image selection can also satisfy a sense of nostalgia to a certain extent (e.g., even distribution of time and location), giving the image set a narrative quality.

[0041] For example, the seat allocation model could be the Q-value method or the maximum remainder method, etc.

[0042] In conjunction with the first aspect, in some implementations of the first aspect, prior to the confirmation operation of the first information input to the search control, the method further includes:

[0043] Get the image;

[0044] The first type of feature is obtained from the image.

[0045] In this embodiment of the invention, the first type of feature of the image, namely the scene feature, is obtained based on the image itself.

[0046] In conjunction with the first aspect, in some implementations of the first aspect, image features also include a second type of features, which includes the image's CV features and basic attribute features; the first type of features obtained from the image include:

[0047] The second type of feature is obtained from the image;

[0048] Based on the image and the second type of features, the first type of features are obtained.

[0049] In this embodiment of the invention, the image features include a first type of feature and a second type of feature. The second type of feature is obtained based on the image itself, while the first type of feature is obtained based on both the image itself and the second type of feature.

[0050] In conjunction with the first aspect, in some implementations of the first aspect, the second type of features are obtained from the image, including:

[0051] Input the image into the AI model to obtain the second type of features output by the AI model.

[0052] In this embodiment of the invention, an AI model is used to preprocess and extract a second type of feature that represents CV features such as face and aesthetics, as well as basic image attributes (such as shooting time, location, and tone).

[0053] In conjunction with the first aspect, in some implementations of the first aspect, the first type of features are obtained based on the image and the second type of features, including:

[0054] The image and the second type of features are input into the multimodal large model to obtain the first type of features output by the multimodal large model.

[0055] In this embodiment of the invention, the encoded image and the second type of features of the image are input into the multimodal large model to obtain the first type of features output by the multimodal large model.

[0056] In conjunction with the first aspect, some implementations of the first aspect also include:

[0057] In response to an operation on the second control, the interface switches from the first interface to the second interface. The first interface is used to display at least one set of images, and the second interface is used to display multiple first images in a sorted order based on a second degree of relevance, where the second degree of relevance is the degree of relevance between the first image and the first information.

[0058] In this embodiment of the invention, the primary interface of the search results also includes a second control, which allows users to operate on the second control to jump from the primary interface to the secondary interface of the search results.

[0059] For example, the first-level search results interface displays only a portion of the images in the search results, while the second-level search results interface displays all the images in the search results.

[0060] Since the correlation between the first image and the first piece of information has been calculated, the images in the secondary interface can be sorted according to their correlation, which helps improve the accuracy of search results and retrieval efficiency.

[0061] Secondly, embodiments of the present invention provide an electronic device, including a processor and a memory, wherein the memory is used to store a computer program, the computer program including program instructions, and when the processor executes the program instructions, the electronic device performs the steps of the method described above.

[0062] Thirdly, embodiments of the present invention provide a computer-readable storage medium storing a computer program, the computer program including program instructions that, when the program requests to be run by a computer, cause the computer to perform the method described above.

[0063] Fourthly, embodiments of the present invention provide a computer program product comprising instructions that, when the computer program product is run on a computer or any at least one processor, cause the computer to perform the functions / steps as described above.

[0064] The present invention provides an information display method and an electronic device. The method includes: responding to an operation of inputting first information (a text or document file) into a search control; searching an image library based on the semantics of the first information and multimodal fusion of images to obtain second information, the second information including multiple first images found in the image library that match the first information; and displaying at least one image set based on the second information, the images in the image set being derived from the multiple first images in the second information. This multimodal fusion recommendation based on semantics and images makes the recommendation results more consistent with the user's intent. Attached Figure Description

[0065] Figure 1 is a schematic diagram of an image recommendation interface;

[0066] Figure 2 is a schematic diagram of an image search interface;

[0067] Figure 3 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention;

[0068] Figure 4 is a software structure block diagram of the electronic device 100 according to an embodiment of the present invention;

[0069] Figure 5 is an architecture diagram of an information display system provided in an embodiment of the present invention;

[0070] Figure 6 is a schematic diagram of an image recommendation interface in an embodiment of the present invention;

[0071] Figure 7 is a schematic diagram of an image search interface according to an embodiment of the present invention;

[0072] Figure 8 is a schematic diagram of the internal implementation of an information display method provided in an embodiment of the present invention;

[0073] Figure 9 is a schematic diagram of the information display method flow in an embodiment of the present invention;

[0074] Figure 10 is a schematic diagram of inputting an image and second type of features into a multimodal large model to obtain the first type of features in an embodiment of the present invention;

[0075] Figure 11 is a schematic diagram of the multimodal large model outputting scene descriptions based on images and formatted prompts in an embodiment of the present invention;

[0076] Figure 12 is a schematic diagram of the internal implementation of the retrieval engine in Figure 8, which searches for second information from the image library based on the semantics of the first information and the multimodal fusion of the images. The second information contains multiple first images related to the first information.

[0077] Figure 13 is a schematic diagram of the internal implementation of the filtering and sorting module in Figure 8 to obtain at least one set of images based on the second information;

[0078] Figure 14 is a schematic diagram of an image search interface according to an embodiment of the present invention;

[0079] Figure 15 is a flowchart of the sorting module in Figure 13 sorting at least one set of images according to the correlation between the first image and the first information;

[0080] Figure 16 is a schematic diagram of the internal implementation of the filtering and sorting module in Figure 8, which obtains at least one image set and a first number of second images based on the second information.

[0081] Figure 17 is a schematic diagram of the process of selecting images using a recommendation strategy in an embodiment of the present invention;

[0082] Figure 18 is a schematic diagram of clustering multiple first images according to the time dimension in an embodiment of the present invention;

[0083] Figure 19 is a flowchart of the recommendation strategy module in Figure 16 selecting at least one second image from at least one image set according to the recommendation strategy.

[0084] Figure 20 is a schematic diagram of the allocation quota for each image set in an embodiment of the present invention;

[0085] Figure 21 is a schematic diagram of another image search interface in an embodiment of the present invention;

[0086] Figure 22 is a schematic diagram of selecting the second image from the image set according to the recommended quota in an embodiment of the present invention;

[0087] Figure 23 is a schematic diagram of the process of selecting images using a recommendation strategy in an embodiment of the present invention;

[0088] Figure 24 is a schematic diagram of the search results first-level interface displayed after the user inputs the first information in an embodiment of the present invention;

[0089] Figure 25 is a schematic diagram of user operation of the first control in an embodiment of the present invention;

[0090] Figure 26 is a schematic diagram of switching from the first interface to the second interface in an embodiment of the present invention;

[0091] Figure 27 is a schematic diagram of user operation of the third control in an embodiment of the present invention;

[0092] Figure 28 is a schematic diagram of the structure of a device provided in an embodiment of the present invention. Detailed Implementation

[0093] To better understand the technical solution of the present invention, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0094] It should be understood that the described embodiments are merely some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.

[0095] The terminology used in the embodiments of this invention is for the purpose of describing particular embodiments only and is not intended to limit the invention. The singular forms “a,” “the,” and “the” as used in the embodiments of this invention and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.

[0096] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.

[0097] Today, mobile phone manufacturers offer a wide variety of gallery apps with diverse functions, providing users with various memory recommendation scenarios and keyword search capabilities, offering a convenient image management platform. On the one hand, in photo recommendation scenarios, such as the "Highlights" section of the gallery app, the app typically uses time, location, or people as themes, aggregates images into groups with auxiliary image tags, and then selects high-quality photos based on existing comprehensive scores. These selections often do not conform to the logic of personal memories, have limited scenario coverage, and lack narrative. On the other hand, in photo search scenarios, the app matches people, tags, locations, and Optical Character Recognition (OCR) text based on user-input keywords. This is mostly limited to image feature text matching or vector retrieval, relying on inherent attributes, and the search dimensions are not sufficient. The search results are also sorted only according to a single dimension (such as reverse chronological order).

[0098] Figure 1 is a schematic diagram of an image recommendation interface. As shown in Figure 1, the "Highlights" interface of the photo library application sorts images recommended by the user according to the inherent attributes of the file, such as time, location, and tags, thus recommending many memories-related images. However, sorting only by inherent file attributes such as time and location has limited coverage and lacks narrative appeal.

[0099] Figure 2 is a schematic diagram of an image search interface. As shown in Figure 2, the image library application recommends images and albums related to "birthday" to the user based on the keyword "birthday" entered in the search control. However, the recommendation process is based on keyword matching, that is, retrieving results by text matching based on basic attributes such as people, time, location, and tags, which cannot meet the user's requirements for specific scenarios. Albums are only clustered according to the inherent attributes of files such as time and location, which has limited scenario coverage and lacks narrative. Search results are retrieved according to a fixed sorting method, resulting in a large number of similar and low-quality images, leading to low retrieval efficiency. Image selection is based solely on a predetermined scoring model, assigning scores to each image and selecting the highest-scoring photos, which easily leads to images with similar time periods and styles, lacking narrative and a sense of nostalgia. For example, the six images displayed under "Search Results" in Figure 2 are all from the same time period, showing similar time periods and styles, lacking narrative and a sense of nostalgia.

[0100] To address the low efficiency of image recommendation, one approach uses an event parsing module to obtain file attribute parameters and image attribute parameters for images. Based on these parameters, photo files are aggregated and categorized, resulting in a collection of photos with high event relevance, which improves the efficiency of subsequent photo search and location. However, image attribute parameters only consider basic features such as hue and contrast, failing to accurately describe the content and events of the images, resulting in limited aggregated scene information. Furthermore, the lack of optimization or ranking strategies for search results means that user retrieval efficiency remains low in scenarios with a large number of similar or duplicate images.

[0101] Therefore, image recommendation has the following technical problems:

[0102] Problem 1: Often, search results are simply matched to different image features (such as person ID, tag category, time, location, album name, OCR-recognized text, etc.) based on user-input keywords, which may not reflect the user's intent. Firstly, these text matching or vector retrieval methods, often limited to image features, rely on inherent attributes or tag recognition, resulting in insufficient richness of search sources. For example, users can only find relevant images by entering specific keywords (such as "dinner party" or "food"); however, if the user wants to enter a phrase (such as "this year's highlight moment") or unspecified keywords (such as "delicious"), no relevant images will be found. Secondly, search results are mostly sorted simply in reverse chronological order, and the top results are often not the most relevant to the search scenario.

[0103] Question 2: Applications typically generate memory themes based on time, location, or people, and group them with auxiliary image tag information. After comprehensive scoring based on existing features, images are selected from high to low. The selection results often do not conform to the logic of people's memories (such as uniform time and location), have limited coverage of scenarios, and lack narrative.

[0104] To address the aforementioned technical issues, this invention provides an information display system that uses multimodal fusion recommendation based on semantics and images to make the recommendation results more aligned with the user's intent.

[0105] The information display system provided in this embodiment of the invention can be entirely deployed in electronic devices. Figure 3 shows a schematic diagram of the structure of the electronic device 100.

[0106] Electronic device 100 may include processor 110, external memory interface 120, internal memory 121, universal serial bus (USB) interface 130, charging management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, a barometric pressure sensor 180C, a magnetic sensor 180D, an accelerometer sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.

[0107] It is understood that the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100. In other embodiments of this application, the electronic device 100 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0108] Processor 110 may include one or more processing units, such as application processor (AP), modem processor, graphics processing unit (GPU), image signal processor (ISP), controller, video codec, digital signal processor (DSP), baseband processor, and / or neural network processing unit (NPU). These different processing units may be independent devices or integrated into one or more processors.

[0109] The controller can generate operation control signals based on the instruction opcode and timing signals to complete the control of instruction fetching and execution.

[0110] The processor 110 may also include a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. This memory can store instructions or data that the processor 110 has just used or that are used repeatedly. If the processor 110 needs to use the instruction or data again, it can retrieve it directly from the memory. This avoids repeated accesses, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.

[0111] In some embodiments, the processor 110 may include one or more interfaces. Interfaces may include integrated circuit interfaces, integrated circuit built-in audio interfaces, pulse code modulation interfaces, universal asynchronous transceiver interfaces, mobile industry processor interfaces, universal input / output interfaces, user identity module interfaces, and / or universal serial bus interfaces, etc.

[0112] It is understood that the interface connection relationships between the modules illustrated in the embodiments of the present invention are merely illustrative and do not constitute a structural limitation on the electronic device 100. In other embodiments of this application, the electronic device 100 may also employ different interface connection methods or combinations of multiple interface connection methods as described in the above embodiments.

[0113] The charging management module 140 receives charging input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 receives charging input from the wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 receives wireless charging input via the wireless charging coil of the electronic device 100. While charging the battery 142, the charging management module 140 can also supply power to the electronic device via the power management module 141.

[0114] The power management module 141 connects the battery 142, the charging management module 140, and the processor 110. The power management module 141 receives input from the battery 142 and / or the charging management module 140, providing power to the processor 110, internal memory 121, display screen 194, camera 193, and wireless communication module 160, etc. The power management module 141 can also monitor parameters such as battery capacity, battery cycle count, and battery health status (leakage current, impedance). In some other embodiments, the power management module 141 may also be located within the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be located in the same device.

[0115] The wireless communication function of electronic device 100 can be realized through antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, modem processor and baseband processor, etc.

[0116] Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 can be used to cover one or more communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna for a wireless local area network. In some other embodiments, the antennas can be used in conjunction with tuning switches.

[0117] The mobile communication module 150 can provide solutions for wireless communication, including 2G / 3G / 4G / 5G, applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves via antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves before transmitting them to a modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves for radiation via antenna 1. In some embodiments, at least some functional modules of the mobile communication module 150 may be housed in the processor 110. In some embodiments, at least some functional modules of the mobile communication module 150 and at least some modules of the processor 110 may be housed in the same device.

[0118] The wireless communication module 160 can provide solutions for wireless communication applications on the electronic device 100, including wireless local area networks (such as Wi-Fi), Bluetooth, global navigation satellite systems, frequency modulation, short-range wireless communication technologies, and infrared technologies. The wireless communication module 160 can be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via antenna 2, performs frequency modulation and filtering of the electromagnetic wave signals, and sends the processed signal to processor 110. The wireless communication module 160 can also receive signals to be transmitted from processor 110, perform frequency modulation and amplification, and convert them into electromagnetic waves for radiation via antenna 2.

[0119] Electronic device 100 implements display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and for graphics rendering. Processor 110 may include one or more GPUs, which execute program instructions to generate or modify display information.

[0120] Electronic device 100 can perform shooting functions through ISP, camera 193, video codec, GPU, display 194 and application processor.

[0121] The external storage interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external storage interface 120 to perform data storage functions. For example, music, video, and other files can be saved on the external memory card.

[0122] Internal memory 121 can be used to store computer executable program code, which includes instructions. Internal memory 121 may include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback, image playback, etc.), etc. The data storage area may store data created during the use of electronic device 100 (such as audio data, phonebook, etc.). Furthermore, internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, general-purpose flash memory, etc. Processor 110 executes various functional applications and data processing of electronic device 100 by running instructions stored in internal memory 121 and / or instructions stored in memory located in the processor.

[0123] Electronic device 100 can implement audio functions, such as music playback and recording, through audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, and application processor.

[0124] Buttons 190 include a power button, volume buttons, etc. Buttons 190 can be mechanical buttons or touch-sensitive buttons. Electronic device 100 can receive button input and generate key signal inputs related to user settings and function control of electronic device 100.

[0125] Motor 191 can generate vibration alerts. Motor 191 can be used for incoming call vibration alerts or for touch vibration feedback. For example, different vibration feedback effects can correspond to touch operations performed on different applications (such as taking photos, playing audio, etc.). Motor 191 can also correspond to different vibration feedback effects for touch operations performed on different areas of the display screen 194. Different application scenarios (such as time reminders, receiving messages, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also be customized.

[0126] Indicator 192 can be an indicator light, used to indicate charging status, power changes, or to indicate messages, missed calls, notifications, etc.

[0127] The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to make contact with and separate from the electronic device 100. The electronic device 100 interacts with the network through the SIM card to realize functions such as making calls and data communication.

[0128] The software system of electronic device 100 can adopt a layered architecture, event-driven architecture, microkernel architecture, microservice architecture, or cloud architecture. This embodiment of the invention uses the layered architecture Android system as an example to exemplify the software structure of electronic device 100.

[0129] Figure 4 is a software structure block diagram of an electronic device 100 according to an embodiment of the present invention.

[0130] A layered architecture divides software into several layers, each with a clear role and function. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into four layers, from top to bottom: the application layer, the application framework layer, the Android runtime and system libraries, and the kernel layer.

[0131] The application layer can include a series of application packages.

[0132] As shown in Figure 4, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and SMS.

[0133] The application framework layer provides application programming interfaces (APIs) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

[0134] As shown in Figure 4, the application framework layer may include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.

[0135] The window manager is used to manage windowed applications. It can retrieve screen size, determine the presence of a status bar, lock the screen, and capture screenshots, among other things.

[0136] Content providers store and retrieve data, making that data accessible to applications. This data may include videos, images, audio, made and received phone calls, browsing history and bookmarks, phone books, etc.

[0137] A view system includes visual controls, such as controls for displaying text and controls for displaying images. View systems can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text notification icon could include views for displaying text and views for displaying images.

[0138] The phone manager is used to provide communication functions for electronic device 100. For example, it manages call status (including connection and disconnection).

[0139] The file explorer provides applications with various resources, such as localized strings, icons, images, layout files, video files, and more.

[0140] The notification manager allows applications to display notifications in the status bar. These notifications can be used to deliver informational messages and can disappear automatically after a short pause, requiring no user interaction. For example, the notification manager can be used to notify users of completed downloads or message alerts. The notification manager can also display notifications as icons or scrolling text in the top status bar, such as notifications from background applications, or as dialog boxes on the screen. Examples include displaying text messages in the status bar, emitting sounds, vibrating electronic devices, and flashing indicator lights.

[0141] The Android Runtime consists of core libraries and a virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.

[0142] The core library consists of two parts: one part is the functionalities that need to be called by the Java language, and the other part is the Android core library.

[0143] The application layer and application framework layer run in a virtual machine. The virtual machine executes the Java files of the application layer and application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, security and exception management, and garbage collection.

[0144] System libraries can include multiple functional modules. For example: surface manager, media libraries, 3D graphics processing libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), etc.

[0145] The Surface Manager is used to manage the display subsystem and provides the blending of 2D and 3D layers for multiple applications.

[0146] The media library supports playback and recording of various common audio and video formats, as well as still image files. It supports multiple audio and video encoding formats, such as MPEG4, H.264, MP3, AAC, AMR, JPG, and PNG.

[0147] The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.

[0148] A 2D graphics engine is a graphics engine for 2D drawing.

[0149] The kernel layer is the layer between hardware and software. The kernel layer contains at least the display driver, camera driver, audio driver, and sensor driver.

[0150] Figure 5 is an architecture diagram of an information display system provided in an embodiment of the present invention.

[0151] As shown in Figure 5, the information display system includes a storage module 210, a preprocessing module 220, a multi-modal scene analysis module 230, a first application 240, a retrieval engine 250, and a filtering and sorting module 260.

[0152] For example, storage module 210 is used to store the source data (also called "image library source data") of images in the image library and to store image features. Image features include a first type of feature x1 and a second type of feature x2 of the image.

[0153] For example, the preprocessing module 220 inputs the image from the storage module 210 into an artificial intelligence (AI) model to extract the second type of features x2 from the image. The second type of features x2 includes the image's computer vision (CV) features and basic attribute features. The preprocessing module 202 uses the AI model to analyze objective features such as people and tags in the image, and combines these with the image's basic attributes to generate the second type of features x2.

[0154] For example, the multimodal scene analysis module 230 is used for data preparation for retrieving and recommending scenes. Specifically, it inputs the image and the second-class feature x2 into a multimodal large language model (MLLM) to obtain the first-class feature x1 of the image. The first-class feature x1 contains the scene features of the image. Combining the encoded image itself and the generated second-class feature x2, the multimodal large language model is applied to interpret and describe the scene depicted in the image, generate a scene description, and extract richer contextual information as the first-class feature x1 of the image.

[0155] For example, the first application 240 includes a first interface. The first interface is an image search interface or an image recommendation interface.

[0156] When the first interface is an image search interface, it includes a search control for the user to enter initial information S in the search box. The initial information S is a text or document file.

[0157] When the first interface is an image recommendation interface, the first information S is the scenario information configured by the operation and maintenance department.

[0158] For example, the retrieval engine 250 is used to search the image library based on the first information S to obtain the second information, which includes multiple first images found in the image library that are associated with the first information S and their first type of feature x1 and second type of feature x2.

[0159] For example, the retrieval engine 250 includes a Natural Language Processing (NLP) module 251 and a result recall module 252.

[0160] For example, NLP module 251 is used to parse the first information S through NLP to obtain the semantics of the first information S.

[0161] For example, the result retrieval module 252 is used to retrieve second information from the image library using a vector retrieval method based on the image features of the images in the image library. The second information includes a first image corresponding to the image features that semantically match the first information S.

[0162] For example, the filtering and sorting module 260 includes a correlation calculation module 261, a recommendation strategy module 262, and a sorting module 263.

[0163] For example, the correlation calculation module 261 is used to calculate the semantic similarity between the image features of the first image and the first information S through a semantic similarity calculation model (such as cosine similarity of vector embedding), and use the calculated semantic similarity as the correlation α between the first image and the first information S.

[0164] For example, the recommendation strategy module 262 is used to recommend or retrieve image sets that are more in line with memory logic in the scenario, and enhance the scenario relevance and narrative of the image result set. Specifically, it is used to group multiple first images in the second information to obtain at least one image set, and to perform clustering and selection of images in each image set in line with memory logic based on the data optimization strategy of the seat allocation model.

[0165] For example, the sorting module 263 is used to sort the results output by the recommendation strategy module 262 according to predetermined rules.

[0166] For example, the first application 240 is also used to display at least one image set in a first interface based on the sorting result output by the sorting module 263, wherein the images in the image set are from multiple images in the second information.

[0167] The information display system provided in this embodiment of the invention can also be deployed in multiple different devices.

[0168] For example, the storage module 210, preprocessing module 220, first application 240, retrieval engine 250 and filtering and sorting module 260 are deployed in an electronic device, and the multi-mode scene analysis module 230 is deployed in a server, with communication connection between the electronic device and the server.

[0169] For example, electronic devices include, but are not limited to, mobile phones, personal computers (PCs), tablets, watches, smart screens, in-vehicle systems, and all other terminal devices with image management functions.

[0170] The hardware and software structures of the electronic device provided in this embodiment of the invention can be found in the relevant descriptions of the electronic device 100 in Figures 3 and 4.

[0171] Based on the above information display system, this embodiment of the invention provides an information display method that uses multimodal fusion recommendation based on semantics and images to make the recommendation results more in line with the user's intent.

[0172] The information display method provided in this embodiment of the invention has the following two application scenarios:

[0173] Scenario 1: Image Recommendation Scenario. Figure 6 is a schematic diagram of an image recommendation interface in an embodiment of the present invention. When a user opens the image management program, as shown in Figure 6, the electronic device can use a data optimization strategy based on seat allocation to recommend images related to the user's memories in the image recommendation interface. First, images related to the scene are selected from the image set. The selected images are clustered according to at least one dimension. Then, a seat allocation model is used to allocate a specified number of seats to each group of images. Based on the number of seats in each group, the image with the highest score is selected from each group of images to ensure that the selected images are of high quality and evenly distributed in dimensions. Finally, the selected images are sorted according to their relevance to the scene, thereby selecting and sorting photo groups that are more in line with the memory logic for the user.

[0174] Scenario 2: Image Search Scenario. Figure 7 is a schematic diagram of an image search interface in an embodiment of the present invention. When a user opens the image management program, as shown in Figure 7, the electronic device can search for richer image results for the user in the image search interface, so that even if the user searches for scenes in the images, the results can be retrieved, providing the user with richer search sources and providing a sorted display of scene relevance. At the same time, according to the data optimization strategy shown in Figure 7, a specified number of images can be selected from all the searched images for the user, and displayed first in the first-level interface of the search results, and then all the searched images can be displayed in the second-level interface of the search results.

[0175] Figure 8 is a schematic diagram of the internal implementation of an information display method provided in an embodiment of the present invention. As shown in Figure 8, the method includes steps 302-306.

[0176] Step 302: The retrieval engine searches for second information from the image library based on the semantics of the first information and the multimodal fusion of the images. The second information contains multiple first images related to the first information.

[0177] For example, the first piece of information is a text or document file entered by the user, or scenario information set up by the operation and maintenance system.

[0178] The second information includes multiple first images related to the first information, or it can be understood as: the second information includes multiple first images that match the first information.

[0179] In this embodiment of the invention, the search is performed based on the semantics of the first information and the multimodal fusion of the image, so that the searched second information is more relevant to the first information and better matches the user's intent.

[0180] Step 304: The filtering and sorting module obtains at least one image set based on the second information. The images in the image set come from multiple first images in the second information.

[0181] Step 306: The first application displays at least one set of images.

[0182] For example, the first application is a photo gallery application.

[0183] In this embodiment of the invention, at least one set of images is displayed based on the second information. Since the searched second information is more in line with the user's intent, the at least one set of images displayed based on the second information is also more in line with the user's intent.

[0184] Optionally, as shown in Figure 8, steps 402-418 are included before step 302.

[0185] Step 402: The storage module stores the acquired images.

[0186] For example, images can be acquired by electronic devices through a camera or by receiving images sent by other devices.

[0187] Step 404: The storage module sends the image to the preprocessing module.

[0188] Step 406: The preprocessing module preprocesses the image to obtain the second type of features of the image.

[0189] For example, the second type of features are the image's CV features and basic attribute features.

[0190] Among them, CV features can represent information such as faces and aesthetics in an image; basic attribute features include information such as the time, location, and color tone of the image.

[0191] In some possible embodiments, step 406 includes: the preprocessing module inputs the image into the AI model to obtain the second type of features output by the AI model.

[0192] For example, Figure 9 is a schematic diagram of the information display method flow in an embodiment of the present invention. As shown in Figure 9, image preprocessing involves inputting the image into an AI model, and the AI model outputting the second type of features x2 of the image. In this embodiment of the present invention, the AI model is used to preprocess and extract the second type of features that characterize CV features such as face and aesthetics, as well as basic image attributes (such as shooting time, location, and tone).

[0193] For example, the second type of feature x2 = [cv features: {recognition label, face ID, aesthetic score}, image basic attributes: {capture time, capture location, capture mode}].

[0194] Step 408: The preprocessing module sends the image and the second type of features to the multi-modal scene analysis module.

[0195] Step 410: The multi-modal scene analysis module enhances the scene description of the image based on the image and the second type of features to obtain the first type of features of the image.

[0196] For example, the first type of feature is the scene feature of the image (also known as "scene description").

[0197] In some possible embodiments, step 410 includes: the multimodal scene analysis module inputs the image and the second type of features into the multimodal large model to obtain the first type of features output by the multimodal large model.

[0198] For example, as shown in Figure 9, scene description enhancement for an image involves inputting the image and the second type of feature x2 into a multimodal large model, and the multimodal large model outputs the first type of feature x1.

[0199] For example, Figure 10 is a schematic diagram of inputting an image and second-type features into a multimodal large model to obtain the first-type features in an embodiment of the present invention. As shown in Figure 10, in order to enhance the scene relevance of the recommended data source, the electronic device uses text data output by the AI model, including shooting time, location, tag theme, and people, as the second-type feature x2, combined with the image itself (such as a BASE64 encoded image), and passes it to the multimodal large model (which can be deployed on the edge or in the cloud). Based on the auxiliary description of the second-type feature x2, the multimodal large model can generate a scene description x1 of the image as the first-type feature of the image.

[0200] For example, Figure 11 is a schematic diagram of the multimodal large model outputting a scene description based on an image and a formatted Prompt in an embodiment of the present invention. As shown in Figure 11, the second type of feature includes the following information: {Shooting time: September 2019, Shooting location: Jiuzhaigou, Sichuan Province, Photo source: Fujifilm XS10 camera, Tag parsing result: Landscape, Portrait, Person: Sister}. Thus, the following formatted Prompt can be obtained: [This is a photo I took in Jiuzhaigou in September 2019 using a Fujifilm XS10 camera, containing "landscape and portrait". The person in the photo is "Sister". Please use a short and accurate paragraph to describe the scene in this photo, requiring strong narrative.] Then, combined with the base64 encoding of the image itself, it is passed to the multimodal large model to generate a scene description with strong narrative as follows:

[0201] "On a clear morning in September 2019, I set foot in Jiuzhaigou, Sichuan Province, and captured this tranquil and beautiful scene with my camera. In the photo, an elegant lady sits on a wooden platform, with a clear lake behind her. The lake water blends blue and green, as crystal clear as a gemstone. Beside her, lush trees surround an ancient bridge, and the flowing water under the bridge is like a symphony of nature."

[0202] The lady was dressed traditionally and elegantly; her white blouse and red skirt stood out strikingly in the sunlight. A delicate floral wreath adorned her head, complementing her surroundings. Her hair was carefully braided into two plaits, swaying gently in the breeze. Her gaze was fixed on the distance, as if admiring the picturesque lake view, or perhaps awaiting some important moment.

[0203] In the lower right corner of the image, a hand holding a black smartphone is faintly visible, as if silently recording everything. This scene is full of life, while also showcasing the beauty and harmony of nature. I can almost feel the refreshing breeze caressing my face, and hear the sound of water flowing and leaves rustling. This photo is unforgettable and has inspired an endless love for nature and life.

[0204] As shown in Figure 11, this scene description provides more objective descriptions of the scene itself, and combines shooting information and people clustering data, thus possessing richer scene information features.

[0205] In this embodiment of the invention, the features of the images are not limited to basic image attributes and tag features, but also incorporate multi-modal scene descriptions of the images themselves. Compared to keyword retrieval based solely on the second type of features, these scene features are more flexible and contain richer scene information.

[0206] Since the features of the images in the embodiments of the present invention also include scene features, the scenes that can be associated and generated are richer, providing users with images with stronger scene relevance. Therefore, the recommended data source has scene enhancement. Especially in user retrieval scenarios, compared with matching queries with specified keywords, it can support retrieval of a text or document file entered by the user and retrieve relevant results, improving the hit range of search results. The scene stories that users can retrieve and piece together will be richer.

[0207] Step 412: The multi-modal scene analysis module sends the first type of features to the preprocessing module.

[0208] Step 414: The preprocessing module sends the first type of features and the second type of features to the storage module.

[0209] Step 416: The storage module stores the first and second types of features of the image.

[0210] Step 418: The first application enters the first interface.

[0211] For example, the first interface is an image search interface or an image recommendation interface.

[0212] In some possible embodiments, the first interface is an image search interface, the first interface includes search controls, and the first information is a piece of text or a document file entered by the user; before step 302, step 420 is also included.

[0213] Step 420: The first application responds to the user's operation of entering the first information in the search control and sends the first information to the search engine.

[0214] In some possible embodiments, the first interface is an image recommendation interface, and the first information is the scenario information of operation and maintenance settings; after step 418, step 420 does not need to be executed, and step 302 is executed instead.

[0215] For example, as shown in Figure 9, based on the first information S configured by the operation and maintenance department or input by the user, the search engine retrieves the second information R containing multiple first images from the image library.

[0216] In some possible embodiments, as shown in FIG12, step 302 includes:

[0217] Step 302a: The natural language processing module performs natural language processing on the first information to obtain the semantics of the first information.

[0218] In this step, the natural language processing module of the retrieval engine performs natural language processing on the first information to obtain the semantics of the first information.

[0219] Step 302b: The natural language processing module sends the semantics of the first information to the result recall module.

[0220] In this step, the natural language processing module of the retrieval engine sends the semantics of the first information to the result recall module.

[0221] Step 302c: The result recall module recalls the first image from the image library that corresponds to the image features that are semantically related to the first information.

[0222] In this step, the result recall module, when recalling images, matches the first type of features and the second type of features mentioned above with the first information, i.e. the scenario actively input by the user or the scenario configured by operation and maintenance, to obtain the second information, i.e. the initial search results.

[0223] In this embodiment of the invention, the first image retrieved is related to the first information, specifically, the image features of the first image are related to the first information. The image features include a first type of feature and a second type of feature. Because the scene features of the first image retrieved are related to the first information, compared to retrieving results solely based on basic attributes such as people, time, location, and tags using text matching, this embodiment of the invention can meet the user's requirements for the scene and retrieve images that better match the user's intent.

[0224] Optionally, as shown in Figure 8, step 502 is included before step 304.

[0225] Step 502: The search engine sends the first and second information to the filtering and sorting module.

[0226] In some possible embodiments, as shown in FIG13, step 304 includes: steps 304a-304e.

[0227] Step 304a: The correlation calculation module obtains the correlation between the first image and the first information based on the first type of features, the second type of features, and the first information of the first image.

[0228] In this step, the correlation calculation module in the filtering and sorting module obtains the correlation between the first image and the first information based on the first type of features, the second type of features, and the first information of the first image.

[0229] In some possible embodiments, step 304a includes: the correlation calculation module inputs the first type of features, the second type of features, and the first information of the first image into the semantic similarity calculation model to obtain the correlation between the first image and the first information output by the semantic similarity calculation model.

[0230] For example, as shown in Figure 9, the first type of feature x2, the second type of feature x1, and the first information S of the first image are input into the semantic similarity calculation model, and the semantic similarity calculation model outputs the correlation degree α between the first image and the first information S.

[0231] Step 304b: The correlation calculation module sends the correlation between the first image and the first information, and the second information, to the recommendation strategy module.

[0232] In this step, the correlation calculation module of the filtering and sorting module sends the correlation between the first image and the first information, and the second information to the recommendation strategy module of the filtering and sorting module.

[0233] Step 304c: The recommendation strategy module clusters the second information according to at least one dimension to obtain at least one image set.

[0234] In this step, the recommendation strategy module of the filtering and sorting module clusters the second information according to at least one dimension to obtain at least one image set.

[0235] For example, dimensions can be time, location, theme, or style.

[0236] At least one dimension can be either set by operations and maintenance or selected by the user.

[0237] In some possible embodiments, at least one dimension is selected by the user, and before step 304c, the method further includes: a first application sending at least one dimension to a retrieval engine in response to the user's selection operation of at least one dimension from multiple dimensions; the retrieval engine sending at least one dimension to a relevance calculation module; and the relevance calculation module sending at least one dimension to a recommendation strategy module.

[0238] For example, multiple dimension options can be displayed next to the search control in the image search interface, allowing the user to select at least one dimension. After the user selects at least one dimension and enters initial information in the search control, the search can be performed based on the initial information, and clustering can be performed based on at least one dimension.

[0239] For example, Figure 14 is a schematic diagram of an image search interface in an embodiment of the present invention. As shown in Figure 14, the image search interface includes not only the search control A, but also multiple dimension selection controls. Users can select a dimension by clicking the selection control corresponding to the dimension. For example, if a user selects the "time" dimension and the "topic" dimension, the searched images can be grouped using the "time" dimension and the "topic" dimension when performing clustering and grouping later.

[0240] Step 304d: The recommendation strategy module sends the correlation between the first image and the first information, and at least one image set, to the sorting module.

[0241] In this step, the filtering and sorting module sends the correlation between the first image and the first information, and at least one image set, to the sorting module of the filtering and sorting module.

[0242] Step 304e: The sorting module sorts at least one set of images based on the correlation between the first image and the first information.

[0243] In this step, the sorting module of the filtering and sorting module sorts at least one set of images based on the correlation between the first image and the first information.

[0244] In some possible embodiments, as shown in FIG15, step 304e includes: steps 3042-3044.

[0245] Step 3042: The sorting module determines the correlation between the image set and the first information based on the correlation between the first image and the first information.

[0246] For example, the correlation between the image set and the first information can be the average correlation between all the first images in the image set and the first information.

[0247] Step 3044: The sorting module sorts at least one image set based on the correlation between the image set and the first information.

[0248] For example, the sorting module sorts at least one image set in reverse order of relevance based on the relevance between the image set and the first information, resulting in at least one sorted image set. This embodiment of the invention sorts image sets based on relevance, which improves the ranking of recommendation results and enhances the user experience.

[0249] Optionally, the first application may also display a first number of second images based on the second information, wherein the second images are selected from the image set based on a recommendation strategy.

[0250] For example, the recommendation strategy (also known as the "data optimization strategy based on seat allocation model") is a strategy that allocates a first number of slots to at least one image set through a seat allocation model, scores the images based on the relevance of the images to the scene, and selects the highest-scoring image from the image set according to the allocated slots of the image set.

[0251] For example, as shown in Figure 9, the data optimization strategy based on the seat allocation model selects a first number L of second images from multiple first images as the scene recall recommendation candidate group R'.

[0252] In some possible embodiments, step 304 includes: the filtering and sorting module obtaining at least one image set and a first number of second images based on the second information.

[0253] In some possible embodiments, as shown in FIG16, step 304 includes: steps 304A-304E.

[0254] Step 304A: The correlation calculation module obtains the correlation between the first image and the first information based on the first type of features, the second type of features, and the first information of the first image.

[0255] For a description of step 304A, please refer to the description of step 304a in the embodiment shown in Figure 13 above, which will not be repeated here.

[0256] Step 304B: The correlation calculation module sends the correlation between the first image and the first information, and the second information, to the recommendation strategy module.

[0257] For a description of step 304B, please refer to the description of step 304b in the embodiment shown in Figure 13 above, which will not be repeated here.

[0258] Step 304C: The recommendation strategy module clusters the second information according to at least one dimension to obtain at least one image set, and selects a first number of second images from the at least one image set according to the recommendation strategy.

[0259] For an explanation of step 304C, in which the recommendation strategy module clusters the second information according to at least one dimension to obtain at least one image set, please refer to the description of step 304c in the embodiment shown in Figure 13 above, and it will not be repeated here.

[0260] Optionally, before clustering, the recommendation strategy module also performs preliminary data screening on multiple first images, removing similar or low-quality images. Assume the second information is a dataset R = {r1, r2, ..., rn}, where each data point ri contains a multi-dimensional feature vector E = {ε1, ε2, ..., εm}, representing different feature vectors of the normalized data, and similar or low-quality images are removed.

[0261] For example, Figure 17 is a schematic diagram of the process of selecting images using a recommendation strategy in an embodiment of the present invention. As shown in Figure 17, firstly, multiple first images in the second information are initially screened; then, based on at least one dimension, the multiple first images are clustered and grouped using a clustering method to obtain at least one image set, i.e., k sets of datasets G = {g1, g2, ..., gk}, where each class gi represents a group of data with similar features.

[0262] For example, the clustering method can be a clustering algorithm, such as the K-means++ algorithm.

[0263] For example, taking the scenario "Parent-Child | Growth" as an example, Figure 18 is a schematic diagram of clustering multiple first images according to the time dimension in an embodiment of the present invention. As shown in Figure 18, combined with the scenario features of the above images, semantic vector retrieval technology is used to retrieve the image data source of parent-child relationships owned by the user. First, after initial data screening, similar and low-quality images are removed. In addition, in order to satisfy the user's sense of memory and the narrative of the recommended images, this embodiment of the present invention considers clustering and grouping according to the time of image shooting, and obtains image sets for January 2023, March 2023, April 2023, etc.

[0264] In some possible embodiments, as shown in FIG19, the recommendation strategy module selects a first number of second images from at least one image set according to the recommendation strategy, including: steps 602-606.

[0265] Step 602: The recommendation strategy module allocates a quota to each image set based on the first quantity to obtain the allocation quota for the image set.

[0266] For example, when a user performs a photo search, if the top-level search results page can only display a first number of images, then a quota needs to be allocated to each image set based on that first number. In most scenarios, a specific number of images need to be selected to prioritize and display them to the user.

[0267] Table 1. Allocation quota for each image set

[0268] For example, Figure 20 is a schematic diagram of the allocation quota for each image set in an embodiment of the present invention. Assuming the first quantity is 13, as shown in Figure 17, the seat allocation model of the Q-value method is used to allocate quotas for the image sets in each time period shown in Figure 18, and finally the recommended quotas for each image set are obtained. As shown in Figure 20 and Table 1, the allocation quota for the image set in January 2023 is 1, the allocation quota for the image set in March 2023 is 2, the allocation quota for the image set in April 2023 is 1, and so on.

[0269] In some possible embodiments, step 606a includes: the recommendation strategy module inputs a first quantity and at least one image set into the seat allocation model to obtain the recommended number of seats for each image set output by the seat allocation model.

[0270] The images selected by the data optimization strategy based on the seat allocation model in this embodiment of the invention are evenly distributed in a certain dimension, thus satisfying the sense of recollection in a certain dimension and giving the image set a certain narrative quality.

[0271] For example, as shown in Figure 17, after clustering, the first number, L, of seats is allocated to at least one image set, i.e., k sets of data, using a seat allocation model, to obtain the recommended seats C = {c1, c2, ..., ck} for each group, where ci represents the recommended seats for the i-th group.

[0272] For example, the seat allocation model could be the Q-value method or the maximum remainder method, etc.

[0273] Step 604: The recommendation strategy module scores the first image based on the correlation between the first image and the first information to obtain the score of the first image.

[0274] This step evaluates the quality of images by assigning them a score; the higher the score, the better the image quality, and vice versa.

[0275] In some possible embodiments, step 604 includes: the recommendation strategy module scores the images in the image based on at least one image parameter to obtain a score for the image, wherein the at least one image parameter includes the correlation between the first image and the first information.

[0276] At least one of the image parameters can be set by the maintenance team or by the user.

[0277] In some possible embodiments, step 604 includes: the recommendation strategy module responding to a user's setting operation on at least one image parameter, the at least one image parameter including the correlation degree between a first image and first information, scoring the images in each image according to the at least one image parameter to obtain an image score.

[0278] For example, in the image search interface, multiple image parameter options can be displayed next to the search control, allowing the user to select at least one image parameter. After the user selects at least one image parameter and enters initial information in the search control, the search can be performed based on the initial information, and the image can be rated based on at least one image parameter.

[0279] For example, image parameters can also be a first parameter, an aesthetic score, and a second parameter of the image.

[0280] For example, the first parameter is used to indicate whether the image has been saved.

[0281] For example, aesthetic scores belong to the second type of features output by the AI model.

[0282] For example, the second parameter is used to indicate whether the image contains a human figure.

[0283] For example, Figure 21 is a schematic diagram of another image search interface in an embodiment of the present invention. As shown in Figure 21, the image search interface includes not only search controls and selection controls for multiple dimensions, but also selection controls for multiple image parameters. Users can select an image parameter by clicking the selection control corresponding to the image parameter. For example, if a user selects the image parameters "relevance" and "aesthetic score", the image parameters "relevance" and "aesthetic score" can be used to score the searched images when scoring them later.

[0284] In some possible embodiments, step 604 includes: the recommendation strategy module inputs at least one image parameter of the image into the scoring model to obtain the score of the image output by the scoring model.

[0285] For example, the rating model can be the following formula: f(x)=α*(isFavorites*10+0.4*aesthetics+isPortraits*10)

[0286] Where α represents the correlation between the image and the first information, with a value range of [0,1]; isFavorites represents the first parameter, i.e., whether the user has actively saved the image. If it is saved, its value is 1, and if it is not saved, its value is 0; aesthetics represents the aesthetic score, which is obtained by the AI model analysis. Each image has an aesthetic score value of [0,100]; isPortraits represents the second parameter, with a value of 1 for images with people and a value of 0 for images without people.

[0287] For example, the images in each image set are scored to obtain image scores, as shown in Figure 20. The image scores are displayed in the overlay below each image.

[0288] Step 606: The recommendation strategy module selects the highest-rated image from the image set as the second image based on the allocated quota of the image set.

[0289] For example, as shown in Figure 17, after the seats are allocated, the scores of each dataset are calculated according to the scoring method, and the data with higher scores in each group are selected first according to the recommended number of seats in each group.

[0290] For example, Figure 22 is a schematic diagram of selecting the second picture from the picture set according to the recommendation quota in an embodiment of the present invention. As shown in Figure 22, the picture with the highest score is selected according to the recommendation quota of each picture set, and finally 13 photos are recommended for memory.

[0291] For example, taking the selection of 8 images as an example, Figure 23 is a schematic diagram of the image selection process using the recommendation strategy in this embodiment of the invention. As shown in Figure 23, the recalled second information is first screened and clustered (different textures represent different cluster groups). Then, through the seat allocation model, the recommendation quotas for each group are obtained. After the images in each group are scored, data selection is performed according to the recommendation quotas of each group, and a total of 8 images are selected. These 8 images satisfy the requirement of uniform distribution within different groups to satisfy the sense of recall, and the selected images are the images with the best scores in each group.

[0292] Step 304D: The recommendation strategy module sends the correlation between the first image and the first information, at least one image set, and a first number of second images to the sorting module.

[0293] Step 304E: The sorting module sorts at least one set of images and a first number of second images according to the correlation between the first image and the first information.

[0294] For an explanation of how the sorting module sorts at least one set of images based on the correlation between the first image and the first information in step 304E, please refer to the description of step 304e in the embodiment shown in Figure 13 above, which will not be repeated here.

[0295] In this embodiment of the invention, the sorting module does not specifically limit the sorting method for the first number of second images; it can sort them according to time, relevance, or score, etc.

[0296] Optionally, as shown in Figure 8, step 702 is included before step 306.

[0297] Step 702: The sorting module sends at least one sorted image set to the first application.

[0298] In this step, the sorting module of the filtering and sorting module sends at least one sorted set of images to the first application.

[0299] In some possible embodiments, step 702 includes: the sorting module sending at least one sorted set of images and a first number of sorted second images to the first application.

[0300] Optionally, the first application also displays a first control for switching the sorting of at least one image set to chronological order in response to a user action. For example, at least one image set may be arranged in reverse chronological order.

[0301] In some possible embodiments, the first application responds to the user's operation on the first control by sending a first instruction to the sorting module. The sorting module sorts at least one set of images in reverse chronological order according to the first instruction. Then, the sorting module sends the at least one set of images in reverse chronological order to the first application. The first application refreshes the order of at least one set of images in the first interface according to the at least one set of images in reverse chronological order.

[0302] Optionally, the first application also displays a second control, which is used to switch from the first interface to the second interface in response to a user's operation. The first interface is used to display at least one set of images, and the second interface is used to display multiple first images in order of their relevance to the first information.

[0303] For example, the sorting module further sorts the multiple first images according to their relevance. The sorting module arranges the multiple first images in reverse chronological order based on their relevance to the first information. Therefore, the sorting module also sends the sorted multiple first images to the first application.

[0304] For example, the first application displays the sorting of the multiple first images on the second interface.

[0305] Optionally, the second interface also includes a third control, which is used to switch the sorting of multiple first images to chronological order. For example, multiple first images can be arranged in reverse chronological order.

[0306] In some possible embodiments, the first application responds to the user's operation on the third control by sending a second instruction to the sorting module. The sorting module sorts multiple first images in reverse chronological order according to the second instruction. Then, the sorting module sends the multiple first images in reverse chronological order to the first application. The first application refreshes the order of the multiple first images in the second interface according to the multiple first images in reverse chronological order.

[0307] The search results are displayed in two main sections: a primary screen showing at least one set of images and a secondary screen displaying all results. The primary screen displays a portion of the search results data, while the secondary screen displays the full results.

[0308] The embodiments of the present invention employ screening, clustering, seat allocation, and scoring optimization to select a set of images with more memorable and narrative qualities from a large number of images. This strategy can be applied to data recommendation or search result scenarios, and the search results can include recommended displays on the first-level page and full displays on the second-level page.

[0309] For example, taking the image search interface as the first interface, Figure 24 is a schematic diagram of the first-level interface of search results displayed after the user inputs the first information in an embodiment of the present invention. As shown in Figure 24, after the user inputs the text "A warm birthday party in 2023" into the search control A of the image search interface, the image search interface displays at least one set of images, namely "Searched Memories", and a first number of second images, namely "8 images selected for you". The image search interface also displays a first control B and a second control C. Among them, the first control B is a "sorting method" control, and the second control C is a "view more" control.

[0310] As shown in Figure 24, the mobile phone recommended three memory sets to the user based on the text "A heartwarming birthday party in 2023". These three memory sets, in descending order, are: "Xiao Hong's birthday party at home in August 2023", "A birthday party with friends in Tianjin in April 2023", and "A birthday party at the company in November 2023". Since the first piece of information contains "birthday party", all three recommended memory sets are related to "birthday party". Because the "birthday party" in the first piece of information is "2023", the time period for all three recommended memory sets is also "2023". However, the order of the three recommended memory sets is not chronological. This is because the "birthday party" in the first piece of information is still "heartwarming", so the user needs recommendations that are also "heartwarming". As shown in Figure 24, the order of the three recommended memory sets is precisely based on the degree of warmth: "at home" is warmer than "with friends", and "with friends" is warmer than "at the company". Therefore, the three recommended memory sets are sorted according to their relevance to "a heartwarming birthday party in 2023". Compared to simply sorting by time, sorting by relevance to the primary information is more in line with the user's intent.

[0311] For example, Figure 25 is a schematic diagram of a user operating the first control in an embodiment of the present invention. As shown in Figure 25, after the user clicks the first control B, the image search interface pops up the sorting options of the first control B, which include "Reverse Time" and "Reverse Relevance". Since the memory sets are sorted by relevance by default, the "Reverse Relevance" option is grayed out and cannot be selected at present, while the "Reverse Time" option is black and can be selected at present. The user can switch the sorting of the memory sets to reverse time by clicking the "Reverse Time" option. As shown in Figure 25, when the user clicks the "Reverse Time" option, the sorting of the three memory sets changes. They are no longer sorted by relevance to "A Warm Birthday Party in 2023", but by reverse time.

[0312] For example, Figure 26 is a schematic diagram of switching from the first interface to the second interface in an embodiment of the present invention. As shown in Figure 26, after the user clicks the second control C, the second interface is displayed. The second interface contains multiple first images, namely "all images found". The second interface also includes a third control D. The third control D is a "sorting method" control.

[0313] In Figure 26, all the searched images are sorted by default according to their relevance to "a cozy birthday party in 2023". Since "at home" is more cozy than "with friends", "with friends" is more cozy than "at the office", and images with more people are more cozy than images with fewer people, all the images on the second screen can be sorted according to the following rules:

[0314] 1. Among the pictures with the most people, the pictures of "at home" are the most prominent, followed by the pictures of "with friends", and the pictures of "at the office" are the least prominent.

[0315] 2. Among the images with fewer people, the image of "at home" is the first, followed by the image of "with friends", and the image of "at the office" is the last.

[0316] 3. Among the pictures without people, the pictures of "at home" are ranked first, followed by the pictures of "with friends", and the pictures of "at the office" are ranked last.

[0317] For example, Figure 27 is a schematic diagram of a user operating the third control in an embodiment of the present invention. As shown in Figure 27, after the user clicks the third control D, the second interface pops up the sorting options of the third control D, which include "Reverse Time" and "Reverse Relevance". Since the images in the second interface are sorted by relevance by default, the "Reverse Relevance" option is grayed out and cannot be selected at present, while the "Reverse Time" option is black and can be selected at present. The user can switch the sorting of the images in the second interface to reverse time by clicking the "Reverse Time" option.

[0318] Therefore, users can click the "View More" control to jump to the second-level search results page. Since the relevance of each image to the first information has been calculated, an optional sorting display method is provided. Users can use the third control to select sorting by relevance in reverse order or by time in reverse order, so that users can quickly locate the information and improve the accuracy and efficiency of search results.

[0319] The information display method provided in this invention uses an AI model to analyze the basic image features of an image, and then utilizes a multimodal large model to add a more narrative scene description as the image's scene feature, thereby enhancing the image's data features. In image search and recommendation scenarios, the relevance of the image to the user's search scenario or operation and maintenance setting scenario is considered to enhance the narrative and relevance of the image search or recommendation result set. On the user's search results page, the sorting method can be selected to be sorted by relevance. The data sources available to users are richer, improving the hit range of search results; at the same time, the search results can be sorted according to the relevance combined with the first type of image features, improving user search efficiency.

[0320] The information display method provided in this embodiment of the invention, for recommendation result sets of image library applications, can only display a fixed number of results on the first screen or first-level page due to multiple recall recommendation scenarios. The recommendation strategy first clusters the dataset according to certain dimensions, and then allocates a specified number of slots to each group using a seat allocation method. Within each group, images are selected according to their relevance to the search or recommendation scenario, and then sorted by the sorting module according to the operation and maintenance strategy before being displayed to the user. Compared to the basic dimension sorting method, the images optimized by the recommendation strategy in this embodiment of the invention have a stronger relevance to the scenario, and can consider grouping based on dimensions such as time uniformity, location uniformity, and theme uniformity before selecting from each group, thus creating a more recall-based optimal result for the user.

[0321] In summary, the information display method provided by the embodiments of the present invention has at least the following beneficial effects:

[0322] 1. In the embodiments of the present invention, the features of the images are not limited to basic image attributes and tag features, but also combine the multi-modal scene description of the image itself. Compared with keyword retrieval based solely on the second type of features, the scene features are more flexible and contain richer scene information.

[0323] 2. Since the features of the images in the embodiments of the present invention also include scene features, the scenes that can be associated and generated are richer, providing users with images with stronger scene relevance. Therefore, the recommended data source has scene enhancement, which enhances the user's search for scenes, and the scene story memories that the user can search for and splice will be richer.

[0324] 3. Compared to the search result sorting method based on time and size attributes, the embodiments of the present invention add a partial search data display on the first-level page (including dataset display, dataset relevance sorting method, and data selection display) and a full search data display on the second-level page (including all data and its relevance sorting method), which can improve the accuracy of search results and the user's retrieval efficiency.

[0325] 4. Compared to the recommendation method that only obtains high-quality images according to a predetermined scoring method, the recommendation strategy based on scene relevance in this embodiment of the invention can ensure that the recommendation results are more narrative and uniform, while avoiding the problem that some scenes have high image scores and thus the recommendations are more concentrated.

[0326] 5. Compared with matching queries based on specified keywords, this embodiment of the invention incorporates a first type of feature that includes enhanced scenario descriptions in the search scenario, making the data sources available to users richer and improving the hit range of search results; and it supports searching for a piece of text or document-type file input by the user and retrieving relevant results.

[0327] 6. The embodiments of the present invention select a set of images that are more in line with the scene theme and the logic of memory by calculating the relevance of the scene and using a seat allocation-based image selection strategy, thereby improving the scene narrative and relevance of the image recommendation and retrieval results.

[0328] The above embodiments use image scenes as an example to introduce the information display method provided by the embodiments of the present invention. However, the information display method provided by the embodiments of the present invention is not limited to image scenes, but can also be used in multi-modal scenes such as video, audio, and files, and can be combined with multi-modal large models.

[0329] Figure 28 is a schematic diagram of the structure of a device provided in an embodiment of the present invention. It should be understood that the device 700 is capable of performing the various steps in the above method embodiments, and will not be described in detail here to avoid repetition. The device 700 includes a processor 701 and a memory 702.

[0330] This application also provides an electronic device, including at least one processor 701 and at least one memory 702, wherein the at least one memory 702 is used to store at least one program, and when the at least one processor 701 runs the at least one program, the electronic device performs the operations as described in the above method embodiments.

[0331] This application also provides a readable storage medium storing a program that, when run by an electronic device, causes the electronic device to perform the operations described in the above method embodiments.

[0332] This application also provides a program product that, when run on an electronic device or at least one processor, causes the electronic device to perform the operations described in the above method embodiments.

[0333] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0334] In the several embodiments provided in this application, any function, if implemented as a software functional unit and sold or used as an independent product, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause an electronic device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0335] The above description is merely a specific embodiment of this application. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the protection scope of this application. The protection scope of this application should be determined by the protection scope of the claims.

Claims

1. An information display method, characterized in that, The method includes: In response to the operation of entering first information in the search control, the first information being a text or document file, the image library is searched based on the semantics of the first information and the multimodal fusion of the images to obtain second information, the second information including multiple first images that match the first information and are found in the image library; At least one image set is displayed based on the second information, wherein the images in the image set are derived from a plurality of first images in the second information.

2. The method according to claim 1, characterized in that, The second information is obtained by searching the image library based on the semantic and image multimodal fusion of the first information, including: Based on the semantics of the first information, the first image corresponding to the image features that match the semantics of the first information is searched from the image library. The image features include a first type of features, which includes the scene features of the first image.

3. The method according to claim 1, characterized in that, The at least one image set may be multiple image sets; The step of displaying at least one image set based on the second information includes: Based on the second information, the at least one image set is displayed in sorted order according to a first degree of relevance, where the first degree of relevance is the degree of relevance between the image set and the first information.

4. The method according to any one of claims 1-3, characterized in that, The method further includes: The second information displays a first number of second images, which are selected from the image set based on a recommendation strategy.

5. The method according to any one of claims 1-4, characterized in that, The method further includes: The plurality of first images are clustered and grouped according to at least one dimension to obtain the at least one image set.

6. The method according to claim 4, characterized in that, The method further includes: The allocation quota for each image set is obtained by allocating a quota to each image set according to the first quantity; The images in the image set are scored to obtain image scores; The image with the highest rating is selected from the image set as the second image based on the allocated quota.

7. The method according to claim 3, characterized in that, The method further includes: A second correlation degree is obtained based on the first information and the first image, where the second correlation degree is the correlation degree between the first image and the first information; The first correlation of the image set is obtained based on the second correlation of the images in the image set.

8. The method according to claim 5, characterized in that, The step of clustering and grouping the plurality of first images according to at least one dimension to obtain the at least one image set includes: In response to the user's setting operation for the at least one dimension, the plurality of first images are clustered and grouped according to the at least one dimension to obtain the at least one image set.

9. The method according to claim 6, characterized in that, The process of scoring the images in the image set to obtain image scores includes: In response to a user's setting operation for at least one image parameter, the images in the image set are rated according to the at least one image parameter to obtain an image rating.

10. The method according to claim 6, characterized in that, The step of allocating a quota for each image set according to a first quantity includes: Input the first quantity and the at least one image set into the seat allocation model to obtain the allocation quota of the image set output by the seat allocation model.

11. The method according to any one of claims 1-10, characterized in that, Prior to the confirmation operation of the first information input into the search control, the method further includes: Obtain the image; The first type of feature is obtained based on the image.

12. The method according to claim 11, characterized in that, The image features also include a second type of features, which includes the image's CV features and basic attribute features; obtaining the first type of features based on the image includes: The second type of feature is obtained based on the image; The first type of feature is obtained based on the image and the second type of feature.

13. The method according to claim 12, characterized in that, The step of obtaining the second type of feature based on the image includes: The image is input into the AI model to obtain the second type of feature output by the AI model.

14. The method according to claim 12 or 13, characterized in that, The step of obtaining the first type of feature based on the image and the second type of feature includes: The image and the second type of features are input into the multimodal large model to obtain the first type of features output by the multimodal large model.

15. The method according to any one of claims 1-14, characterized in that, The method further includes: In response to an operation on the second control, the interface switches from the first interface to the second interface. The first interface is used to display the at least one set of images, and the second interface is used to display the plurality of first images in a sorted order of second relevance, where the second relevance is the relevance between the first image and the first information.

16. An electronic device, characterized in that, The device includes a processor and a memory, wherein the memory is used to store a computer program, the computer program including program instructions that, when the processor executes the program instructions, cause the electronic device to perform the steps of the method as described in any one of claims 1-15.

17. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed by a computer, cause the computer to perform the method as described in any one of claims 1-15.

18. A computer program product, characterized in that, The computer program product stores a computer program, which includes program instructions that, when executed by a computer, cause the computer to perform the method as described in any one of claims 1-15.