Electronic device, method, and non-transitory computer-readable storage medium for generating media collection including media contents

WO2026134524A1PCT designated stage Publication Date: 2026-06-25SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-08-25
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing systems lack an efficient method for creating a media collection that accurately selects and arranges media content based on user input, particularly when dealing with large datasets and complex user-defined criteria.

Method used

An electronic device and method that utilize a processor to identify media content corresponding to user-defined keywords, adjust similarity thresholds, and create a media collection using filtering information to select and arrange media items based on user input, including text, images, and audio.

Benefits of technology

Enables the creation of personalized media collections that reflect user intent, enhancing user experience by accurately selecting and arranging media content according to specified criteria, even in large datasets.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025012926_25062026_PF_FP_ABST
    Figure KR2025012926_25062026_PF_FP_ABST
Patent Text Reader

Abstract

This electronic device may comprise a memory for storing instructions, and at least one processor. The instructions may cause the electronic device to: receive a user input for generating a media collection including media contents; identify first media contents corresponding to a first keyword included in the user input from among the media contents stored in the memory; identify a second keyword from among keywords allocated to the first media contents on the basis of identifying the number of the first media contents smaller than a reference number; identify one or more second media contents corresponding to the second keyword; and generate the media collection by using the first media contents and the one or more second media contents.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device, method, and non-transient computer-readable storage medium for generating a media collection including media content

[0001] The present disclosure relates to an electronic device, a method, and a non-transient computer-readable storage medium for creating a media collection comprising media contents.

[0002] An electronic device may receive user input representing at least one of an image, video, audio, and text. For example, the electronic device may receive said user input through a touch-sensitive display. For example, the electronic device may receive said user input representing audio through a camera. Based on receiving said user input, the electronic device may execute a function corresponding to the user input.

[0003] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0004] The aspects of the present disclosure are to solve at least the problems and / or disadvantages mentioned above and to provide at least the advantages to be described below. Accordingly, the aspects of the present disclosure provide an electronic device, a method, and a non-transient computer-readable storage medium for creating a media collection containing media content.

[0005] Additional aspects will be presented in part in the following description, and in part will be apparent from the above description or learned through the practice of the presented embodiments.

[0006] According to aspects of the present disclosure, an electronic device is described. The electronic device may include a memory comprising one or more storage media for storing instructions. The electronic device may include at least one processor comprising a processing circuit. The instructions may cause the electronic device to receive user input for creating a media collection containing media content when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to identify first media content corresponding to a first keyword included in the user input among the media content stored in the memory when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to identify a second keyword among the keywords assigned to the first media content based on identifying a number of the first media content that is smaller than a reference number when executed individually or collectively by the at least one processor. The above instructions may cause the electronic device to identify one or more second media contents corresponding to the second keyword when executed individually or collectively by the at least one processor. The above instructions may cause the electronic device to create the media collection using the first media contents and the one or more second media contents when executed individually or collectively by the at least one processor.

[0007] According to aspects of the present disclosure, a method is provided. The method may be executed within an electronic device having a memory. The method may include receiving user input for creating a media collection containing media contents. The method may include identifying first media contents corresponding to a first keyword included in the user input among the media contents stored in the memory. The method may include identifying a second keyword among the keywords assigned to the first media contents based on identifying a number of the first media contents that is smaller than a reference number. The method may include identifying one or more second media contents corresponding to the second keyword. The method may include creating the media collection using the first media contents and the one or more second media contents.

[0008] According to an aspect of the present disclosure, a non-transient computer-readable storage medium is provided. The non-transient computer-readable storage medium may store one or more programs. The one or more programs may include instructions that cause the electronic device to receive user input for creating a media collection containing media contents when executed by the electronic device having memory. The one or more programs may include instructions that cause the electronic device to identify first media contents corresponding to a first keyword included in the user input among the media contents stored in the memory when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to identify a second keyword among the keywords assigned to the first media contents based on identifying a number of the first media contents that is smaller than a reference number when executed by the electronic device. The above one or more programs may include instructions that cause the electronic device to identify one or more second media contents corresponding to the second keyword when executed by the electronic device. The above one or more programs may include instructions that cause the electronic device to create the media collection using the first media contents and the one or more second media contents when executed by the electronic device.

[0009] Other aspects, advantages, and key features of the present disclosure will become apparent to those skilled in the art from the following detailed description disclosing various embodiments together with the accompanying drawings.

[0010] The above and other aspects, features, and advantages of specific embodiments of the present disclosure will become more apparent from the following description together with the accompanying drawings.

[0011] FIG. 1 illustrates an example of an electronic device displaying a media collection according to an embodiment of the present disclosure.

[0012] FIG. 2 is a simplified block diagram of an electronic device according to an embodiment of the present disclosure.

[0013] FIG. 3 is a flowchart illustrating the operation of an electronic device that generates a media collection including media content according to an embodiment of the present disclosure.

[0014] FIG. 4 illustrates the operation of an electronic device that identifies first media contents using filtering information according to an embodiment of the present disclosure.

[0015] FIG. 5 is a flowchart illustrating the operation of an electronic device that identifies a first keyword based on user input according to an embodiment of the present disclosure.

[0016] FIG. 6a illustrates the operation of an electronic device that obtains filtering information based on user input according to an embodiment of the present disclosure.

[0017] FIG. 6b is a flowchart illustrating the operation of an electronic device that determines media content using an embedding vector according to an embodiment of the present disclosure.

[0018] FIG. 7 is a flowchart illustrating the operation of an electronic device that generates a media collection based on the number of one or more media contents according to an embodiment of the present disclosure.

[0019] FIG. 8 illustrates the operation of an electronic device that generates a media collection by arranging one or more second media contents according to an embodiment of the present disclosure.

[0020] FIGS. 9a to 9c illustrate the operation of an electronic device displaying a User Interface (UI) for receiving user input according to various embodiments of the present disclosure.

[0021] FIG. 10 is a block diagram of an electronic device in a network environment according to various embodiments.

[0022] Figure 11 is a schematic diagram of an exemplary AI (Artificial Intelligence) system.

[0023] It should be noted that throughout the drawings, reference numbers are used to describe identical or similar elements, features, and structures.

[0024] The following description is provided to facilitate a comprehensive understanding of the various embodiments of the present disclosure as defined by the claims and their equivalents, with reference to the accompanying drawings. While this description includes various specific details to aid understanding, they are to be considered merely illustrative. Accordingly, those skilled in the art will recognize that various changes and modifications to the various embodiments of this specification are possible without departing from the scope and spirit of the present disclosure. Additionally, descriptions of well-known functions and configurations may be omitted for clarity and brevity.

[0025] The terms and words used in the following description and claims are not limited to their bibliographic meanings and are used merely to enable the inventor to clearly and consistently understand the contents of the present disclosure. Accordingly, it will be apparent to those skilled in the art that the following description of various embodiments of the present disclosure is provided only for illustrative purposes and is not intended to limit the present disclosure as defined by the appended claims and their equivalents.

[0026] Singular words should be understood to include the plural form unless the context clearly indicates otherwise. Thus, for example, a reference to "compositional surface" includes a reference to one or more of these surfaces.

[0027] It must be clear that the blocks and combinations of each flowchart can be executed by one or more computer programs containing instructions. The entire set of one or more computer programs may be stored in a single memory device, or the one or more computer programs may be divided into several parts and stored in several memory devices.

[0028] The functions or operations described herein may be processed by a single processor or a combination of processors. A single processor or a combination of processors is a circuit that performs processing and includes circuits such as an application processor (AP, e.g., a central processing unit (CPU)), a communication processor (CP, e.g., a modem), a graphics processing unit (GPU), a neural processing unit (NPU) (e.g., an artificial intelligence (AI) chip), a wireless fidelity (Wi-Fi) chip, a Bluetooth chip, a global positioning system (GPS) chip, a near field communication (NFC) chip, a connectivity chip, a sensor controller, a touch controller, a fingerprint sensor controller, a display driver integrated circuit (IC), an audio codec chip, a universal serial bus (USB) controller, a camera controller, an image processing IC, a microprocessor unit (MPU), a system on chip (SoC), and the like.

[0029] FIG. 1 illustrates an example of an electronic device displaying a media collection according to an embodiment of the present disclosure.

[0030] Referring to FIG. 1, an electronic device (100) may be used to create a media collection (140) containing at least some of the images (132, 134, 136, 138). For example, the media collection (140) may be described as a set of media contents (e.g., images, videos) arranged in order. For example, the media collection may include audio associated with the media contents along with the media contents. For example, the media collection may be referred to as a story, a collection of images, a collection of videos, or a collection of media contents. For example, the electronic device (100) may create the media collection (140) using images (132, 134, 136, 138) stored in memory (e.g., memory (206) in FIG. 2) based on identifying an event to create the media collection (140). For example, the electronic device (100) may include a display (not shown). For example, the electronic device (100) may display or provide a media collection (140) through the display. For example, the electronic device (100) may include a speaker (e.g., the audio module (1070) of FIG. 10). For example, when providing or playing the media collection (140), the electronic device (100) may output audio associated with (or related to) the media collection (140) through the speaker.

[0031] According to one embodiment, when an electronic device (100) displays a media collection (140), a visual object (145) for representing media content included in the media collection (140) can be displayed through the display. For example, the electronic device (100) can display the visual object (145) together with the media collection (140) through the display. For example, the visual object (145) can be used to represent (or guide) the media content being displayed through the display. For example, the visual object (145) can be used to represent (or display) the time for playing the media collection.

[0032] According to one embodiment, a user of an electronic device (100) can recall memories using a media collection (140) that includes images (132, 134, 136, 138). The user of the electronic device (100) can have an enjoyable time based on the media collection (140).

[0033] According to one embodiment, the electronic device (100) may generate a media collection (140) based on keywords. For example, the electronic device (100) may generate a media collection (140) using images containing a specific visual object (e.g., a visual object representing a specific person) among images stored in memory (e.g., memory (206) of FIG. 2). For example, the electronic device (100) may be required to generate a media collection (140) based on user input (e.g., user input (405) of FIG. 4). For example, the user input may represent at least one of an image, video, audio, and text. For example, the electronic device (100) may be required to identify media contents to be included in a media collection (e.g., a media collection (820) of FIG. 8) from among media contents (e.g., media contents (420) of FIG. 4) stored in memory (e.g., memory (206) of FIG. 2) based on the user input. For example, the electronic device (100) may be required to identify a number of media contents within a reference range from among the media contents in the memory in order to create the media collection.

[0034] For example, the electronic device (100) can use the user input to identify or determine media contents to be included in a media collection (e.g., media collection (820) of FIG. 8) among the media contents stored in the memory (e.g., media contents (420) of FIG. 4). For example, the electronic device (100) can determine the order among the media contents based on the user input. For example, the electronic device (100) can create the media collection including media contents arranged based on the determined order.

[0035] For example, the electronic device (100) may include hardware components used to perform or execute the above operations. The hardware components are described and illustrated with reference to FIG. 2.

[0036] FIG. 2 is a simplified block diagram of an electronic device according to an embodiment of the present disclosure.

[0037] Referring to FIG. 2, the electronic device (100) may include at least one processor (207) and memory (206).

[0038] At least one processor (207) may include a hardware component for processing data using instructions stored in memory (206). The hardware component for processing data may include a CPU (central processing unit) (e.g., including processing circuits). The hardware component for processing data may include a GPU (graphic processing unit) (e.g., including processing circuits). The hardware component for processing data may include a DPU (display processing unit) (e.g., including processing circuits). The hardware component for processing data may include a NPU (neural processing unit) (e.g., including processing circuits).

[0039] At least one processor (207) may include one or more cores. For example, at least one processor (207) may have the structure of a multi-core processor such as a dual core, a quad core, or a hexa core.

[0040] Memory (206) may include a hardware component for storing data and / or instructions that are input to and / or output from at least one processor (207). Memory (206) may include, for example, volatile memory such as RAM (random-access memory) and / or non-volatile memory such as ROM (read-only memory). Volatile memory may include, for example, at least one of DRAM (dynamic RAM), SRAM (static RAM), cache RAM, and PSRAM (pseudo SRAM). Non-volatile memory may include, for example, at least one of PROM (programmable ROM), EPROM (erasable PROM), EEPROM (electrically erasable PROM), flash memory, hard disk, compact disk, and EMMC (embedded multimedia card).

[0041] At least one processor (207) may receive user input (e.g., user input (405) of FIG. 4) for creating a media collection (e.g., media collection (820) of FIG. 8) containing media content. For example, at least one processor (207) may identify first media content (e.g., first media content (430) of FIG. 4) corresponding to a first keyword (e.g., keyword (412-1) of FIG. 4) included in the user input, among the media content (e.g., media content (420) of FIG. 4) stored in memory (206). At least one processor (207) may identify a second keyword among the keywords assigned to the first media content based on identifying the first media content with a number smaller than a reference number. At least one processor (207) can identify one or more second media contents corresponding to the second keyword (e.g., one or more second media contents (810) of FIG. 8). For example, at least one processor (207) can create the media collection using the first media contents and the one or more second media contents.

[0042] FIG. 3 is a flowchart illustrating the operation of an electronic device for generating a media collection including media content according to an embodiment of the present disclosure. This method may be executed by the electronic device (100) illustrated in FIG. 2 or by at least one processor (207) of the electronic device (100). In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0043] Referring to FIG. 3, in operation 310, at least one processor (207) may receive user input (e.g., user input (405) of FIG. 4) for creating a media collection (e.g., media collection (820) of FIG. 8) containing media content. For example, the user input may include at least one of an image, video, audio, and text. For example, the electronic device (100) may include a microphone (e.g., input module (1050) of FIG. 10). For example, at least one processor (207) may receive user input representing audio through the microphone. For example, at least one processor (207) may include a display (e.g., display module (1060) of FIG. 10). For example, the display may include a touch-sensitive display. For example, at least one processor (207) may include user input representing text through the display. For example, at least one processor (207) may receive multiple user inputs for generating the media collection. For example, the type of each of the multiple user inputs may be different. For example, a first user input included in the multiple user inputs may represent text based on natural language. For example, a second user input included in the multiple user inputs may represent an image.

[0044] In operation 320, at least one processor (207) can identify first media contents (e.g., first media contents (430) of FIG. 4) corresponding to a first keyword (e.g., keyword (412-1) of FIG. 4) included in a user input (e.g., user input (405) of FIG. 4) among media contents (e.g., media contents (420) of FIG. 4) stored in memory (206). For example, at least one processor (207) can identify the first keyword based on the user input. For example, the first keyword can be described as a keyword corresponding to a value appearing by the user input (e.g., user input (405) of FIG. 4) among the keywords (e.g., keywords (415) of FIG. 4). For example, the identification of the first keyword will be described later with reference to FIG. 4.

[0045] In operation 330, at least one processor (207) can identify a second keyword among the keywords (415) assigned to the first media contents (430) based on identifying the first media contents (430) of a number smaller than a reference number. For example, the second keyword may be described as one of the remaining keywords for the keywords (412-1, 412-2, ..., 412-N) corresponding to at least one value representing the user input (405). For example, the second keyword may be described as one of the remaining keywords for the keywords (412-1, 412-2, ..., 412-N) identified through the filtering information (410) among the keywords (415). For example, the second keyword may be described as a keyword different from the first keyword associated with at least one value representing the user input (405). For example, at least one processor (207) can identify or determine a second keyword among the remaining keywords using filtering information (410). For example, at least one processor (207) can randomly determine or identify a second keyword among the remaining keywords.

[0046] According to one embodiment, at least one processor (207) may change the method for identifying media content corresponding to a keyword (412-1) based on identifying first media content (430) with a number greater than a reference number. For example, at least one processor (207) may change the threshold similarity when identifying similarity for an embedding vector to identify media content corresponding to a keyword (412-1) with a number smaller than the reference number. For example, at least one processor (207) may change the threshold similarity used to identify media content to a second threshold similarity (e.g., 0.7) which is higher than the first threshold similarity (e.g., 0.6) used to identify the first media content (430) based on identifying first media content (430) with a number greater than a reference number. For example, at least one processor (207) can identify, based on the second threshold similarity, a number of second media contents smaller than the number of first media contents (430) among the media contents (420).

[0047] According to one embodiment, the reference number may be a first reference number (e.g., 500). For example, at least one processor (207) may identify a second keyword among the keywords (415) assigned to the first media contents (430) based on identifying first media contents (430) that are smaller than the first reference number and larger than the second reference number (e.g., 10). For example, at least one processor (207) may newly (or again) identify media contents corresponding to the keywords (412-1, 412-2, ..., 412-N) among the media contents (420) based on identifying first media contents (430) that are smaller than the second reference number. For example, at least one processor (207) may change the threshold similarity for the identification to identify media contents that are larger than the second reference number. For example, at least one processor (207) may change the threshold similarity for the identification from a first threshold similarity to a second threshold similarity lower than the first threshold similarity. However, it is not limited thereto. For example, at least one processor (207) may change the method of searching (or identifying) media content. For example, when identifying media content corresponding to a keyword (412-1) to which a first value and a second value are assigned, at least one processor (207) may change the method for identifying media content from a first method for identifying media content corresponding to the first value and the second value to a second method for identifying media content corresponding to the first value or the second value.

[0048] In operation 340, at least one processor (207) can identify one or more second media contents corresponding to the second keyword (e.g., one or more second media contents (810) of FIG. 8). For example, at least one processor (207) can identify the number of one or more second media contents corresponding to the second keyword. For example, at least one processor (207) can identify values ​​represented by a set of metadata corresponding to the first media contents (430) for the second keyword. For example, at least one processor (207) can randomly determine or identify one of the values. For example, at least one processor (207) can determine or identify one or more media contents corresponding to the identified value as one or more second media contents (e.g., one or more second media contents (810) of FIG. 8).

[0049] In operation 350, at least one processor (207) can generate a media collection (e.g., a media collection (820) of FIG. 8) using first media contents (430) and one or more second media contents (e.g., one or more second media contents (810) of FIG. 8). For example, at least one processor (207) can identify or determine a method for determining the order of media contents to be included in the media collection (e.g., a media collection (820) of FIG. 8) using filtering information (410) based on user input (405). For example, at least one processor (207) can generate a media collection (e.g., a media collection (820) of FIG. 8) containing media contents arranged based on the identified order. For example, at least one processor (207) can display the generated media collection through a display (e.g., a display module (1060) of FIG. 10). The creation of a media collection based on the second keyword (e.g., the media collection (820) of FIG. 8) will be described later with reference to FIG. 7.

[0050] FIG. 4 illustrates the operation of an electronic device that identifies first media contents using filtering information according to an embodiment of the present disclosure.

[0051] Referring to FIG. 4, at least one processor (207) may identify or obtain filtering information (410) based on user input (405). For example, the filtering information (410) may be obtained based on content (e.g., at least one of images, videos, text, and audio) that appears through user input (405), and may be described as information for searching media content within a database of media content (420). For example, the filtering information (410) may represent at least one value identified based on user input (405) for at least some of the keywords (415). For example, the filtering information (410) may be referred to as a story clue.

[0052] Keywords (415) may be used to represent media content. For example, keywords (415) may be referred to as categories. For example, keywords (415) may include time, place, person, person relationship, pet, object, background, media type, person description, object description, action, event, caption, subject of media collection, and arrangement of media content, etc. For example, the media type may represent a type of media content. For example, the type may represent a video, image, spherical image (or 360-degree image), and spherical video (or 360-degree video). For example, the relationship between a keyword and a value for the keyword may be described as a key-value relationship (or structure). For example, a key-value relationship may be described as storing data as key and value pairs.

[0053] For example, at least one processor (207) can identify or obtain values ​​for each of the keywords (415) based on user input (405). For example, at least one processor (207) can identify keywords (e.g., keyword (412-1), keyword (412-2) to keyword (412-N)) (N is a natural number greater than or equal to 1) among the keywords (415) based on user input (405). For example, at least one processor (207) can obtain or identify filtering information (410) including keywords (412-1), keyword (412-2) to keyword (412-N) whose values ​​are identified (or assigned) based on user input (405). For example, at least one processor (207) can use the filtering information (410) to identify or determine keyword (412-1) among the keywords whose values ​​are identified. For example, the identification of the first keyword above will be described later with reference to FIG. 5.

[0054] At least one processor (207) can identify or determine a keyword (412-1) corresponding to a value representing the user input (405) among the keywords (415) by using filtering information (410) obtained based on the user input (405). For example, at least one processor (207) can identify or determine keywords (e.g., keyword (412-1), keyword (412-2) to keyword (412-N)) corresponding to one or more values ​​representing the user input (405) among the keywords (415) by using the filtering information (410).

[0055] At least one processor (207) can identify first media contents (430) corresponding to the keyword (412-1) among media contents (420) stored in memory (206) using the keyword (412-1). For example, at least one processor (207) can identify first media contents (430) among media contents (420) using values ​​representing user input (405) identified using filtering information (410). For example, at least one processor (207) can determine media contents related to the values ​​among media contents (420) as first media contents (430). For example, at least one processor (207) can determine media contents corresponding to metadata representing other values ​​related to the values ​​as first media contents (430). For example, at least one processor (207) can use the values ​​to identify metadata corresponding to the values ​​within a set of metadata for media contents (420). For example, the operation of determining the first media contents (430) will be described later with reference to FIG. 6b.

[0056] FIG. 5 is a flowchart illustrating the operation of an electronic device for identifying a first keyword based on user input according to an embodiment of the present disclosure. This method may be executed by the electronic device (100) illustrated in FIG. 2 or by at least one processor (207) of the electronic device (100). In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0057] Referring to FIG. 5, in operation 510, at least one processor (207) can obtain text data (e.g., text data (620) of FIG. 6a) representing text corresponding to at least one keyword based on user input (405). For example, the text data (e.g., text data (620) of FIG. 6a) can be described as data obtained by analyzing the user input (405). For example, the text data (e.g., text data (620) of FIG. 6a) can be obtained based on user input (405) representing text based on natural language. For example, the text data (e.g., text data (620) of FIG. 6a) can be described as data classified according to keywords (415) based on user input (405) representing text based on natural language. For example, at least one processor (207) may obtain the text data corresponding to the user input (405) based on receiving the user input (405). For example, the text data may be referenced as a rough clue.

[0058] In operation 520, at least one processor (207) can obtain filtering information (410) representing at least one value corresponding to at least one keyword. For example, at least one processor (207) can obtain or identify filtering information (410) representing at least one keyword in which at least one value is identified by using text data (e.g., text data (620) of FIG. 6a). For example, the acquisition of the filtering information (410) is described and illustrated in more detail with reference to FIG. 6a.

[0059] In operation 530, at least one processor (207) can identify a first keyword (e.g., keyword (412-1)) to which a value is assigned among keywords (415) using filtering information (410). For example, at least one processor (207) can identify or determine a keyword (412-1) corresponding to an identified value among keywords (415) based on user input (405). For example, keyword (412-1) can be described as a keyword to which a value representing user input (405) is identified (or assigned) among keywords (415). For example, the first keyword (e.g., keyword (412-1)) can be described as a keyword (e.g., story theme) corresponding to a value representing user input (405) (e.g., travel). For example, the first keyword can be described as a keyword related to a value representing user input (405).

[0060] FIG. 6a illustrates the operation of an electronic device that obtains filtering information based on user input according to an embodiment of the present disclosure.

[0061] Referring to FIG. 6a, at least one processor (207) may receive one or more user inputs to create a media collection. For example, at least one processor (207) may obtain text data (620) based on user input (605). At least one processor (207) may obtain text data (625) based on user input (610). For example, user input (605) and user input (610) may each be an example of user input (405) of FIG. 4. For example, the type of user input (605) (e.g., image, video, text, or audio) may be different from the type of user input (610). However, it is not limited thereto. For example, the type of user input (605) may be the same as the type of user input (610).

[0062] At least one processor (207) can obtain text data (620) by analyzing user input (605) for each of the keywords (415). At least one processor (207) can obtain text data (625) by analyzing user input (610) for each of the keywords (415). For example, text data can be referenced as a rough clue.

[0063] User Input Text Data I want to create a story about Sam and Kim having a meal together in Seoul in 2023, but I want to exclude media content featuring Park and also exclude their meeting in Gangnam-gu. Positive Value Time: 2023 Place: Seoul People: Sam and Kim Story Topic: Meal Negative Value Place: Gangnam-gu, Seoul People: Park I want to create a story about a trip to Jeju Island with my daughter last summer. Positive Value Time: Last Summer Place: Jeju Island People Relationship: Daughter Story Topic: Travel Negative Value

[0064] Referring to Table 1, examples of user input for text based on natural language and text data corresponding to said user input may be described. For example, the positive value of said text data may be used to search for or identify media content that represents (or is related to) a value for a keyword. For example, the negative value of said text data may be used to search for or identify content that is different from the media content that represents (or is related to) a value for a keyword. At least one processor (207) may obtain filtering information (410) by combining text data (620) and text data (625). For example, the filtering information (410) may be described as information for searching for media content within a database of media content (420). For example, at least one processor (207) may obtain filtering information (410) by combining text data (620) and text data (625) and then converting the text represented by the text data into data for searching within a database. Although the operation of obtaining filtering information (410) based on the combination of text data (620) and text data (625) has been described above, the embodiments are not limited thereto. For example, at least one processor (207) may obtain other filtering information by combining filtering information (410) based on a first user input and text data based on a second user input. For example, at least one processor (207) may obtain filtering information (410) without obtaining text data (620) based on the type of user input received. For example, at least one processor (207) may obtain third filtering information for identifying first media contents (430) by combining first filtering information based on a first user input and second filtering information based on a second user input.

[0065] Although the operation of obtaining filtering information (410) based on multiple user inputs has been described above, the embodiment is not limited thereto. For example, at least one processor (207) may obtain filtering information (410) based on a single user input.

[0066] User Input Filtering Information I want to create a story about Sam and Kim having a meal together in Seoul in 2023, but I want to exclude media content featuring Park and also exclude their meeting in Gangnam-gu. Positive Value Time: 167,253,200 Location: 37.5665, 126,978 Characters: Sam and Kim Story Topic: Meal Negative Value Location: 37.5172, 127.0473 Character: Park I want to create a story about a trip to Jeju Island with my daughter last summer. Positive Value Time: 171,979,2000 Location: 33.4996, 126.5312 Character Relationship: Daughter Story Topic: Travel Negative Value

[0067] Referring to Table 2, examples of user input for text based on natural language and filtering information corresponding to said user input may be described. For example, positive values ​​of said filtering information may be used to search for or identify media content that represents (or is related to) a value for a keyword. For example, negative values ​​of said filtering information may be used to search for or identify content that is different from media content that represents (or is related to) a value for a keyword. For example, said negative values ​​may be used to exclude specific media content. The time in Table 1 (e.g., 2023) may be converted into a Unix timestamp in Table 2 for the filtering information (410). For example, a Unix timestamp may be described as a number representing the number of seconds elapsed since 0:00 on January 1, 1970. However, it is not limited thereto. For example, in Table 2 for filtering information (410), time can be expressed in the form of year-month-date (e.g., yyyymmdd).

[0068] The place in Table 1 (e.g., Seoul) can be converted into latitude and longitude in Table 2 for filtering information (410). However, it is not limited thereto. For example, the place in Table 2 for filtering information (410) can be represented as text indicating the name of the region.

[0069] According to one embodiment, at least one processor (207) may obtain text data (620) in which "when going out" is assigned to a keyword for a story topic based on receiving user input (405) indicating "when going out." For example, at least one processor (207) may obtain filtering information (410) in which "travel, outing" is assigned to a keyword for a story topic in order to change the text data (620) into a format for searching a database using the text data (620) indicating "when going out." For example, at least one processor (207) may identify media content indicating "travel, outing" among the media content (420).

[0070] Text data (620) based on user input (605) may represent text based on natural language. For example, the text represented by the text data (620) may correspond to at least one keyword among the keywords (415). For example, the text represented by the text data (620) may be identified or assigned to at least one keyword among the keywords (415).

[0071] Filtering information (410) based on text data (620) may represent a value identified (or assigned) for at least one of the keywords (415). For example, the value may represent natural language, but is not limited thereto. For example, the value may represent at least one of a number, an embedding vector, and text. For example, the value may be used to search for media content within a database for media content (420). For example, the electronic device (100) may identify or determine keywords related to the filtering information (410) by using a database representing a set of metadata for each of the media content (420). For example, each metadata for each media content may represent at least one value for at least one of the keywords (415). For example, the electronic device (100) may store first metadata for the first media content in memory (206) together with the first media content included in the media content (420). For example, the first metadata may represent at least one value for at least one keyword among the keywords (415). For example, the first metadata may represent the time when the first media content was acquired (e.g., 1717200000). For example, the first metadata may represent the type of visual object included in the first media content (e.g., sea, boat, or hamburger, etc.). For example, the first metadata may represent the type of the first media content (e.g., image or video).

[0072] For example, metadata for media content may be obtained based on a plurality of trained models included in the electronic device (100). For example, when at least one processor (207) obtains media content, it may analyze the media content using the plurality of trained models. For example, based on the analysis, at least one processor (207) may obtain or identify metadata representing at least one value for keywords (415) for the media content. For example, the plurality of trained models may include a model for identifying faces within the media content, a model for identifying backgrounds within the media content, and a model for identifying poses within the media content. However, it is not limited thereto.

[0073] According to one embodiment, metadata for media content may represent a value (e.g., Christmas) assigned to a keyword (e.g., story topic) for determining the topic of a media collection. For example, at least one processor (207) may determine the value assigned to the keyword for determining the topic of a media collection based on analyzing the media content through a plurality of trained models for acquiring metadata. For example, when creating a media collection, at least one processor (207) may create a media collection for a specific topic (e.g., Christmas) using the value assigned to the keyword that appears in the metadata. However, it is not limited thereto. For example, at least one processor (207) may create a media collection for a topic different from the topic corresponding to the value based on user input (405). For example, at least one processor (207) may determine the arrangement order of media content using filtering information (410) based on user input (405). For example, the determination of the above arrangement order will be described later with reference to FIG. 8.

[0074] According to one embodiment, an electronic device (100) may use a trained model to obtain text data (620) based on user input (405). For example, the text data (620) may be obtained through a language model trained to output at least one word corresponding to at least one of the keywords (415) using text. For example, the electronic device (100) may include the trained model. For example, the trained model may include a model trained through machine learning techniques (or deep learning techniques). For example, the trained model may include a Large Language Model (LLM). For example, the trained model may include a Large Multi-modal Model (LMM). For example, the trained model may be described as a model trained to assign text to at least some of the keywords (415) using at least one of text, images, videos, and audio. For example, the above-mentioned trained model may be described as a model trained to identify at least one word for at least one category of keywords (415) using at least one of text, images, videos, and audio. For example, at least one processor (207) may obtain text data (620) from user input (405) through the above-mentioned trained model using a specified prompt. For example, the above-mentioned specified prompt may be obtained through a Chain of Thought (CoT) technique or a few-shot example technique. For example, the CoT technique may be described as a technique that generates intermediate reasoning steps to solve complex problems. For example, the few-shot example technique may be described as a technique that helps the model understand its task by providing a small number of examples when the model performs a new task.

[0075] For example, the electronic device (100) may use a Parameter-Efficient Fine-Tuning (PEFT) technique to classify words for keywords (415) based on user input (405). For example, the PEFT technique may be described as a technique for fine-tuning an LLM to perform a specified task. For example, the PEFT technique may include a Low-Rank Adaptation (LoRA) technique.

[0076] According to one embodiment, when at least one processor (207) receives user input (605) representing text, it may divide the text represented by the user input (605) into a plurality of parts. For example, each of the plurality of parts may be one of a word, a phrase, and a morpheme. For example, at least one processor (207) may identify one or more parts by dividing the text based on a predetermined unit (e.g., a morpheme). For example, at least one processor (207) may identify whether the identified one or more parts are identified within a set of metadata corresponding to media content (420) stored in memory (206). For example, at least one processor (207) may determine whether each of the identified one or more parts is identical to one of the values ​​represented by the set of metadata. For example, at least one processor (207) can obtain or generate filtering information (410) corresponding to a user input (605) representing text without using a trained model (e.g., a language model) based on a determination that all of the identified one or more parts are identified within the set of metadata. For example, at least one processor (207) can obtain or generate text data (620) corresponding to the user input (605) through the trained model based on a determination that at least one of the identified one or more parts is not identified within the set of metadata. For example, at least one processor (207) can reduce the power consumed to generate a media collection (e.g., the media collection (820) of FIG. 8) when not using the trained model.For example, the power consumed to generate the media collection using the trained model may be greater than the power consumed to generate the media collection without using the trained model. For example, at least one processor (207) can reduce the time required to generate the media collection when the trained model is not used.

[0077] According to one embodiment, at least one processor (207) can identify whether a noun requiring matching is included in the text by using a user input (605) representing text. For example, at least one processor (207) can identify or obtain one or more parts of the text based on the user input (605). For example, each of the one or more parts may be based on one of a morpheme, a word, and a phrase. For example, at least one processor (207) can identify whether matching information is required for each of the one or more parts. For example, at least one processor (207) may decide to obtain the matching information based on a determination that at least one of the one or more parts corresponds to a preset keyword. For example, the matching information may be described as information for matching at least one of the one or more parts with a visual object identified within the media content. For example, at least one processor (207) may determine to obtain the matching information based on identifying that at least one of the one or more parts corresponds to a keyword regarding a person relationship (or pet). For example, at least one processor (207) may display a User Interface (UI) for obtaining the matching information through a display (not shown). For example, at least one processor (207) may obtain the matching information based on receiving another user input to the UI. For example, the other user input may be described as an input for matching a visual object identified within the media content with at least one of the one or more parts.

[0078]

[0079] FIG. 6b is a flowchart illustrating the operation of an electronic device for determining media content using an embedding vector according to an embodiment of the present disclosure. This method may be executed by the electronic device (100) illustrated in FIG. 2 or at least one processor (207) of the electronic device (100).

[0080] Referring to FIG. 6b, in operation 660, at least one processor (207) can determine whether to obtain an embedding vector using filtering information (410). For example, at least one processor (207) can execute operation 670 based on a decision to obtain an embedding vector using filtering information (410), and execute operation 680 based on a decision not to obtain an embedding vector using filtering information (410).

[0081] According to one embodiment, at least one processor (207) can identify the first media contents (430) among the media contents (420) by using similarity between embedding vectors. For example, at least one processor (207) can identify similarities between other embedding vectors represented by a set of metadata for the media contents (420) and embedding vectors corresponding to values ​​identified by filtering information (410) and user input (405). For example, at least one processor (207) can calculate or obtain an embedding vector corresponding to the text when identifying a text (or word) corresponding to a predetermined keyword (e.g., a keyword for person description and a keyword for object description, etc.) through text data (e.g., text data (620) of FIG. 6a). For example, at least one processor (207) may determine to obtain a similarity for an embedding vector corresponding to the text based on identifying the text corresponding to the predetermined keyword. For example, the similarity may be obtained through one of the Jaccard similarity method, the cosine similarity method, the Euclidean similarity method, and the Manhattan similarity method.

[0082] According to one embodiment, at least one processor (207) can calculate or identify an embedding vector corresponding to the text when identifying text (or words) (e.g., wearing a yellow shirt) assigned to the predetermined keyword within the text data (620). At least one processor (207) can represent the text into a predetermined vocabulary through a tokenizer. For example, at least one processor (207) can calculate or identify an embedding vector corresponding to the text by performing positional encoding using an encoder.

[0083] In operation 670, at least one processor (207) may add media content corresponding to the metadata to the first media content (430) based on identifying metadata corresponding to a similarity that exceeds a threshold similarity. For example, the metadata may correspond to each media content stored in memory (206) and may represent each media content. For example, at least one processor (207) may calculate or identify similarities between other embedding vectors represented by an embedding vector represented by a value corresponding to a keyword (412-1) and a set of metadata of media content (420). For example, at least one processor (207) may add at least one media content corresponding to the similarity to the first media content (430) based on identifying, among the similarities, a similarity that exceeds a threshold similarity. For example, at least one processor (207) can obtain a first embedding vector representing a first value corresponding to a first keyword (e.g., keyword (412-1)) based on user input (405) representing an image. For example, at least one processor (207) can obtain similarities between the first embedding vector and second embedding vectors representing second values ​​(e.g., Seoul, Busan, Jeju Island) corresponding to the first keyword (e.g., place), each corresponding to media contents (420) stored in memory (206). For example, at least one processor (207) can determine the media contents corresponding to similarities exceeding a threshold similarity among the similarities as the first media contents (430).

[0084] According to one embodiment, at least one processor (207) may change the method for identifying media content based on the determination that the number of first media content (430) identified using keywords (412-1, 412-2, ..., 412-N) corresponding to values ​​representing user input (405) is outside a reference range (e.g., 20 to 500). For example, if the value assigned to keyword (412-1) is a first value and a second value, at least one processor (207) may change the method for identifying (or searching) media content from a first method for searching media content representing said first value and said second value to a second method for searching media content representing said first value or said second value. For example, at least one processor (207) can identify media contents (420) in the second manner using keywords (412-1, 412-2, ..., 412-N) corresponding to some of the values ​​representing user input (405). For example, at least one processor (207) can create a media collection (e.g., the media collection (820) of FIG. 8) using the identified media contents based on a determination that the number of the identified media contents is within a reference range.

[0085] According to one embodiment, at least one processor (207) can identify, based on user input (405), third media contents corresponding to a first value and a second value for a first keyword among the media contents (420) stored in memory (206). For example, at least one processor (207) can identify fourth media contents corresponding to the first value or the second value among the media contents (420) stored in memory (206) based on identifying the number of third media contents that is smaller than another reference number. For example, at least one processor (207) can determine the fourth media contents as the first media contents (430) based on identifying the number of fourth media contents that is smaller than the reference number and larger than the other reference number.

[0086] According to one embodiment, a change in the method for searching or identifying media content has been described, but the embodiment is not limited thereto. For example, at least one processor (207) can identify whether the number of identified media content (e.g., first media content (430)) among the media content (420) falls within a reference range within a preset number of times (e.g., 10). For example, if the preset number of times is 10, at least one processor (207) can search the media content only up to 10 times to identify media content representing a number within the reference range among the media content (420). For example, based on the determination that at least one processor (207) failed to identify or search for media content representing a number within the reference range during the 10 searches, the processor (207) can create a media collection (e.g., media collection (820) of FIG. 8) using media content representing a number close to the reference range.

[0087] In operation 680, at least one processor (207) may add media content related to a value indicated by filtering information (410) to the first media content (430) based on a decision not to obtain an embedding vector using filtering information (410). For example, at least one processor (207) may identify media content corresponding to metadata related to said value among the media content stored in memory (206) based on a decision not to correspond to a predetermined keyword (e.g., person description and object description) based on a decision not to correspond to a value indicated by filtering information (410). For example, at least one processor (207) may add media content corresponding to metadata indicating a value (substantially) identical to said value to the first media content (430). For example, at least one processor (207) can add media content corresponding to metadata representing a value within a reference range for a value represented by filtering information (410) to the first media content (430).

[0088] FIG. 7 is a flowchart illustrating the operation of an electronic device that generates a media collection based on the number of one or more media contents according to an embodiment of the present disclosure. This method may be executed by the electronic device (100) illustrated in FIG. 2 or by at least one processor (207) of the electronic device (100). In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0089] Referring to FIG. 7, in operation 710, at least one processor (207) can identify a second keyword among the remaining keywords for the identified keyword representing the user input (405). For example, at least one processor (207) can identify or determine at least one keyword corresponding to at least one value representing the user input (405) from among the keywords (415) by using filtering information (410) based on the user input (405). For example, at least one processor (207) can determine the remaining keywords for at least one keyword corresponding to the at least one value from among the keywords (415). For example, at least one processor (207) can randomly determine the second keyword from among the remaining keywords. For example, the second keyword can be described as one of the remaining keywords. For example, the remaining keywords mentioned above may be described as at least one keyword that is not a keyword corresponding to at least one value representing user input (405) among the keywords (415).

[0090] In operation 720, at least one processor (207) may execute operation 730 based on a determination that the number of one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) associated with one of the values ​​for the second keyword (e.g., place) (e.g., Seoul, Busan, Jeju Island, etc.) is within a reference range (e.g., 10 to 30), and may execute operation 740 based on a determination that the number of one or more second media contents (810) is not within the reference range. For example, at least one processor (207) may identify the values ​​for the second keyword using a set of metadata representing the first media contents (430). For example, the values ​​may be represented by the set. For example, at least one processor (207) can search for or identify media content related to the values ​​among the first media content (430) using a set of metadata representing the first media content (430).

[0091] At least one processor (207) may randomly determine one of the values. For example, at least one processor (207) may identify the number of one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) associated with the determined one. At least one processor (207) may identify the number of one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) corresponding to a value substantially identical to the determined one. For example, at least one processor (207) may determine or identify one or more media contents among the first media contents (430) corresponding to metadata representing the substantially identical value as one or more second media contents (810). Although the operation of identifying the number of one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) corresponding to the same value as the one determined above has been described above, the embodiments are not limited thereto. For example, at least one processor (207) can identify the number of one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) corresponding to a value similar to the one determined above (e.g., when the similarity between embedding vectors exceeds a threshold similarity). For example, at least one processor (207) can identify the number of one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) corresponding to values ​​included within a reference range (e.g., [1672531200, 1704067199] expressed in Unix timestamps) for the one determined above (e.g., 1700000000).For example, at least one processor (207) can identify whether the number of identified one or more second media contents (e.g., one or more second media contents (810) of FIG. 8) is within a reference range (e.g., 30 to 100). For example, the reference range may be pre-set. For example, the reference range may be changed based on user input.

[0092] In operation 730, at least one processor (207) may generate a media collection (e.g., a media collection (820) of FIG. 8) comprising one or more second media contents (e.g., one or more second media contents (810) of FIG. 8). For example, at least one processor (207) may generate the media collection comprising the one or more second media contents based on a determination that the number of the one or more second media contents is within a reference range. For example, at least one processor (207) may display the media collection through a display (not shown). For example, the generation of the media collection will be described later with reference to FIG. 8.

[0093] In operation 740, at least one processor (207) can identify a third keyword among the remaining keywords based on a determination that the number of one or more second media contents (810) is not within a reference range. For example, at least one processor (207) can randomly identify or determine a third keyword different from the second keyword among the remaining keywords for at least one keyword for which at least one value representing the user input (405) has been identified, based on a determination that the number of one or more second media contents (810) is outside the reference range.

[0094] In operation 750, at least one processor (207) can identify the number of one or more third media contents associated with one of the values ​​for the third keyword. For example, at least one processor (207) can randomly identify one of the values ​​for the third keyword using a set of metadata corresponding to the first media contents (430).

[0095] In operation 760, at least one processor (207) can generate a media collection (e.g., the media collection (820) of FIG. 8) containing the one or more third media contents based on a determination that the number of the one or more third media contents is within a reference range.

[0096] According to one embodiment, at least one processor (207) can identify a value corresponding to a keyword for a story topic using filtering information (410). For example, at least one processor (207) can identify whether a value is assigned to a keyword for a story topic using filtering information (410). For example, when a value representing a keyword for a story topic is identified through the filtering information (410), at least one processor (207) can identify media content corresponding to the value among the first media content (430). For example, at least one processor (207) can create a media collection (820) using the identified media content.

[0097] According to one embodiment, at least one processor (207) can determine a value corresponding to a keyword for a story topic based on the inability to identify a value corresponding to a keyword for a story topic using filtering information (410). For example, at least one processor (207) can determine a value corresponding to a keyword for a story topic as a predetermined value. For example, at least one processor (207) can identify media content corresponding to the predetermined value among the first media content (430). For example, at least one processor (207) can create a media collection (820) using the identified media content. Although the operation of creating a media collection (820) using a value of a keyword for a story topic has been described above, the embodiment is not limited thereto. For example, based on identifying that no value is assigned to the keyword for the story topic, at least one processor (207) may generate a media collection (820) using a predetermined value for the keyword for the story topic, or generate a media collection (820) using a keyword different from the keyword for the story topic. For example, the operation of generating a media collection (820) using the predetermined value and the operation of generating a media collection (820) using the different keyword may be performed randomly. For example, based on identifying that no value is assigned to the keyword for the story topic, at least one processor (207) may randomly identify a second keyword among the remaining keywords (412-1, 412-2, ..., 412-N) corresponding to at least one value representing the user input (405) in order to identify media content to be included in the media collection (820). For example, the operation of identifying the second keyword above may be referred to as operation 710 of FIG. 7.

[0098] According to one embodiment, the electronic device (100) may include a display (not shown). For example, the electronic device (100) may receive user input (405) through the display. For example, the electronic device (100) may display a User Interface (UI) for receiving user input (405) through the display. For example, the electronic device (100) may display visual objects for receiving user input (405) (e.g., visual objects of FIG. 9a (912, 914, 916)) through the display. For example, the visual objects for receiving user input (405) will be described later with reference to FIG. 9a through 9c.

[0099] FIG. 8 illustrates the operation of an electronic device that generates a media collection by arranging one or more second media contents according to an embodiment of the present disclosure.

[0100] Referring to FIG. 8, at least one processor (207) can generate a media collection (820) using one or more second media contents (810) and filtering information (410). For example, at least one processor (207) can use the filtering information (410) to identify whether an arrangement order for one or more second media contents (810) appears by user input (405). For example, at least one processor (207) can use the filtering information (410) to identify an arrangement order for one or more second media contents (810) and generate a media collection (820) in which one or more second media contents (810) are arranged according to said arrangement order. For example, at least one processor (207) can use the filtering information (410) to identify a value for a keyword (e.g., arrangement order of media contents) indicating the order of media contents. For example, at least one processor (207) can generate a media collection (820) in which one or more second media contents (810) are arranged based on the identified value. For example, at least one processor (207) can determine an order for arranging media contents to be included in the media collection (820) based on user input (405). For example, at least one processor (207) can generate a media collection (820) including media contents arranged based on the order using the first media contents (430) and one or more second media contents (810).

[0101] According to one embodiment, at least one processor (207) may not identify a method for arranging one or more second media contents (810) using filtering information (410). For example, a value indicating a method for arranging one or more second media contents (810) may not appear in the user input (405) (or filtering information (410)). For example, if the method is not identified, at least one processor (207) may arrange one or more second media contents (810) according to a preset order. For example, the preset order may include an order according to the time the media contents were acquired, an order according to the flow of time identified within the media contents (e.g., the order of morning, lunch, and evening), an order according to the change of seasons (e.g., the order of spring, summer, autumn, and winter), an order according to the age of a person identified within the media contents, an order according to user preference, and an order according to keywords.

[0102] According to one embodiment, when at least one processor (207) acquires a user input (405) based on natural language, one or more second media contents (810) may be arranged according to the order in which words identified through the user input (405) are acquired. For example, when at least one processor (207) acquires the user input (405), the first word and the second word may be identified or acquired sequentially. For example, when at least one processor (207) creates a media collection using one or more second media contents (810) related to the first word and the second word, the first word and the second word may be created or acquired a media collection in which at least one media content related to the first word and at least one media content related to the second word are arranged sequentially.

[0103]

[0104] FIGS. 9a to 9c illustrate the operation of an electronic device displaying a User Interface (UI) for receiving user input according to various embodiments of the present disclosure.

[0105] Referring to FIG. 9a, the state (910) can be described as a state in which visual objects (912, 914, 916) for receiving user input (405) are displayed. For example, the electronic device (100) may include the display. For example, at least one processor (207) may display visual objects (912, 914, 916) for receiving user input (405) through the display. For example, the visual objects (912, 914, 916) may represent a set of recommended text. For example, at least one processor (207) may generate recommended text by analyzing media content (420) stored in memory (206). For example, at least one processor (207) may identify the distribution of keywords and the distribution of values ​​corresponding to the keywords using filtering information (410). For example, at least one processor (207) can obtain, identify, or generate recommended text using the distribution of the keywords and the distribution of the values. For example, at least one processor (207) can generate a media collection (820) corresponding to at least one based on receiving user input for at least one of the visual objects (912, 914, 916).

[0106] According to one embodiment, at least one processor (207) may receive user input (405) representing text. For example, at least one processor (207) may display a UI object (905) for receiving user input representing text based on natural language through the display. For example, at least one processor (207) may display the text on the UI object (905) based on receiving user input representing text based on natural language. For example, at least one processor (207) may generate a media collection (820) corresponding to the text based on receiving the user input.

[0107] According to one embodiment, the electronic device (100) may include the microphone. For example, at least one processor (207) may acquire audio through the microphone. For example, at least one processor (207) may receive user input representing the audio. For example, at least one processor (207) may receive user input representing the audio based on receiving user input regarding a visual object (907). For example, at least one processor (207) may acquire text corresponding to the audio by performing speech recognition on the audio represented by the user input. For example, at least one processor (207) may perform speech-to-text (STT) on the audio. For example, at least one processor (207) may acquire filtering information (410) based on the text corresponding to the audio. For example, at least one processor (207) may generate a media collection (820) corresponding to the audio.

[0108]

[0109] Referring to FIG. 9b, the state (920) can be described as a state in which visual objects (922, 924, 926, 928) are displayed to receive user input (405). For example, at least one processor (207) can display the visual objects (922, 924, 926, 928) through the display. For example, the visual objects (922, 924, 926, 928) may correspond to each of the recommended images. For example, at least one processor (207) can identify or obtain the distribution of keywords and the distribution of values ​​identified within the keywords by analyzing filtering information (410) obtained based on the user input (405). For example, at least one processor (207) can identify or determine the recommended images among the media contents (420) using the distribution of the keywords and the distribution of the values. For example, at least one processor (207) can generate a media collection (820) using the identified recommended image.

[0110] According to one embodiment, at least one processor (207) may receive a user input (405) representing at least one of the media contents (420) stored in memory (206). For example, at least one processor (207) may generate a media collection (820) based on generating filtering information (410) using the at least one.

[0111] According to one embodiment, the electronic device (100) may include a Video See-Through (VST) device. For example, the VST device may include a first camera (not shown) for acquiring images of the environment surrounding the VST device, and a second camera (not shown) for acquiring images of the face (or expression) of the user of the VST device. For example, the VST device may include a display (not shown). For example, the VST device may create a media collection (820) using a screen displayed through the display. For example, the screen may display images acquired through the first camera. For example, the VST device may create a media collection (820) by acquiring filtering information (410) based on user input (405) representing the screen. For example, the VST device may create a media collection (820) by acquiring filtering information (410) based on user input (405) representing images acquired through the second camera.

[0112]

[0113] Referring to FIG. 9c, the state (930) can be described as a state in which a visual object (935) representing text data based on user input (405) is displayed. For example, at least one processor (207) may display a visual object (935) representing text data (620) obtained based on user input (405) through the display based on receiving user input (405). For example, at least one processor (207) may display a visual object (935) through the display based on receiving user input (405) representing natural language, such as "I want to create a story about traveling to Jeju Island with my daughter last summer." For example, at least one processor (207) may provide feedback to the user of the electronic device (100) to create a media collection by displaying the visual object (935). For example, the user can understand how the generated media collection (820) is identified within the electronic device (100) through a visual object (935).

[0114]

[0115] FIG. 10 is a block diagram of an electronic device in a network environment according to various embodiments.

[0116] Referring to FIG. 10, in a network environment (1000), an electronic device (1001) may communicate with an electronic device (1002) through a first network (1098) (e.g., a short-range wireless communication network) or with at least one of an electronic device (1004) or a server (1008) through a second network (1099) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (1001) may communicate with the electronic device (1004) through a server (1008). According to one embodiment, the electronic device (1001) may include a processor (1020), memory (1030), input module (1050), sound output module (1055), display module (1060), audio module (1070), sensor module (1076), interface (1077), connection terminal (1078), haptic module (1079), camera module (1080), power management module (1088), battery (1089), communication module (1090), subscriber identification module (1096), or antenna module (1097). In some embodiments, at least one of these components (e.g., connection terminal (1078)) may be omitted from the electronic device (1001), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (1076), camera module (1080), or antenna module (1097)) may be integrated into a single component (e.g., display module (1060)).

[0117] The processor (1020) can, for example, execute software (e.g., program (1040)) to control at least one other component (e.g., hardware or software component) of the electronic device (1001) connected to the processor (1020) and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (1020) can store commands or data received from other components (e.g., sensor module (1076) or communication module (1090)) in volatile memory (1032), process the commands or data stored in volatile memory (1032), and store the resulting data in non-volatile memory (1034). According to one embodiment, the processor (1020) may include a main processor (1021) (e.g., a central processing unit or an application processor) or an auxiliary processor (1023) that can operate independently or together with it (e.g., a graphics processing unit, a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device (1001) includes a main processor (1021) and an auxiliary processor (1023), the auxiliary processor (1023) may be configured to use lower power than the main processor (1021) or to be specialized for a specified function. The auxiliary processor (1023) may be implemented separately from the main processor (1021) or as part thereof.

[0118] The auxiliary processor (1023) may control at least some of the functions or states associated with at least one component of the electronic device (1001) (e.g., display module (1060), sensor module (1076), or communication module (1090)) on behalf of the main processor (1021) while the main processor (1021) is in an inactive (e.g., sleep) state, or together with the main processor (1021) while the main processor (1021) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (1023) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (1080) or communication module (1090)). According to one embodiment, the auxiliary processor (1023) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (1001) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (1008)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, the artificial intelligence model may include a software structure, either additionally or substantially.

[0119] The memory (1030) can store various data used by at least one component of the electronic device (1001) (e.g., a processor (1020) or a sensor module (1076)). The data may include, for example, input data or output data for software (e.g., a program (1040)) and related commands. The memory (1030) may include volatile memory (1032) or non-volatile memory (1034). The non-volatile memory may include internal memory (1036) or external memory (1038).

[0120] The program (1040) may be stored as software in memory (1030) and may include, for example, an operating system (1042), middleware (1044), or an application (1046).

[0121] The input module (1050) can receive commands or data to be used for a component of the electronic device (1001) (e.g., processor (1020)) from outside the electronic device (1001) (e.g., user). The input module (1050) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

[0122] The sound output module (1055) can output a sound signal to the outside of the electronic device (1001). The sound output module (1055) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.

[0123] The display module (1060) can visually provide information to an external (e.g., user) of the electronic device (1001). The display module (1060) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (1060) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.

[0124] The audio module (1070) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (1070) can acquire sound through the input module (1050) or output sound through the sound output module (1055) or an external electronic device (e.g., electronic device (1002)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (1001).

[0125] The sensor module (1076) can detect the operating state of the electronic device (1001) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (1076) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

[0126] The interface (1077) may support one or more specified protocols that can be used for the electronic device (1001) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (1002)). According to one embodiment, the interface (1077) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

[0127] The connection terminal (1078) may include a connector through which the electronic device (1001) can be physically connected to an external electronic device (e.g., electronic device (1002)). According to one embodiment, the connection terminal (1078) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).

[0128] The haptic module (1079) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module (1079) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.

[0129] The camera module (1080) can capture still images and video. According to one embodiment, the camera module (1080) may include one or more lenses, image sensors, image signal processors, or flashes.

[0130] The power management module (1088) can manage power supplied to the electronic device (1001). According to one embodiment, the power management module (1088) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).

[0131] The battery (1089) can supply power to at least one component of the electronic device (1001). According to one embodiment, the battery (1089) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

[0132] The communication module (1090) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (1001) and an external electronic device (e.g., electronic device (1002), electronic device (1004), or server (1008)), and the performance of communication through the established communication channel. The communication module (1090) may include one or more communication processors that operate independently of the processor (1020) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (1090) may include a wireless communication module (1092) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (1094) (e.g., LAN (local area network) communication module, or power line communication module). The corresponding communication module among these communication modules can communicate with an external electronic device (1004) through a first network (1098) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (1099) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN)). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (1092) can identify or authenticate the electronic device (1001) within a communication network such as the first network (1098) or the second network (1099) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (1096).

[0133] The wireless communication module (1092) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (1092) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (1092) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (1092) can support various requirements specified in the electronic device (1001), external electronic device (e.g., electronic device (1004)), or network system (e.g., second network (1099)). According to one embodiment, the wireless communication module (1092) can support a Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mMTC, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for realizing URLLC.

[0134] An antenna module (1097) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (1097) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (1097) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (1098) or a second network (1099), may be selected from the plurality of antennas, for example, by a communication module (1090). A signal or power may be transmitted or received between the communication module (1090) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (1097).

[0135] According to various embodiments, the antenna module (1097) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.

[0136] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.

[0137] According to one embodiment, commands or data may be transmitted or received between the electronic device (1001) and an external electronic device (1004) through a server (1008) connected to a second network (1099). Each of the external electronic devices (1002, or 1004) may be the same or a different type of device as the electronic device (1001). According to one embodiment, all or part of the operations performed on the electronic device (1001) may be performed on one or more of the external electronic devices (1002, 1004, or 1008). For example, if the electronic device (1001) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (1001) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (1001). The electronic device (1001) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (1001) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (1004) may include an Internet of Things (IoT) device. The server (1008) may be an intelligent server using machine learning and / or neural networks.According to one embodiment, an external electronic device (1004) or server (1008) may be included within the second network (1099). The electronic device (1001) may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

[0138] Some of the operations described above may be executed (or performed) through an artificial intelligence (AI) system described with reference to FIG. 11.

[0139] Figure 11 is a schematic diagram of an exemplary AI (Artificial Intelligence) system.

[0140] Referring to FIG. 11, the AI ​​system (1100) may include an input / output interface (1110), an AI framework (1120), a generative AI model (1130), and / or a knowledge repository (1190).

[0141] The input / output interface (1110) can receive input. The input may include user input and / or data obtained or generated by an electronic device (e.g., the electronic device (100) or electronic device (1001) described above). The above data may include images, videos, and / or sensor data generated by at least one processor of the electronic device (e.g., at least one processor (207) or processor (1020)) (e.g., illumination data around the electronic device obtained from a sensor or sensor hub (e.g., auxiliary processor (1023)), posture data (or orientation data) of the electronic device, temperature inside the electronic device (e.g., temperature of the display or temperature of at least one processor (207)), size information of the display area of ​​the display, and / or images obtained through an image sensor of the electronic device (e.g., included in a camera module (1080)). The user input may include natural language, touch data obtained through a touch circuit included in the display panel (e.g., used to identify input from a finger and / or stylus), images displayed (and / or to be displayed) on the display panel, and / or videos. As an example, without limitation, the user input may be received by an input / output interface (1110) along with context information. The context information is It may be described as additional information obtained in relation to the above user input. The above context information may be related to the state at the time the user input is received (e.g., the state of the electronic device and / or the state of the surroundings of the electronic device (e.g., user state)). For example, the above context information may include information about one or more software applications executed within the electronic device at the time the user input is received.For example, the above situation information may include information about the location of the electronic device (or the location of the user of the electronic device) when the user input is received. For example, the user input may be integrated with the situation information. For example, the user input with the situation information integrated as the input may be received by the input / output interface (1110).

[0142] The input / output interface (1110) may transmit (or provide) an output. The output may include a result (or result information) generated or obtained by the AI ​​system (1100) based on at least part of the input. The format of the output may vary. For example, the output may include natural language. For example, the output may include content (e.g., media content and / or multimedia content). For example, the output may include an action related to the user of the electronic device. For example, the output may have a format according to the user settings of the electronic device.

[0143] The input / output interface (1110) can be described as a user question / response interface (1110).

[0144] The AI ​​framework (1120) can be used to obtain information (or data) about the input from the input / output interface (1110) and to control one or more components related to the AI ​​system (1100) using the obtained information.

[0145] For example, a prompt design component (1121) within an AI framework (1120) can generate or obtain prompts for a generative AI model (1130) (e.g., including a large language model (LLM) or a large multimodal model (LMM)) using the acquired information. For example, the prompt design component (1121) may be described as an AI component that uses a learning algorithm and / or a neural network to provide prompts that are enhanced over time. For example, the prompt design component (1121) can generate or obtain prompts by accessing a knowledge component (e.g., a knowledge repository (1190)) containing user preference data, a prompt library, and / or prompt examples using the acquired information. The generated prompts may be provided to the generative AI model (1130) (e.g., including an LLM or LMM).

[0146] For example, an API / plugin management component (1122) within the AI ​​framework (1120) may be used to support communication for additional information requested (or induced) in relation to the prompt provided (or to be provided) to the generative AI model (1130). For example, the API / plugin management component (1122) may be used to create or establish a channel for communication with various data sources (e.g., knowledge repository (1190)). For example, the API / plugin management component (1122) may support access to at least some of the data sources. For example, the API / plugin management component (1122) may be used to request another component (e.g., application / service component (1180)) that performs feedback (or response) according to the prompt. As a non-limiting example, information obtained (or generated) through the API / plugin management component (1122) may be provided to the prompt design component (1121) for generating a prompt. As a non-limiting example, information obtained (or generated) through the API / plugin management component (1122) may be provided to the generative AI model (1130).

[0147] For example, an improvement component (1123) within the AI ​​framework (1120) can at least partially tune (or adjust) (or change) the result (e.g., content) obtained (or output) from the generative AI model (1130). For example, the improvement component (1123) can determine or verify whether the content obtained from the generative AI model (1130) is related to the input. For example, the improvement component (1123) can determine or verify whether the content obtained from the generative AI model (1130) contains biased content. For example, the improvement component (1123) can determine or verify whether the content obtained from the generative AI model (1130) contains harmful content. For example, the improvement component (1123) can support or assist in performing additional processing to improve the content obtained from the generative AI model (1130). For example, the improvement component (1123) may support providing a hint to the user to improve the content.

[0148] A generative AI model (1130) can be described as an artificial intelligence neural network that generates feedback in response to a prompt. For example, the feedback may include additional data and / or information relative to the prompt, but relative to the prompt. For example, the feedback may include new content relative to the prompt. For example, the generative AI model (1130) may include a model that generates images and / or a model that generates language. For example, the model that generates images may include a generative adversarial network (GAN) and / or a variational autoencoder (VAE). For example, the model that generates images may include a diffusion-based generative model (e.g., a transformer VAE). For example, the model that generates language may include CHAT-GPT 3 and / or CHAT-GPT 4. For example, the generative AI model (1130) may include an LMM that generates the feedback by recognizing text, images, and / or speech.

[0149] As an example without limitation, the AI ​​framework (1120) and / or generative AI model (1130) may be included within an AI module (e.g., including a processing circuit) within the electronic device. For example, the AI ​​module may be operatively coupled with at least one processor of the electronic device (e.g., at least one processor (207) or processor (1020)). For example, the AI ​​module may be operatively coupled with a display driving circuit of the electronic device (e.g., a display driving circuit or a DDI). For example, the AI ​​module may be operatively coupled with a sensor hub of the electronic device for one or more sensors within the electronic device.

[0150]

[0151] The technical problems to be solved in this disclosure are not limited to those mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art to which this disclosure pertains.

[0152]

[0153] An electronic device as described above (e.g., electronic device (100)) may include a memory (e.g., memory (206)) for storing instructions. The electronic device may include at least one processor (e.g., at least one processor (207)). The instructions may cause the electronic device to receive user input for creating a media collection containing media content when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to identify first media content corresponding to a first keyword included in the user input among the media content stored in the memory when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to identify a second keyword among the keywords assigned to the first media content based on identifying the number of the first media content that is smaller than a reference number when executed individually or collectively by the at least one processor. The above instructions may cause the electronic device to identify one or more second media contents corresponding to the second keyword when executed individually or collectively by the at least one processor. The above instructions may cause the electronic device to create the media collection using the first media contents and the one or more second media contents when executed individually or collectively by the at least one processor.

[0154] According to one embodiment, the first keyword may correspond to a value representing the user input. The second keyword may be included among the remaining keywords for the first keyword among the assigned keywords for representing media content. When the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to identify the number of one or more second media contents associated with a value corresponding to the second keyword based on identifying the second keyword. When the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to generate the media collection containing the one or more second media contents based on a determination that the number of the one or more second media contents is within a reference range.

[0155] According to one embodiment, the instructions may cause the electronic device to identify a third keyword among the remaining keywords based on a determination that the number of the one or more second media contents is outside the reference range when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to identify the number of one or more third media contents associated with other values ​​corresponding to the third keyword based on the identification of the third keyword when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to generate the media collection containing the one or more third media contents based on a determination that the number of the one or more third media contents is within the reference range when executed individually or collectively by the at least one processor.

[0156] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, they may cause the electronic device to obtain filtering information for identifying the first media contents, which represents at least one value corresponding to at least one of the assigned keywords for representing media contents based on the user input. When the instructions are executed individually or collectively by the at least one processor, they may cause the electronic device to identify the first keyword corresponding to the user input among the assigned keywords using the filtering information.

[0157] According to one embodiment, the filtering information can be obtained by combining first text data obtained through the user input representing text based on natural language, and second text data obtained through another user input representing other text based on natural language.

[0158] According to one embodiment, the instructions may cause the electronic device to obtain text data representing text corresponding to at least one keyword among the assigned keywords, using the user input representing text based on natural language when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to obtain filtering information for searching a database of media content using the text data when executed individually or collectively by the at least one processor.

[0159] According to one embodiment, the text data can be obtained through a language model trained to output at least one word corresponding to at least one of the assigned keywords using the text.

[0160] According to one embodiment, the instructions may cause the electronic device to obtain a first embedding vector representing a first value corresponding to the first keyword based on the user input representing an image, when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to obtain similarities between the first embedding vector and second embedding vectors, each corresponding to media contents stored in the memory and representing second values ​​corresponding to the first keyword, when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to determine the media contents corresponding to a similarity exceeding a threshold similarity among the similarities as the first media contents when executed individually or collectively by the at least one processor.

[0161] According to one embodiment, the instructions may cause the electronic device to determine an order for arranging media contents to be included in the media collection based on the user input when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to generate the media collection including the media contents arranged based on the order, using the first media contents and the one or more second media contents when executed individually or collectively by the at least one processor.

[0162] According to one embodiment, the user input may represent at least some of text, images, videos, and audio.

[0163] According to one embodiment, the instructions may cause the electronic device to identify, among the media contents stored in the memory, third media contents corresponding to a first value and a second value for a first keyword when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to identify, among the media contents stored in the memory, fourth media contents corresponding to the first value or the second value based on identifying a number of the third media contents that is smaller than another reference number when executed individually or collectively by the at least one processor. The instructions may cause the electronic device to determine the fourth media contents as the first media contents based on identifying a number of the fourth media contents that is smaller than the reference number and larger than the other reference number when executed individually or collectively by the at least one processor.

[0164]

[0165] A method performed by an electronic device (e.g., electronic device (100)) having a memory (e.g., memory (206)) as described above may include receiving user input for creating a media collection containing media content. The method may include identifying first media content corresponding to a first keyword included in the user input among the media content stored in the memory. The method may include identifying a second keyword among the keywords assigned to the first media content based on identifying the number of the first media content that is smaller than a reference number. The method may include identifying one or more second media content corresponding to the second keyword. The method may include creating the media collection using the first media content and the one or more second media content.

[0166] According to one embodiment, the first keyword may correspond to a value representing the user input. The second keyword may be included among the remaining keywords for the first keyword among the assigned keywords for representing media content. The method may include an operation of identifying the number of one or more second media contents associated with a value corresponding to the second keyword based on identifying the second keyword. The method may include an operation of creating the media collection including the one or more second media contents based on a determination that the number of the one or more second media contents is within a reference range.

[0167] According to one embodiment, the method may include an operation of identifying a third keyword among the remaining keywords based on a determination that the number of the one or more second media contents is outside the reference range. Based on identifying the third keyword, the method may include an operation of identifying the number of one or more third media contents associated with other values ​​corresponding to the third keyword. Based on a determination that the number of the one or more third media contents is within the reference range, the method may include an operation of creating the media collection including the one or more third media contents.

[0168] According to one embodiment, the method may include an operation of obtaining filtering information for identifying the first media contents, wherein, based on the user input, at least one value corresponding to at least one of the assigned keywords for representing the media contents is represented. The method may include an operation of identifying the first keyword corresponding to the user input among the assigned keywords using the filtering information.

[0169] According to one embodiment, the filtering information can be obtained by combining first text data obtained through the user input representing text based on natural language, and second text data obtained through another user input representing other text based on natural language.

[0170] According to one embodiment, the method may include an operation of obtaining text data representing text corresponding to at least one keyword among the assigned keywords, using the user input representing text based on natural language. The method may include an operation of obtaining filtering information for searching a database of media content using the text data.

[0171] According to one embodiment, the text data can be obtained through a language model trained to output at least one word corresponding to at least one of the assigned keywords using the text.

[0172] According to one embodiment, the method may include an operation of obtaining a first embedding vector representing a first value corresponding to the first keyword based on the user input representing an image. The method may include an operation of obtaining similarities between the first embedding vector and second embedding vectors, each corresponding to media contents stored in the memory and each representing second values ​​corresponding to the first keyword. The method may include an operation of determining the media contents corresponding to similarities exceeding a threshold similarity among the similarities as the first media contents.

[0173] According to one embodiment, the method may include an operation of determining an order for arranging media contents to be included in the media collection based on the user input. The method may include an operation of generating the media collection including the media contents arranged based on the order using the first media contents and the one or more second media contents.

[0174] According to one embodiment, the user input may represent at least some of text, images, videos, and audio.

[0175] According to one embodiment, the method may include an operation of identifying third media contents corresponding to a first value and a second value for the first keyword among the media contents stored in the memory. The method may include an operation of identifying fourth media contents corresponding to the first value or the second value among the media contents stored in the memory based on identifying the number of the third media contents that is smaller than another reference number. The method may include an operation of determining the fourth media contents as the first media contents based on identifying the number of the fourth media contents that is smaller than the reference number and larger than the other reference number.

[0176]

[0177] In a computer-readable storage medium in which one or more programs are stored as described above, the one or more programs may include instructions that cause the electronic device (e.g., electronic device (100)) having a memory (e.g., memory (206)) to receive user input for creating a media collection containing media contents when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to identify first media contents corresponding to a first keyword included in the user input among the media contents stored in the memory when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to identify a second keyword among the keywords assigned to the first media contents based on identifying a number of the first media contents that is smaller than a reference number when executed by the electronic device. The above one or more programs may include instructions that cause the electronic device to identify one or more second media contents corresponding to the second keyword when executed by the electronic device. The above one or more programs, when executed by the electronic device, utilize the first media contents and the one or more second media contents, the

[0178] It may include instructions that cause the electronic device to generate a media collection.

[0179] According to one embodiment, the first keyword may correspond to a value representing the user input. The second keyword may be included among the remaining keywords for the first keyword among the assigned keywords for representing media content. The one or more programs may include instructions that cause the electronic device to identify the number of one or more second media contents associated with a value corresponding to the second keyword, based on identifying the second keyword when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to generate the media collection containing the one or more second media contents, based on a determination that the number of the one or more second media contents is within a reference range when executed by the electronic device.

[0180] According to one embodiment, the one or more programs may include instructions that cause the electronic device to identify a third keyword among the remaining keywords based on a determination that the number of the one or more second media contents is outside the reference range when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to identify the number of one or more third media contents associated with other values ​​corresponding to the third keyword based on identifying the third keyword when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to generate the media collection containing the one or more third media contents based on a determination that the number of the one or more third media contents is within the reference range when executed by the electronic device.

[0181] According to one embodiment, the one or more programs may include instructions that cause the electronic device to obtain filtering information for identifying the first media contents, and to represent at least one value corresponding to at least one of the assigned keywords for representing media contents based on the user input when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to identify the first keyword corresponding to the user input among the assigned keywords using the filtering information when executed by the electronic device.

[0182] According to one embodiment, the filtering information can be obtained by combining first text data obtained through the user input representing text based on natural language, and second text data obtained through another user input representing other text based on natural language.

[0183] According to one embodiment, the one or more programs may include instructions that cause the electronic device to obtain text data representing text corresponding to at least one of the assigned keywords, using the user input representing text based on natural language when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to obtain filtering information for searching a database of media content using the text data when executed by the electronic device.

[0184] According to one embodiment, the text data can be obtained through a language model trained to output at least one word corresponding to at least one of the assigned keywords using the text.

[0185] According to one embodiment, the one or more programs may include instructions that cause the electronic device to obtain a first embedding vector representing a first value corresponding to the first keyword based on the user input representing an image when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to obtain similarities between the first embedding vector and second embedding vectors, each corresponding to media contents stored in the memory and representing second values ​​corresponding to the first keyword when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to determine the media contents corresponding to a similarity exceeding a threshold similarity among the similarities as the first media contents when executed by the electronic device.

[0186] According to one embodiment, the one or more programs may include instructions that cause the electronic device to determine an order for arranging media contents to be included in the media collection based on the user input when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to generate the media collection including the media contents arranged based on the order using the first media contents and the one or more second media contents when executed by the electronic device.

[0187] According to one embodiment, the user input may represent at least some of text, images, videos, and audio.

[0188] According to one embodiment, the one or more programs may include instructions that cause the electronic device to identify, among the media contents stored in the memory, third media contents corresponding to a first value and a second value for a first keyword when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to identify, among the media contents stored in the memory, fourth media contents corresponding to the first value or the second value based on identifying a number of the third media contents that is smaller than another reference number when executed by the electronic device. The one or more programs may include instructions that cause the electronic device to determine the fourth media contents as the first media contents based on identifying a number of the fourth media contents that is smaller than the reference number and larger than the other reference number when executed by the electronic device.

[0189] The effects obtainable from the present disclosure are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which the present disclosure belongs.

[0190] The device described above may be implemented as a hardware component, a software component, and / or a combination of a hardware component and a software component. For example, the device and components described in the embodiments may be implemented using one or more general-purpose or special-purpose computers, such as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing unit may execute an operating system (OS) and one or more software applications executed on said operating system. Additionally, the processing unit may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing unit may be described as being used as a single unit, but those skilled in the art will understand that the processing unit may include multiple processing elements and / or multiple types of processing elements. For example, the processing unit may include multiple processors or one processor and one controller. In addition, other processing configurations, such as parallel processors, are also possible.

[0191] Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing unit to operate as desired or instruct the processing unit independently or collectively. Software and / or data may be embodied in any type of machine, component, physical device, computer storage medium, or device so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be distributed over networked computer systems and may be stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

[0192] The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. In this case, the medium may continuously store a computer-executable program, or temporarily store it for execution or download. Additionally, the medium may be various recording or storage means in the form of a single or several combined hardware, and may not be limited to a medium directly connected to a computer system but may exist distributed over a network. Examples of media may include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and media configured to store program instructions, including ROM, RAM, and flash memory. Additionally, other examples of media may include recording or storage media managed by app stores that distribute applications or sites and servers that supply or distribute various other software.

[0193] Although the embodiments have been described above with reference to limited examples and drawings, those skilled in the art can make various modifications and variations from the description above. For example, suitable results may be achieved even if the described techniques are performed in a different order than described, and / or the components of the described system, structure, device, circuit, etc. are combined or assembled in a form different from described, or replaced or substituted by other components or equivalents.

[0194] Therefore, other implementations, other embodiments, and equivalents to the claims set forth below are also within the scope of the claims. According to one embodiment, the method according to the various embodiments disclosed herein may be provided as a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created in a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0195] According to various embodiments, each component (e.g., module or program) of the components described above may include a singular or multiple entities, and some of the multiple entities may be separated and placed in other components. According to various embodiments, one or more of the components or operations of the aforementioned components may be omitted, or one or more other components or operations may be added. Generally or additionally, multiple components (e.g., module or program) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the multiple components in the same or similar manner as those performed by the corresponding component among the multiple components prior to integration. According to various embodiments, operations performed by the module, program, or other components may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or one or more other operations may be added.

[0196] Although the present disclosure has been illustrated and described with reference to various embodiments, those skilled in the art will understand that various modifications of form and detail are possible without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents.

Claims

1. In an electronic device, Memory comprising one or more storage media for storing instructions; and It includes at least one processor comprising processing circuitry, and When the above instructions are executed individually or collectively by the at least one processor, Receiving user input to create a media collection containing media content, and Identifying first media contents corresponding to the first keyword included in the above user input from among the media contents stored in the memory, Based on identifying the number of the first media contents that is smaller than a reference number, a second keyword is identified among the keywords assigned to the first media contents, and Identifying one or more second media contents corresponding to the above second keyword, and To create the media collection using the first media content and the one or more second media content, causing the above electronic device, Electronic device.

2. In Claim 1, the first keyword is, Corresponding to the value representing the above user input, The above second keyword is, Among the above-mentioned assigned keywords for representing media content, those included in the remaining keywords for the first keyword, and When the above instructions are executed individually or collectively by the at least one processor, Based on identifying the second keyword, the number of one or more second media contents associated with a value corresponding to the second keyword is identified, and Based on the determination that the number of the one or more second media contents is within a reference range, to create the media collection including the one or more second media contents. causing the above electronic device, Electronic device.

3. In Claim 2, When the above instructions are executed individually or collectively by the at least one processor, Based on the determination that the number of the above one or more second media contents is outside the above standard range, a third keyword is identified among the above remaining keywords, and Based on identifying the third keyword, the number of one or more third media contents associated with other values ​​corresponding to the third keyword is identified, and Based on the determination that the number of the one or more third media contents is within the reference range, to create the media collection including the one or more third media contents. causing the above electronic device, Electronic device.

4. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, Based on the above user input, at least one value corresponding to at least one of the assigned keywords for representing media content is represented, and filtering information is obtained for identifying the first media content, and Using the filtering information above, to identify the first keyword corresponding to the user input among the assigned keywords, causing the above electronic device, Electronic device.

5. In claim 4, the filtering information is, A first text data obtained through the above user input representing text based on natural language, and a second text data obtained through another user input representing another text based on natural language, obtained by combining Electronic device.

6. In Claim 4, When the above instructions are executed individually or collectively by the at least one processor, Using the user input representing text based on natural language, text data representing text corresponding to at least one keyword among the assigned keywords is obtained, and Using the above text data, to obtain the above filtering information for searching a database of media content, causing the above electronic device, Electronic device.

7. In claim 6, the text data is, Obtained through a language model trained to output at least one word corresponding to at least one of the above-mentioned assigned keywords using text, Electronic device.

8. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, Based on the above user input representing an image, a first embedding vector representing a first value corresponding to the above first keyword is obtained, and Obtaining similarities between second embedding vectors and first embedding vectors, each corresponding to media content stored in the memory and representing second values ​​corresponding to the first keyword, and To determine the media contents corresponding to the similarity exceeding the threshold similarity among the above similarities as the first media contents, causing the above electronic device, Electronic device.

9. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, Based on the above user input, determine the order for arranging media contents to be included in the media collection, and Using the first media contents and the one or more second media contents, to create the media collection including the media contents arranged based on the order. causing the above electronic device, Electronic device.

10. In claim 1, the user input is, Representing at least some of text, images, videos, and audio, Electronic device.

11. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, Based on the above user input, third media contents corresponding to the first value and the second value for the first keyword are identified among the media contents stored in the memory, and Based on identifying the number of the third media contents that is smaller than another reference number, the fourth media contents corresponding to the first value or the second value are identified among the media contents stored in the memory, and Based on identifying the number of the fourth media contents that is smaller than the reference number and larger than the other reference number, the fourth media contents are determined to be the first media contents. causing the above electronic device, Electronic device.

12. In a non-transient computer-readable storage medium storing one or more programs, said one or more programs are, When executed by an electronic device having memory, Receiving user input to create a media collection containing media content, and Identifying first media contents corresponding to the first keyword included in the above user input from among the media contents stored in the memory, Based on identifying the number of the first media contents that is smaller than a reference number, a second keyword is identified among the keywords assigned to the first media contents, and Identifying one or more second media contents corresponding to the above second keyword, and To create the media collection using the first media content and the one or more second media content, Including instructions that cause the above electronic device, Non-transient computer-readable storage media.

13. In claim 12, the first keyword is, Corresponding to the value representing the above user input, The above second keyword is, Among the above-mentioned assigned keywords for representing media content, those included in the remaining keywords for the first keyword, and When the above one or more programs are executed by the electronic device, Based on identifying the second keyword, the number of one or more second media contents associated with a value corresponding to the second keyword is identified, and Based on the determination that the number of the one or more second media contents is within a reference range, to create the media collection including the one or more second media contents. Including instructions that cause the above electronic device, Non-transient computer-readable storage media.

14. In Claim 13, When the above one or more programs are executed by the electronic device, Based on the determination that the number of the above one or more second media contents is outside the above standard range, a third keyword is identified among the above remaining keywords, and Based on identifying the third keyword, the number of one or more third media contents associated with other values ​​corresponding to the third keyword is identified, and Based on the determination that the number of the one or more third media contents is within the reference range, to create the media collection including the one or more third media contents. Including instructions that cause the above electronic device, Non-transient computer-readable storage media.

15. A method executed within an electronic device having memory, An operation to receive user input for creating a media collection containing media content, and The operation of identifying first media contents corresponding to a first keyword included in the user input among the media contents stored in the memory, and Based on identifying the number of the first media contents that is smaller than a reference number, the operation of identifying a second keyword among the keywords assigned to the first media contents, and An operation of identifying one or more second media contents corresponding to the above second keyword, and A method comprising the operation of generating the media collection using the first media contents and the one or more second media contents. method.