Electronic device and method for processing natural language input related to video, and non-transitory computer-readable storage medium

A hybrid AI architecture in electronic devices processes natural language inputs by using local databases and models to generate responses, addressing power and privacy concerns, and ensuring efficient and timely interactions.

WO2026134679A1PCT designated stage Publication Date: 2026-06-25SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-11-11
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing electronic devices face challenges in efficiently processing natural language inputs related to video content due to increased power consumption and privacy concerns when communicating with external servers for response generation, leading to delays and potential data leakage.

Method used

Implementing a hybrid artificial intelligence architecture within the device that includes a local database and a local language model to process user inputs, allowing for on-device response generation and reducing the need for external server communication.

Benefits of technology

This approach reduces power consumption, minimizes communication delays, and mitigates privacy issues by enabling efficient and immediate response generation directly on the device, while still leveraging high-performance models when necessary.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025018541_25062026_PF_FP_ABST
    Figure KR2025018541_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to an artificial intelligence (AI) system utilizing a machine learning algorithm, and an application thereof. An electronic device according to an embodiment may control a display to display a video. If information corresponding to a user input related to the video is obtained while the video is displayed, the electronic device may obtain, from a database, first response information corresponding to the user input on the basis of the obtained information corresponding to the user input. If information corresponding to the user input based on the video is not obtained from the database, the electronic device may transmit a request for second response information with respect to the user input, to an external electronic device through a communication circuit.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device, method, and non-transient computer-readable storage medium for processing natural language input related to video

[0001] The present disclosure relates to an electronic device, a method, and a non-transient computer-readable storage medium for processing natural language input related to video.

[0002] An artificial intelligence (AI) system is a computer system that implements human-level intelligence, in which the machine learns and makes judgments autonomously, and its recognition rate improves with use. AI technology may include machine learning (deep learning) technology that utilizes algorithms to classify and learn the characteristics of input data autonomously, and elemental technologies that utilize machine learning algorithms to mimic functions such as cognition and judgment of the human brain. The aforementioned elemental technologies may include, for example, at least one of linguistic understanding technology that recognizes human language / characters, visual understanding technology that perceives objects like human vision, reasoning / prediction technology that judges information to logically infer and predict, knowledge representation technology that processes human experience information into knowledge data, and motion control technology that controls autonomous driving of vehicles and the movement of robots.

[0003] Electronic devices are becoming more sophisticated due to advancements in electronic technology. To provide clearer images, there is an increasing demand for larger-sized electronic devices. To support a variety of functions, the number and complexity of electronic components included in devices are increasing. In addition to functions for playing images and / or videos, various features for user interaction are being added to electronic devices.

[0004] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0005] According to one embodiment, an electronic device may include a display, a communication circuit, a memory comprising one or more storage media for storing instructions, and at least one processor comprising a processing circuit. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the display to control the display to display a video. When the instructions are executed individually or collectively by the at least one processor, if the electronic device obtains information corresponding to a user input related to the video while the video is being displayed, the electronic device may cause the electronic device to obtain a first response information corresponding to the user input from a database based on the information corresponding to the obtained user input. When the instructions are executed individually or collectively by the at least one processor, if the electronic device does not obtain information corresponding to the user input based on the video from the database, the electronic device may cause the electronic device to transmit a request for a second response information for the user input to an external electronic device through the communication circuit.

[0006] In one embodiment, a method of an electronic device comprising a display, a communication circuit, and a memory may be provided. The method may include an operation of controlling the display to display a video. The method may include an operation of, when information corresponding to a user input related to the video is obtained while the video is being displayed, obtaining a first response information corresponding to the user input from a database based on the information corresponding to the obtained user input. The method may include an operation of, when information corresponding to the user input based on the video is not obtained from the database, transmitting a request for a second response information for the user input to an external electronic device through the communication circuit.

[0007] In one embodiment, a non-transient computer-readable storage medium for storing instructions may be provided. When executed by an electronic device comprising a display, a communication circuit, and a memory, the instructions may cause the electronic device to control the display to display a video. When executed by the electronic device, the instructions may cause the electronic device, while the video is being displayed, to obtain information corresponding to a user input related to the video, and if the electronic device obtains such information, to obtain a first response information corresponding to the user input from a database based on said information corresponding to said user input. When executed by the electronic device, the instructions may cause the electronic device, if it does not obtain information corresponding to the user input based on the video from the database, to transmit a request for a second response information for said user input to an external electronic device through said communication circuit.

[0008] According to one embodiment, an electronic device may include a display, a communication circuit, a memory comprising one or more storage media for storing instructions, and at least one processor comprising a processing circuit. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to display a video through the display. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to identify information related to the user input from a database stored in the memory in relation to the video, based on receiving user input while displaying the video. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to obtain first response information for the user input using said information, based on identifying said information related to the user input from the database. When the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause a second response information regarding the user input to be requested from an external electronic device through the communication circuit, depending on whether the information related to the user input is identified in the database.

[0009] In one embodiment, a non-transient computer-readable storage medium for storing instructions may be provided. The instructions may be executed by an electronic device comprising a display, a communication circuit, and a memory. When executed by the electronic device, the instructions may cause the electronic device to display a video through the display. When executed by the electronic device, the instructions may cause the electronic device to identify information related to the user input from a database stored in the memory in relation to the video, based on receiving user input while displaying the video. When executed by the electronic device, the instructions may cause the electronic device to obtain a first response information for the user input using the information, based on identifying the information related to the user input from the database. When the above instructions are executed by the electronic device, the electronic device may cause a second response information regarding the user input to be requested from an external electronic device through the communication circuit, depending on whether the information related to the user input is identified in the database.

[0010] In one embodiment, a method of an electronic device may be provided. The electronic device may include a first communication circuit configured to be connected to a display device, a second communication circuit available to be connected to a server, and a memory. The method may include an operation of transmitting a signal representing a video through the first communication circuit. The method may include an operation of identifying information related to the user input from a database stored in the memory in relation to the video, based on receiving user input while transmitting the signal through the first communication circuit. The method may include an operation of obtaining a first response information for the user input using the information based on identifying the information related to the user input from the database. The method may include an operation of requesting a second response information for the user input from the server through the second communication circuit, based on determining that the information related to the user input was not identified from the database.

[0011] FIG. 1 illustrates an electronic device according to one embodiment.

[0012] FIG. 2 schematically illustrates the hardware components and programs of an electronic device according to one embodiment.

[0013] FIG. 3 illustrates an exemplary flowchart for explaining the operation of an electronic device according to one embodiment.

[0014] FIG. 4 is a schematic diagram illustrating a database stored in an electronic device according to one embodiment.

[0015] FIG. 5 illustrates an exemplary operation of an electronic device that searches a database based on user input.

[0016] FIG. 6 illustrates an exemplary operation of an electronic device for determining at least one record related to user input from a plurality of records obtained from a database.

[0017] FIG. 7 illustrates an exemplary graph for explaining the weights used to retrieve at least one record related to user input from a database.

[0018] FIG. 8 illustrates an exemplary operation of an electronic device that searches a database based on playback history.

[0019] FIG. 9 illustrates an exemplary operation of a server for updating a database.

[0020] FIG. 10 illustrates an exemplary operation of an electronic device that displays information related to video based on a database.

[0021] FIG. 11 illustrates an exemplary flowchart for explaining the operation of a server that receives a request from an external electronic device to generate response information for user input.

[0022] FIG. 12 illustrates an exemplary UI output by an electronic device before transmitting information related to user input from the electronic device to a server.

[0023] Hereinafter, various embodiments of this document will be described with reference to the attached drawings.

[0024] The various embodiments of this document and the terms used therein are not intended to limit the technology described in this document to specific embodiments and should be understood to include various modifications, equivalents, and / or substitutions of such embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar components. A singular expression may include a plural expression unless the context clearly indicates otherwise. In this document, expressions such as “A or B,” “at least one of A and / or B,” “A, B or C,” or “at least one of A, B and / or C” may include all possible combinations of items listed together. Expressions such as “first,” “second,” “first,” or “second” may modify the components, regardless of order or importance, and are used only to distinguish one component from another and do not limit the components. When it is mentioned that a certain (e.g., 1st) component is “(functionally or telecommunicationally) connected” or “connected” to another (e.g., 2nd) component, said certain component may be directly connected to said other component or connected through another component (e.g., 3rd component).

[0025] As used in this document, the term “module” includes a unit composed of hardware and may be used interchangeably with terms such as, for example, component, and / or circuit. A module may be a component formed as a whole, or a minimum unit or part thereof that performs one or more functions. For example, a module may be composed of an application-specific integrated circuit (ASIC).

[0026] FIG. 1 illustrates an electronic device (101) according to one embodiment. The electronic device (101) may be described as an electronic device capable of providing and / or outputting (or displaying) video, audio, or any combination thereof. For example, the electronic device (101) may include a TV (television), an STB (set-top box) (125), a monitor, a computer, a smartphone, a tablet, a portable media player, a wearable device, a video wall, a digital photo frame, etc. The electronic device (101) may be referred to as a display device.

[0027] For convenience of explanation, the following description assumes that the electronic device (101) is implemented as a TV, but the embodiments are not limited thereto. For example, the STB (125) may be configured to perform the operation of the present disclosure. When the STB (125) is connected to a display device including a TV and / or a monitor, it may transmit information representing video, audio, or a combination thereof (e.g., media content) to the display device. The display device connected to the STB (125) may display the video represented by the information or / or output the audio represented by the information based on receiving the information.

[0028] Referring to FIG. 1, the electronic device (101) may receive user input for executing the function of the electronic device (101) through an input means of the electronic device (101). The input means may include at least one of a switch (or button) that is at least partially visible through the housing of the electronic device (101), a touch sensor (e.g., a pressure-sensitive touch sensor and / or a capacitive touch sensor) for detecting touch input on the housing (and / or display panel) of the electronic device (101), a microphone, or a motion sensor (e.g., LiDAR (light detection and ranging), and / or ToF (time-of-flight) sensor) for detecting motion and / or gesture of a user separated from the electronic device (101).

[0029] In one embodiment, user input may be received (indirectly) through an input means of the electronic device (101) as well as through another electronic device connected to the electronic device (101) (e.g., STB (125) and / or remote controller (120)). For example, the remote controller (120) may transmit to the electronic device (101) information indicating at least one of a button press of the remote controller (120), a touch gesture performed on one side of the remote controller (120), a physical movement of the remote controller (120), and / or an audio signal received through the microphone of the remote controller (120). The electronic device (101) may detect or identify user input using the information received from the remote controller (120). In the present disclosure, “user input” may include inputs identified through an external electronic device included in the electronic device (101), such as a remote controller (120), as well as input means of the electronic device (101). In the present disclosure, “user input” may include inputs identified by an external electronic device, such as a smartphone. For example, a software application for controlling the electronic device (101) may be installed on an external electronic device, such as a smartphone. In the above example, the electronic device (101) may receive a signal related to user input from the external electronic device on which the software application is executed.

[0030] Referring to FIG. 1, an electronic device (101) may be connected to a remote controller (120), an STB (125), and / or a server (110). For example, the connection between the electronic device (101) and the STB (125) may include a wired connection based on HDMI (high-definition multimedia interface), RGB, DVI (digital visual interface), DP (DisplayPort), component video, Thunderbolt, and / or an Aux (Auxiliary) cable. Embodiments are not limited thereto, and the connection between the electronic device (101) and the STB (125) may include a wireless connection such as Wi-Fi (wireless fidelity), Wi-Fi-direct, and / or Wi-Di (Wireless Display). For example, the connection between the electronic device (101) and the remote controller (120) may include a wireless connection based on infrared (IR). The embodiments are not limited thereto, and the remote controller (120) may be connected to the electronic device (101) based on Bluetooth, BLE (Bluetooth low energy), NFC (near-field communication), UWB (ultra-wideband), Wi-Fi, Wi-Fi-Direct, and / or other wireless short-range communication protocols. For example, the connection between the electronic device (101) and the server (110) may be established based on a LAN (local area network).

[0031] According to one embodiment, an electronic device (101) may be configured to receive user input. The user input may be input based on natural language (e.g., natural language input). For example, the user input may include a user’s speech identified from an audio signal output from a microphone (included in the electronic device (101) and / or the remote controller (120)). For example, the user input may include texts entered via a keypad of the remote controller (120), a hardware keyboard connected to the electronic device (101), and / or a software keyboard (e.g., a software keyboard displayed through the display panel of the electronic device (101)). Referring to FIG. 1, a speech (130) identified from the user input is illustrated.

[0032] Referring to FIG. 1, while displaying media content including images and / or videos, the electronic device (101) may receive or identify user input, such as speech (130). Based on user input received while displaying media content, the electronic device (101) may generate or output a response corresponding to said user input. User input based on natural language may indicate the execution of a specific function of the electronic device (101) (e.g., a function to control the playback of the video being displayed through the electronic device (101), such as a function to adjust the volume and / or a channel change). Embodiments are not limited thereto, and user input based on natural language may be performed to identify information related to the media content being displayed through the electronic device (101).

[0033] For example, when receiving user input including a remark (130) such as “What is the male lead’s name?”, the electronic device (101) may generate or output a response containing information related to the user input. Referring to FIG. 1, the electronic device (101) may generate or output an audio signal indicating a remark (140) such as “It is Leslie Cheung.” (e.g., through the speaker of the electronic device (101)). While displaying media content, the user input identified by the electronic device (101) is not limited to remarks (130). For example, a user viewing media content may make remarks regarding various information related to the media content (e.g., actors, plot, and / or one or more products related to said media content). The electronic device (101) may display or output a response to said remark in the format of an audio signal and / or text (e.g., a prompt) displayed along with the media content.

[0034] In order to respond to natural language input such as speech (130), an electronic device (101) and a system configured to provide media content through the electronic device (101) (e.g., electronic device (101), STB (125), server (110), or any combination thereof) may support natural language-based interaction based on an artificial intelligence model. In the present disclosure, the artificial intelligence model may include a computational model that simulates or mimics the neural activity of a living organism, program(s) for executing said computational model, hardware for executing said program(s), or any combination thereof. The server (110) may be configured to execute the artificial intelligence model, referred to as a large language model (LLM). From an artificial intelligence model running on a server (110), response information (e.g., text and / or audio signals representing a utterance (140)) for a natural language input identified by an electronic device (101), including a utterance (130), may be generated or output. The language model may include an artificial intelligence model designed for linguistic understanding. Here, linguistic understanding is a technology for recognizing and applying / processing human language / characters, including natural language processing, machine translation, conversational systems, question answering, speech recognition / synthesis, etc.

[0035] In one embodiment, a method for efficiently performing a process of generating a natural language response including a utterance (140) from a natural language input including a utterance (130) may be required. For example, in order to generate a natural language response corresponding to a utterance (130) identified by an electronic device (101), the electronic device (101) transmitting information related to the utterance (130) (e.g., an audio signal representing the utterance (130) and / or a speech-to-text (STT) result for said audio signal) to a server (110) may cause delay (or increase in response time) depending on the network environment (e.g., Internet and / or LAN (local area network)) between the electronic device (101) and the server (110). For example, the server (110) consumes relatively large amounts of power due to the hardware and software resources required to run a large language model. As the workload of the server (110) is reduced, the power consumption of the entire system including the server (110) may be reduced. For example, natural language inputs detected from each of the various user devices, including the electronic device (101), can be accumulated in the server (110). The natural language inputs accumulated in the server (110) may cause side effects such as leakage of personal information.

[0036] According to one embodiment, the electronic device (101) may determine or select a device to process the user input (i.e., natural language input) including a speech (130) by using a database related to the media content being displayed by the electronic device (101). For example, the electronic device (101) that receives the user input while displaying the media content may identify or search for information related to the user input from a database stored in the electronic device (101) in relation to the media content.

[0037] Based on identifying information related to user input from a database, the electronic device (101) can use said information to obtain or generate response information for said user input (directly). For example, the electronic device (101) can obtain response information for said user input (e.g., information for outputting a statement (140)) without communicating with the server (110). Depending on whether it is determined that information related to user input is not identified from said database, the electronic device (101) can request response information for said user input from the server (110). In other words, if information related to user input is stored in the database of the electronic device (101), the electronic device (101) can generate or output response information without transmitting the user input to the server (110).

[0038] When the electronic device (101) directly generates response information (e.g., remark (140)) from user input (e.g., remark (130)) using a database, the workload of the server (110) is not increased, so the power consumption of the system including the electronic device (101) and the server (110) can be reduced. Since the electronic device (101) directly generates response information, communication between the electronic device (101) and the server (110) can be omitted, and no delay is caused by said communication. When the electronic device (101) directly generates response information, no signal related to user input is transmitted to the server (110), so privacy issues related to the server (110) can be mitigated.

[0039] In the following, with reference to FIG. 2 and / or FIG. 3, hardware included in an electronic device (101) and / or server (110) and software executed based on said hardware for processing user input such as speech (130) are described exemplarily (or schematically).

[0040] FIG. 2 schematically illustrates hardware components and programs of an electronic device according to one embodiment. The client device (201) of FIG. 2 may include the electronic device (101) and / or STB (125) of FIG. 1. The server (202) of FIG. 2 may include the server (110) of FIG. 1.

[0041] Referring to FIG. 2, according to one embodiment, a client device (201) may include a processor (210) and / or memory (220). The processor (210) may further include an input circuit (230) and / or an output circuit (232). For example, in one embodiment where the client device (201) is the electronic device (101) of FIG. 1, the client device (201) may include a processor (210), memory (220), input circuit (230), and output circuit (232). For example, in one embodiment where the client device (201) is the STB (125) of FIG. 1, the client device (201) may include a processor (210), memory (220), and a communication circuit (e.g., a wired communication interface, a communication modem, and / or a wireless communication circuit) for communication with an external display (or speaker). The processor (210) can be connected (electrically and / or operationally) to the memory (220), input circuit (230), and / or output circuit (232) through electronic components such as a communication bus.

[0042] Referring to FIG. 2, the processor (210) of the client device (201) may include a circuit (e.g., a processing circuit) for processing data based on instructions. The circuit for processing data may include, for example, an arithmetic and logic unit (ALU), a floating point unit (FPU), a field programmable gate array (FPGA), a central processing unit (CPU), a graphic processing unit (GPU), a neural processing unit (NPU), and / or an application processor (AP). For example, the number of processors (210) included in the client device (201) may be one or more. The processing circuit of the processor (210) that loads (loads or fetches) instructions and performs calculations corresponding to the loaded instructions may be referred to as a core circuit (or core) or referenced. For example, the processor (210) may have the structure of a multi-core processor including multiple core circuits, such as a dual core, quad core, hexa core, or octa core. The functions and / or operations described with reference to the present disclosure may be performed individually or collectively by one or more processing circuits included in the processor (210).

[0043] Referring to FIG. 2, the memory (220) of the client device (201) may include circuitry for storing data and / or instructions that are input to or output from the processor (210). The memory (220) may include, for example, volatile memory such as RAM (random-access memory) and / or non-volatile memory such as ROM (read-only memory). Non-volatile memory may be referred to as storage. Volatile memory may include, for example, at least one of DRAM (dynamic RAM), SRAM (static RAM), Cache RAM, and PSRAM (pseudo SRAM). Non-volatile memory may include, for example, at least one of PROM (programmable ROM), EPROM (erasable PROM), EEPROM (electrically erasable PROM), flash memory, hard disk, compact disk, SSD (solid state drive), and eMMC (embedded multi media card). The memory (220) may include one or more storage media (e.g., the volatile memory and / or non-volatile memory described above) distributedly located on the client device (201). The processor (210) may execute instructions in the memory (220) to perform functions and / or operations indicated by said instructions. For example, if the client device (201) includes at least one processor, said at least one processor may be configured to execute said instructions collectively or individually.

[0044] The input circuit (230) of the client device (201) may be configured to receive user input associated with the client device (201). For example, the input circuit (230) may be connected to at least one button visible through at least a portion of the housing of the client device (201). In the above example, the input circuit (230) may be configured to detect the pressing of the at least one button. For example, the input circuit (230) may include a touch sensor for detecting contact (e.g., contact of a finger) on one side of the housing of the client device (201).

[0045] For example, the input circuit (230) may include a communication circuit (e.g., an IR sensor) for communicating with a remote controller (e.g., a remote controller (120) of FIG. 1) wirelessly connected to the client device (201). The circuit for communicating with the remote controller is not limited to an IR sensor and may include a circuit based on Wi-Fi, Wi-Fi-Direct, NFC (near field communication), UWB (ultra-wideband), Bluetooth, and / or BLE (Bluetooth low energy). For example, the input circuit (230) may include a microphone. The input circuit (230) may transmit signals and / or information related to user input to the processor (210). Using the signals and / or information transmitted from the input circuit (230), the processor (210) may detect or identify user input.

[0046] The output circuit (232) of the client device (201) may be configured to output information in the form of video and / or audio. For example, the output circuit (232) may include a display panel. The display panel may be visible from one side (e.g., the front side) of the client device (201). The display panel may include a liquid crystal display (LCD), a plasma display panel (PDP), and a plurality of LEDs. The display panel may include an organic LED (OLED). In one embodiment, the display panel may include electronic paper. If the display panel has a flat shape, the display panel may be referred to as a flat panel display (FPD). If the display panel has a curved shape, the display may be referred to as a curved display. If the display panel has a deformable shape, the display panel may be referred to as a bendable display, a flexible display, and / or a rollable display. The display panel may include a beam projector (e.g., a DLP (digital light processing) and / or LCD (liquid crystal display) projector). For example, the output circuit (232) may include a speaker. For the playback of multi-channel audio, the client device (201) may include a plurality of speakers. The processor (210) may control the output circuit (232) to output, display, or play images, videos, and / or audio included in the media content.

[0047] The hardware components included in the client device (201) are not limited to the processor (210), memory (220), input circuit (230), and output circuit (232) shown in FIG. 2. For example, the client device (201) may include a communication circuit, which is hardware for communicating with an external electronic device (e.g., a server (202)). The communication circuit may include at least one of a modem, an antenna, and an optic / electronic converter.

[0048] In one embodiment where the client device (201) is the STB (125) of FIG. 1, the client device (201) may be implemented or produced without at least one of the processor (210), memory (220), input circuit (230), and output circuit (232) shown in FIG. 2. For example, the client device (201) may include a wired interface (e.g., HDMI, DVI, and / or DP) for transmitting information and / or signals representing images and / or videos to an external display instead of the output circuit (232). For example, the client device (201) may include a wired interface (e.g., AUX terminal, HDMI, and / or DP) for transmitting information and / or signals representing audio to an external speaker instead of the output circuit (232). The embodiments are not limited thereto, and the client device (201) may include a communication circuit (e.g., a communication circuit based on Bluetooth, BLE, Wi-Di, and / or Wi-Fi-direct) for wirelessly transmitting information and / or signals to an external display and / or external speaker.

[0049] Hardware components included in the client device (201), such as a processor (210), memory (220), input circuit (230), and output circuit (232), are illustrated, but the hardware components included in the client device (201) are not limited to the hardware component(s) illustrated in FIG. 2. For example, the client device (201) may (further) include a power circuit configured to obtain power from a power system (e.g., infrastructure for providing power, including a power plant).

[0050] Referring to FIG. 2, a program (e.g., an assistant program (224)) and / or an artificial intelligence model (e.g., a first language model (226)) installed in the memory (220) of the client device (201) is shown. Information for the execution of said program and / or said artificial intelligence model (e.g., playback history information (222) and / or a first database (228)) may be stored or accumulated in the memory (220) of the client device (201). The client device (201) may provide an interactive service related to media content being output through the output circuit (232). As described above with reference to FIG. 1, the client device (201) may directly execute a language model independent of the language model (e.g., a second language model (240)) executed on the server (202). For example, the client device (201) and the server (202) can create a hybrid AI (artificial intelligence) architecture and / or a hybrid AI system capable of running language models individually.

[0051] Referring to FIG. 2, the assistant program (224) may be executed in a background mode so as not to interrupt the playback of media content through the output circuit (232). For example, the processor (210) may execute the assistant program (224) (substantially simultaneously) while playing media content using the output circuit (232). By executing the assistant program (224), the processor (210) may monitor, identify, or track user input through the input circuit (230). If user input (e.g., a query for media content through the output circuit (232), including the remark (130) of FIG. 1) is identified through the input circuit (230), the processor (210) may check or obtain playback history information (222) based on the user input.

[0052] While media content is being played through the output circuit (232), information related to the playback time and / or playback section of the media content may be stored in the memory (220) as playback history information (222). For example, the playback history information (222) may represent the playback history of the media content through the output circuit (232). For example, if a user jumps to a specific playback section of the media content, the history of the jump to the playback section may be stored in the playback history information (222). For example, if a user at least temporarily pauses the playback of the media content at a specific playback point, the history of the pause regarding the playback of the media content may be stored in the playback history information (222). The playback history information (222) may be configured to store information regarding the playback time of the media content output through the output circuit (232) according to the time (in a time series) while the media content is being output through the output circuit (232). The processor (210) can obtain data to be used for searching the first database (228) from playback history information (222) based on user input (e.g., data indicating a playback section of media content that was output through the output circuit (232)).

[0053] By using playback history information (222) to search the first database (228), the processor (210) may obtain or search for information in the first database (228) related to user input. The first database (228) may include a set of systematized information or at least one of one or more applications managing said information, which are to be shared among a plurality of electronic devices including independent applications (e.g., assistant programs (224)) running within the client device (201) and / or electronic devices (101). In said set of information, different information may be combined with each other based on units such as types, columns, records, and / or tables. The combination of information may be used for adding, deleting, updating, and searching for information within the first database (228).

[0054] The first database (228) may contain information related to media content being output through the output circuit (232). For example, at least one record may be stored in the first database (228), which is a combination of a query related to the media content, information corresponding to the query, and data (e.g., timestamp(s)) representing a part of the media content (e.g., a playback section) related to the query. The organized information within the first database (228) is described by example with reference to FIG. 4.

[0055] For example, when a client device (201) searches the first database (228), the client device (201) can extract, filter, or search for at least one record among the records of the first database (228) that satisfies the conditions indicated by the playback history information (222) and user input. The operation of the client device (201) searching the first database (228) using user input and / or playback history information (222) is described with reference to FIG. 5 and / or FIG. 6.

[0056] By using the results of searching the first database (228), the processor (210) of the client device (201) may determine whether to use the first language model (226) to generate response information corresponding to user input. For example, the client device (201) may determine whether to use the first language model (226) to generate response information by using information (e.g., relevance and / or similarity) indicating the relationship between at least one record included in the first database (228) and the user input. The operation of the client device (201) acquiring and / or calculating the information indicating the relationship between the records of the first database (228) and the user input is described with reference to FIG. 7 and / or FIG. 8.

[0057] Depending on the decision to generate response information corresponding to user input using the first language model (226), the client device (201) may apply input data (e.g., embedding vector, feature vector, and / or feature information) based on user input and / or playback history information (222) to the first language model (226). The client device (201) may perform a plurality of calculations represented by the first language model (226) using the user input and / or playback history information (222). The result of performing the plurality of calculations may include response information for said user input. The client device (201) that has obtained response information using the first language model (226) may output said response information through the output circuit (232). For example, the utterance (140) of FIG. 1 may be included in the response information obtained using the first language model (226).

[0058] If response information corresponding to user input cannot be generated using the first language model (226), the client device (201) may transmit at least a portion of user input and / or playback history information (222) to the server (202). For example, if information related to user input is not identified from the first database (228), the client device (201) may request the server (202) to transmit response information regarding user input. For example, if no record retrieved from the first database (228) is related to user input, the client device (201) may discard the results of the search of the first database (228) and transmit a signal indicating the request to the server (202). In one embodiment, the client device (201) may display a user interface (UI) to the user of the client device (201) to confirm whether to transmit the signal before transmitting the signal to the server (202). The UI is described with reference to FIG. 12.

[0059] Referring to FIG. 2, a program (e.g., an assistant server program (260)) and / or an artificial intelligence model (e.g., a second language model (240)) installed on a server (202) is illustrated. Information for the execution of said program and / or said artificial intelligence model (e.g., a second database (250)) may be stored on the server (202). Although not illustrated, the server (202) may include one or more processors (e.g., processor (210)) and memory (e.g., memory (220)), similar to the processor (210). For example, the assistant server program (260), the second database (250), and / or the second language model (240) may be installed or stored in the memory of the server (202). The processor of the server (202) may execute the assistant server program (260) to perform the operation of the present disclosure. The server (202) may be implemented as a cluster of one or more computing devices, or may be implemented in the form of a virtual machine running on the cluster based on virtualization.

[0060] In one embodiment, the server (202) may receive or identify a request from the client device (201) to generate response information for user input. In response to the request, the server (202) may search through a plurality of records stored in the second database (250) to identify or filter at least one record related to the user input. If at least one record is identified from the second database (250), the server (202) may transmit the identified at least one record to the client device (201). If at least one record is received from the server (202), the processor (210) of the client device (201) may execute the first language model (226) to obtain or generate response information based on the at least one record received from the server (202).

[0061] If no record related to user input is identified from the second database (250), the server (202) may use the second language model (240) to generate or process response information corresponding to the user input. The second language model (240) may be an artificial intelligence model (e.g., a large language model) designed to perform higher-performance calculations than the first language model (226). For example, the artificial intelligence model may be classified according to the number of coefficients (or weights) used in multiple calculations. In the above example, the number of coefficients defining the calculations of the first language model (226) (e.g., less than one billion) may be less than the number of coefficients defining the calculations of the second language model (240) (e.g., more than one billion). The first language model (226) and / or the second language model (240) may have the structure of an LLM, a CNN (convolutional neural network), a Transformer (or other suitable encoder-decoder based computational model), a RNN (recursive neural network), and / or a FNN (feed-forward neural network). A server (202) running the second language model (240) may be configured to perform (additional) operations not supported by the first language model (226), such as the operation of searching the internet.

[0062] A server (202) that has obtained response information from a second language model (240) may transmit at least a portion of the obtained response information to a client device (201). The server (202) may modify at least a portion of the second database (250) using the obtained response information (e.g., add at least one record corresponding to the response information). The server (202) may transmit the second database (250) (or at least a portion of the second database (250) modified by the response information) to the client device (201) (or another client device connected to the server (202)) to update the first database (228) of the client device (201). An operation in which the server (202) transmits at least a portion of the second database (250) to one or more client devices including the client device (201) is described with reference to FIG. 9. The operation of the server (202) generating response information based on a request from the client device (201) is described with reference to FIG. 11.

[0063] As described above, according to one embodiment, a client device (201) and a server (202) may be included in a hybrid artificial intelligence architecture for responding to user input (or natural language input) related to media content being played through an output circuit (232). The hybrid artificial intelligence architecture may be configured to efficiently execute a first language model (226) executable in a low-power computing environment (e.g., a computing environment provided by the client device (201)) and a second language model (240) executable in a high-performance computing environment (e.g., a computing environment provided by the server (202)). The hybrid artificial intelligence architecture may be implemented based on information that is dynamically managed based on the playback history of the media content (or the user's viewing history of the media content), such as playback history information (222), a first database (228), and / or a second database (250). When generating response information corresponding to user input, information stored in the playback history information (222), the first database (228), and / or the second database (250) can be used for the efficient execution of a language model (e.g., the first language model (226) and / or the second language model (240)).

[0064] For example, information stored in the first database (228) can be input into the first language model (226) along with user input to generate more accurate response information. Since the first language model (226) is used preferentially over the second language model (240) of the server (202), delays occurring in communication with the second language model (240) and / or the server (202) can be reduced or prevented. For example, information stored in the second database (250) can be used to generate response information without the execution of the second language model (240), which requires more power and / or resources than the first language model (226). In other words, based on the information stored in the playback history information (222), the first database (228), and / or the second database (250), the power and / or resources of the hybrid artificial intelligence architecture can be reduced and the response time can be reduced.

[0065] Although a hybrid artificial intelligence architecture is described in which a first language model (226) is installed in an electronic device (101) and a second language model (240) is installed in a server (202), embodiments are not limited thereto. For example, based on user input, the electronic device (101) can generate or output response information for said user input (independently of the server (202)) using the first language model (226) without communication with the server (202).

[0066] Below, with reference to FIG. 3, exemplary operations performed by the client device (201) and / or processor (210) of FIG. 2 are described.

[0067] FIG. 3 illustrates an exemplary flowchart for explaining the operation of an electronic device according to one embodiment. The electronic device of FIG. 3 may include the electronic device (101) of FIG. 1, an STB (125), and / or the client device (201) of FIG. 2. For example, the client device (201) and / or processor (210) of FIG. 2 may be configured to perform the operations of FIG. 3. The order in which the operations of FIG. 3 are performed is not limited to the order shown in FIG. 3. For example, the operations of FIG. 3 may be performed in an order different from the order shown in FIG. 3. For example, at least two of the operations of FIG. 3 may be performed substantially simultaneously (e.g., multithreading and / or multitasking). The operations of FIG. 3 may be performed based on the execution of the assistant program (224) of FIG. 2.

[0068] Referring to FIG. 3, in operation (310), a processor of an electronic device according to one embodiment may display video. In one embodiment where the electronic device is the electronic device (101) of FIG. 1, the electronic device may display video of operation (310) through a display. In one embodiment where the electronic device is the STB (125) of FIG. 1, the electronic device may transmit signals and / or information representing video to an external display (e.g., TV) connected to the STB. In one embodiment where the electronic device is the STB (125) of FIG. 1, the electronic device may cause the display of the external electronic device to display video. Although one embodiment of displaying video is described, the embodiment is not limited thereto, and the electronic device may display or output media content including images, video, and / or audio signals. While performing the operation (310), the electronic device may store information about the playback section of the video that is displayed or is being displayed based on the operation (310) within the entire playback section of the video (e.g., playback history information (222) of FIG. 2).

[0069] In one embodiment, the electronic device may perform an operation (310) based on user input for playing a video. The user input may include an input for selecting a video provided through an OTT (over the top) application and / or OTT service. The user input may include an input indicating the selection of one channel among a plurality of channels. The user input may include an input identified by an external electronic device connected to the electronic device based on mirroring. The user input may include an input identified by a remote controller connected to the electronic device (e.g., the remote controller (120) of FIG. 1).

[0070] An electronic device that identifies a video to be displayed based on an operation (310) may obtain or request a database corresponding to the video (e.g., a first database (228) of FIG. 2) from an external electronic device (e.g., a server (110) of FIG. 1 and / or a server (202) of FIG. 2). For example, the electronic device may receive information representing said database from the external electronic device. Based on said information received from the external electronic device, the electronic device may create or store a database corresponding to the video in memory (e.g., memory (220) of FIG. 2). For example, based on user input for playing the video, the electronic device may obtain said database corresponding to the video from said external electronic device via a communication circuit. A database corresponding to the video stored in the electronic device may be a copy of a database stored in said external electronic device. For example, a database stored in the electronic device may include one or more records related to said video within a database stored in the external electronic device (e.g., a database (250) of FIG. 2).

[0071] Referring to FIG. 3, within an operation (320), a processor of an electronic device according to one embodiment may receive user input. In one embodiment where the electronic device includes a microphone, the electronic device may identify user input of the operation (320) using an audio signal obtained from the microphone. The electronic device may identify or detect user input of the operation (320) by executing the assistant program (224) of FIG. 2. For example, the electronic device may identify user input of the operation (320) by performing STT on the audio signal. To perform STT, the electronic device may execute an artificial intelligence model and / or algorithm related to STT.

[0072] Referring to FIG. 3, within operation (330), according to one embodiment, the processor of the electronic device may determine whether it has identified information related to user input from a database stored in memory in relation to the video. For example, the electronic device may identify information related to user input by searching the database of operation (330) based on a portion displayed based on operation (310) during the playback section of the video of operation (310). The portion of the playback section of the video displayed based on operation (310) may be defined based on the playback history of the video. For example, while displaying the video based on operation (310), the electronic device may identify or receive input to change (e.g., jump) the playback section of the video or to control (e.g., pause) the playback of the video. Based on the input, the electronic device may obtain or search for information related to user input based on at least a portion of the video displayed through the electronic device.

[0073] For example, the electronic device may identify a set of records among a plurality of records included in a database that are related to at least a portion of a video displayed through the electronic device. The electronic device may search for or filter at least one record related to user input among the records included in the set. Each of the plurality of records included in the database may include a playback section of the video associated with the record, at least one keyword to be compared with the user input (or vector data representing said at least one keyword, referred to as an embedding vector), and data regarding said video (e.g., said at least one keyword and data related to said video). For example, each of the plurality of records included in the database may include data describing a specific playback section of the video and / or a scene at a specific playback point. Information of the operation (330) may include at least one record related to user input among the records stored in the database. Information included in the record may include, for example but not limited to, a counter value (e.g., a hit counter) indicating the number of times the record was used to generate response information for user input.

[0074] If information related to user input is identified from the database based on operation (330) (330-Yes), the electronic device may perform operation (340). If information related to user input is not identified from the database based on operation (330) (330-No), the electronic device may perform operation (350).

[0075] Referring to FIG. 3, within an operation (340), a processor of an electronic device according to one embodiment may obtain first response information for user input by using information identified from a database. For example, the electronic device may generate or obtain first response information for operation (340) by using a first language model (e.g., the first language model (226) of FIG. 2) configured to be executed by at least one processor of the electronic device. For example, the electronic device may input user input for operation (320) and / or information for operation (330) into the first language model. From the first language model into which user input for operation (320) and / or information for operation (330) is input, the electronic device may obtain or identify first response information for operation (340).

[0076] Since the information of the operation (330) is input into the first language model along with the user input of the operation (320), the electronic device can obtain first response information for the user input more quickly from the first language model. Since the first language model generates the first response information more accurately using the information, the electronic device can obtain or provide high-quality first response information. If the information of the operation (330) is at least one record included in the database, the electronic device can increase the counter value of the at least one record (e.g., by 1) based on generating the first response information using the at least one record. For example, the counter value of a specific record may indicate the number of times the specific record was used to generate response information based on the electronic device.

[0077] Based on obtaining first response information of an operation (340), the electronic device may display or output said first response information at least partially. In one embodiment where the electronic device of FIG. 3 is the electronic device (101) of FIG. 1, the electronic device may display the first response information of the operation (340) through a display. In one embodiment where the electronic device of FIG. 3 is the STB (125) of FIG. 1, the electronic device may transmit information and / or signals for displaying said first response information to an external display connected to the STB. The electronic device may display or output the first response information of the operation (340) while maintaining the display of a video based on the operation (310). For example, the first response information may be output in the format of a visual object (e.g., a pop-up window and / or a prompt) superimposed on the video displayed based on the operation (310). For example, the first response information may be output in the format of an audio signal transmitted to a speaker. The above audio signal can be played together with an audio signal linked to the video.

[0078] Referring to FIG. 3, within operation (350), according to one embodiment, the processor of an electronic device may request second response information regarding user input from an external electronic device (e.g., server (110) and / or server (202) of FIG. 1). For example, the electronic device may transmit information related to the user input of operation (320) to the external electronic device via a communication circuit. The information related to the user input may include text related to the user input and / or a playback history of a video that was displayed while receiving the user input (e.g., playback history information (222) of FIG. 2). After transmitting the information to the external electronic device, based on receiving the second response information from the external electronic device, the electronic device may display or output the second response information at least partially. As described above with reference to operation (340), the electronic device may display or output the second response information in the format of a visual object superimposed on the video. As described above with reference to operation (340), the electronic device can output second response information in the format of an audio signal transmitted to a speaker. The audio signal can be played substantially simultaneously with an audio signal corresponding to a video.

[0079] As described above, the electronic device may operate as at least part of a hybrid artificial intelligence architecture (or hybrid artificial intelligence system) based on language models placed in the electronic device and external electronic device, respectively. The electronic device may perform operation (340) to utilize the language model of the external electronic device relatively less, which requires more power consumption than the language model of the electronic device. Thus, the overall power consumption of the hybrid artificial intelligence system including the electronic device may be reduced. The language model installed in the electronic device may be a lightweight artificial intelligence model (e.g., quantization and / or knowledge distillation) so as to be executed based on fewer resources than the language model of the external electronic device. To generate more accurate response information (e.g., first response information of operation (340)) using the language model installed in the electronic device, the electronic device may utilize a database related to video (e.g., a database of operation (330)). When the first response information is obtained based on the operation (340), since no information and / or signals related to the user input of the operation (320) are transmitted outside the electronic device (e.g., to an external electronic device of the operation (350)), the electronic device can prevent the leakage of personal information that may be included in the user input and can provide a user experience based on enhanced security.

[0080] Below, with reference to FIG. 4, an exemplary structure of the database of the operation (330) is described.

[0081] FIG. 4 is a schematic diagram illustrating a database stored in an electronic device according to one embodiment. The electronic device of FIG. 4 may include the electronic device (101) of FIG. 1, an STB (125), and / or the client device (201) of FIG. 2. The database of FIG. 4 may include the first database (228) and / or the second database (250) of FIG. 1.

[0082] Referring to FIG. 4, records (431, 432, 433) included in a database corresponding to a video (410) are illustrated. A database containing records (431, 432, 433) and / or information representing said database may be transmitted from a server (e.g., server (202) of FIG. 2) to a client device (e.g., client device (201) of FIG. 2). For example, a client device that has identified an input for playing a video (410) may request a database corresponding to the video (410) from the server. In response to said request, the server may transmit information and / or signals including said database to the client device.

[0083] Records (431, 432, 433), and / or a database containing records (431, 432, 433) may be created and / or published by a server included in the hybrid artificial intelligence architecture. For example, a database may be created when a video (410) is registered on the server (e.g., a database containing no records). For example, a database (intentionally) created by the publisher of the video (410) along with the video (410) may be registered on the server. For example, a database provided with the video (410) may include questions that may be raised in relation to the video (410) (e.g., anticipated questions), and information (or answers) paired with said questions. After the database (or the records (431, 432, 433)) contained in the database is registered on the server, it may be dynamically managed by the server (e.g., addition, deletion, and / or modification of records).

[0084] For example, it is assumed that the server has created an empty database containing no records based on the registration of the video (410). In the assumed case, a client device that receives a request for playback of the video (410) may request the video (410) and the database from the server. Based on the request, the server may transmit the empty database along with the video (410) (or a communication link for streaming the video (410)) to the client device. While displaying the video (410) received from the server, the client device may receive or identify user input related to the video (410) (e.g., by performing the action (320) of FIG. 3). The client device that identifies the user input may request response information for the user input from the server, along with information indicating the user input, because the database contains no records.

[0085] In the above-mentioned case, the server that has identified user input from the client device can obtain or generate response information for said user input by executing a large language model installed on the server (e.g., the second language model (240) of FIG. 2) because the database stored on the server also does not contain any information related to the video (410). The server can store the response information obtained from the large language model in the database along with the question (of the user of the client device) that appears through the user input.

[0086] For example, a record may be created or stored in a database that combines information corresponding to the question, the response information, and the playback time of the video (410) linked to the user input (e.g., the playback time corresponding to the frame(s) of the video (410) displayed through the client device when the user input is received). The columns (or fields) of the record may be defined as shown in Table 1.

[0087] Column Name Description of information stored in the column Start time: A timestamp indicating the start time of the playback interval corresponding to the record Duration: The length of the playback interval corresponding to the record Search key: Information for searching one or more natural language inputs (e.g., natural language sentences identified from user input) associated with the record Information: Information used to generate response information Cache hit count: The number of times the record has been used to generate response information and / or the number of times natural language input(s) associated with the record have been identified

[0088] The column name(s) of the database are not limited to each column name exemplified in Table 1. In the column named “search key” in Table 1, keyword(s) common to natural language inputs corresponding to records and / or vector(s) corresponding to said keyword(s) (e.g., vectors contained in a vector space representing relationships between words and / or embedding vectors) may be stored. In the column named “information” in Table 1, one or more words used to generate response information and / or one or more vectors corresponding to said one or more words may be stored.

[0089] In the above assumed case, response information generated using a large language model and / or one or more words included in said response information, or one or more vectors representing said one or more words, may be stored in the “information” column of the record. In the above assumed case, keyword(s) included in the question identified from user input and / or vector(s) representing said keyword(s) may be stored in the “search key” column of the record. In the above assumed case, the hit count of the record (e.g., “cache hit count” column) may be initialized to or set to the natural number 1. In the above assumed case, the “start time” column of the record created in the database may store a timestamp indicating the playback time of the video (410) displayed on the client device at the time the question corresponding to said record (or one or more questions similar to said question) was received. In the above assumed case, the “duration” column of the record created in the database may be initialized to or set to 0 because said record was created by a single question.

[0090] As the server provides the video (410) and a database (e.g., an empty database within the assumed case above) to a plurality of electronic devices, user inputs received from each of the plurality of electronic devices may be transmitted to the server. The server may add records to the database or update records based on the user inputs. For example, among the user inputs, user inputs representing similar questions may be used to update a specific record in the database. Whenever a record is updated, the server may increase the “cache hit count” column (by 1). In the “start time” column of the record, a timestamp representing the earliest playback point among the playback points of the video (410) where the user inputs associated with the record (e.g., user inputs representing similar questions) were identified may be stored. In the “duration” column of the record, a numerical value representing the difference between the earliest and latest playback points among the playback points of the user inputs may be stored. For example, among the above user inputs, a user input representing a question that is not similar to any of the record(s) stored in the database can be used to add a new record to the database.

[0091] Referring to FIG. 4, records (431, 432, 433) updated based on questions (or user inputs) detected by a server and client devices connected to the server are illustrated as examples. The records (431, 432, 433) can each be matched to time intervals (421, 422, 423) where user inputs associated with each record were received. Each of the time intervals (421, 422, 423) can be defined by the “start time” column and the “duration” column of the corresponding record.

[0092] For example, if natural language inputs related to the actors of the video (410) (e.g., “Who is the male lead?” and / or “Tell me the actor playing the police.”) are detected by client devices within a time interval (422) of the video (410), words common to the natural language inputs (e.g., male, actor, real name), and / or response information for the natural language inputs (e.g., “Sam K”) may be stored in a record (432) corresponding to the time interval (422). For example, vectors representing the words common to the natural language inputs, and / or word(s) included in the response information (or vector(s) corresponding to said word(s)) may be stored in the record (432). Referring to the “cache hit count” column of the record (432), the natural language inputs were detected 127 times. The time interval (422) may be the time interval between the fastest and slowest playback points among the playback points of the video (410) where each of the natural language inputs associated with the record (432) was detected.

[0093] For example, a record (433) corresponding to a time interval (423) of a video (410) may be generated based on natural language inputs (e.g., “Who is the actress?”) that were frequently entered (via the client device(s)) during the time interval (423). Vectors representing words commonly identified from the natural language inputs (e.g., woman, actress, real name) may be stored in the “search key” column of the record (433). Response information (e.g., “Jane M”) corresponding to the natural language inputs may be stored in the “information” column of the record (433). The number of the natural language inputs may be stored in the “cache hit count” column of the record (433).

[0094] Although records (e.g., records (432, 433)) corresponding to a portion (e.g., playback segments (422, 423)) of the entire playback segment (421) of the video (410) have been described, embodiments are not limited thereto. For example, a record (431) corresponding to the entire playback segment (421) may be stored in a database corresponding to the video (410). The record (431) may respond to questions that may be raised at any time while playing the video (410). For example, if the video (410) is a movie, the record (431) may store information about the director of the movie of the video (410). A timestamp (e.g., 00:00:00.0) indicating the start time of the playback segment of the video (410) may be stored in the “start time” column of the record (431). In the “duration” column of the record (431), a timestamp (e.g., 01:45:17.0) indicating the length of the entire playback section (421) of the video (410) may be stored. In the “search key” column of the record (431), vector(s) indicating word(s) (e.g., director, and / or filmography) included in the question related to the director may be stored. In the “information” column of the record (431), word(s) included in the response information related to the director and / or vector(s) corresponding to said word(s) may be stored.

[0095] Referring to FIG. 4, the number of records stored in the database (e.g., N), such as records (431, 432, 433), may be stored in the database. The sum of the numerical values ​​stored in the “cache hit count” column of the records stored in the database (e.g., total hit count = 2000) may be stored in the database. The sum may correspond to the number of natural language inputs received to create the database. For example, the ratio between the sum and the numerical value in the “cache hit count” column of the record may (statistically) represent the probability that user input received in relation to the video (410) corresponds to the record. For example, the ratio may represent the probability that user input (or natural language input) related to the record will occur. For example, the ratio may represent the importance of the information contained in the record (e.g., information related to the video (410)).

[0096] According to one embodiment, a server may transmit a database containing records (431, 432, 433) to a client device requesting video (410). In one embodiment, the server may maintain the number of records (e.g., N) stored in the database at a number less than or equal to a threshold size. For example, the server may determine at least one record (to be removed first) from the database using the numerical values ​​of the “cache hit count” column of the records (431, 432, 433). For example, the server may remove records from the database starting with the record with the smallest numerical value of the “cache hit count” column.

[0097] A database generated based on the operation of the server described above with reference to FIG. 4 can be transmitted to at least one client device among the client devices connected to the server that is outputting a video (410). Below, with reference to FIG. 5, an exemplary operation of a client device that generates response information for user input using the database transmitted from the server is described.

[0098] FIG. 5 illustrates an exemplary operation of an electronic device (101) that searches a database based on user input. The electronic device (101) of FIG. 5 may include the electronic device (101) of FIG. 1, the STB (125) of FIG. 2, and / or the client device (201) of FIG. 2. The database of FIG. 5 may correspond to the first database (228) of FIG. 1. The database of FIG. 5 may be generated by a server connected to the electronic device (101) (e.g., the server (202) of FIG. 2 and / or the server (110) of FIG. 1) based on the operation described above with reference to FIG. 4.

[0099] Referring to FIG. 5, an exemplary state of an electronic device (101) that plays a video (510) is illustrated. It is assumed that the video (510) is a movie having a running time of 1 hour 45 minutes 17 seconds. Based on receiving user input to play the video (510), the electronic device (101) may acquire or receive a database corresponding to the video (510). The electronic device (101) may store the acquired database as the first database (228) of FIG. 2 in memory (e.g., memory (220) of FIG. 2).

[0100] For example, a user watching a video (510) being played (or displayed) through an electronic device (101) may speak a question related to the video (510). The electronic device (101) may identify or detect the user's natural language input (or user input) containing the question by using an audio signal output from a microphone. As another example, the user may input text containing the question by using a remote controller connected to the electronic device (101) (e.g., the remote controller (120) of FIG. 1), a keyboard connected to the electronic device (101), and / or a software keyboard displayed through the display of the electronic device (101). The electronic device (101) may identify or detect the natural language input (or user input) containing the question by using the text input by the user.

[0101] While displaying the video (510), the electronic device (101) that detects the natural language input can search a database using at least one word (or a vector corresponding to said at least one word) included in the question to identify, extract, or filter at least one record related to said natural language input within the database. The operation of searching for a plurality of records included in the database may include comparing a vector included in the plurality of records (e.g., a vector(s) stored in a “search key” column) with at least one vector corresponding to the at least one word identified from the natural language input.

[0102] In one embodiment, the operation of searching for a plurality of records included in a database may include the operation of obtaining distances between a playback time t related to natural language input (e.g., a playback time t of a video (510) that was being displayed through the electronic device (101) at the time the natural language input was received) and playback segments corresponding to each of the plurality of records (e.g., playback segments (521, 522)). For example, since a user watching the video (510) is likely to ask a question related to a portion of the playback segment of the video (510) recognized (e.g. viewed) by the user, the electronic device (101) may preferentially extract records corresponding to a playback segment preceding the playback time t among the plurality of records. For example, the electronic device (101) may include the operation of extracting record(s) that are relatively close to the playback time t among the records stored in the database using a weight, which is a function based on the distance between the playback time t and the playback segments. Referring to FIG. 5, among the playback intervals (521, 522), a record (532) corresponding to the playback interval (522) adjacent to the playback time t may be assigned a weight greater than that of the record (531) corresponding to the playback interval (521).

[0103] Hereinafter, with reference to FIG. 6 and / or FIG. 7, a weight determined based on a playback time t associated with a natural language input, and / or a function used to calculate said weight are described.

[0104] FIG. 6 illustrates an exemplary operation of an electronic device (101) that determines at least one record related to user input from a plurality of records obtained from a database. The electronic device (101) of FIG. 6 may include the electronic device (101) of FIG. 1, the STB (125) of FIG. 2, and / or the client device (201) of FIG. 2. Referring to FIG. 6, an exemplary operation of the electronic device (101) that identifies user input while outputting an image (e.g., image frame, frame image, and / or frame) corresponding to a playback time t of the video (510) of FIG. 5 is described.

[0105] Based on user input identified while outputting a portion of the video (510) (e.g., image frames and / or audio signals) corresponding to playback time t, the electronic device (101) may obtain or calculate scores (e.g., matching coefficients) for each of the records (e.g., records (531, 532)) stored in the database. The scores may be a combination of weights (e.g., a combination based on multiplication) based on the numerical value of the record’s “cache hit count” column (e.g., the number of natural language input(s) corresponding to the record and / or the number of times the record was used to generate response information), and / or a playback segment corresponding to the record (e.g., playback segments (521, 522) of each of the records (531, 532)) and the distance between the playback time t.

[0106] Referring to FIG. 6, exemplary records (531, 532) stored in a database corresponding to a video (510), and playback intervals (521, 522) corresponding to each of the records (531, 532) are illustrated. The playback intervals (521, 522) may be indicated by information stored in each of the records (531, 532) (e.g., information stored in a “start time” column and / or a “duration” column). The distance between a playback time t and a playback interval (521) may be used to determine the score of the record (531) corresponding to the playback interval (521). Similarly, the distance between a playback time t and a playback interval (522) may be used to determine the score of the record (532) corresponding to the playback interval (522).

[0107] According to one embodiment, the electronic device (101) can calculate or determine the weights used to calculate the score using a function (610) formed on a time axis with respect to playback time t. The function (610) may be referred to as a neighbor relevance function (NRF). The function (610) may have the form of a convex function that decreases as it moves further away from playback time t on the time axis. The function (610) may have an asymmetric form with respect to playback time t such that the weights assigned to playback time t in the past exceed the weights assigned to playback time t in the future. For example, regarding playback time t of a video (510) being viewed by a user, the probability of a question occurring about content in the past before playback time t is greater than the probability of a question occurring about content in the future before playback time t, so the function (610) may have the form of an asymmetric convex function with respect to playback time t. Different examples of functions (610) having the form of asymmetric convex functions are described with reference to FIG. 7.

[0108] According to one embodiment, an electronic device (101) can calculate or determine the scores of each of the records (531, 532) using a function (610) determined based on playback intervals (521, 522) corresponding to the records (531, 532) and a playback time t. For example, the score corresponding to the record (531) may be a value obtained by applying (e.g., multiplying) the numeric value of the “cache hit count” column of the record (531) to the output value of the function (610) into which the playback time t1 (e.g., the median value, maximum value, and / or minimum value of the timestamps included in the playback interval (521)) is input. For example, the score of the record (532) may be a combination of the output value of the function (610) for the playback time t2 within the playback interval (522) and the numeric value of the “cache hit count” column of the record (532).

[0109] The electronic device (101) may determine or select a record having a relatively large score among the scores of records (531, 532) as the record associated with user input. Referring to FIG. 6, the score may be dependent on the distance from playback time t on the time axis due to a weight calculated by a function (610) mapped to a maximum value at playback time t. Referring to FIG. 6, the score may be dependent on the frequency and / or number of natural language input(s) associated with the record due to the value of the record’s “cache hit count” column. On the time axis, the distance between the playback interval (521) corresponding to record (531) and playback time t may be greater than the distance between the playback interval (522) corresponding to record (532) and playback time t. In other words, the weight corresponding to record (531) represented by the function (610) may be smaller than the weight corresponding to record (532). However, if we look at the values ​​of the “cache hit count” column of records (531, 532), the value of the “cache hit count” column of record (531) is larger, so the score of record (531) may be greater than the score of record (532).

[0110] As described above, when the score of record (531) is greater than the score of record (532) among records (531, 532) adjacent to playback time t, the electronic device (101) may use the information stored in record (531) (e.g., information stored in the “information” column) to generate response information corresponding to user input. For example, the electronic device (101) may input the information of record (531) and a natural language sentence (e.g., text representing a question) identified from user input into the first language model (226) of FIG. 2. For example, the information of record (531) may be input into the first language model (226) to obtain a natural language sentence and / or text biased against the information of record (541) from the first language model (226). Based on obtaining response information from a first language model (226), the electronic device (101) may output the obtained response information at least partially. For example, the electronic device (101) may display a visual object, including text, an image, and / or a video, that appears in the response information, on a display. For example, the electronic device (101) may output an audio signal representing a speech based on the response information through a speaker.

[0111] Although an embodiment has been described in which a score is determined based on a function (610) based on a playback time t corresponding to a natural language input, and on playback intervals (e.g., playback intervals (521, 522)) of each of the records, the embodiment is not limited thereto. For example, because a record contains keyword(s) associated with the record, the electronic device (101) can determine a score by using the similarity between the keyword and the natural language input. For example, the electronic device (101) can compare the distance between a vector representing the keyword, stored in the record’s “search key” within a vector space, and a vector identified from the natural language input. For example, the electronic device (101) can obtain or calculate the distances between the vectors stored in each of the records and the vector identified from the natural language input. Among the calculated distances, the electronic device (101) can assign the largest weight and / or score to the record in which the vector having the minimum distance is stored. In other words, a record containing keyword(s) having a meaning similar to at least one word included in the natural language input can be determined as the record with the highest score and can be used to generate response information together with user input.

[0112] Although an embodiment has been described in which a “cache hit count” column, indicating the number of times natural language input(s) corresponding to a record have been identified (or the frequency in which a record has been used to generate response information), is used to search for at least one record associated with user input, the embodiment is not limited thereto. For example, the user may repeat the natural language input if the response output from the electronic device (101) based on the natural language input is incorrect (e.g., hallucination). In the above example, the numerical value of the “cache hit count” column may be gradually increased even though the electronic device (101) repeatedly outputs an incorrect response.

[0113] According to one embodiment, when an electronic device (101) receives a natural language input, it may determine or decide whether the received natural language input is similar to another natural language input that was previously received. For example, by using the distance between a vector corresponding to the other natural language input and a vector corresponding to the received natural language input, the electronic device (101) may determine or calculate the similarity between the natural language input and the other natural language input. If a similarity greater than a threshold similarity is identified, the electronic device (101) may transmit the natural language input to a server without searching a database based on the natural language input. For example, to prevent the regeneration of response information similar to the response information generated based on the other natural language input, the electronic device (101) may request the server to generate response information for the natural language input.

[0114] FIG. 7 illustrates an exemplary graph for explaining weights used to retrieve at least one record related to user input from a database. Referring to FIG. 7, graphs (720, 730, 740) of functions defined to output a maximum value at a reference point of 0 seconds are shown on a coordinate plane (710) where the x-axis is the time axis. The y-axis of the coordinate plane (710) can be defined to represent the output value of the function, i.e., the weight. The functions corresponding to each of the graphs (720, 730, 740) can be used to determine the weights described with reference to FIG. 6. For example, an electronic device can shift any one of the functions according to the playback point t of a video being displayed through the electronic device. Using the shifted function, the electronic device can calculate or obtain weights corresponding to each of the records stored in the database.

[0115] Each of the functions corresponding to the graphs (720, 730, 740) may be an NRF function of Equation 1. The NRF function can be defined based on the chi-square probability density function f(x; k).

[0116]

[0117] of mathematical formula 1 can be a gamma function. In the functions corresponding to graphs (720, 730, 740), k can be defined as 6. s in Equation 1 can be a scale factor. In the function of graph (720), s can be defined as 0.5. In the function of graph (730), s can be defined as 1.0. In the function of graph (740), s can be defined as 2.0.

[0118] s in Equation 1 may be related to the width of the graphs (720, 730, 740). When the function corresponding to the graph (730) where s = 1.0 is used to determine the weight, the weight may converge to 0 at -20 seconds (e.g., 20 seconds before the playback time t) and may converge to 0 at +4 seconds (e.g., 4 seconds after the playback time t). When the function is used to search for records in a database, one or more records corresponding to a 24-second playback interval between 20 seconds before the playback time t and 4 seconds after the playback time t may be filtered. In other words, the value of s in Equation 1 may represent the size of the window for filtering records in the database based on the playback time t.

[0119] Referring to FIG. 7, the function of the graph (720) where s = 0.5 can converge to 0 at -40 seconds and to 0 at +8 seconds. When the function of the graph (720) is used to determine weights, records of a playback interval of 48 seconds between a point 40 seconds before the playback point t and a point 8 seconds after the playback point t can be filtered. When the function of the graph (720) is used to determine weights, the operation of an electronic device to search for one or more records related to user input from a database can be represented by the pseudo-code of Table 2.

[0120]

[0121] Referring to Table 2, the playback time t of a video being displayed on an electronic device may be stored in the curr_time variable. Records related to the playback interval between -40 seconds before the playback time t stored in the curr_time variable and +8 seconds after the playback time t may be stored in the Record[diff] variable. The result of calculating the score of the record may be stored in the Relevance_index[diff] variable. Referring to Table 2, the score may be the numerical value of the “cache hit count” column stored in the record, the output value of the NRF function based on the playback time corresponding to the record (e.g., the output value of the function of graph (720)), the information of the “search key” column included in the record, and the multiplication of the similarity between the words included in the natural language input.

[0122] Referring to Table 2, the electronic device that calculates the scores of all records can determine the number of records stored in the context_size_max variable as the result of searching the database based on the ascending order of the scores. The number stored in the context_size_max variable may be related to the performance of the language model running on the electronic device. The electronic device can execute a language model (e.g., the first language model (226) of FIG. 2) using the record(s) determined as the result of searching the database to obtain or generate response information for user input. The response information can be output through the output circuit of the electronic device (e.g., the output circuit (232) of FIG. 2).

[0123] As described above, the electronic device can search records within a database to extract information to be used for the execution of a low-performance, low-power language model running on the electronic device. Since information related to user input is extracted preferentially, the electronic device can obtain or generate high-quality response information related to user input from the low-performance, low-power language model. Because the electronic device obtains response information without communication with a server and / or a server, the electronic device can obtain response information relatively quickly. Since information and / or signals related to user input are not transmitted to a server, privacy issues related to user input may not arise.

[0124] Although exemplary operations of an electronic device searching for records of a database stored in the electronic device have been described, the embodiments are not limited thereto. For example, regarding the second database (250) of FIG. 2, the server (202) may also perform operations based on the operations described with reference to FIG. 7. In the above example, the server (202) may search the second database (250) stored in the server (202) based on user input received from the client device (201). Searching the second database (250) may be performed similarly to the operation of the electronic device searching for a database described with reference to FIG. 5 through 7. Based on searching for one or more records from the second database (250), the server (202) may transmit one or more retrieved records to the client device (201). The client device (201) can input one or more records transmitted from the server (202) into the first language model (226) (e.g., input along with user input) to obtain or output response information for user input. If a record related to user input is not found in the second database (250), the server (202) can obtain response information for user input using the second language model (240).

[0125] Although an embodiment based on a chi-square probability density function has been described, the embodiment is not limited thereto. To calculate the weights to be applied to records in a database, other functions may be used that have a peak value (or maximum value) at playback time t corresponding to the natural language input and converge to 0 as they move away from playback time t.

[0126] According to one embodiment, an electronic device may receive user input to omit playback of a specific playback section of a video. Hereinafter, with reference to FIG. 8, the operation of an electronic device to search for at least one record related to the user input from a database while playback of a specific playback section is omitted from the entire playback section is described.

[0127] FIG. 8 illustrates an exemplary operation of an electronic device that searches a database based on playback history. The electronic device of FIG. 8 may include the electronic device (101) of FIG. 1, the STB (125) of FIG. 2, and / or the client device (201) of FIG. 2.

[0128] Referring to FIG. 8, the electronic device can acquire the video (810) and a database corresponding to the video (810) based on user input for playing the video (810). Time point t1 may correspond to the starting position of the playback section of the video (810). The electronic device can output the frames of the video (810) sequentially through a display, starting from the frame of the video (810) corresponding to time point t1.

[0129] Referring to FIG. 8, an embodiment is illustrated in which a database containing 13 records is received along with a video (810). In FIG. 8, a white box marked with the number k may indicate a playback section of the video (810) corresponding to the k-th record among the records. For example, a playback section of the first record may be included in a playback section (821) between time t1 and time t2, and a playback section of the second record may be included at least partially.

[0130] It is assumed that while playing a frame corresponding to time point t2 within the entire playback period of the video (810), the electronic device receives user input to skip (skip or jump) the playback of the video (810). For example, the electronic device may receive user input at time point t2 to display a frame of the video (810) corresponding to time point t3 after time point t2. Based on the user input, the electronic device may output the frames of the video (810) after time point t3 sequentially through the display, starting from the frame of the video (810) corresponding to time point t3. For example, frames of the video (810) located in the playback period between time point t2 and time point t3 may not be output through the display.

[0131] In the above-mentioned assumed case, at time t5 after time t3, the electronic device may receive user input related to the video (810) (e.g., user input including a remark (830)). The remark (830) may include a question related to the content of the video (810), such as “What is the main character doing now?” The electronic device, having identified a natural language question such as the remark (830) from the user input, may search for at least one record among 13 records stored in the database to be input into the electronic device’s language model (e.g., the first language model (226) of FIG. 2). To search for records, the electronic device may determine weights to be applied to each of the records using a function (850) having a peak value at time t5.

[0132] Referring to FIG. 8, in an exemplary case where playback of a playback interval between time point t2 and time point t3 of the video (810) is omitted, the electronic device may remove records associated with said playback interval (e.g., third to eighth records) from the search range of records to be searched based on user input including a remark (830). For example, records within the playback interval between time point t2 and time point t3 may not be searched as records associated with user input. The electronic device may identify that playback of the playback interval between time point t2 and time point t3 has been omitted based on a playback history (e.g., playback history information (222) of FIG. 2). The electronic device that has identified that playback of the playback interval between time point t2 and time point t3 has been omitted may merge the playback interval before time point t2 (821) and the playback interval after time point t3 (822) to identify a timeline (840) for applying weights based on a function (850). On the timeline (840), time t2 and time t3 may coincide.

[0133] Referring to FIG. 8, a weight based on a function (850) is determined on a timeline (840), and within the entire playback section of the video (810), a weight based on the function (850) (e.g., a weight that does not converge to 0) may be applied to a playback section (821) spaced apart from time point t5. For example, the weight to be applied to a first record corresponding to a playback section (821) may be determined as the output value of the function (850) input by subtracting the time difference between time point t2 and time point t3 from the time difference between time point t5 and time point 1 associated with the first record. For example, the weight to be applied to a second record corresponding to a playback section (821) may be determined as the output value of the function (850) input by subtracting the time difference between time point t2 and time point t3 from the time difference between time point t5 and time point 2 associated with the second record.

[0134] As described above, the electronic device can search for a record based on user input within playback segments (821, 822) substantially played through the electronic device. If at least one record is identified from the database based on the search, the electronic device can execute a language model (e.g., the first language model (226) of FIG. 2) using the identified at least one record. Response information output from the language model can be output (directly) by the electronic device without communication with a server.

[0135] Although an operation in which an electronic device searches a database using playback segments (821, 822) that were played through the electronic device has been described, the embodiments are not limited thereto. For example, if no record is identified from the database, the electronic device may request a server (e.g., server (202) in FIG. 2) to transmit response information for user input along with information indicating the playback segments (821, 822) that were played through the electronic device. The server may search a database (e.g., database (250) in FIG. 2) stored on the server using the information indicating the playback segments (821, 822) to obtain at least one record related to user input among the records related to the playback segments (821, 822) that were played through the electronic device.

[0136] FIG. 9 illustrates an exemplary operation of a server (e.g., server (202) of FIG. 2) for updating a database. The server (110) of FIG. 1 and / or the server (202) of FIG. 2 may perform the operation of the server described with reference to FIG. 9. The first client device and the second client device of FIG. 9 may be included in the electronic device (101) of FIG. 1, the STB (125), and / or the client device (201) of FIG. 2.

[0137] Referring to FIG. 9, 13 records may be stored in the database of the server for a video corresponding to the database. A box marked with the number k may indicate a playback section within the video corresponding to the k-th record among the records. As described above with reference to FIG. 2, the server may be connected to a plurality of client devices, including the two client devices of FIG. 9 (e.g., a first client device and a second client device). Based on user input received from at least one of the plurality of client devices, the server may search the database stored in the server. Based on the search of the database, the server may update at least one of the records in the database.

[0138] Referring to FIG. 9, an exemplary state is illustrated in which the second record (910) and the thirteenth record (920) are updated after the database corresponding to the video is transmitted to the first client device and the second client device. The server may determine whether to update the database of each of the client devices based on the playback times of each of the client devices (e.g., the first client device and the second client device) that are playing the video.

[0139] For example, the server may determine whether to determine the second record (910) or the thirteenth record (920), respectively, as the first client device, based on the playback time t1 of the video being displayed on the first client device. Referring to FIG. 9, the playback time t1 may be located before the second record (910) on the time axis. In other words, the first client device is likely to play the playback segment corresponding to the second record (910) after the playback time t1 (e.g., when it does not receive user input to skip the playback segment corresponding to the second record (910)). Based on the update of the second record (910), the server may transmit the updated second record (910) to a client device (e.g., the first client device) among the client devices connected to the server that is playing the playback time (e.g., playback time t1) before the playback segment of the second record (910). For example, since the playback time t2 of the video being played on the second client device is located after the playback interval corresponding to the second record (910) on the time axis, the server may not transmit the updated second record (910) to the second client device.

[0140] A first client device that receives an updated second record (910) can store the received second record (910) in a database. For example, the second record (910) that was stored in the database can be changed to an updated version of the second record (910) received from the server.

[0141] Referring to FIG. 9, the server may transmit to the second client device the 13th record (920), which corresponds to a playback interval after the playback time t2 of the video being played through the second client device, among the updated 2nd record (910) or 13th record (920). The second client device may update the database at least partially using the 13th record (920) transmitted from the server. Based on the update, the second client device may perform a search of the database based on user input.

[0142] As described above, the server can continuously update the databases of multiple client devices connected to the server. The server can select, from among the client devices connected to the server, a client device to transmit the updated record(s) by comparing the playback time of a video being played through a client device with the playback section of the updated record(s) in the database.

[0143] The operation of a server and / or client device that generates response information for user input using a database has been described, but the embodiments are not limited thereto. Below, with reference to FIG. 10, an exemplary operation of a client device that recommends information related to video using a database is described.

[0144] FIG. 10 illustrates an exemplary operation of an electronic device (101) that displays information related to a video (1010) based on a database. The electronic device (101) of FIG. 1, the STB (125) of FIG. 2, and / or the client device (201) of FIG. 2 can perform the operation of the electronic device (101) described with reference to FIG. 10.

[0145] Referring to FIG. 10, an exemplary state of an electronic device (101) playing a video (1010) is illustrated. For example, an exemplary state of an electronic device (101) playing a frame corresponding to a playback time tc of the video (1010) is illustrated. A database corresponding to the video (1010) may be stored in the electronic device (101). Referring to FIG. 10, exemplary records (1031, 1032, 1033) included in the database, and time intervals (1021, 1022, 1023) corresponding to the records (1031, 1032, 1033) are illustrated.

[0146] Referring to FIG. 10, the electronic device (101) may display information related to the video (1010) together with the video (1010) while playing the video (1010). For example, the electronic device (101) may display a visual object (1040) (e.g., an assistant UI (user interface)) that is displayed on the display, and a visual object (1045) containing information related to the video (1010). Referring to FIG. 10, an exemplary screen is illustrated in which an avatar represented by the visual object (1040) speaks the information contained in the visual object (1045). The UI displayed by the electronic device (101) to provide information about the video (1010) is not limited to the embodiment of FIG. 10.

[0147] Information to be provided with the video (1010) can be retrieved from records in a database. For example, the electronic device (101) can identify a record (1031) associated with a relatively large number of natural language inputs (e.g., a relatively high numerical value stored in the “cache hit count” column) among records (1031, 1032, 1033) corresponding to playback time tc of the video (1010) being played through the electronic device (101) and adjacent playback intervals (1021, 1022, 1023) on the time axis. The electronic device (101) can display a visual object (1045) containing information stored in the identified record (1031). The electronic device (101) searching the database can be performed based on a function defined for calculating weights, as described with reference to FIGS. 5 through 7, and / or the distance between the playback time tc and the playback interval of the record.

[0148] FIG. 11 illustrates an exemplary flowchart for explaining the operation of a server that receives a request from an external electronic device to generate response information for user input. The operations of FIG. 11 may be performed by the server (110) of FIG. 1 and / or the server (202) of FIG. 2. The external electronic device of FIG. 11 may include the electronic device (101) of FIG. 1, the STB (125), and / or the client device (201) of FIG. 2. The order in which the operations of FIG. 11 are performed is not limited to the order shown in FIG. 11. For example, the server may perform the operations of FIG. 11 in an order different from the order shown in FIG. 11. For example, at least two of the operations of FIG. 11 may be performed substantially simultaneously.

[0149] Referring to FIG. 11, within operation (1110), according to one embodiment, a server may receive a request from an external electronic device to generate response information for a video-based user input. For example, based on receiving the user input, the external electronic device may obtain or search for information related to the user input (e.g., at least one record stored in the database) from the external electronic device's database (e.g., the first database (228) of FIG. 1). If information related to the user input is not obtained from the database, the external electronic device may transmit a signal indicating a request for operation (1110) to the server. Upon receiving the request for operation (1110), the server may perform operation (1120).

[0150] Referring to FIG. 11, in operation (1120), according to one embodiment, a server may identify at least one record related to user input from a database corresponding to a video. The database of operation (1120) may include the second database (250) of FIG. 2. The server may perform operation (1120) by performing the operation described above with reference to FIG. 5 through FIG. 8. If at least one record related to user input is identified from the database (1120-Yes), the server may perform operation (1130). If no record related to user input is identified from the database (1120-No), the server may perform operation (1150).

[0151] Referring to FIG. 11, in operation (1130), according to one embodiment, a server may generate response information corresponding to user input using at least one identified record. For example, the server may generate response information of operation (1130) by executing a language model using at least one record. The language model executed to generate response information of operation (1130) may include not only the server's language model (e.g., the second language model (240) of FIG. 2) but also the language model of an external electronic device (e.g., the language model (226) of FIG. 1).

[0152] Referring to FIG. 11, in operation (1140), according to one embodiment, the server may update at least one identified record. In one embodiment where the record(s) of the database include the columns of Table 1, the server may increase the numerical value stored in the “cache hit count” column value of at least one record (e.g., the number of times at least one record was used to generate response information). In one embodiment, the order in which operations (1130, 1140) are performed may be the opposite of the order shown in FIG. 11. In one embodiment, operations (1130, 1140) may be performed substantially simultaneously.

[0153] Referring to FIG. 11, in operation (1150), according to one embodiment, a server can generate response information related to user input using a language model. The server can obtain response information of operation (1150) by inputting information representing user input of operation (1110) into a language model installed on the server (e.g., language model (240) of FIG. 2).

[0154] Referring to FIG. 11, within operation (1160), according to one embodiment, a server may add a record based on user input and response information to a database. For example, the server may add a record of operation (1160) to a database stored on the server (e.g., the second database (250) of FIG. 2). The record may be transmitted to one or more client devices connected to the server, including an external electronic device of operation (1110), based on the operation described with reference to FIG. 9. In one embodiment, the order in which operations (1150, 1160) are performed may be the opposite of the order shown in FIG. 11. In one embodiment, operations (1150, 1160) may be performed substantially simultaneously.

[0155] Referring to FIG. 11, in operation (1170), according to one embodiment, a server may transmit generated response information to an external electronic device. For example, a server that has obtained response information of operation (1150) may transmit a signal to an external electronic device that causes the external electronic device to output said response information. For example, the server may transmit a signal to an external electronic device that causes the external electronic device to execute a language model using said at least one record, along with at least one record identified based on operation (1120). An external electronic device that has received response information may at least partially display or output the response information through a display and / or speaker of the external electronic device.

[0156] Referring to FIG. 11, within operation (1180), according to one embodiment, a server may transmit at least a portion of an updated database to at least one external electronic device playing a video. For example, based on at least one of operations (1140, 1160), a portion of the updated database (e.g., at least one record) may be transmitted from the server to one or more client devices connected to the server. For example, the server may perform operation (1180) of FIG. 11 in a manner similar to the operation of the server described with reference to FIG. 9.

[0157] FIG. 12 illustrates an exemplary UI output by an electronic device (101) prior to transmitting information related to user input from the electronic device (101) to a server (110). The electronic device (101), STB (125), and / or the client device (201) of FIG. 1 may include the electronic device (101) of FIG. 12. The server (110) of FIG. 1 and / or the client device (201) of FIG. 2 may include the server (110) of FIG. 12.

[0158] Referring to FIG. 12, an exemplary state of an electronic device (101) receiving natural language input (or user input) related to the video while displaying the video is illustrated. Based on the natural language input, the electronic device (101) may search a database stored in the electronic device (101) related to the video (e.g., the first database (228) of FIG. 1). The electronic device (101) searching the database may include the operations described with reference to FIG. 5 through 8.

[0159] For example, if no record related to natural language input is identified from the database, the electronic device (101) may display the visual object (1220) of FIG. 12. For example, the electronic device (101) may display the visual object (1220) containing specified text, such as “Shall I send a question to the server?” together with the visual object (1210) (e.g., Assistant UI). At least one of the visual objects (1210, 1220) may be displayed to receive user input to check whether to receive information related to natural language input to the server (110).

[0160] With the visual object (1220) displayed, the electronic device (101) may receive a first user input that allows the transmission of information related to natural language input to the server (110). For example, the first user input may be identified based on a user's statement that has the meaning of allowing the transmission of information, such as “Yes, send it.” For example, the first user input may include an input that presses a specific button (e.g., “OK” button) of a remote controller (e.g., the remote controller (120) of FIG. 1). Based on receiving the first user input, the electronic device (101) may transmit information related to natural language input to the server (110). The electronic device (101) may request the server (110) to transmit response information regarding the natural language input. After receiving the first user input, the electronic device (101) may display a visual object (1230) indicating that information is being transmitted to the server (110). A visual object (1230) having the form of an icon is illustrated as an example, but the embodiment is not limited thereto. The server (110) that receives the information may perform the operation described above with reference to FIGS. 1 to 11 to transmit response information and / or information available to generate the response information (e.g., at least one record retrieved from the database of the server (110)) to the electronic device (101).

[0161] With the visual object (1220) displayed, the electronic device (101) may receive a second user input that blocks the transmission of information related to natural language input to the server (110). For example, the second user input may be identified based on a user's statement that has the meaning of not allowing the transmission of information, such as “No, do not send.” For example, the second user input may include an input of pressing a specific button on a remote controller (e.g., a “Cancel” button and / or a “No” button). Based on identifying the second user input, the electronic device (101) may not transmit information related to natural language input to the server (110). After receiving the second user input, the electronic device (101) may display a visual object (1240) indicating that the transmission of information to the server (110) has been blocked. A visual object (1240) having the form of an icon is illustrated as an example, but the embodiment is not limited thereto. Upon receiving the second user input, the electronic device (101) may stop generating or outputting response information based on natural language input.

[0162] Before requesting response information for natural language input from the server (110), the electronic device (101) can display at least one of the visual objects (1210, 1220) to prevent privacy issues that may occur when the natural language input is transmitted to the server (110).

[0163] In one embodiment, a method for generating and / or outputting a response to user input related to video more quickly may be required. In one embodiment, a method for generating a response to user input using an artificial intelligence model (e.g., an on-device model) installed in an electronic device may be required. An electronic device (e.g., the electronic device (101) of FIG. 1) according to one embodiment as described above may include a display, a communication circuit, a memory including one or more storage media for storing instructions, and at least one processor including a processing circuit. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the display to control the display to display a video. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to obtain first response information corresponding to the user input from a database based on the information corresponding to the obtained user input when the electronic device obtains information corresponding to the obtained user input while the video is being displayed. When the above instructions are executed individually or collectively by the at least one processor, if the electronic device fails to obtain information corresponding to the user input based on the video from the database, it may cause the electronic device to transmit a request for a second response information for the user input to an external electronic device through the communication circuit.

[0164] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to obtain the first response information through a language model configured to be executed by the at least one processor based on the user input and the information.

[0165] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to search the database based on a section displayed on the display during the playback section of the video to obtain the information.

[0166] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to acquire, as information, at least one record related to the user input among a plurality of records stored in the database. Each of the plurality of records may include a playback section of the video associated with the record, at least one keyword to be compared with the user input, and data regarding the video.

[0167] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may acquire a playback point of the video corresponding to the user input when it receives the user input. When the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to acquire the at least one record based on the playback intervals of the plurality of records and the playback point.

[0168] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the at least one record having a playback interval adjacent to the playback time point to be acquired using weights based on the distances of each of the playback intervals at the playback time point. The weights may be determined based on a function having a peak value at the acquired playback time point.

[0169] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to obtain the at least one record by comparing the at least one keyword stored in each of the plurality of records with a word included in the user input. When the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to obtain the first response information using the data regarding the video, which is stored as information about the video regarding the at least one keyword within the at least one record.

[0170] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause at least one record related to the user input among a plurality of records stored in the database to be obtained as the information. Each of the plurality of records may include a counter value corresponding to the number of times the record is used in relation to the generation of response information to the user input.

[0171] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the counter value of the at least one record to be increased based on identifying the at least one record related to the user input from the database.

[0172] For example, the user input may be a first user input. When the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to obtain the database corresponding to the video from the external electronic device through the communication circuit, based on a second user input received prior to the first user input corresponding to the display of the video on the display.

[0173] In one embodiment as described above, a method of an electronic device comprising a display, a communication circuit, and a memory may be provided. The method may include an operation of controlling the display to display a video. The method may include an operation of, when information corresponding to a user input related to the video is obtained while the video is being displayed, obtaining a first response information corresponding to the user input from a database based on the information corresponding to the obtained user input. The method may include an operation of, when information corresponding to the user input based on the video is not obtained from the database, transmitting a request for a second response information for the user input to an external electronic device through the communication circuit.

[0174] For example, the operation to acquire the above may include the operation to acquire the first response information through a language model configured to be executed by the at least one processor based on the user input and the information.

[0175] For example, the operation of acquiring the above may include an operation of acquiring the information by searching the database based on a section displayed on the display among the playback sections of the video.

[0176] For example, the acquisition operation may include the operation of acquiring at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records may include a playback section of the video associated with the record, at least one keyword to be compared with the user input, and data regarding the video.

[0177] For example, the operation of acquiring at least one record may include, upon receiving the user input, an operation of acquiring a playback point of the video corresponding to the user input. The operation of acquiring at least one record may include an operation of acquiring the at least one record based on the playback intervals of the plurality of records and the playback point.

[0178] For example, the operation of acquiring at least one record may include acquiring at least one record having a playback segment adjacent to the playback time, using weights based on the distances of each of the playback segments at the playback time. The weights may be determined based on a function having a peak value at the acquired playback time.

[0179] For example, the operation of acquiring the at least one record may include the operation of acquiring the at least one record by comparing the at least one keyword stored in each of the plurality of records with a word included in the user input. The operation of acquiring the at least one record may include the operation of acquiring the first response information using the data regarding the video, which is stored as information about the video regarding the at least one keyword within the at least one record.

[0180] For example, the operation of acquiring the above may include acquiring at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records may include a counter value corresponding to the number of times the record is used in relation to the generation of response information for the user input.

[0181] For example, the operation of obtaining may include increasing the counter value of the at least one record based on identifying at least one record related to the user input from the database.

[0182] In one embodiment as described above, a non-transient computer-readable storage medium for storing instructions may be provided. When the instructions are executed by an electronic device comprising a display, a communication circuit, and a memory, the electronic device may cause the electronic device to control the display to display a video. When the instructions are executed by the electronic device, if the electronic device obtains information corresponding to a user input related to the video while the video is being displayed, the instructions may cause the electronic device to obtain a first response information corresponding to the user input from a database based on the information corresponding to the obtained user input. When the instructions are executed by the electronic device, if the electronic device does not obtain information corresponding to the user input based on the video from the database, the instructions may cause the electronic device to transmit a request for a second response information for the user input to an external electronic device through the communication circuit.

[0183] According to one embodiment as described above, an electronic device (e.g., electronic device (101) of FIG. 1) may include a display, a communication circuit, a memory comprising one or more storage media for storing instructions, and at least one processor comprising a processing circuit. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to display a video (e.g., video (410) of FIG. 4, video (510) of FIG. 5, video (810) of FIG. 8, and / or video (1010) of FIG. 10) through the display. When the instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to identify information related to the user input from a database stored in the memory (e.g., first database (228) of FIG. 2) in relation to the video, based on receiving user input while displaying the video. When the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to obtain first response information for the user input using the information based on identifying the information related to the user input from the database. When the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the electronic device to request second response information for the user input from an external electronic device via the communication circuit based on determining that the information related to the user input is not identified from the database. According to one embodiment, the electronic device may generate or output a response to a user input related to video (e.g., a response based on the first response information) more quickly.According to one embodiment, an electronic device can generate or output a response to user input by using an artificial intelligence model (e.g., an on-device model) installed in the electronic device.

[0184] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the user input and the information to be input into a language model configured to be executed by the at least one processor, thereby obtaining the first response information.

[0185] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to search the database based on a portion that was displayed through the display during the playback section of the video to identify the information.

[0186] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to identify at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records may include a playback section of the video associated with the record, at least one keyword to be compared with the user input, and data regarding the video.

[0187] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to identify at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records may include a counter value indicating the number of times the record was used to generate response information for the user input.

[0188] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the counter value of the at least one record to be increased based on identifying the at least one record related to the user input from the database.

[0189] For example, the above user input is a first user input, and when the above instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to obtain the database corresponding to the video from the external electronic device through the communication circuit based on a second user input received prior to the first user input for the playback of the video.

[0190] For example, the electronic device may include a microphone. When the instructions are executed individually or collectively by the at least one processor, the electronic device may be caused to identify the user input using an audio signal obtained from the microphone.

[0191] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the first response information to be displayed through the display based on acquiring the first response information.

[0192] For example, when the above instructions are executed individually or collectively by the at least one processor, the electronic device may cause the second response information to be displayed through the display based on receiving the second response information from the external electronic device through the communication circuit.

[0193] In one embodiment as described above, a non-transient computer-readable storage medium for storing instructions may be provided. The instructions may cause the electronic device, when executed by the electronic device comprising a display, a communication circuit, and a memory, to display a video through the display. The instructions may cause the electronic device, when executed by the electronic device, to identify information related to the user input from a database stored in the memory in relation to the video, based on receiving user input while displaying the video. The instructions may cause the electronic device, when executed by the electronic device, to obtain a first response information for the user input using the information, based on identifying the information related to the user input from the database. When the above instructions are executed by the electronic device, the electronic device may cause a second response information regarding the user input to be requested from an external electronic device through the communication circuit, depending on whether the information related to the user input is identified in the database.

[0194] For example, when the above instructions are executed by the electronic device, the electronic device may input the user input and the information into a language model configured to be executed by at least one processor of the electronic device, thereby causing the first response information to be obtained.

[0195] For example, when the above instructions are executed by the electronic device, the electronic device may be caused to search the database based on a portion that was displayed through the display during the playback section of the video to identify the information.

[0196] For example, when the above instructions are executed by the electronic device, the electronic device may cause the device to identify at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records may include a playback section of the video associated with the record, at least one keyword to be compared with the user input, and data regarding the video.

[0197] For example, when the above instructions are executed by the electronic device, the electronic device may cause the device to identify at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records may include a counter value indicating the number of times the record was used to generate response information for the user input.

[0198] For example, when the above instructions are executed by the electronic device, the electronic device may cause the counter value of the at least one record to be increased based on identifying at least one record related to the user input from the database.

[0199] In one embodiment as described above, a method of an electronic device may be provided. The electronic device may include a first communication circuit configured to be connected to a display device, a second communication circuit available to be connected to a server, and a memory. The method may include an operation of transmitting a signal representing a video through the first communication circuit. The method may include an operation of identifying information related to the user input from a database stored in the memory in relation to the video, based on receiving user input while transmitting the signal through the first communication circuit. The method may include an operation of obtaining a first response information for the user input using the information based on identifying the information related to the user input from the database. The method may include an operation of requesting a second response information for the user input from the server through the second communication circuit, based on determining that the information related to the user input was not identified from the database.

[0200] For example, the operation of obtaining the first response information may include the operation of transmitting another signal to display the first response information through the first communication circuit.

[0201] For example, the operation of requesting the second response information may include the operation of transmitting another signal to display the second response information through the first communication circuit in response to receiving the second response information from the server through the second communication circuit.

[0202] For example, the above identifying operation may include an operation of identifying the information by searching the database based on a portion corresponding to the signal being transmitted through the first communication circuit during the playback section of the video.

[0203] As used herein, the term “if” will be understood, depending on the context, to mean “when, upon,” “in response to a decision,” or “in response to a detection.” Similarly, “when decided to,” or “when [the mentioned condition or event] is detected,” will be understood, optionally, to mean “when decided,” or “in response to a decision,” “when [the mentioned condition or event] is detected,” or “in response to detecting [the mentioned condition or event].”

[0204] The device described above may be implemented as a hardware component, a software component, and / or a combination of a hardware component and a software component. For example, the device and components described in the embodiments may be implemented using one or more general-purpose or special-purpose computers, such as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing unit may execute an operating system (OS) and one or more software applications executed on said operating system. Additionally, the processing unit may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing unit may be described as being used as a single unit, but those skilled in the art will understand that the processing unit may include multiple processing elements and / or multiple types of processing elements. For example, the processing unit may include multiple processors or one processor and one controller. In addition, other processing configurations, such as parallel processors, are also possible.

[0205] Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing unit to operate as desired or instruct the processing unit independently or collectively. Software and / or data may be embodied in any type of machine, component, physical device, computer storage medium, or device so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be distributed over networked computer systems and may be stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

[0206] The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. In this case, the medium may continuously store a program executable by a computer, or temporarily store it for execution or download. Additionally, the medium may be various recording or storage means in the form of a single or several hardware combined, and may not be limited to a medium directly connected to a computer system but may exist distributed over a network. Examples of media may include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and media configured to store program instructions, including ROM, RAM, and flash memory. Additionally, other examples of media may include recording or storage media managed by app stores that distribute applications or sites and servers that supply or distribute various other software.

[0207] Although the embodiments have been described above with reference to limited examples and drawings, those skilled in the art can make various modifications and variations from the description above. For example, suitable results can be achieved even if the described techniques are performed in a different order than described, and / or the components of the described system, structure, device, circuit, etc. are combined or assembled in a form different from described, or replaced or substituted by other components or equivalents.

[0208] Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims set forth below.

Claims

1. In an electronic device, display; Communication circuit; Memory comprising one or more storage media for storing instructions; and It includes at least one processor including a processing circuit, and When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Control the above display to display a video; While the above video is being displayed, if information corresponding to user input related to the above video is obtained, a first response information corresponding to the user input is obtained from a database based on the information corresponding to the obtained user input; and If information corresponding to the user input based on the video is not obtained from the above database, it causes a request for a second response information for the user input to be transmitted to an external electronic device through the communication circuit. Electronic device.

2. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Causing to obtain the first response information through a language model configured to be executed by the at least one processor based on the user input and the information. Electronic device.

3. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Causing to search the database based on the section displayed on the display within the playback section of the above video to obtain the above information, Electronic device.

4. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Causing to obtain at least one record related to the user input among a plurality of records stored in the above database as the information, Each of the plurality of records includes a playback section of the video linked to the record, at least one keyword to be compared with the user input, and data regarding the video. Electronic device.

5. In Claim 4, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Upon receiving the above user input, the playback time of the video corresponding to the above user input is obtained; and Causing to acquire at least one record based on the playback intervals of the plurality of records and the playback time. Electronic device.

6. In Claim 5, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Using weights based on the distances of each of the playback segments at the playback time, causing to acquire the at least one record having a playback segment adjacent to the playback time, The above weights are determined based on a function having a peak value at the above-mentioned playback point, Electronic device.

7. In Claim 4, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Comparing the at least one keyword stored in each of the plurality of records with a word included in the user input to obtain the at least one record; and Causing to obtain the first response information using the data for the video, which is stored as information for the video for the at least one keyword within the at least one record. Electronic device.

8. In Claim 1, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Causing to obtain at least one record related to the user input among a plurality of records stored in the above database as the information, Each of the above plurality of records includes a counter value corresponding to the number of uses associated with the generation of response information to user input. Electronic device.

9. In Claim 8, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Based on identifying at least one record related to the user input from the above database, causing the counter value of the at least one record to increase, Electronic device.

10. In Claim 1, The above user input is a first user input, and When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Causing to obtain the database corresponding to the video from the external electronic device through the communication circuit based on the second user input received prior to the first user input corresponding to the display of the video on the display, Electronic device.

11. A method of an electronic device comprising a display, a communication circuit, and a memory, An operation to control the display to display a video; While the above video is being displayed, if information corresponding to a user input related to the above video is obtained, an operation of obtaining a first response information corresponding to the user input from a database based on the information corresponding to the obtained user input; and If information corresponding to the user input based on the video is not obtained from the database, the method includes the operation of transmitting a request for a second response information for the user input to an external electronic device through the communication circuit. method.

12. In claim 11, the operation to be obtained is, The operation of obtaining the first response information through a language model configured to be executed by the at least one processor based on the user input and the information, method.

13. In claim 11, the operation to be obtained is, A method comprising the operation of searching the database based on the section displayed on the display among the playback sections of the above video to obtain the above information. method.

14. In claim 11, the operation to be obtained is, The operation includes obtaining at least one record related to the user input among a plurality of records stored in the database as the information. Each of the plurality of records includes a playback section of the video linked to the record, at least one keyword to be compared with the user input, and data regarding the video. method.

15. In claim 14, the operation of acquiring at least one record is, When the above user input is received, an operation to obtain a playback time of the video corresponding to the above user input; and The operation of acquiring at least one record based on the playback intervals of the plurality of records and the playback time, method.