Electronic apparatus and control method therefor

The electronic device addresses temporary absence or non-display of objects by using processor-driven object information acquisition and storage to generate a second image with missing information, ensuring continuous information provision.

WO2026127588A1PCT designated stage Publication Date: 2026-06-18SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-12-09
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Existing electronic devices fail to continuously provide information about objects that are temporarily absent or not displayed in an image due to temporary obscuration or recognition failures.

Method used

The electronic device employs a processor to acquire and store object information from previous frames, identify potential overlaps or recognition failures, and generate a second image that includes missing object information, ensuring continuous information provision.

🎯Benefits of technology

Enables continuous provision of object information even when objects are temporarily obscured or unrecognized, enhancing user engagement and information accessibility.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025021094_18062026_PF_FP_ABST
    Figure KR2025021094_18062026_PF_FP_ABST
Patent Text Reader

Abstract

This electronic device comprises: a communication unit; a memory for storing at least one instruction; and at least one processor, wherein the at least one processor individually or collectively executes the at least one instruction so that the at least one processor: acquires at least one object image from a first image; acquires object information associated with each of the at least one object image; generates a second image on the basis of object information associated with a previous frame image and object information associated with each of the at least one object image acquired from the first image; and outputs, onto a display, the second image by using the display.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device and control method thereof

[0001] The present disclosure relates to an electronic device capable of continuously providing information about an object even when the object is temporarily absent or temporarily not displayed within an image, and a method for controlling the same.

[0002] An electronic device can perform operations such as generating images corresponding to content or displaying images. Such an electronic device can provide the user not only with images corresponding to the content selected by the user, but also with various information related to the content.

[0003] Embodiments of the present disclosure may solve at least one of the previously described problems and / or disadvantages and provide the advantages described below. Accordingly, the embodiments of the present disclosure provide an electronic device and a method for controlling the same that can continuously provide information about an object even when the object is temporarily not displayed in an image.

[0004] Additional embodiments will be presented in the detailed description below, some of which are obvious from the detailed description, and other embodiments can also be presented through learning from the presented embodiments.

[0005] An electronic device according to an embodiment of the present disclosure is disclosed. The electronic device comprises a communication unit, a memory for storing at least one instruction, and at least one processor, wherein the one or more processors execute the at least one instruction individually or collectively, and the one or more processors acquire at least one object image in a first image, acquire object information associated with each of the at least one object image, generate a second image based on object information associated with a previous frame image and object information associated with each of the at least one object image acquired from the first image, and output the second image to the display through a display.

[0006] The above one or more processors can execute the above at least one instruction individually or collectively, so that the above one or more processors obtain missing object information including information not included in the object information obtained from the current frame image among at least one object information stored in advance, and store the missing object information and the object information obtained from the current frame image as object information of the current frame image in the memory.

[0007] The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors can store the missing object information as object information associated with the current frame image based on the overlap between an object associated with the missing object information and another object.

[0008] The above one or more processors can execute the above at least one instruction individually or collectively, and based on the fact that the above one or more processors cannot identify an object associated with the above missing object information, the above missing object information can be stored as object information associated with the current frame image.

[0009] The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors can delete the missing object information from the memory based on the fact that the object associated with the missing object information leaves a preset area.

[0010] The above one or more processors execute the above at least one instruction individually or collectively, and if the number of object information obtained from the current frame is less than the number of object information stored, the above one or more processors can obtain the missing object information.

[0011] The above one or more processors can execute the above at least one instruction individually or collectively, so that the above one or more processors obtain information regarding the possibility of collision between the plurality of objects based on the plurality of objects included in the first image, and store object information related to the collision in the memory based on the information regarding the possibility of collision.

[0012] The above one or more processors can execute the above at least one instruction individually or collectively, so that the above one or more processors can obtain a plurality of expected positions in the next frame based on the current position of an object in the current frame and the position of each of the one or more objects identified in the previous frame for each of the plurality of objects, and obtain information related to the possibility of collision among the plurality of objects based on the plurality of expected positions and the expected position of another object in the current frame image.

[0013] The position of each of the above objects has a plurality of coordinate values ​​corresponding to the outer position of each object, and the one or more processors execute the at least one instruction individually or collectively, so that the one or more processors can obtain probability information inversely proportional to the average change in movement of each coordinate for each of the expected positions of each object.

[0014] The above one or more processors can execute the above at least one instruction individually or collectively to control the communication unit so that the above one or more processors transmit an object image corresponding to a face in the first image to a server, and obtain object information including at least one of person information and work information corresponding to the face from the server.

[0015] A control method for an electronic device executed individually or collectively by one or more processors according to one embodiment of the present disclosure comprises the steps of: acquiring at least one object image in a first image; acquiring object information associated with each of the at least one object image; generating a second image based on object information associated with a previous frame image and object information associated with each of the at least one object image acquired from the first image; and outputting the second image to the display through a display.

[0016] The control method comprises the steps of: acquiring missing object information that includes information not included in the object information acquired from the current frame image among at least one object information stored in advance; and storing the missing object information and the object information acquired from the current frame image as object information of the current frame image.

[0017] The step of storing the missing object information and object information obtained from the current frame image involves storing the missing object information as object information associated with the current frame image based on the fact that an object associated with the missing object information overlaps with another object.

[0018] The step of storing the missing object information and object information obtained from the current frame image is based on the inability to identify an object associated with the missing object information, wherein the missing object information is stored as object information associated with the current frame image.

[0019] A non-transient computer-readable recording medium storing instructions that, when executed by one or more processors according to one embodiment of the present disclosure, cause the one or more processors to perform the following individually or collectively, comprises the steps of: acquiring at least one object image in a first image; acquiring object information associated with each of the at least one object image; generating a second image based on object information associated with a previous frame image and object information associated with each of the at least one object image acquired from the first image; and outputting the second image to the display through a display.

[0020] The above-described or other aspects, features, and benefits of embodiments of the present disclosure will become more apparent from the following description with reference to the accompanying drawings. In the accompanying drawings:

[0021] FIG. 1 is an example of a view providing object information according to one embodiment of the present disclosure,

[0022] FIG. 2 is a block diagram illustrating the configuration of an electronic device according to one embodiment of the present disclosure,

[0023] FIG. 3 is a block diagram illustrating the configuration of an electronic device according to one embodiment of the present disclosure,

[0024] FIG. 4 is a drawing illustrating the process of acquiring an object and acquiring object information according to one embodiment of the present disclosure.

[0025] FIG. 5 is a drawing illustrating an example of the location of a detected object according to one embodiment of the present disclosure,

[0026] FIG. 6 is a drawing illustrating a method for calculating the probability of collision of an object according to one embodiment of the present disclosure,

[0027] FIG. 7 is a drawing illustrating a method for calculating the probability of collision of an object according to one embodiment of the present disclosure,

[0028] FIG. 8 is a drawing illustrating an example of a screen according to one embodiment of the present disclosure,

[0029] FIG. 9 is a drawing illustrating an example of operation according to one embodiment of the present disclosure, and,

[0030] FIG. 10 is a drawing illustrating a processor of an electronic device according to one embodiment of the present disclosure.

[0031] The embodiments described herein are subject to various modifications and may have various forms; specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the scope of specific embodiments and should be understood to include various modifications, equivalents, and / or alternatives of the embodiments of the present disclosure. In relation to the description of the drawings, similar reference numerals may be used for similar components.

[0032] In describing the present disclosure, if it is determined that a detailed description of related known functions or configurations could unnecessarily obscure the essence of the present disclosure, such detailed description is omitted.

[0033] Additionally, the following embodiments may be modified in various other forms, and the scope of the technical concept of the present disclosure is not limited to the following embodiments. Rather, these embodiments are provided to make the present disclosure more faithful and complete and to fully convey the technical concept of the present disclosure to those skilled in the art.

[0034] The terms used in this disclosure are used merely to describe specific embodiments and are not intended to limit the scope of the rights. The singular expression includes the plural expression unless the context clearly indicates otherwise.

[0035] In the present disclosure, expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of such features (e.g., numerical values, functions, actions, or components, etc.) and do not exclude the presence of additional features.

[0036] In the present disclosure, expressions such as “A or B,” “at least one of A or / and B,” or “one or more of A or / and B” may include all possible combinations of items listed together. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” may refer to cases including (1) at least one A, (2) at least one B, or (3) both at least one A and at least one B.

[0037] Expressions such as “first,” “second,” “first,” or “second” used in this disclosure may modify various components regardless of order and / or importance, and are used only to distinguish one component from another and do not limit said components.

[0038] Where it is stated that a component (e.g., a first component) is "(operatively or communicatively) coupled with / to" or "connected to" another component (e.g., a second component), it should be understood that the component may be directly connected to the other component or connected through the other component (e.g., a third component).

[0039] On the other hand, when it is stated that a certain component (e.g., a first component) is "directly connected" or "directly coupled" to another component (e.g., a second component), it may be understood that no other component (e.g., a third component) exists between said certain component and said other component.

[0040] As used in this disclosure, the expression “configured to” may be replaced, depending on the context, with, for example, “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of.” The term “configured to” does not imply “specifically designed to” in hardware.

[0041] Instead, in some situations, the expression “device configured to do something” may mean that the device is “capable of doing something” together with other devices or components. For example, the phrase “processor configured (or set) to perform A, B, and C” may mean a dedicated processor for performing those operations (e.g., an embedded processor), or a generic-purpose processor (e.g., a CPU or application processor) capable of performing those operations by executing one or more software programs stored in a memory device.

[0042] In the embodiments, a 'module' or 'part' performs at least one function or operation and may be implemented in hardware or software, or a combination of hardware and software. Additionally, a plurality of 'modules' or a plurality of 'parts' may be integrated into at least one module and implemented by at least one processor, except for the 'module' or 'part' that needs to be implemented in specific hardware.

[0043] Operations performed by a module, program, or other component according to various embodiments may be executed sequentially, in parallel, iteratively, or heuristically, or at least some operations may be executed in a different order, omitted, or other operations may be added.

[0044] Meanwhile, the various elements and areas in the drawings are depicted schematically. Accordingly, the technical concept of the present invention is not limited by the relative sizes or spacing depicted in the attached drawings.

[0045] Meanwhile, an electronic device according to various embodiments of the present disclosure may include, for example, at least one of a smartphone, a tablet PC, a desktop PC, a laptop PC, a server, or a wearable device. The wearable device may include at least one of an accessory type (e.g., a watch, ring, bracelet, anklet, necklace, glasses, contact lens, or head-mounted device (HMD)), a fabric or clothing integrated type (e.g., electronic clothing), a body-attached type (e.g., a skin pad or tattoo), or a bio-implantable circuit.

[0046] In some embodiments, the electronic device is, for example, a television, a DVD (digital video disk) player, audio, a refrigerator, an air conditioner, a vacuum cleaner, an oven, a microwave, a washing machine, an air purifier, a set-top box, a home automation control panel, a security control panel, a media box (e.g., Samsung HomeSync). TM , Apple TV TM, or Google TV TM ), game console (e.g., Xbox) TM PlayStation TM It may include at least one of an electronic dictionary, an electronic key, a camcorder, or an electronic photo frame. Meanwhile, among the electronic devices described above, a device having a display may be referred to as a display device. Meanwhile, even if the electronic device of the present disclosure does not have a display, it may be a set-top box or a PC that provides images to a display device.

[0047] Hereinafter, embodiments according to the present disclosure are described in detail with reference to the attached drawings so that those skilled in the art can easily implement them.

[0048] FIG. 1 is a drawing for explaining an object information providing operation according to one embodiment of the present disclosure.

[0049] Referring to FIG. 1, the electronic device (100) can display a first image (101) corresponding to content and a second image (103) containing object information in response to a user's request. Meanwhile, in the illustrated example, for ease of explanation, it is assumed that the electronic device directly displays the image, but in implementation, the electronic device may be a device that outputs an image without having a display.

[0050] Here, the first image (101) is an image corresponding to the content, and the second image (103) is an image including the first image (101) and an information display area (102) that displays information about objects within the image. In the illustrated example, the image (101) corresponding to the content and the information display area (102) are displayed together, but in implementation, the second image may display only object information. Also, the information display area may not display all acquired object information, but may display only the object information of some objects.

[0051] Here, the screen includes an image displayed on the display of the electronic device (100). The image may also be referred to by terms such as a frame. Various types of objects, such as icons, text, photos, videos, widgets, etc., may be displayed on the screen.

[0052] And an object is an identifiable entity that exists physically or can be conceived of abstractly. Such objects can be expressed as objects, targets, things, characters, people, figures, etc.

[0053] Furthermore, content refers to providing movies, music, plays, photographs, comics, animations, computer games, text, shapes, colors, sounds, movements, or images, or combinations of the aforementioned, through electronic devices. Among such content, regarding content such as movies, plays, and dramas, users may be interested in the characters within the content; in specific animations, they may be interested in the characters; and in documentaries, they may be interested in specific buildings, animals, etc.

[0054] For example, if the object were a person, the user might be curious about other works by the actor currently playing the role, or about the costumes the character is wearing.

[0055] Here, a character refers to a figure appearing in the relevant content, such as novels, plays, movies, etc. Here, the term "character" does not refer to a human being but can include anthropomorphic animals, mythical beings, and even inanimate objects within the content. In other words, in games, animations, etc., characters, animals, mascots, etc., that are not human can also be considered characters. Such characters may be referred to as protagonists, players, narrators, personas, or characters.

[0056] In an example implementation, if a user is curious about a character, they previously had to search for the character's name on a search site or look up information related to the content on a content provider server. However, the present disclosure can provide information about the object through object recognition.

[0057] Here, object information may include the name of the object, and if the object is a person, it may be referred to as character information. Within the content, character information may include role names, actual cast member names, or arbitrary identifiers, character information, actor information, etc. Additionally, object information may include work information. Here, work information may include the release date and time, cast member information, work title, writer, director information, etc.

[0058] Meanwhile, among object recognition methods, human identification can be performed by detecting a human face. For example, as shown in FIG. 1, a first object (10) and a second object (20) are present in the first image (101), and then only the second object (20) may be temporarily displayed as in the third image (104). For example, in an embodiment, when the first object (10) and the second object (20) are moving in a direction close to each other, if the second object (20) is positioned in the direction close to the screen and the first object (10) is positioned behind it, the first object (10) may not be displayed on the screen.

[0059] In the same or different embodiments, if a person corresponding to a specific object turns around or looks to the side or downward rather than directly at the front of the screen, that is, if only an area insufficient for face recognition is exposed in the image, object recognition for the first object (10) may not be performed in the image at that point in time. Accordingly, the fourth image may not display information about the first object (10).

[0060] However, the viewer is aware that the first object (10) is located within a specific space in the video, but due to situations such as being temporarily obscured by the next object or having its face obscured, the viewer is unable to empathize with the electronic device's processing result that the first object (10) is not present in the video.

[0061] To solve these problems, the electronic device of the present disclosure may provide a user with a fourth image (106) containing object information by using together object information identified in a third image (104) corresponding to the current frame and object information identified in a first image (101) corresponding to the previous frame.

[0062] For example, the possibility of overlap for each of the first object (10) and the second object (20) detected in the first image (101) can be checked, object information regarding the object with the possibility of overlap can be stored, and the previously stored object information can be used in the next frame. In the same or different embodiments, it is also possible to store all objects detected in the first image (101), and to check whether the previous object overlaps or is temporarily unrecognized in the third image (104) and to use the object information of the previous screen. The specific configuration and operation of such an electronic device will be described later in FIG. 2.

[0063] In addition, although the illustrated example shows the electronic device directly displaying the image, in implementation, the electronic device may be a device that performs only the image generation operation as described above and provides the generated image to a separate display.

[0064] As described above, the electronic device according to the present disclosure can provide various information about a specific object within a screen, and can also continue to provide information about the object even when the object is not temporarily displayed within the screen.

[0065] An example was described in which the movement paths of objects within an image overlap (or collision) occurs during the process of providing information about a specific object on the screen. However, in implementation, the electronic device can be applied even if, when the movement paths of objects overlap, it does not perform the operation of providing object information, but only performs the operation of storing or identifying the object information included in each image.

[0066] For example, in a scenario where time information regarding an actor named A's appearance in a video is stored, there may be cases where some appearance times (or scenes) are omitted because actor A is temporarily obscured by overlapping with another actor or because character recognition is not performed due to the actor looking in a direction other than the front. However, since the present disclosure allows for the identification of temporary omission times, it is possible to record that the actor is continuously present in the scene even in such cases.

[0067] Here, a scene (or scene) can be a basic unit of composition in dramas, films, literary works, etc. However, a scene can also be a shorter unit aligned with the point of change of a character within the screen, or a frame unit.

[0068] FIG. 2 is a block diagram illustrating the configuration of an electronic device according to one embodiment of the present disclosure.

[0069] Referring to FIG. 2, the electronic device (100) may include a communication unit (110), a memory (120), and a processor (130).

[0070] The communication unit (110) is a configuration that performs communication with various types of external devices according to various types of communication methods. The communication unit (110) may include a Wi-Fi module, a Bluetooth module, an infrared communication module, and a wireless communication module, etc. Here, each communication module may include at least one hardware chip or hardware circuit.

[0071] Wi-Fi modules and Bluetooth modules can perform communication via Wi-Fi and Bluetooth methods, respectively. When using a Wi-Fi module or a Bluetooth module, various connection information, such as SSID and session key, is transmitted and received first; after establishing a communication connection using this information, various types of information can be transmitted and received.

[0072] The infrared communication module performs communication according to infrared communication (IrDA, Infrared Data Association) technology, which uses infrared rays located between visible light and millimeter waves to wirelessly transmit data over short distances.

[0073] In addition to the communication method described above, the wireless communication module may include at least one communication chip that performs communication according to various wireless communication standards such as Zigbee, 3G (3rd Generation), 3GPP (3rd Generation Partnership Project), LTE (Long Term Evolution), LTE-A (LTE Advanced), 4G (4th Generation), and 5G (5th Generation).

[0074] In addition, the communication unit (110) may include at least one wired communication module that performs communication using a LAN (Local Area Network) module, an Ethernet module, a pair cable, a coaxial cable, a fiber optic cable, or a UWB (Ultra Wide-Band) module.

[0075] According to one example, the communication unit (110) may use the same communication module (e.g., Wi-Fi module) to communicate with external devices such as a remote control and an external server.

[0076] According to other examples, the communication unit (110) may use different communication modules (e.g., Wi-Fi modules) to communicate with external devices such as a remote control and external servers. For example, the communication unit (110) may use at least one of an Ethernet module or a Wi-Fi module to communicate with an external server, and may use a BT module to communicate with an external device such as a remote control. However, this is merely one embodiment, and the communication unit (110) may use at least one of various communication modules when communicating with multiple external devices or external servers.

[0077] The communication unit (110) can receive content. This content can be diverse, such as movies, music videos, dramas, short videos, etc. And while the content is assumed to be video, it may also be an image and may be referred to as video.

[0078] The communication unit (110) can receive information from a website or social media. For example, the communication unit (110) can provide a query (or search term, query, etc.) to a specific site under the control of the processor (130) and receive response information corresponding to the query. Here, the query may be a combination of keywords necessary to obtain search results for a person displayed in the content, such as "title and person" or "title and performer" corresponding to the specific content.

[0079] And the communication unit (110) can receive not only content but also information necessary for providing various applications and services of the electronic device (100) from an external device. For example, in cases where a database of people is not created directly, the communication unit (110) can obtain a database stored on an external server.

[0080] And when the communication unit (110) uses an external DB or external module for person search, it can transmit an image for the search and receive corresponding result information.

[0081] And the communication unit (110) can transmit the database generated in the process described later to an external server. In this way, by registering the database generated by one device to the server, other devices can use the database registered to the server.

[0082] The memory (120) may be implemented as internal memory such as ROM (e.g., EEPROM (electrically erasable programmable read-only memory)) or RAM included in the processor (130), or as memory separate from the processor (130). In this case, the memory (120) may be implemented in the form of memory embedded in the electronic device (100) or in the form of memory that can be attached to and detached from the electronic device (100), depending on the purpose of data storage. For example, data for operating the electronic device (100) may be stored in memory embedded in the electronic device (100), and data for the expansion function of the electronic device (100) may be stored in memory that can be attached to and detached from the electronic device (100).

[0083] The memory (120) can store a database created in the process described below. Although the above description assumes that the electronic device (100) directly creates and uses a database, it is also possible to receive and use a database created by an external device.

[0084] Additionally, the memory (120) can store object images or object information generated during the object recognition process.

[0085] And the memory (120) can store various contents (e.g., broadcast content, applications, etc.) received through the communication unit (110) described above.

[0086] Meanwhile, the memory embedded in the electronic device (100) is implemented as at least one of volatile memory (e.g., DRAM (dynamic RAM), SRAM (static RAM), or SDRAM (synchronous dynamic RAM), non-volatile memory (e.g., OTPROM (one time programmable ROM), PROM (programmable ROM), EPROM (erasable and programmable ROM), EEPROM (electrically erasable and programmable ROM), mask ROM, flash ROM, flash memory (e.g., NAND flash or NOR flash), hard drive, or solid state drive (SSD), and the memory that can be attached to and detached from the electronic device (100) can be implemented in the form of a memory card (e.g., CF (compact flash), SD (secure digital), Micro-SD (micro secure digital), Mini-SD (mini secure digital), xD (extreme digital), MMC (multi-media card), etc.), external memory that can be connected to a USB port (e.g., USB memory).

[0087] Meanwhile, in the illustrated example, the electronic device (100) is described as including a single memory, but when distinguishing between volatile memory and non-volatile memory, the electronic device (100) may be described as including multiple memories.

[0088] The processor (130) controls the overall operation of the electronic device (100). Specifically, the processor (130) is connected to the configuration of the electronic device including a communication unit (110) and a memory (120), and can control the overall operation of the electronic device by executing at least one instruction stored in the memory (120) as described above. In particular, the processor (130) can be implemented as a single processor as well as as a plurality of processors.

[0089] The processor (130) may be implemented as one or more IC (integrated circuit (or circuitry)) chips and may perform various data processing operations. The processor (130) may include at least one electrical circuit and may process instructions (or programs, data, etc.) stored in memory individually or collectively in a distributed manner.

[0090] The processor (130) may include a processor assembly comprising one or more processing circuits. The processor (130) may include any processing circuit that is operative to control the performance and operation of one or more components of an electronic device (e.g., memory and / or driving device (sensor)). For example, the processor (130) (e.g., AP) may be implemented as a system on chip (SoC) (e.g., a single chip or a chipset). For example, the processor (130) may be implemented as multiple cores (or at least one core circuit), multiple chips, or multiple chipsets.

[0091] For example, the processor (130) may include one or more processing circuits. The processor (130) may include one or more processing circuits configured to perform various functions of the present disclosure individually and / or collectively. As an example without limitation, at least a portion of the processor (130) may be included in a first chip of the electronic device (100), and at least another portion of the processor (130) may be included in a second chip of an electronic device different from the first chip of the electronic device (100).

[0092] For example, the processor (130) may include a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), an image signal processor (ISP), a display controller, a memory controller, a storage controller, a communication processor (CP), and / or a sensor interface. These components of the processor (130) are merely exemplary. The processor (130) may include additional components other than those described above. Additionally, some components of the processor (130) may be omitted. Furthermore, some components of the processor (130) may be included as separate components of the electronic device (100) outside of the processor (130). For example, some components of the processor (130) (e.g., a memory controller) may be included within other components (e.g., at least a portion of memory, an interface (e.g., available for connection to at least one component of the electronic device (100)), a display).

[0093] The processor (130) can cause other components of the electronic device (100) to perform various operations by executing instructions stored in memory (120). The processor (130) processes setting values, function commands, etc. according to a stored control program or control data, and can output control signals related to functions that the electronic device (100) can perform or communication signals for communicating with an external electronic device.

[0094] The processor (130) acquires content using the communication unit (110). For example, when the processor (130) receives a playback command for specific content from a user, it can control the communication unit (110) to receive the content. Such content may be a video such as VOD, but may also be a real-time streaming video provided by a specific server, or broadcast content transmitted by a broadcasting station.

[0095] The processor (130) generates a first image corresponding to the acquired content. For example, if the content is real-time streaming, the processor (130) can construct a screen using video data from the received streaming data and generate a first image using the screen.

[0096] The processor (130) acquires at least one object image from the first image. For example, the processor (130) can use an object recognition model to recognize objects of a preset type within the image and acquire coordinate information of the objects. Here, the objects of a preset type may be human faces, but are not limited thereto. Once such coordinate information is acquired, the processor (130) can extract a region corresponding to the coordinates to acquire an object image.

[0097] Meanwhile, although the illustrated example describes generating an object image corresponding to each object, coordinate values ​​may be used during implementation, and it is also possible to use an image in which only the object image remains after blanking out the area outside the region corresponding to the object in the image. Here, blanking involves covering the area outside the aforementioned region with black (or white) or removing the area outside the aforementioned region so that pixel information, etc., is not present.

[0098] Meanwhile, the processor (130) may acquire an object image upon a user's request, or automatically acquire an object image during implementation. That is, it may acquire an object image and acquire object information in advance, and provide the object information acquired in advance in response to a user's request for object information.

[0099] The processor (130) obtains object information corresponding to each of at least one object image based on at least one object image. For example, the processor (130) controls the communication unit (110) to transmit the obtained object image to an external server and can obtain information corresponding to the object image from the external server. For example, if the object described above is a human face, the processor (130) can obtain information such as the name of the person and works in which the person appeared.

[0100] When implementing, the processor (130) may obtain object information using multiple servers. For example, the processor (130) may control the communication unit (110) to transmit an object image to a first server and obtain the name (or identifier) ​​of the object image from the first server.

[0101] And the processor (130) controls the communication unit (110) to transmit the acquired name (or identifier) ​​to the second server and may also acquire additional information about the object from the second server. In implementation, the operation of acquiring one of the two operations described above (i.e., the name or identifier) ​​may be performed independently by the electronic device (100).

[0102] At this time, the processor (130) can store the acquired object information. At this time, the processor (130) may store the object information of all acquired objects, or may store only the object information that is likely to collide. That is, it may identify only objects that are likely not to be recognized in the next frame, and the processor (130) may store only the object information that is likely not to be recognized. Although it has been described here as storing only object information, in implementation, the processor (130) may also store information such as object images, object coordinates, and frame information for the corresponding objects.

[0103] The processor (130) can obtain the possibility of collision between each object when multiple objects are identified in the first image. For example, for each of the multiple objects, the processor (130) can obtain multiple expected positions in the next frame based on the current object's position and each of the multiple object positions obtained in the previous frame, and can determine the possibility of collision between the multiple objects based on the obtained multiple expected positions and the expected positions of other objects in the current frame.

[0104] Such collision probability information can be used that is inversely proportional to the amount of movement change of each object. And the processor (130) can calculate the amount of movement change of each object as the average amount of movement change of each of the multiple coordinate values ​​of each object.

[0105] In an embodiment, the processor (130) generates a second image containing object information when a preset event occurs. For example, the processor (130) may generate a second image containing object information within the current screen when a preset event occurs. Here, the preset event may be a user command requesting information about objects within the currently displayed image.

[0106] At this time, the processor (130) generates a second image containing at least one object information. For example, if the number of object information obtained in the current frame is equal to or greater than the number of object information stored, the processor (130) may generate the second image using only the object information obtained in the current frame.

[0107] Meanwhile, during implementation, the processor (130) checks whether object information obtained in the previous frame is confirmed in the object recognition of the next frame without using a count, and if there is no object information confirmed in the previous frame, or if all objects confirmed in the previous frame are confirmed in this frame as well, it can generate a second image including object information confirmed in this frame.

[0108] Meanwhile, the processor (130) can acquire missing object information when the number of object information acquired in the current frame is less than the number of stored object information. For example, if the processor (130) stores all object information of the previous frame, it can acquire (or identify) missing object information that is not included in the object information acquired in the current frame image among at least one of the previously stored object information. In the same or different embodiment, if the processor (130) stores only object information that is likely to collide among the objects of the previous frame, it can acquire only whether the object information is included in the object information acquired in the current frame image.

[0109] If there is missing object information, the processor (130) can generate a second image using the identified missing object information and the object information identified in the current frame.

[0110] At this time, the processor (130) can distinguish whether the missing object is overlapping with another object, whether image recognition failed, or whether the missing object has moved outside the screen during the process of determining the missing object information.

[0111] If the processor (130) determines that the object is missing due to overlap with another object or image recognition failure, it may be added to the second image as a missing object as described above; however, if it determines that the object has moved out of a preset area (e.g., screen), it may not be added to the second image and the stored object information may be deleted.

[0112] Additionally, the processor (130) can generate content corresponding to an object selected by the user. For example, if the user selects performer A, the processor (130) can generate content about the person by using time information corresponding to the person, and by extracting and merging videos within that time information within the content. For example, if the user requests past scenes of the current batter while watching a baseball game, the processor (130) can check all segments where the batter appeared and generate a video by combining those segments.

[0113] As described above, the electronic device according to the present disclosure can provide information about an object included in an image to a user, and can continuously provide information about the object if a specific object is temporarily invisible due to overlapping with another object or if there is a temporary failure to recognize the specific object.

[0114] Meanwhile, although only a simple configuration constituting the electronic device (100) has been illustrated and described above, various additional configurations may be provided during implementation. This will be explained below with reference to FIG. 3.

[0115] FIG. 3 is a block diagram illustrating the configuration of an electronic device according to one embodiment of the present disclosure.

[0116] Referring to FIG. 3, the electronic device (100') may include a communication unit (110), memory (120), processor (130), input / output interface (140), microphone (150), display (160), and speaker (170).

[0117] The configuration of the communication unit (110), memory (120), and processor (130) was previously described in FIG. 2, and only the operation different from FIG. 2 will be described below.

[0118] The input / output interface (140) may be any one of the following interfaces: HDMI (High Definition Multimedia Interface), MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP (Display Port), Thunderbolt, VGA (Video Graphics Array) port, RGB port, D-SUB (D-subminiature), and DVI (Digital Visual Interface).

[0119] The input / output interface (140) can input and output at least one of audio and video signals. Depending on the implementation example, the input / output interface (140) may include separate ports for inputting and outputting only audio signals and for inputting and outputting only video signals, or it may be implemented as a single port for inputting and outputting both audio and video signals.

[0120] And the input / output interface (140) can provide a video signal corresponding to an image (or screen) generated by the electronic device (100') or an audio signal together with the video signal to an external device (e.g., a display device, an STB, etc.).

[0121] The microphone (150) can receive the user's voice when active. For example, the microphone (150) may be formed integrally on the upper side, front side, or side side of the electronic device (100'). The microphone (150) may include various configurations such as a microphone for collecting analog user voice, an amplifier circuit for amplifying the collected user voice, an A / D conversion circuit for sampling the amplified user voice and converting it into a digital signal, and a filter circuit for removing noise components from the converted digital signal.

[0122] When a user's voice is input through such a microphone (150), the processor (130) can check the content of the user's voice and perform an action corresponding to the content of the voice. For example, the content of the voice may be a request for information about a person output in the content or a request for information about a person currently appearing on the screen.

[0123] Meanwhile, although it has been described above that user voice is input through the microphone (150), the microphone may be provided in a remote control for controlling the electronic device (100'), and user voice input through the microphone provided in the remote control may be input to the electronic device (100') and processed through the communication unit (110) described above.

[0124] The electronic device (100') can operate not only based on the configuration or remote control provided in the electronic device (100'), but also according to control commands from a user terminal device. For example, if the electronic device is a TV or a set-top box, recently, manufacturers provide applications for controlling the TV or set-top box. Such applications can provide a function that allows the user terminal device to be used as a remote control for the electronic device.

[0125] Accordingly, when a user executes an application to control a TV or set-top box using a user terminal device and inputs a voice command through the user terminal device, the electronic device (100') can perform a voice recognition operation and a corresponding voice recognition result using the voice signal input through the user terminal device.

[0126] The display (160) can be implemented as various types of displays such as an LCD (Liquid Crystal Display), an OLED (Organic Light Emitting Diodes) display, and a PDP (Plasma Display Panel). The display (160) may also include a driving circuit, a backlight unit, etc., which can be implemented in forms such as an a-si TFT, an LTPS (low temperature poly silicon) TFT, and an OTFT (organic TFT). Meanwhile, the display (160) can be implemented as a touchscreen combined with a touch sensor, a flexible display, a 3D display, etc.

[0127] The display (160) can display various images. For example, the display (160) can display a first image or a second image generated by the processor (130).

[0128] The speaker (170) can output sound. Specifically, the speaker (170) may be a component that outputs various audio data processed at the input / output interface, as well as various notification sounds or voice messages. Additionally, the speaker (170) may output result information (e.g., person information) corresponding to the voice recognition operation described later.

[0129] Meanwhile, in FIG. 3, the electronic device (100) is illustrated and described as including a display (160). However, in the embodiment, if the electronic device (100') is a device such as a set-top box that does not include a display, the display configuration may be omitted. Also, depending on the implementation form, the speaker and microphone described above may also be omitted. Additionally, although not illustrated in FIG. 3, other components (e.g., camera, human body detection sensor) may be further included.

[0130] FIG. 4 is a drawing for explaining a method for acquiring an object and acquiring object information according to one embodiment of the present disclosure.

[0131] Referring to FIG. 4, the electronic device can acquire an image and output an image corresponding to the acquired image to be displayed on a display (operation 405). For example, if the electronic device includes a display, the image can be displayed directly using the display.

[0132] Then, a pre-set object within the image can be detected using an object recognition model (Operation 410). Based on this verification operation, if the object recognition model detects a pre-set object, it can output the coordinate values ​​of the object. At this time, the object recognition model may output the center coordinates of each object, or may output coordinate values ​​corresponding to an area (e.g., a rectangle) that includes the shape (e.g., a face) of the object.

[0133] Upon acquiring such coordinate values, the electronic device can use those coordinates to obtain an image of a specific object (i.e., an object image). Once the object image is acquired, the electronic device transmits the object image to an external server and obtains information regarding the object image from that external server. Here, the object information may be the object name; if the object is a physical object, it may include information such as location, nationality, and manufacturer; and if the object is a person, it may include information such as works in which they have appeared.

[0134] Meanwhile, when implementing, object information may be obtained using multiple servers rather than a single server. For example, the name of the object may be obtained from a first server, and additional information (e.g., works in which it appeared) may be obtained based on the name obtained from a second server.

[0135] Meanwhile, although the above description illustrates the transmission of only image information, in implementation, the electronic device may also transmit information related to the current image (e.g., the image title or information within the metadata) to the server.

[0136] And the electronic device can check whether there is a history of object analysis performed prior to the current screen (operation 415). That is, it can check whether the current video is the first scene (or scene, frame) of the content.

[0137] If there is no previous screen, the electronic device (100) can store all object information detected in the current scene (operation 420). Conversely, if there is a previous scene, the electronic device (100) can compare the number of object information in the previous scene with the number of object information in the current scene (operation 425).

[0138] Here, the previous scene can be the frame immediately preceding the current scene, or it can be a frame from a predetermined number of previous frames. This relationship between the previous frame and the current frame can be utilized differently depending on the number of frames in the video, the complexity of the video, etc.

[0139] Meanwhile, although FIG. 4 illustrates and describes that all object information is stored when there is no previous scene, it is also possible to store all object information by default during implementation.

[0140] If the number of objects does not change or actually increases, the probability of collision between objects in the current scene can be checked (Operation 430). Then, information about objects in the current scene whose probability of collision is greater than or equal to a preset probability can be stored (Operation 435).

[0141] If the number of objects decreases, that is, if a collision occurs between objects or if they go off-screen (or if a specific object within the screen moves out of the screen), the electronic device checks for a collision (operation 445), and if a collision occurs, it may provide object information to the user by adding previously stored object information to the currently checked object information (operation 455).

[0142] If the object has moved outside (or moved off-screen) rather than colliding, the action of deleting previously saved object information can be performed (Action 450).

[0143] Meanwhile, in the illustrated example, an embodiment was described in which object duplication in the next scene is predicted based on the current scene (or frame), and object information regarding objects predicted to be duplicated is stored and utilized. This operation has the advantage of being operable even with limited storage space, as it stores and utilizes only a minimum amount of object information.

[0144] However, considering that it may be difficult to apply in environments other than duplicates, it is also possible to store all acquired object information, check in the next frame for object duplication, whether an object has leaked externally, or if object recognition was temporarily not performed to identify missing objects, and select and use only the missing objects.

[0145] FIG. 5 is a drawing illustrating an example of the location of a detected object according to one embodiment of the present disclosure.

[0146] Referring to FIG. 5, the image (500) may include a plurality of objects (A, B, C).

[0147] The electronic device can input an image (500) into an object recognition model to check (or detect) whether a preset object is included in the image. In this case, the object recognition model can check for the existence of a specific object within the image and, if the specific object exists, output the location (or coordinate value) of the object. This specific object may be a person or a person's face.

[0148] For example, if there are three faces in the image (500), the object recognition model can output four coordinate values ​​(a1, a2, a3, a4) for the first object (510), four coordinate values ​​(b1, b2, b3, b4) for the second object (520), and four coordinate values ​​(c1, c2, c3, c4) for the third object (530).

[0149] Such coordinates can have coordinate values ​​corresponding to an area that can include the entire human face. In the illustrated example, four coordinates were used for one area, but more than four coordinates may be used in implementation.

[0150] When the object recognition model outputs such coordinates, the electronic device (100) can extract an image for each coordinate and transmit the extracted object image to an external server. It can also obtain information about each object image from the external server.

[0151] Meanwhile, although FIG. 5 illustrates and describes generating four coordinates for each object, shapes other than rectangles or five or more coordinates may be used during implementation. Additionally, the objects that the object recognition model can recognize may include not only people but also animals or objects.

[0152] Such object recognition models may utilize different models depending on the genre of the video. For example, a model capable of recognizing human faces as described above may be used for general movies or dramas, a model capable of recognizing characters may be used for comics or animations, and a model capable of recognizing animals may be used for documentaries, etc.

[0153] The following describes a specific operation for calculating the probability of collision between these three objects after they have been recognized.

[0154] FIG. 6 is a diagram illustrating a method for calculating the probability of collision of an object according to an embodiment of the present disclosure. Specifically, the object recognition result from a previous step or operation and the object recognition result from the current step or operation are superimposed and displayed.

[0155] Referring to FIG. 6, the image (600) may include a plurality of objects (A', B', C').

[0156] For example, if there are three faces in the image (600), the object recognition model can output four coordinate values ​​(a1, a2, a3, a4) for the first object (610), four coordinate values ​​(b1, b2, b3, b4) for the second object (620), and four coordinate values ​​(c1', c2', c3', c4') for the third object (630).

[0157] Here, "'" is not marked on the coordinate values ​​of the first object and the second object respectively, but this is only to clarify that the first object and the second object are in the same position when compared with FIG. 5, and the first object (610) can be represented by four coordinate values ​​(a1', a2', a3', a4'), and the second object (620) can be represented by four coordinate values ​​(b1', b2', b3', b4').

[0158] In Fig. 5, multiple objects (A, B, C) were identified, and in Fig. 6, multiple objects (A', B', C') were identified. That is, since the number of objects in both images is the same, it can be confirmed that the electronic device does not experience overlap (or collision) between objects in its current state.

[0159] In the illustrated example, only a few objects were shown, but in implementation, dozens of objects may be recognized. In such cases, storing information about all objects can result in significant resource consumption.

[0160] In other words, in situations where memory space is insufficient, it may be advantageous to store only object information that can be utilized in the next frame, rather than all object information.

[0161] Accordingly, the present disclosure predicts the position of each object identified on the current screen in the next screen, predicts the possibility of collision between other objects based on the predicted position, and stores object information only when the possibility of collision is greater than a certain level.

[0162] The following describes a specific method for calculating the probability of collision.

[0163] The calculation of collision probability according to the present disclosure can be performed under two conditions. One is a case where the movement of each object can be tracked, and the other is a case where it cannot. That is, in FIGS. 5 and 6, the case where object A in FIGS. 5 can be recognized as A' in FIGS. 6 is the case where tracking is possible, and the latter case is a case where three objects are recognized in FIGS. 5 and three objects are recognized in FIGS. 6, but the relationship between the three objects in FIGS. 5 and the three objects in FIGS. 6 is unknown. Below, the operation in the case where tracking is possible will be explained first.

[0164] Referring to FIG. 6, it can be seen that among the three objects (A, B, C), only object C (630) moved in a certain direction. Therefore, it can be predicted that object C (630) will move in the same direction as its current direction of movement in the next frame, and that the other two objects will be in the same location.

[0165] In such cases, the electronic device can determine the possibility of collision between objects by considering the expected positions of the three objects in the next frame. For example, in FIG. 6, it can be determined that there is a possibility of future collision between C (not shown) and A (not shown).

[0166] If it is predicted that an overlap between objects A and C will occur in this way, the electronic device can store information about the first object and information about the third object acquired from the current image in memory.

[0167] Meanwhile, the following describes cases where each object cannot be tracked.

[0168] The object recognition results in step 5 of Fig. 5 are output as shown in Fig. 5 for User 1 (a1, a2, a3, a4), User 2 (b1, b2, b3, b4), and User 3 (c1, c2, c3, c4), and similarly for User 1 (a1, a2, a3, a4), User 2 (b1, b2, b3, b4), and User 3 (c1', c2', c3', c4') as shown in Fig. 6.

[0169] Since three people were recognized in the previous step or operation and three people are recognized in the current step or operation as well, the electronic device simply determines that no separate overlap (or collision) has occurred. That is, in cases where the location is not tracked, it can be assumed that the existing A has moved to C', or that B has moved to C', and various combinations can be assumed.

[0170] Considering this situation, the expected coordinates where object A might exist on the next screen are as follows.

[0171] (1st predicted coordinates)

[0172] a1" x = a1' x + ( a1' x - a1 x ) , a1" y = a1' y + ( a1' y - a1 y )

[0173] a2" x = a2' x + ( a2' x - a2 x ) , a2" y = a2' y + ( a2' y - a2 y )

[0174] a3" x = a3' x + ( a3' x - a3x ), a3" y = a3' y + ( a3' y - a3 y )

[0175] a4" x = a4' x + ( a4' x - a4 x ), a4" y = a4' y + ( a4' y - a4 y )

[0176] (제2 예상 좌표)

[0177] a1" x = b1' x + ( b1' x - a1 x ) , a1" y = b1' y + ( b1' y - a1 y )

[0178] a2" x = b2' x + ( b2' x - a2 x ) , a2" y = b2' y + ( b2' y - a2 y )

[0179] a3" x = b3' x + ( b3' x - a3 x ), a3" y = b3' y + ( b3' y - a3 y )

[0180] a4" x = b4' x + ( b4' x - a4x ) , a4" y = b4' y + ( b4' y - a4 y )

[0181] (3rd predicted coordinates)

[0182] a1" x = c1' x + (c1' x - a1 x ) , a1" y = c1' y + (c1' y - a1 y )

[0183] a2" x = c2' x + (c2' x - a2 x ) , a2" y = c2' y + (c2' y - a2 y )

[0184] a3" x = c3' x + (c3' x - a3 x ) , a3" y = c3' y + (c3' y - a3 y )

[0185] a4" x = c4' x + (c4' x - a4 x ) , a4" y = c4' y + (c4' y - a4 y )

[0186] Although only the predicted coordinates for object A have been described above, the same operation can be performed for objects B and C. Accordingly, the predicted positions of each object are as shown in Fig. 7.

[0187] FIG. 7 is a drawing for explaining a method for calculating the collision probability of an object according to one embodiment of the present disclosure.

[0188] Referring to FIG. 7, the image (700) may include a plurality of objects (A', B', C'). Here, the image (700) is identical to the image of FIG. 6, and the image (700) displays the expected positions of the currently identified objects (A', B', C') in the next frame together based on the positions of the plurality of objects (A, B, C) in the previous image.

[0189] As previously explained, the operation in FIG. 7 assumes a case where the position of each object cannot be accurately tracked. Therefore, the predicted position of each object is not expressed as A", B", C", but rather as (A'-A), (A'-B), etc. Meanwhile, in FIG. 7, only the predicted positions of objects A and B are shown to facilitate explanation, and the predicted position of object C is not shown.

[0190] Specifically, when using three coordinate information, a total of nine predicted coordinates should be produced, but as explained, in Fig. 7, only the predicted positions for two coordinates are displayed together.

[0191] Among these predicted locations, if we exclude cases where they go outside the screen (e.g., 720, 730, 740, 750), only four predicted coordinates (610, 620, 630, 710) remain as illustrated.

[0192] Once the predicted coordinates are identified in this manner, the electronic device can calculate a probability value for each. For example, the electronic device can calculate the probability value based on the amount of movement between the current coordinates and the predicted coordinates. For instance, if the amount of movement is very small, such as with A and A', or if they are at the same coordinates, the probability that the object will remain at that location in the next frame will be very high.

[0193] In the illustrated example, if the average change in coordinates of a1, a2, a3, and a4 is 0.25 and the average change in coordinates of C' from A is 200, then since the difference in coordinate change is 1 / 800, the probability of the object being at the predicted coordinates predicted through A and A' is 800 times the probability of the object being at the predicted coordinates predicted through A and C'. This can be expressed mathematically as follows.

[0194] [Mathematical Formula 1]

[0195] distance(A'-A) : If distance(C'-A) = i : l, prediction(A'-A) : prediction(C'-A)= 1 / i :1 / l

[0196] Additionally, assuming the average difference between B' and A is 160, the change in A' is 1 / 640, and the probability of the object being located at predicted coordinate 1, which is the coordinate of the object predicted through A and A', is 640 times the probability of the object being located at predicted coordinate 2, which is the coordinate of the object predicted through A and B'.

[0197] This can be expressed mathematically as follows.

[0198] [Mathematical Formula 2]

[0199] distance(A'-A) : distance(C'-A) : distance(B'-A) ​​= i : l : m, then prediction(A'-A) : prediction(C'-A) : prediction(B'-A) ​​= 1 / i :1 / l : 1 / m = 800*640 : 640 : 800

[0200] At this time, if A suggests that the location in the next scene is one of the three points predicted through A', B', and C', the following probability can be calculated.

[0201] Probability of being in the predicted position via A'-A: ((800*640) / ((800*640) + 800 + 640))*100 = 99.71%

[0202] Probability of being in the predicted position via C'-A: ((640) / ((800*640) + 800 + 640))*100 = 0.12%

[0203] Probability of being in the predicted position via A'-A: ((800) / ((800*640) + 800 + 640))*100 = 0.15%

[0204] If the object location probability is calculated based on C in the same way, it can be as follows.

[0205] For example, if A'-C : 280, C'-C = 80, and B'-C = 200,

[0206] [Mathematical Formula 3]

[0207] distance(A'-C) : distance(C'-C) : distance(B'-C) = i : l : m, then prediction(A'-C) : prediction(C'-C) : prediction(B'-C) = 1 / i :1 / l : 1 / m = 1 / 7 : 1 / 2:1 / 5 = 10 : 35 : 14

[0208] Probability of being in the predicted position via A'-C: ((10) / (10+35+14))*100 = 16.94%

[0209] Probability of being in the predicted position via C'-C: ((14) / (10+35+14))*100 = 23.72%

[0210] Probability of being in the predicted position via A'-C: ((35) / (10+35+14))*100 = 59.32%

[0211] Considering these predicted positions, the predicted position is the location where each object has moved by the same amount as the position difference of the previously recognized object.

[0212] At this time, referring to Figure 7, it can be seen that there is a possibility that the predicted positions of C'-C and A'-A may collide with each other. Since the probability of C'-C is 59.32% and the probability of A'-A is 99.72%, the probability of collision can be calculated as 59.32 * 99.72 / 100 = 59.14%.

[0213] For cases where the electronic device has a preset probability value (e.g., 50%) or higher, it may view the object corresponding to that case as a potential collision object and store object information about that object.

[0214] Meanwhile, the above description illustrates the calculation of the possibility of collision by considering only the amount of movement or direction of movement of an object within the screen. However, such movement may include not only movement along the x and y coordinates within the screen, but also movement along the axis. For example, if a person is rotating in the same spot, the recognition area of ​​the user's face within the screen may gradually decrease. Accordingly, the electronic device (100) may check for changes in the recognized face area, changes in the direction of the face, etc., identify objects that will not be recognized temporarily, and store information about the object.

[0215] In addition, while FIGS. 5 to 7 illustrate and explain the calculation of the expected position of an object moving within a fixed space, the above-described calculation method may also be modified and used in cases where the object is fixed but moves due to the camera's zoom in, zoom out, or movement.

[0216] In addition, although FIGS. 5 to 7 describe a method of storing information about objects with potential for collision by predicting the position in the next frame at the current stage—that is, a method of calculating and utilizing the probability of collision in the next frame—it is also possible to identify and utilize information about objects colliding at the current stage by utilizing information related to the two previous frames. In other words, it is also possible to identify and utilize objects colliding in the current frame without predicting the probability of collision in the next frame.

[0217] FIG. 8 is a drawing for illustrating an example of a second image according to one embodiment of the present disclosure.

[0218] Referring to FIG. 8, the user interface window (800) can display a second image. This second image may include a content area (810), a person information area (820), and a time information area (830).

[0219] The content area (810) is an area where a first video corresponding to the content currently being played is displayed. This content area may also be referred to as a video area, screen area, video area, etc.

[0220] The person information area (820) is an area that displays information about an object identified through an object recognition process. For example, as illustrated, the person information area (820) may display the name of a specific person within the screen and information about works in which that person appeared. In addition to the other works mentioned above, interview videos, etc., may also be displayed. This person information area may also be referred to as an additional area, an information provision area, etc.

[0221] If the user selects another displayed work or content, the electronic device may acquire and display a website or content corresponding to the work or content selected by the user.

[0222] Meanwhile, if a pre-configured object is a landmark, the address or country information where the landmark is located may be the object information.

[0223] The time information area (830) can display the time period in which the identified object appears within the content. The user can select a specific area of ​​the time information to search for various areas in which the object appears.

[0224] Meanwhile, in the illustrated example, time information was displayed in a form that visually displays the area corresponding to the position where the object appears in the area (or timeline) corresponding to the content, but in implementation, it is also possible to display it simply in a list form, or to display highlights (or key scenes) by section in a thumbnail form.

[0225] In the present disclosure, it is possible to determine that a specific person is present on the screen even when that person is temporarily obscured, thereby accurately indicating the time interval during which the person appears.

[0226] FIG. 9 is a drawing for explaining an example of operation according to one embodiment of the present disclosure.

[0227] Referring to FIG. 9, the electronic device (100) can display a second image (930). This second image (930) may include a content area (910) and a person information area (920).

[0228] The content area (910) is an area where a video corresponding to the content currently being played is displayed. For example, in the illustrated example, the content area (910) may be a video featuring multiple people (10, 20).

[0229] The person information area (920) is an area that displays information about an object identified through an object recognition process. If person information is provided only with the face recognition results within the current frame as in the past, FIG. 9 will display object information about the first object (10) in the person information. For example, as illustrated in FIG. 9, if the user briefly turns their face or looks downward, face recognition cannot be performed.

[0230] However, from the viewer's perspective, knowing that there are two people on the screen, they might find it strange to display information for only one.

[0231] In the present disclosure, as described above, object information of a previous step or previous operation is stored. For example, in a previous frame, both the first object (10) and the second object (20) are recognized and information for both objects is stored. Even if only the first object (10) is recognized in the current frame, the person information area (920) can also provide object information of the second object (20) by including it in the person information (920).

[0232] Accordingly, it can be seen that the person information area (920) in the present disclosure displays the object information of the first object (10) and the object information of the second object (20) together.

[0233] Meanwhile, if the first screen displayed is as in FIG. 9, that is, if the face of the second object (20) was not exposed in the previous frame, the person information area (920) can display only the information of the first object (10). In this case, general viewers do not know who the second object (20) is, and object information about the second object (20) is not stored in memory.

[0234] FIG. 10 is a flowchart illustrating a method for controlling an electronic device according to one embodiment of the present disclosure.

[0235] Referring to FIG. 10, at least one object image is obtained from the first image (operation 1010). For example, an object recognition model can be used to recognize objects of a preset type within the image and obtain coordinate information of the objects. Here, the objects of a preset type may be human faces, but are not limited thereto.

[0236] When such coordinate information is obtained, the electronic device can obtain an object image by extracting a region corresponding to the coordinates. Meanwhile, during implementation, the extracted image can be used as is, or image processing such as changing the resolution can be performed.

[0237] Then, object information corresponding to at least one object image is obtained (operation 1020). For example, the electronic device (100) can transmit the obtained object image to an external server and obtain information corresponding to the object image from the external server. For example, if the object described above is a human face, information such as the name of the person and works in which the person appeared can be obtained.

[0238] At this time, the electronic device can store the acquired object information. The electronic device may store the object information of all acquired objects, or it may store only the object information of objects that have a possibility of collision. That is, it may identify only objects that are likely not to be recognized in the next frame and store only the object information for them.

[0239] To this end, the electronic device (100) can obtain the possibility of collision between each object when multiple objects are identified in the first image. For example, for each of the multiple objects, multiple expected positions in the next frame are obtained based on the current object's position and each of the multiple object positions identified in the previous frame, and the possibility of collision between the multiple objects is determined based on the obtained multiple expected positions and the expected positions of other objects in the current frame.

[0240] Such collision probability can utilize probability information inversely proportional to the change in movement of each object. Furthermore, the change in movement of each object can be calculated as the average change in movement of each of the object's multiple coordinate values.

[0241] A second image containing at least one object information is generated (operation 1030). For example, if the number of object information obtained in the current frame is equal to or greater than the number of object information stored, the second image can be generated using only the object information obtained in the current frame.

[0242] Meanwhile, during implementation, the number is not used, and it is possible to check whether object information identified and stored in the previous frame is identified in the object recognition of the next frame. In addition, if there is no object information identified in the previous frame, or if all objects identified in the previous frame are identified in this frame as well, a second image containing object information identified in this frame can be generated.

[0243] Meanwhile, if the number of object information acquired in the current frame is smaller than the number of stored object information, missing object information can be acquired. For example, if all object information from the previous frame is stored, missing object information that is not included in the object information acquired from the current frame image among at least one piece of previously stored object information can be acquired. In the same or different embodiments, if only object information with a potential for collision among the objects of the previous frame is stored, it is possible to identify only whether such object information is included in the object information acquired from the current frame image.

[0244] If there is missing object information, a second image can be generated using the acquired missing object information and the object information identified in the current frame.

[0245] At this time, during the process of determining missing object information, it is possible to distinguish whether the missing object is missing due to overlap with other objects, due to image recognition failure, or due to moving outside the screen.

[0246] And if it is due to overlap with other objects or a failure in image recognition, it can be added to the second image as a missing object, as previously explained. Conversely, if the object moves outside a preset area (e.g., the screen), it is not added to the second image, and the stored object information can also be deleted.

[0247] As described above, the control method according to the present disclosure can provide information about an object included in an image to a user, and can continuously provide information about the object even in cases where a specific object is temporarily invisible due to overlapping with another object or in cases of temporary failure to recognize a specific object.

[0248] Meanwhile, methods according to at least some of the various embodiments of the present disclosure described above may be implemented in the form of an application that can be installed on an existing electronic device.

[0249] In addition, methods according to at least some of the various embodiments of the present disclosure described above may be implemented by software upgrades or hardware upgrades alone for existing electronic devices.

[0250] In addition, methods according to at least some of the various embodiments of the present disclosure described above may also be performed through an embedded server equipped in an electronic device, or through at least one external server among the electronic devices.

[0251] Meanwhile, according to one embodiment of the present disclosure, the various embodiments described above may be implemented as software containing instructions stored on a machine-readable storage medium (e.g., a computer). The machine may include an electronic device (e.g., electronic device (A)) according to the disclosed embodiments, which is a device capable of calling instructions stored from the storage medium and operating according to the called instructions. When instructions are executed by a processor, the processor may perform a function corresponding to the instructions directly or by using other components under the control of the processor. Instructions may include code generated or executed by a compiler or an interpreter. The machine-readable storage medium may be provided in the form of a non-transitory storage medium. Here, "non-transitory storage medium" simply means that it is a tangible device and does not contain a signal (e.g., electromagnetic waves), and this term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily on the storage medium. For example, a 'non-transient storage medium' may include a buffer in which data is temporarily stored. According to one embodiment, the method according to the various embodiments disclosed herein may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)) or an application store (e.g., Play Store). TMIt can be distributed online (e.g., downloaded or uploaded) through ) or directly between two user devices (e.g., smartphones). For online distribution, at least a portion of the computer program product (e.g., a downloadable app) may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0252] Various embodiments of the present disclosure may be implemented as software comprising instructions stored on a machine-readable storage medium (e.g., a computer). The machine may include an electronic device (e.g., an electronic device (100)) according to the disclosed embodiments, which is a device capable of calling instructions stored from the storage medium and operating according to the called instructions.

[0253] When the above-described instruction is executed by a processor, the processor may perform the function corresponding to the above-described instruction directly or by using other components under the control of the above-described processor. The instruction may include code generated or executed by a compiler or an interpreter.

[0254] Although preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above. It is understood that various modifications can be made by those skilled in the art without departing from the essence of the present disclosure as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present disclosure.

Claims

1. In an electronic device, Communications Department; Memory for storing at least one instruction; and The above includes at least one processor; The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, At least one object image is obtained from the first image, and Obtain object information associated with each of the above at least one object image, and A second image is generated based on object information associated with a previous frame image and object information associated with each of the at least one object image obtained from the first image, and An electronic device that outputs the second image to the display through the display.

2. In Paragraph 1, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, An electronic device that acquires missing object information including information not included in the object information acquired from the current frame image among at least one object information stored previously, and stores the missing object information and the object information acquired from the current frame image as object information of the current frame image in the memory.

3. In Paragraph 2, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, An electronic device that stores the missing object information as object information associated with the current frame image, based on the overlap between the object associated with the missing object information and another object.

4. In Paragraph 2, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, An electronic device that stores the missing object information as object information associated with the current frame image, based on the inability to identify the object associated with the missing object information.

5. In Paragraph 2, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, An electronic device that deletes the missing object information from the memory based on the fact that an object associated with the missing object information deviates from a preset area.

6. In Paragraph 2, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, An electronic device that acquires the missing object information when the number of object information acquired in the current frame is less than the number of object information stored.

7. In Paragraph 1, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, Based on a plurality of objects included in the first image above, information related to the possibility of collision between the plurality of objects is obtained, and An electronic device that stores object information related to a collision in the memory based on information related to the above-mentioned collision possibility.

8. In Paragraph 7, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, For each of the multiple objects, Based on the current position of an object in the current frame and the position of each of one or more objects identified in the previous frame, multiple expected positions in the next frame are obtained, and An electronic device that acquires information related to the possibility of collision among a plurality of objects based on the above-mentioned plurality of predicted positions and the predicted positions of other objects in the current frame image.

9. In Paragraph 8, The position of each of the above objects has multiple coordinate values ​​corresponding to the outer position of each object, and The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, An electronic device that acquires probability information inversely proportional to the average change in movement of each coordinate for each predicted position of each object.

10. In Paragraph 1, The above one or more processors execute the above at least one instruction individually or collectively, and the above one or more processors, Control the communication unit to transmit an object image corresponding to the face in the first image to the server, and An electronic device that obtains object information including at least one of person information and work information corresponding to the face from the above server.

11. A method for controlling an electronic device executed individually or collectively by one or more processors, A step of acquiring at least one object image in a first image; A step of obtaining object information associated with each of the above at least one object image; A step of generating a second image based on object information associated with a previous frame image and object information associated with each of the at least one object image obtained from the first image; and A control method comprising the step of outputting the second image to the display through the display.

12. In Paragraph 11, A step of obtaining missing object information including information not included in the object information obtained from the current frame image among at least one object information stored previously; and A control method comprising the step of storing the above-mentioned missing object information and object information obtained from the current frame image as object information of the current frame image.

13. In Paragraph 12, The step of storing the above-mentioned missing object information and object information obtained from the current frame image is: A control method for storing the missing object information as object information associated with the current frame image, based on the overlap between the object associated with the missing object information and another object.

14. In Paragraph 12, The step of storing the above-mentioned missing object information and object information obtained from the current frame image is: A control method for storing the missing object information as object information associated with the current frame image, based on the inability to identify the object associated with the missing object information.

15. A non-transient computer-readable recording medium storing instructions that, when executed by one or more processors, cause said one or more processors to perform the following individually or collectively: A step of acquiring at least one object image in a first image; A step of obtaining object information associated with each of the above at least one object image; A step of generating a second image based on object information associated with a previous frame image and object information associated with each of the at least one object image obtained from the first image; and A non-transient computer-readable recording medium comprising the step of outputting the second image to the display through the display.