Electronic device and control method therefor

The electronic device addresses real-time subtitle translation challenges by employing adaptive translation methods based on hardware and input speed, ensuring high-quality and timely subtitle translation.

WO2026127540A1PCT designated stage Publication Date: 2026-06-18SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-12-08
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Conventional methods for real-time subtitle translation in electronic devices face challenges due to limitations in AI model size and hardware resources, leading to degraded translation quality and inadequate timing for outputting translations, especially when processing subtitles from broadcasters without external server assistance.

Method used

An electronic device with a processor and memory that employs multiple translation methods, including Incremental Decoding, word cluster, and End-of-Sentence translation, to adapt to hardware capabilities and input speed, ensuring accurate and timely subtitle translation.

🎯Benefits of technology

The device achieves high-quality, real-time subtitle translation by dynamically selecting translation methods based on hardware performance and input characteristics, maintaining translation quality and timing accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025020934_18062026_PF_FP_ABST
    Figure KR2025020934_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to an artificial intelligence (AI) system utilizing a machine learning algorithm, and an application thereof. The electronic device comprises: a memory for storing instructions; and at least one processor. When collectively or individually executed by at least one processor, the instructions instruct the electronic device to: acquire subtitle data corresponding to content; acquire information about an input language corresponding to the subtitle data and a target language related to translation of the subtitle data; and, on the basis of the performance of the electronic device, the input speed of the subtitle data and the relationship between the input language and the target language, translate the subtitle data into the target language by using one from among a plurality of translation methods.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device and control method thereof

[0001] The present disclosure relates to an electronic device and a method for controlling the same, and more specifically, to an electronic device and a method for controlling the same capable of translating and providing subtitle data into a target language.

[0002] An artificial intelligence system is a computer system that implements human-level intelligence, in which machines learn and make judgments on their own, and whose recognition rate improves with use.

[0003] Artificial intelligence technology consists of machine learning (deep learning) technology, which utilizes algorithms to self-classify and learn the characteristics of input data, and component technologies that mimic the functions of the human brain, such as cognition and judgment, by utilizing machine learning algorithms.

[0004] The elemental technologies may include, for example, at least one of linguistic understanding technology that recognizes human language / characters, visual understanding technology that perceives objects like human vision, reasoning / prediction technology that judges information to logically infer and predict, knowledge representation technology that processes human experience information into knowledge data, and motion control technology that controls autonomous driving of vehicles and the movement of robots. In particular, visual understanding is a technology that perceives and processes objects like human vision, and includes object recognition, object tracking, image search, person recognition, scene understanding, spatial understanding, image enhancement, etc.

[0005] Meanwhile, although conventional methods handle the processing of subtitle data input in real-time from electronic devices such as TVs, they have failed to overcome the technical limitations associated with the size of AI models and real-time translation. Consequently, methods such as partially sacrificing real-time capabilities or utilizing external servers to store and process data have been employed. In particular, unlike translating full subtitles when they are provided, translating subtitles transmitted by broadcasters in real-time without the assistance of external servers presents two technical challenges.

[0006] First, the size of usable AI models (e.g., translation models) is limited due to constraints on hardware resources. This means that the same translation quality as that of AI services (translation models) provided with sufficient storage space and computing power cannot be expected, and implies that various technologies and methods are required to prevent a degradation in translation quality.

[0007] Secondly, unlike general translation tasks that rely on viewing the entire text, real-time translation cannot utilize text to be spoken in the future for contextual understanding. Furthermore, unlike existing standard translation models, the device must automatically determine the timing for executing the translation. In this process, performing the translation too quickly degrades translation quality, while delaying the translation as long as possible can diminish the significance of real-time translation. As such, there are technical challenges in appropriately determining the translation timing and outputting the results, and advanced algorithms and processing capabilities are required to provide accurate translations at the right time.

[0008] Meanwhile, the information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0009] According to one embodiment of the present disclosure, an electronic device comprises: a memory for storing instructions; and at least one processor; wherein, when the instructions are executed collectively or individually by the at least one processor, the electronic device acquires subtitle data corresponding to content, acquires information regarding an input language corresponding to the subtitle data and a target language related to the translation of the subtitle data, and translates the subtitle data into the target language using one of a plurality of translation methods based on the performance of the electronic device, the input speed of the subtitle data, and the relationship between the input language and the target language.

[0010] When the above instructions are executed collectively or individually by the processor, the electronic device translates the subtitle data into the target language using a first translation method included in the plurality of translation methods, if the processor for artificial intelligence is included among the at least one processor.

[0011] When the above instructions are executed collectively or individually by the processor, the electronic device may identify whether the input speed of the subtitle data is below a threshold value if the processor for artificial intelligence among the at least one processor is not included, and if the input speed of the subtitle data is below the threshold value, translate the subtitle data into the target language using a first translation method included in the plurality of translation methods, and if the input speed of the subtitle data is above the threshold value, translate the subtitle data into the target language using a second translation method or a third translation method included in the plurality of translation methods.

[0012] When the above instructions are executed collectively or individually by the processor, the electronic device may identify the input speed of the subtitle data based on the speech rate of the person included in the content.

[0013] When the above instructions are executed collectively or individually by the processor, the electronic device may translate the subtitle data into the target language using the second translation method if the word order of the input language and the target language are the same, and translate the subtitle data into the target language using the third translation method if the word order of the input language and the target language are different.

[0014] When the above instructions are executed collectively or individually by the processor, the electronic device may identify whether the word order of the input language and the target language is the same based on whether the part of speech and meaning of the matching words between the input language and the target language are the same.

[0015] The above first translation method corresponds to the Incremental Decoding translation method, the above second translation method corresponds to the word cluster translation method, and the above third translation method corresponds to the End-of-Sentence translation method.

[0016] When the above instructions are executed collectively or individually by the processor, the electronic device may acquire lowercase letters based on uppercase letters included in the subtitle data acquired for more than a preset number of characters, acquire spaces through a preprocessing function based on special characters included in the subtitle data, and identify the input language of the subtitle data through a language detection model based on the subtitle data on which the preprocessing function has been performed.

[0017] When the above instructions are executed collectively or individually by the at least one processor, the electronic device may provide subtitle data translated into the target language while the content is being output.

[0018] When the above instructions are executed collectively or individually by the at least one processor, the electronic device may translate the subtitle data into the target language using a first translation method included in the plurality of translation methods if the available computing resources and memory capacity are above a threshold value.

[0019] Meanwhile, a control method for an electronic device according to one embodiment of the present disclosure comprises: a step of acquiring subtitle data corresponding to content; a step of acquiring information regarding an input language corresponding to the subtitle data and a target language related to the translation of the subtitle data; and a step of translating the subtitle data into the target language using one of a plurality of translation methods based on the performance of the electronic device, the input speed of the subtitle data, and the relationship between the input language and the target language.

[0020] The above translation step can translate the subtitle data into the target language using a first translation method included in the plurality of translation methods if an artificial intelligence processor is included among the at least one processor.

[0021] The above translation step may include: a step of identifying whether the input speed of the subtitle data is below a threshold when the processor for artificial intelligence is not included among the at least one processor; and a step of translating the subtitle data into the target language using a first translation method included in the number of translation methods when the input speed of the subtitle data is below the threshold, and translating the subtitle data into the target language using a second translation method or a third translation method included in the plurality of translation methods when the input speed of the subtitle data is above the threshold.

[0022] The above control method may include the step of identifying the input speed of the subtitle data based on the speech speed of the person included in the content.

[0023] The above translation step may translate the subtitle data into the target language using the second translation method if the word order of the input language and the target language are the same, and translate the subtitle data into the target language using the third translation method if the word order of the input language and the target language are different.

[0024] The above control method can identify whether the word order of the input language and the target language is the same based on whether the part of speech and meaning of the matching words between the input language and the target language are the same.

[0025] The above first translation method corresponds to the Incremental Decoding translation method, the above second translation method corresponds to the word cluster translation method, and the above third translation method corresponds to the End-of-Sentence translation method.

[0026] The step of obtaining information about the target language may include: a step of obtaining lowercase letters based on uppercase letters included in the subtitle data obtained at a preset number of characters or more; a step of obtaining spaces through a preprocessing function based on special characters included in the subtitle data; and a step of identifying the input language of the subtitle data through a language detection model based on the subtitle data on which the preprocessing function has been performed.

[0027] The above control method may include the step of providing subtitle data translated into the target language while the content is being output.

[0028] In a non-transient computer-readable medium storing instructions for executing a method of controlling an electronic device according to one embodiment of the present disclosure, the method of controlling the electronic device comprises: a step of acquiring subtitle data corresponding to content; a step of acquiring information regarding an input language corresponding to the subtitle data and a target language related to the translation of the subtitle data; and a step of translating the subtitle data into the target language using one of a plurality of translation methods based on the performance of the electronic device, the input speed of the subtitle data, and the relationship between the input language and the target language.

[0029] FIG. 1 is a block diagram showing the configuration of an electronic device according to one embodiment of the present disclosure.

[0030] FIG. 2 is a drawing including a plurality of modules for translating subtitle data according to one embodiment of the present disclosure.

[0031] FIG. 3 is a flowchart illustrating a method for translating subtitle data of an electronic device according to one embodiment of the present disclosure.

[0032] FIG. 4 is a diagram illustrating a method for determining whether the word order of an input language and a target language is the same, according to one embodiment of the present disclosure.

[0033] FIG. 5 is a flowchart illustrating a method for controlling an electronic device according to one embodiment of the present disclosure.

[0034] The various embodiments of the present disclosure and the terms used therein are not intended to limit the technical features described in the present disclosure to specific embodiments, and should be understood to include various modifications, equivalents, or substitutions of said embodiments.

[0035] In relation to the description of the drawings, similar reference numerals may be used for similar or related components.

[0036] The singular form of the noun corresponding to an item may include one or plural items, unless the relevant context clearly indicates otherwise.

[0037] In the present disclosure, each of the phrases such as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B or C,” “at least one of A, B and C,” and “at least one of A, B, or C” may include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” may refer to cases including (1) at least one A, (2) at least one B, or (3) both at least one A and at least one B.

[0038] Terms such as "first," "second," or "first" or "second" may be used simply to distinguish a component from another component and do not limit the components in other aspects (e.g., importance or order).

[0039] Where any (e.g., 1st) component is referred to as "coupled" or "connected" to another (e.g., 2nd) component, with or without the terms "functionally" or "communicationly," it means that the component may be connected to the other component directly (e.g., via a wire), wirelessly, or through a third component.

[0040] Terms such as "include" or "have" are intended to specify the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in this document, and do not preclude the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0041] When it is said that a component is "connected," "combined," "supported," or "in contact" with another component, this includes not only cases where the components are directly connected, combined, supported, or in contact, but also cases where they are indirectly connected, combined, supported, or in contact through a third component.

[0042] When it is said that a component is located "on" another component, this includes not only cases where one component is in contact with the other, but also cases where another component exists between the two components.

[0043] The term "and / or" includes a combination of multiple related described components or any of the multiple related described components.

[0044] In some situations, the expression “device configured to do something” may mean that the device is “capable of doing something” in conjunction with other devices or components. For example, the phrase “processor configured (or set) to perform A, B, and C” may mean a dedicated processor for performing the said operations (e.g., an embedded processor), or a generic-purpose processor (e.g., a CPU or application processor) capable of performing said operations by executing one or more software programs stored in a memory device.

[0045] In the embodiments, a "module" or "part" performs at least one function or operation and may be implemented in hardware or software, or a combination of hardware and software. Additionally, a plurality of "modules" or a plurality of "parts" may be integrated into at least one module and implemented by at least one processor (not shown), except for a "module" or "part" that needs to be implemented in specific hardware.

[0046] The various elements and areas in the drawings are depicted schematically. Accordingly, the technical concept of the present invention is not limited by the relative sizes or spacing depicted in the attached drawings.

[0047] An embodiment of the present disclosure will be described in more detail later with reference to the drawings.

[0048]

[0049] FIG. 1 is a block diagram showing the configuration of an electronic device according to the present embodiment of the present disclosure. An electronic device (100) according to various embodiments of the present disclosure may include a user input unit (110), a camera (120), a microphone (130), a sensor (140), a display (150), a memory (160), a communication interface (170), and a processor (190), as shown in FIG. 1. However, this is merely one embodiment, and it is understood that some components may be removed or added depending on the type of electronic device (100). For example, if the electronic device (100) is implemented as a set-top box, the electronic device (100) may not include a display (150).

[0050] The user input unit (110) is configured to receive user input for controlling the electronic device (100) and may include a button, a lever, a switch, a touch interface, etc. At this time, the touch interface may be implemented in a way that receives input by touching the user on the display (150) screen of the electronic device (100). Alternatively, the user input unit (110) may include various input devices such as a remote control signal receiver, a keyboard, a mouse, etc.

[0051] In particular, the user input unit (110) can receive user commands for translating subtitle data.

[0052] The camera (120) can capture still images and video. A camera (120) according to various embodiments of the present disclosure may include one or more lenses, an image sensor, an image signal processor, and a flash. One or more lenses may include a telephoto lens, a wide-angle lens, and a super-wide-angle lens disposed on the surface of the electronic device (100), and may also include a three-dimensional depth lens. The camera (120) may be disposed on the surface (e.g., rear or front) of the electronic device (100), but is not limited to such configuration, and various embodiments according to the present disclosure may be implemented through a connection with a camera (120) that exists separately outside the electronic device (100).

[0053] A microphone (130) may refer to a device that detects sound and converts it into an electrical signal. For example, the microphone (130) can detect voice in real time, and by converting the detected voice into an electrical signal, the electronic device (100) can perform an action corresponding to the electrical signal. The microphone (130) may include a TTS module or an STT module. The microphone (130) may be included not only as a component of the electronic device (100) but also as a component of an input device.

[0054] Meanwhile, the microphone (130) may be provided in the electronic device (100), but this is merely one embodiment, and the user's voice may be received through the microphone of an external device (e.g., a remote control) that interacts with the electronic device (100). At this time, an analog voice signal is received through the microphone of the external device, and the external device may digitize the analog voice signal and transmit it to the electronic device (100). Here, the digital voice signal may be transmitted through a communication interface such as Bluetooth or Wi-Fi.

[0055] Additionally, a remote control application can be installed on a user terminal such as a smartphone, and the electronic device (100) can be controlled using the user voice obtained through the user terminal. Here, the smartphone is merely one embodiment, and the user voice can be received from an AI speaker or other device with an application installed. In the case of a user terminal with a remote control application installed, the user voice can be received, and the user terminal, such as a remote control or a smartphone, can transmit and receive data and control the electronic device (100) using Wi-Fi / Bluetooth or infrared, etc. Accordingly, the electronic device (100) may include a plurality of types of communication interfaces. Additionally, the communication unit communicating with the server and the communication interface communicating with the external device may be different from each other, but this is merely one embodiment, and they may be implemented with the same communication interface.

[0056] The sensor (140) can detect the state of the electronic device (100) (e.g., movement) or the state of the external environment (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. The sensor (140) may include, for example, a gesture sensor and an accelerometer.

[0057] Additionally, the sensor (140) may include a Time of Flight (ToF) sensor, such as an infrared sensor or an ultrasonic sensor, to obtain information about the distance to an object (e.g., the user's hand (10)).

[0058] The display (150) may include various types of display panels such as an LCD (Liquid Crystal Display) panel, an OLED (Organic Light Emitting Diodes) panel, an AM-OLED (Active-Matrix Organic Light-Emitting Diode), an LcoS (Liquid Crystal on Silicon), a QLED (Quantum dot Light-Emitting Diode) and DLP (Digital Light Processing), a PDP (Plasma Display Panel) panel, an inorganic LED panel, and a micro LED panel, but is not limited thereto. Meanwhile, the display (150) may form a touchscreen together with a touch panel and may be made of a flexible panel.

[0059] In one embodiment of the present disclosure, the display (150) may provide various UIs, such as a UI for guiding a user's hand gesture or a UI for guiding a mode of controlling an electronic device (100) using a hand gesture.

[0060] The memory (160) can store an operating system (OS) for controlling the overall operation of the components of the electronic device (100) and instructions or data related to the components of the electronic device (100). In particular, the memory (160) may include a plurality of modules for translating subtitle data into a target language. In particular, when a plurality of modules for translating subtitle data into a target language are executed, the electronic device (100) can load data for various modules to perform various operations stored in non-volatile memory into volatile memory. Here, loading means the operation of bringing data stored in non-volatile memory into volatile memory and storing it so that the processor (190) can access it.

[0061] Meanwhile, memory (160) can be implemented as non-volatile memory (e.g., hard disk, SSD (Solid state drive), flash memory), volatile memory (memory within the processor (190)), etc.

[0062] Additionally, the memory (160) may store a language detection model for identifying the input language of subtitle data, and may store a plurality of translation programs or subtitle translation models for translating subtitle data of the input language into a subtitle language. Here, the language detection model and the subtitle translation model may be trained neural network models, but are not limited thereto. Meanwhile, the storage of a plurality of types of programs or neural network models for translating subtitle data of the input language into a subtitle language by the memory (160) of the electronic device (100) is merely one embodiment, and it is obvious that at least some of the plurality of types of programs or neural network models for translating subtitle data of the input language into a subtitle language may be stored on an external server.

[0063] The communication interface (170) includes at least one circuit and can communicate with various types of external devices or servers. The communication interface (170) may include at least one of a BLE (Bluetooth Low Energy) module, a Wi-Fi communication module, a cellular communication module, a 3G (3rd generation) mobile communication module, an Ultra Wideband (UWB) communication module, a 4G (4th generation) mobile communication module, a 4th generation LTE (Long Term Evolution) communication module, and a 5G (5th generation) mobile communication module.

[0064] In particular, the communication interface (170) can receive content and subtitle data included in the content from an external server or external device.

[0065] The input / output interface (180) is configured to input or output at least one of audio and video signals. For example, the input / output interface (180) may be HDMI (High Definition Multimedia Interface), but this is merely an example of an embodiment, and it may be any one of MHL (Mobile High-Definition Link), USB (Universal Serial Bus), DP (Display Port), Thunderbolt, VGA (Video Graphics Array) port, RGB port, D-SUB (D-subminiature), or DVI (Digital Visual Interface). Depending on the implementation example, the input / output interface (180) may include separate ports for inputting and outputting only audio signals and for inputting and outputting only video signals, or it may be implemented as a single port for inputting and outputting both audio and video signals. In particular, the electronic device (100) can acquire at least one input video through the input / output interface (180).

[0066] In particular, the input / output interface (180) can receive content and subtitle data included in the content from an external device.

[0067] The processor (190) can control the electronic device (100) according to at least one instruction stored in memory (160).

[0068] In particular, the processor (190) may include one or more processors. Specifically, one or more processors may include one or more of a CPU (Central Processing Unit), GPU (Graphics Processing Unit), APU (Accelerated Processing Unit), MIC (Many Integrated Core), DSP (Digital Signal Processor), NPU (Neural Processing Unit), hardware accelerator, or machine learning accelerator. One or more processors may control one or any combination of other components of an electronic device and may perform operations or data processing related to communication. One or more processors may execute one or more programs or instructions stored in memory. For example, one or more processors may perform a method according to one embodiment of the present disclosure by executing one or more instructions stored in memory.

[0069] When a method according to one embodiment of the present disclosure includes a plurality of operations, the plurality of operations may be performed by a single processor or by a plurality of processors. That is, when a first operation, a second operation, and a third operation are performed by a method according to one embodiment, the first operation, the second operation, and the third operation may all be performed by a first processor, or the first operation and the second operation may be performed by a first processor (e.g., a general-purpose processor) and the third operation may be performed by a second processor (e.g., a processor for artificial intelligence).

[0070] One or more processors may be implemented as a single-core processor comprising one core, or as one or more multicore processors comprising multiple cores (e.g., homogeneous multicore or heterogeneous multicore). When one or more processors are implemented as multicore processors, each of the multiple cores included in the multicore processor may include internal processor memory such as cache memory or on-chip memory, and a common cache shared by multiple cores may be included in the multicore processor. Additionally, each of the multiple cores included in the multicore processor (or some of the multiple cores) may independently read and execute program instructions for implementing a method according to one embodiment of the present disclosure, or all (or some) of the multiple cores may be linked together to read and execute program instructions for implementing a method according to one embodiment of the present disclosure.

[0071] When a method according to one embodiment of the present disclosure includes a plurality of operations, the plurality of operations may be performed by one of the plurality of cores included in a multi-core processor, or may be performed by a plurality of cores. For example, when a first operation, a second operation, and a third operation are performed by a method according to one embodiment, the first operation, the second operation, and the third operation may all be performed by a first core included in a multi-core processor, or the first operation and the second operation may be performed by a first core included in a multi-core processor and the third operation may be performed by a second core included in a multi-core processor.

[0072] In embodiments of the present disclosure, the processor (190) may mean a system-on-chip (SoC) in which one or more processors and other electronic components are integrated, a single-core processor, a multi-core processor, or a core included in a single-core processor or a multi-core processor, wherein the core may be implemented as a CPU, GPU, APU, MIC, DSP, NPU, hardware accelerator or machine learning accelerator, etc., but the embodiments of the present disclosure are not limited thereto.

[0073] In particular, the processor (190) obtains subtitle data corresponding to the content by executing at least one instruction, obtains information about the input language corresponding to the subtitle data and the target language related to the translation of the subtitle data, and translates the subtitle data into the target language using one of a plurality of translation methods based on the performance of the electronic device (100), the input speed of the subtitle data, and the relationship between the input language and the target language.

[0074] In one or more embodiments, the processor (190) can translate subtitle data into a target language using a first translation method included in a plurality of translation methods, if at least one of the processors includes a processor for artificial intelligence.

[0075] In one or more embodiments, the processor (190) may identify whether the input speed of subtitle data is below a threshold value if at least one processor for artificial intelligence is not included, and if the input speed of subtitle data is below the threshold value, translate the subtitle data into a target language using a first translation method included in a plurality of translation methods, and if the input speed of subtitle data is above the threshold value, translate the subtitle data into a target language using a second translation method or a third translation method included in a plurality of translation methods.

[0076] In one or more embodiments, the processor (190) may identify the input speed of subtitle data based on the speech speed of a person included in the content.

[0077] In one or more embodiments, the processor (190) may translate subtitle data into the target language using a second translation method if the word order of the input language and the target language is the same, and translate subtitle data into the target language using a third translation method if the word order of the input language and the target language is different.

[0078] In one or more embodiments, the processor (190) may identify whether the word order of the input language and the target language is the same based on whether the part of speech and meaning of the matching words between the input language and the target language are the same.

[0079] Here, the first translation method corresponds to the Incremental Decoding translation method, the second translation method corresponds to the word cluster translation method, and the third translation method corresponds to the End-of-Sentence translation method.

[0080] In one or more embodiments, the processor (190) may obtain lowercase letters based on uppercase letters included in the subtitle data obtained for more than a preset number of characters, obtain spaces through a preprocessing function based on special characters included in the subtitle data, and identify the input language of the subtitle data through a language detection model based on the subtitle data on which the preprocessing function has been performed.

[0081] In one or more embodiments, the processor (190) may provide subtitle data translated into a target language while the content is being output.

[0082] In one or more embodiments, the processor (190) may translate the subtitle data into the target language using a first translation method included in a plurality of translation methods when it is identified that the currently available computing resources and memory capacity are above a threshold.

[0083] FIG. 2 is a diagram including a plurality of modules for translating caption data according to one embodiment of the present disclosure. As shown in FIG. 2, the electronic device may include a caption provider module (210), a service module (220), a resource (230), a UI module (240), and Other Apps (250). Meanwhile, the configuration shown in FIG. 2 may be implemented in software, but this is merely one embodiment, and it is obvious that it may be implemented by a combination of software and hardware.

[0084] The subtitle providing module (210) can acquire broadcast signals from various sources and acquire subtitle data from the broadcast signals. Here, the subtitle data may include status information indicating the current state of the subtitle service, information about the subtitle, and a subtitle stream that transmits the subtitle data in real-time or in file format.

[0085] In one or more embodiments, the subtitle providing module (210) can acquire subtitle data included in the broadcast signal. Alternatively, the subtitle providing module (210) can acquire subtitle data through speech recognition of the audio signal included in the broadcast signal. Alternatively, the subtitle providing module (210) can acquire subtitle data through Optical Character Recognition (OCR) of the video signal included in the broadcast signal.

[0086] The service module (222) can provide a translation service for subtitle data. Specifically, the service module (222) can provide a translation service that translates subtitle data into a target language by identifying one of a plurality of translation methods based on the performance of the electronic device (200), the input speed of the subtitle data, and the relationship between the input language and the target language.

[0087] In particular, the service module (222) may include a Caption Process Handler (221), an Engine Usage Handler (222), and an Engine Update Handler (223).

[0088] The subtitle processing handler (221) can perform the function of managing subtitle generation and processing. The subtitle processing handler (221) can generate or edit subtitle data and process subtitle data according to the output format. In particular, the subtitle processing handler (221) can perform preprocessing of subtitles. Specifically, the subtitle processing handler (221) can perform preprocessing that includes defining the format of subtitles consistently by applying rule-based rules and reducing errors that may occur during the translation process. This preprocessing improves the quality of subtitles and helps facilitate the smooth progress of the translation process. The subtitle processing handler (221) can perform translation work on the preprocessed subtitle data. In particular, the subtitle processing handler (221) can perform translation work on the preprocessed subtitle data using a translation model corresponding to the identified translation method among a plurality of translation programs or translation models stored in the resource (230).

[0089] The engine usage handler (222) can perform the function of tracking and managing the usage status of the engine in a system or application. In particular, the engine usage handler (222) can identify one of a plurality of translation methods based on the performance of the electronic device (200), the input speed of subtitle data, and the relationship between the input language and the target language. The method for determining the translation method will be explained in more detail later with reference to FIG. 3.

[0090] The engine update handler (223) may be responsible for handling the update and maintenance tasks of the engine. That is, the engine update handler (223) may handle updates to keep the engine up to date and to improve performance and security. In one or more embodiments, the engine update handler (223) may update the translation method to a different translation method based on the performance of the electronic device (200), the input speed of subtitle data, and the relationship between the input language and the target language.

[0091] The resource (230) can store various translation models. The resource (230) can store translation models for various translation methods and translation models for various languages. For example, the resource (230) can store multiple translation models corresponding to first to third translation methods. Here, the first translation method may be an Incremental Decoding translation method, the second translation method may be a word cluster translation method, and the third translation method may be an End-of-Sentence translation method. Additionally, the resource (230) can store various translation models such as a Korean-English translation model, a Korean-Japanese translation model, an English-French translation model, etc.

[0092] The UI module (240) can provide subtitle data translated by the service module (230) to the UI. For example, the UI module (240) can create a UI element containing the translated subtitle data and display the created UI element on a part of the screen (e.g., the bottom part of the screen).

[0093] Other Apps (250) can provide various application services, and in particular, can send a translation request to the service module (220) and receive translation results from the service module (220).

[0094] FIG. 3 is a flowchart illustrating a method for translating subtitle data of an electronic device according to one embodiment of the present disclosure.

[0095] In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0096] According to one or more embodiments, S310 to S390 may be understood to be performed in a processor (e.g., processor (190) of FIG. 1) of an electronic device (e.g., electronic device (100) of FIG. 1).

[0097] In one or more embodiments, the electronic device (100) can acquire subtitle data (310). Specifically, the electronic device (100) can acquire subtitle data included in a broadcast signal input in real time. Alternatively, the electronic device (100) can acquire subtitle data by performing speech recognition on an audio signal included in a broadcast signal input in real time. Alternatively, the electronic device (100) can acquire subtitle data by performing OCR on a video signal included in a broadcast signal input in real time.

[0098] Additionally, the electronic device (100) can detect the input language of the subtitle data and identify the target language. Here, the input language refers to the language type of the subtitle data, and the target language may refer to the language type to which the subtitle data is to be translated. For example, the input language may be Korean and the target language may be English. Meanwhile, the input language may be referred to as the first language, source language, etc., and the target language may be referred to as the second language, output language, etc.

[0099] In one or more embodiments, the electronic device (100) can obtain information about the input language of the subtitle data using a language detection model. Specifically, to increase the accuracy of the language detection model, the electronic device (100) can obtain more than a preset number of characters (e.g., 20 characters), convert uppercase letters included in the subtitle data obtained more than the preset number of characters into lowercase letters, and perform a preprocessing operation to replace special characters included in the subtitle data with spaces. That is, the electronic device (100) can perform preprocessing to minimize errors caused by unnecessary uppercase letters or symbols. Then, the electronic device (100) can identify the input language of the subtitle data by inputting the subtitle data on which the preprocessing operation has been performed into the language detection model.

[0100] In one or more embodiments, the electronic device (100) can obtain information about a target language based on user input. For example, the electronic device (100) can set a target language based on user input received through a UI and obtain information about the set target language.

[0101] The electronic device (100) can identify whether the performance of the electronic device (100) satisfies preset conditions (S320). Here, the performance of the electronic device (100) may include the performance of hardware included in the electronic device (100), memory capacity, computing resources, etc.

[0102] In one or more embodiments, the electronic device (100) can identify whether it includes a processor for artificial intelligence. For example, the electronic device (100) can identify whether at least one processor included in the electronic device (100) includes a GPU or NPU, etc., which is a processor for artificial intelligence. That is, if the electronic device (100) is identified as including an artificial intelligence processor, it can identify that it satisfies a preset condition.

[0103] In one or more embodiments, the electronic device (100) can identify whether the currently available computing resources and memory capacity are above a threshold. And, if the electronic device (100) identifies that the currently available computing resources and memory capacity are above a threshold, it can identify that a preset condition is satisfied.

[0104] When it is identified that the performance of the electronic device (100) satisfies a preset condition (S320-Y), the electronic device (100) can identify a translation method for subtitle data as a first translation method (S350). Here, the first translation method may be an Incremental Decoding translation method. An Incremental Decoding translation method can sequentially generate translation results one token (word) at a time. Translation proceeds by generating one word at a time, then predicting and adding the next word based on it, and when the translation is completed, a termination identification token is <eos>This method terminates translation by generating an (End of Sentence) token. The Incremental Decoding translation method enables real-time processing and has the advantage of reflecting the context of the sentence.

[0105] If it is identified that the performance of the electronic device (100) does not satisfy preset conditions (S320-N), the electronic device (100) can identify whether the input speed of subtitle data is less than a preset value (S330). That is, the electronic device (100) can determine whether it can process translation within the input speed of subtitle data corresponding to the average speech speed of a person. If the subtitle data is input at a speed lower than the average speech speed of a person, the electronic device (100) can use a translation method that enables more accurate processing.

[0106] In one or more embodiments, the electronic device (100) can identify the input speed of subtitle data based on the speech rate of a person included in the content. That is, when subtitle data is obtained through speech recognition of the voice of a person included in the audio data, the electronic device (100) can identify the input speed of subtitle data by detecting the speech rate of a person included in the content.

[0107] In one or more embodiments, the electronic device (100) can identify the input speed of subtitle data included in a real-time broadcast signal.

[0108] If it is identified that the input speed of the subtitle data is less than a preset value (S330-Y), the electronic device (100) can identify the translation method for the subtitle data as a first translation method (S350). Here, the first translation method may be an Incremental Decoding translation method.

[0109] If it is identified that the input speed of subtitle data is greater than or equal to a preset value (S330-N), the electronic device (100) can identify whether the word order of the input language and the target language is the same (S340). In one or more embodiments, the word order of the input language and the target language can be identified by identifying whether the part of speech and meaning of the matching words of the input language and the target language are the same. For example, if the input language is Korean and the target language is English, the electronic device (100) can obtain two sentences: "I go to school" in the input language and "I go to school" in the target language. Then, the electronic device (100) can extract word clusters from each sentence and compare them. For example, as shown in FIG. 4, the electronic device (100) can compare the part of speech and meaning of the first word cluster of Korean, "I" (410-1), and the first word cluster of English, "I" (420-1). At this time, the part of speech and meaning of the first Korean word cluster "나는" (410-1) and the first English word cluster "I" (420-1) may be identical. Additionally, the electronic device (100) can compare the part of speech and meaning of the second Korean word cluster "학교에" (410-2) and the second English word cluster "go to" (420-2). At this time, the part of speech and meaning of the second word cluster "학교에" (410-2) and the second English word cluster "go to" (420-2) may be different from each other. Additionally, the electronic device (100) can compare the part of speech and meaning of the third Korean word cluster "간다" (410-3) and the third English word cluster "school" (420-3). At this time, the part of speech and meaning of the third word group "go" (410-3) and the third word group in English "school" (420-3) may differ from each other. That is, the electronic device (100) can identify that the word order is different because the part of speech and meaning between the second word groups and the third word groups are different from each other.However, if the parts of speech and meanings of word clusters between the input language and the target language are the same, the electronic device (100) can identify that the word order of the input language and the target language is the same.

[0110] However, this is merely one example, and the electronic device (100) may store information in memory (160) regarding whether the word order of the input language and the target language is the same.

[0111] When it is identified that the word order of the input language and the target language are the same (S340-Y), the electronic device (100) can identify the translation method for the subtitle data as a second translation method (S360). Here, the second translation method may be a word chunk translation method. The word chunk translation method may be a method of dividing a sentence into multiple word chunks, translating multiple word chunks independently, and then combining them to complete the entire sentence. In the case of the word chunk translation method, faster translation may be possible than the first translation method, the Incremental Decoding translation method, but the accuracy may be lower.

[0112] If it is identified that the word order of the input language and the target language are different (S340-N), the electronic device (100) can identify the translation method for the subtitle data as a third translation method (S370). Here, the third translation method may be an End-of-Sentence translation method. An End-of-Sentence translation method may be a method that receives the entire content of a sentence as input and generates the entire sentence at once. An End-of-Sentence translation method may enable faster translation than an Incremental Decoding translation method and a word cluster translation method, but may have lower accuracy.

[0113] The electronic device (100) can translate subtitle data using an identified translation method (S380). Specifically, the electronic device (100) can perform preprocessing on the subtitle data. Here, the preprocessing may include applying rule-based rules to consistently define the format of the subtitles and reducing errors that may occur during the translation process.

[0114] Additionally, if identified as a first translation method, the electronic device (100) can translate subtitle data using a neural network model corresponding to the first translation method. The neural network model corresponding to the first translation method is a neural network model for performing sequential prediction, and may be, for example, an autoregressive model, but is not limited thereto.

[0115] Additionally, if identified as a second translation method, the electronic device (100) can translate subtitle data using a neural network model corresponding to the second translation method. The neural network model corresponding to the second translation method may be a Sequence-to-Sequence (Seq2Seq) structure and a Transformer model extended therefrom, but is not limited thereto.

[0116] Additionally, if identified as a third translation method, the electronic device (100) can translate subtitle data using a neural network model corresponding to the third translation method. The neural network model corresponding to the third translation method may also be a Sequence-to-Sequence (Seq2Seq) structure and a Transformer model extended therefrom, but is not limited thereto.

[0117] The electronic device (100) can perform post-processing work on the translated subtitles. The post-processing work also follows rule-based rules and may include a cleanup operation to finally review the quality of the translated subtitles and provide the best subtitles to the user.

[0118] The electronic device (100) can output translated subtitle data (S390). Specifically, the electronic device (100) can obtain UI elements using the translated subtitle data and output them on a pre-set area (e.g., the bottom area of ​​the screen). By doing so, the electronic device (100) can provide subtitles to the screen in real time, and the user can smoothly understand the content through the translated subtitles. By outputting subtitles in real time, accurate and fast translated subtitles can be provided to the user, thereby maximizing the viewing experience.

[0119]

[0120] FIG. 5 is a flowchart illustrating a method for controlling an electronic device according to one embodiment of the present disclosure.

[0121] First, the electronic device (100) acquires subtitle data of the content (S310). Specifically, the electronic device (100) can acquire subtitle data included in a broadcast signal input in real time. Alternatively, the electronic device (100) can acquire subtitle data by performing speech recognition on an audio signal included in a broadcast signal input in real time. Alternatively, the electronic device (100) can acquire subtitle data by performing OCR on a video signal included in a broadcast signal input in real time.

[0122] The electronic device (100) identifies information regarding the input language of the subtitle data and the target language to which the subtitle data is to be translated (S320). In one or more embodiments, the electronic device (100) may perform a preprocessing operation to obtain more than a preset number of characters, convert uppercase letters included in the subtitle data obtained more than the preset number of characters into lowercase letters, and replace special characters included in the subtitle data with spaces. Then, the electronic device (100) may input the subtitle data on which the preprocessing operation has been performed into a language detection model to identify the input language of the subtitle data. Additionally, the electronic device (100) may obtain information regarding the target language according to user input.

[0123] The electronic device (100) identifies one of a plurality of translation methods based on the performance of the electronic device (100), the input speed of subtitle data, and the relationship between the input language and the target language, and translates the subtitle data into the target language (S530). In one or more embodiments, if the electronic device (100) identifies that at least one processor included in the electronic device (100) includes a processor for artificial intelligence, the translation method may be identified as a first translation method. Additionally, if it is identified that at least one processor does not include a processor for artificial intelligence, the electronic device (100) may identify whether the input speed of the subtitle data is below a threshold. If the input speed of the subtitle data is below the threshold, the electronic device (100) identifies the translation method as a first translation method, and if the input speed of the subtitle data is above the threshold, the electronic device (100) may identify the translation method as either a second translation method or a third translation method. Here, the electronic device (100) may identify the input speed of the subtitle data based on the speech speed of a person included in the content. If it is identified that the word order of the input language and the target language is the same, the electronic device (100) identifies the translation method as a second translation method, and if it is identified that the word order of the input language and the target language is different, the electronic device (100) can identify the translation method as a second translation method. The electronic device (100) can identify whether the word order of the input language and the target language is the same by identifying whether the part of speech and meaning of the matching words of the input language and the target language are the same.

[0124] Meanwhile, according to one embodiment of the present disclosure, the first translation method may be an Incremental Decoding translation method, the second translation method may be a word cluster translation method, and the third translation method may be an End-of-Sentence translation method.

[0125] In one or more embodiments, the electronic device (100) may provide subtitle data translated into a target language while the content is displayed.

[0126]

[0127] Meanwhile, the method according to various embodiments of the present disclosure may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product (e.g., downloadable app) may be temporarily stored or temporarily created on a device-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0128] A method according to various embodiments of the present disclosure may be implemented as software comprising instructions stored on a machine-readable storage medium (e.g., a computer). The machine may include an electronic device according to the disclosed embodiments, which is a device capable of calling instructions stored from the storage medium and operating according to the called instructions.

[0129] Meanwhile, a device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' simply means that it is a tangible device and does not contain a signal (e.g., electromagnetic waves), and this term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily. For example, a 'non-transitory storage medium' may include a buffer in which data is stored temporarily.

[0130] When the above instruction is executed by a processor, the processor may perform the function corresponding to the instruction directly or by using other components under the control of the processor. The instruction may include code generated or executed by a compiler or an interpreter.

[0131] Although preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above. It is understood that various modifications can be made by those skilled in the art without departing from the essence of the present disclosure as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present disclosure.< / eos>

Claims

1. In an electronic device, Memory for storing instructions; and It includes at least one processor; and When the above instructions are executed collectively or individually by the at least one processor, the electronic device, Acquire subtitle data corresponding to the content, Information regarding the input language corresponding to the above subtitle data and the target language related to the translation of the above subtitle data is obtained, An electronic device that translates subtitle data into the target language using one of a plurality of translation methods based on the performance of the electronic device, the input speed of the subtitle data, and the relationship between the input language and the target language.

2. In Paragraph 1, When the above instructions are executed collectively or individually by the processor, the electronic device, An electronic device that translates subtitle data into the target language using a first translation method included in the plurality of translation methods, wherein at least one of the above processors includes a processor for artificial intelligence.

3. In Paragraph 2, When the above instructions are executed collectively or individually by the processor, the electronic device, If an artificial intelligence processor is not included among the above at least one processor, it identifies whether the input speed of the subtitle data is below a threshold, and If the input speed of the above subtitle data is less than a threshold, the subtitle data is translated into the target language using a first translation method included in the plurality of translation methods, and An electronic device that translates the subtitle data into the target language using a second translation method or a third translation method included in the plurality of translation methods when the input speed of the subtitle data is greater than or equal to a threshold.

4. In Paragraph 3, When the above instructions are executed collectively or individually by the processor, the electronic device, An electronic device that identifies the input speed of the subtitle data based on the speech speed of a person included in the content.

5. In Paragraph 3, When the above instructions are executed collectively or individually by the processor, the electronic device, If the word order of the input language and the target language are the same, the subtitle data is translated into the target language using the second translation method, and An electronic device that translates the subtitle data into the target language using the third translation method when the word order of the input language and the target language is different.

6. In Paragraph 5, When the above instructions are executed collectively or individually by the processor, the electronic device, An electronic device that identifies whether the word order of the input language and the target language is the same based on whether the part of speech and meaning of the matching words between the input language and the target language are the same.

7. In Paragraph 5, The above first translation method corresponds to the Incremental Decoding translation method, and The above second translation method corresponds to a word block translation method, and An electronic device characterized by the above-mentioned third translation method corresponding to an End-of-Sentence translation method.

8. In Paragraph 1, When the above instructions are executed collectively or individually by the processor, the electronic device, Lowercase letters are obtained based on uppercase letters included in the above subtitle data obtained at a preset number of characters or more, and Based on the special characters included in the above subtitle data, spaces are obtained through a preprocessing function, and An electronic device that identifies the input language of the subtitle data through a language detection model based on the subtitle data on which the above-mentioned preprocessing function has been performed.

9. In Paragraph 1, When the above instructions are executed collectively or individually by the at least one processor, the electronic device, An electronic device that provides subtitle data translated into the target language while the above content is being output.

10. In Paragraph 1, When the above instructions are executed collectively or individually by the at least one processor, the electronic device, An electronic device that translates the subtitle data into the target language using a first translation method included in the plurality of translation methods when the available computing resources and memory capacity are above a threshold.

11. In a method for controlling an electronic device, A step of acquiring subtitle data corresponding to the content; A step of obtaining information about an input language corresponding to the above subtitle data and a target language related to the translation of the above subtitle data; and A control method comprising the step of translating the subtitle data into the target language using one of a plurality of translation methods based on the performance of the electronic device, the input speed of the subtitle data, and the relationship between the input language and the target language.

12. In Paragraph 11, The above translation step is, A control method for translating subtitle data into the target language using a first translation method included in the plurality of translation methods, wherein at least one of the above processors includes a processor for artificial intelligence.

13. In Paragraph 12, The above translation step is, If the above-mentioned at least one processor does not include a processor for artificial intelligence, a step of identifying whether the input speed of the subtitle data is below a threshold; and If the input speed of the above subtitle data is less than a threshold, the subtitle data is translated into the target language using a first translation method included in the plurality of translation methods, and A control method comprising the step of translating the subtitle data into the target language using a second translation method or a third translation method included in the plurality of translation methods when the input speed of the subtitle data is greater than or equal to a threshold.

14. In Paragraph 13, The above control method is, A control method comprising the step of identifying the input speed of the subtitle data based on the speech speed of a person included in the content.

15. In Paragraph 13, The above translation step is, If the word order of the input language and the target language are the same, the subtitle data is translated into the target language using the second translation method, and A control method for translating subtitle data into the target language using the third translation method when the word order of the input language and the target language is different.