Display device interworking with external device
The display device synchronizes subtitles with video playback using AI and dynamically controls translation quality, addressing synchronization and recognition issues, thereby improving the user experience.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- LG ELECTRONICS INC
- Filing Date
- 2024-12-27
- Publication Date
- 2026-07-02
AI Technical Summary
Existing display devices struggle with synchronizing translated subtitles with video playback and users find it difficult to intuitively recognize multilingual subtitle support, especially during channel switching.
A display device interacts with an external device to generate and synchronize subtitles in real-time using artificial intelligence, dynamically controlling translation quality based on the external device's state and processing capabilities, and provides a user interface for intuitive recognition of multilingual subtitle support.
The solution ensures synchronized subtitle display with video playback and allows users to easily recognize multilingual subtitle support, enhancing the viewing experience by adapting to the external device's processing capabilities and providing context-based translation.
Smart Images

Figure KR2024021336_02072026_PF_FP_ABST
Abstract
Description
Display device that interacts with external devices
[0001] The present disclosure relates to a display device that interacts with an external device. More specifically, the present disclosure relates to a display device that displays subtitles in interaction with an external device in a content provision system.
[0002] Recently, digital TV services using wired or wireless communication networks have become commonplace. Digital TV services can provide a variety of services that were not available through existing analog broadcasting services.
[0003] For example, IPTV (Internet Protocol Television) and SMART TV services, which are types of digital TV services, provide interactivity that allows users to actively select the types of programs and viewing time.
[0004] Meanwhile, among media content, video is provided through telecommunication or broadcasting networks, so it is available to all users regardless of region or language. However, because the language of videos varies by country, region, or language, the videos available to users are inevitably limited depending on their translation skills.
[0005] For example, domestic (Korean) users can watch videos produced in English-speaking countries through display devices. However, since the audio and subtitles of videos from English-speaking countries are produced based on English, domestic users are restricted from using videos created by users from English-speaking countries.
[0006] Meanwhile, translated subtitles can be displayed via voice recognition while playing video content on a display device. However, since the display device generates the translated subtitles while playing the video content, there are difficulties in displaying the subtitles on the screen in sync with the playback of the video content.
[0007] Furthermore, even if a display device supports multilingual subtitles based on multilingual translation, there is an issue in that users find it difficult to intuitively recognize that such support exists. Therefore, it is necessary to provide a user interface (UI) that allows users to intuitively recognize the support of multilingual subtitles on the initial screen or when switching channels.
[0008] The present disclosure is intended to resolve the issue that even if a display device supports a multilingual subtitle function based on multilingual translation, it is difficult for users to intuitively recognize that a multilingual subtitle function is supported.
[0009] The present disclosure is intended to provide a user interface (UI) that allows intuitive recognition that a multilingual subtitle function is supported on the initial screen or during channel switching of a display device.
[0010] The present disclosure is intended to resolve the difficulty of displaying translated subtitles on the screen at the time the video content is played, as the display device generates translated subtitles while playing video content.
[0011] The present disclosure is intended to propose a method for synchronizing media being played on a webOS-based display device with subtitles generated in real time through an external device.
[0012] The present disclosure is for transmitting media being played to an external device and generating subtitles in real time using artificial intelligence (AI).
[0013] The present disclosure is intended to dynamically control translation quality according to the state and processing capabilities of an external device.
[0014] A display device that interacts with an external device according to the present disclosure to achieve the above or other purposes comprises: a communication module configured to be connected to a control server interacting with a content server based on an input from a remote control device that executes content services through a plurality of channels; a display configured to play content on a screen through said channels; and a processor that, when connected to a control server associated with said content service, interacts with said external device to display a subtitle icon associated with the provision of translated subtitles on said screen. The processor may detect a connection to said control server or a transition between said plurality of channels, and when a connection to said control server or a transition between said plurality of channels is detected for the first time, it may display a subtitle icon associated with the provision of translated subtitles on said screen. When a selection input for said subtitle icon is detected, the processor may display a dialog box on said screen associated with the activation of subtitle display, the language of a specific content, and a list of translatable languages, and when a specific language is selected from the list of translatable languages, it may display subtitles translated into said specific language by said external device on said screen based on audio data of said data stream of said specific content.
[0015] According to an embodiment, when a selection input for the subtitle icon is detected, the processor determines whether the subtitle display is in a disabled state, and if the subtitle display is in a disabled state, it may display a first phrase associated with the activation of the subtitle in a first area of the dialog box. If the subtitle display is in a enabled state, the processor may display a second phrase associated with the deactivation of the subtitle in the first area of the dialog box.
[0016] According to an embodiment, the processor can determine whether a first selection input is received on a first button in a first area corresponding to the first phrase via a remote control device. If the first selection input is received in the first area, the processor can display a dialog box on the screen of the external device having a button agreeing to the external device perform AI-based translation considering contextual association.
[0017] According to an embodiment, the processor may display the language of the specific content in a second area of the dialog box. The processor may display a list of translatable languages in a third area of the dialog box. When the specific language in the list of translatable languages is selected through the remote control device, the processor may control the external device to translate the audio data into the specific language.
[0018] According to an embodiment, the processor may display the subtitle icon in the center area of the screen adjacent to one side of a service provision button that enables the content service to be provided on the initial screen in the power-on state. When the subtitle icon is selected, the processor may display the dialog box in the side area of the screen on one side of the center area.
[0019] According to an embodiment, the processor can determine whether it is a live playback mode or a VOD-based VOD playback mode through an IP-based live channel. The processor can determine whether the external device is capable of AI-based translation. If the AI-based translation is not possible in the live playback mode, the processor can display CC associated with general captions on a button at the bottom of the initial screen of the live playback mode. If the AI-based translation is possible in the live playback mode, the processor can display AI CC associated with AI subtitles on a button at the bottom of the initial screen of the live playback mode.
[0020] According to an embodiment, when the button labeled CC is selected, the processor can display subtitles translated by the display device on the screen while playing the first content in the live playback mode. When the button labeled AI CC is selected, the processor can display subtitles translated by considering the contextual correlation in conjunction with the external device while playing the first content in the live playback mode. The processor can compare the translation processing time considering the contextual correlation with the threshold time for displaying the subtitles on the screen by synchronizing them with the voice of the audio data.
[0021] According to an embodiment, if the translation processing time exceeds a threshold time, the processor may control the external device to translate based on the audio data while excluding the contextual association. If the second translation processing time based on the audio data while excluding the contextual association exceeds the threshold time, the processor may transmit the parsed audio data to the external device prior to the audio data being decoded.
[0022] According to an embodiment, if the AI-based translation is not possible in the VOD playback mode, the processor may display CC associated with a general caption on a button at the bottom of the initial screen of the VOD playback mode. If the AI-based translation is possible in the VOD playback mode, the processor may display AI CC associated with an AI subtitle on a button at the bottom of the initial screen of the VOD playback mode.
[0023] According to an embodiment, when the button labeled AI CC is selected, the processor can display translated subtitles on the screen while playing the second content in the VOD playback mode, in conjunction with the external device, taking into account the contextual correlation. When the button labeled CC is selected, the processor can display subtitles translated by the display device on the screen while playing the second content in the VOD playback mode. The processor can compare the translation processing time considering the contextual correlation with a second threshold time to display the subtitles on a screen playing at high speed by synchronizing them with the voice of the audio data.
[0024] According to an embodiment, if the translation processing time exceeds the second threshold time, the processor may control the external device to translate based on the audio data while excluding the contextual association. If the second translation processing time based on the audio data while excluding the contextual association exceeds the second threshold time, the processor may control the playback speed of the second content to a normal speed slower than the high speed. If the second translation processing time exceeds the threshold time at the normal speed, the processor may transmit the parsed audio data to the external device before the audio data is decoded.
[0025] According to an embodiment, the processor may display the subtitle icon at the bottom of one side of the second screen adjacent to icons associated with the playback of the second content while the second content based on the VOD is being played. When the subtitle icon is selected, the processor may display the dialog box in the area of the bottom of the first side. The dialog box displayed on the second screen may be positioned closer to the boundary of one side of the display than the dialog box displayed on the screen.
[0026] According to an embodiment, the processor may compare the translation processing time of the external device with a threshold time for synchronizing the subtitle with the voice and displaying it on the screen. If the translation processing time is less than or equal to the threshold time, the processor may control the external device to generate a subtitle of a first quality based on the previous audio data and the audio data. The subtitle of the first quality may be generated based on the context of the subtitle translated from the previous audio data and the subtitle translated from the audio data.
[0027] According to an embodiment, the processor may control the external device to generate subtitles of a second quality based on the audio data when the translation processing time exceeds a threshold time. The subtitles of the second quality may have lower translation accuracy than the subtitles of the first quality, be based on literal translation, or have a shorter length of text displayed on the screen.
[0028] According to an embodiment, the processor may receive text data corresponding to a subtitle of a second quality based on a literal translation from the external device. The processor may compare an estimated output time for outputting a subtitle of a first quality based on a paraphrase with a threshold time for displaying the subtitle on the screen by synchronizing it with the audio. If it is determined that the estimated output time is earlier than the threshold time, the processor may generate the subtitle of the first quality based on previous audio data and current audio data. If it is determined that the estimated output time is later than the threshold time, the processor may generate the subtitle of the second quality.
[0029] According to an embodiment, the processor may include: a media playback module configured to extract and transmit audio data from a data stream; a voice detection module that collects first audio data in which a voice is detected from the extracted audio data; a text receiving module configured to receive text data in which the voice or a second voice translated from the voice is converted into text; and a subtitle generation module configured to synchronize the first audio data and the text data based on one of the timestamp information of the first audio data, the size of the text data, and synchronization information included in the header of the first audio data. The subtitle generation module may display the synchronized text data as a subtitle on the specific frame of the screen.
[0030] According to an embodiment, the media playback module may include: a source input module configured to receive a data stream including video data and audio data; a demultiplexer configured to classify video data, audio data, and control information from the data stream; a parser configured to parse the audio data; an audio decoder configured to decode and extract the parsed audio data; and an audio sink module configured to output the decoded audio data. The audio data extracted through the parser may be transmitted to the voice detection module.
[0031] According to an embodiment, the voice detection module may be configured to include: a source input module configured to receive audio data extracted from the parser; an audio decoder configured to decode the extracted audio data; a voice activity detector (VAD) configured to detect the voice in the decoded audio data; a payloader module that forms a data stream including a header for synchronization and a payload of data associated with the detected voice; and a transmission module configured to transmit the data stream including the header and the payload to the external device through the communication module.
[0032] According to an embodiment, the processor may be configured to receive status information and capability information of the external device and control the timing of transmitting the audio data to the voice detection module. The processor may decode the audio data extracted through the parser via the voice detection module and transmit the audio data decoded to a first sound quality to the external device. If the text conversion accuracy of the external device is below a threshold ratio, the processor may transmit the audio data decoded to a second sound quality higher than the first sound quality via the audio decoder of the media playback module to the external device via the voice detection module. If the translation processing speed of the external device is below a threshold speed, the processor may decode the audio data extracted through the parser via the voice detection module and transmit the audio data decoded to the first sound quality to the external device.
[0033] According to an embodiment, the processor can determine whether the speech times of multiple speakers included in the audio data extracted through the audio decoder of the media playback module overlap. If the speech times overlap, the processor can transmit sequence information including a start time stamp and an end time stamp of each speaker's speech time, and the audio data, to the external device. The processor can display the translated subtitles of the text data synchronized with the speech times according to the sequence information. If the speech times do not overlap, the processor can sequentially transmit segments of the audio data corresponding to the speech times to the external device. The processor can display each of the translated subtitles of the segments of the text data corresponding to the segments of the audio data for a respective period set according to the sizes of the segments of the text data.
[0034] According to the present specification, users of a multilingual subtitle generation function can easily use the service. Accordingly, to facilitate the user's use of such multilingual subtitle services, dialog boxes associated with subtitle icons and translation options can be visually arranged for ease.
[0035] According to the present specification, a subtitle icon is displayed on the screen during the initial screen of the display device or when switching channels, so that the user can intuitively recognize that a multilingual subtitle function based on multilingual translation is supported.
[0036] According to the present specification, a subtitle icon, which is a user interface (UI) that allows for intuitive recognition that a multilingual subtitle function is supported, can be provided in an optimal location on the initial screen of a display device, during channel switching, or during content playback. Accordingly, the subtitle icon and a dialog box for translation options can be optimally displayed so as not to interfere with the user's perception of the current screen of the display device.
[0037] According to the present specification, as an external device generates translated subtitles while a display device plays video content, the difficulty of displaying translated subtitles on the screen at the time the video content is played can be resolved.
[0038] According to the present specification, a method can be proposed to synchronize subtitles generated in real time through an external device by transmitting audio data to an external device in advance before decoding media on a webOS-based display device.
[0039] According to the present specification, media currently being played can be transmitted to an external device to generate subtitles in real time using artificial intelligence (AI), and the subtitles can be accurately aligned with the timing of media playback to enhance the user's viewing experience.
[0040] According to the present specification, the timing of transmitting audio data to be translated to an external device can be adaptively controlled depending on the state of the external device, such as text translation accuracy and translation processing speed.
[0041] According to the present specification, an automatic subtitle generation service can be provided by utilizing hardware resources within an external device or a display device.
[0042] According to the present specification, translation quality can be dynamically controlled through context-based paraphrasing of consecutive utterances or literal translation based on the utterances, depending on processing capabilities such as the state of an external device and translation processing time.
[0043] FIG. 1 is a block diagram illustrating the configuration of a display device according to one embodiment of the present disclosure.
[0044] FIG. 2 is a drawing for explaining a content server according to an embodiment of the present disclosure.
[0045] FIG. 3 is a drawing for explaining a content provision system according to an embodiment of the present disclosure.
[0046] Figure 4 shows a content delivery system including a display device that displays translated subtitles in conjunction with an external device.
[0047] FIGS. 5A and FIGS. 5B show the configuration of an initial screen of a display device according to an embodiment.
[0048] Figure 6 shows the screen of an external device in response to a translation request on a display device.
[0049] Figures 7 and 8 show examples of buttons at the bottom of the initial screen according to the type of content and whether AI translation is supported when the content service is executed.
[0050] Figure 9 is a flowchart of a control method for a display device that displays translated subtitles in conjunction with an external device in live playback mode.
[0051] FIG. 10 is a flowchart of a control method for a display device that displays translated subtitles in conjunction with an external device in VOD playback mode.
[0052] FIG. 11 shows the screen of a display device configured to display a subtitle icon during VOD playback according to an embodiment.
[0053] FIGS. 12a and FIGS. 12b show flowcharts in which a display device according to the present disclosure controls an external device by comparing the translation processing time and threshold time of the external device.
[0054] FIG. 13 shows examples of subtitles of first quality and second quality displayed on a display device according to the state of an external device.
[0055] Figure 14 shows the configuration of a display device that transmits parsed audio data or decoded audio data.
[0056] FIGS. 15a and FIGS. 15b show the configuration of an external device receiving parsed audio data or decoded audio data.
[0057] FIG. 16 is a flowchart of a control method that controls the timing of adaptively transmitting audio data according to the state of an external device according to the present disclosure.
[0058] Figure 17 shows a flowchart of a control method for transmitting audio data of a different format to an external device by determining whether the speech times of the speaker overlap.
[0059] It should be noted that technical terms used in this specification are used merely to describe specific embodiments and are not intended to limit the invention. Additionally, singular expressions used in this specification include plural expressions unless the context clearly indicates otherwise. The suffixes "module" and "part" for components used in the following description are assigned or used interchangeably solely for the ease of drafting the specification and do not inherently possess distinct meanings or roles.
[0060] In this specification, terms such as "composed of" or "comprising" should not be interpreted as necessarily including all of the various components or steps described in the specification, and should be interpreted as potentially excluding some of the components or steps, or including additional components or steps.
[0061] In addition, when describing the technology disclosed in this specification, if it is determined that a detailed description of related prior art could obscure the essence of the technology disclosed in this specification, such detailed description is omitted.
[0062] In addition, the attached drawings are intended only to facilitate understanding of the embodiments disclosed in this specification, and the technical concept disclosed in this specification is not limited by the attached drawings; it should be understood that they include all modifications, equivalents, and substitutions that fall within the concept and technical scope of the present invention. Furthermore, not only each of the embodiments described below, but also combinations of embodiments may fall within the concept and technical scope of the present invention as modifications, equivalents, and substitutions that fall within the concept and technical scope of the present invention.
[0063] Hereinafter, embodiments disclosed in this specification will be described in detail with reference to the attached drawings.
[0064] FIG. 1 is a block diagram illustrating the configuration of a display device according to one embodiment of the present disclosure.
[0065] Referring to FIG. 1, the display device (100) may include a broadcast receiver (130), an external device interface unit (135), a storage unit (140), a user input interface (150), a processor (170), a communication module (173), a voice acquisition unit (175), a display unit (180), an audio output unit (185), and a power supply unit (190). Since the external device interface unit (135) performs wired communication with a peripheral device, the external device interface unit (135) may be referred to as a wired communication module. Since the communication module (173) performs wireless communication through a wireless signal, it may be referred to as a wireless communication module.
[0066] The broadcast receiving unit (130) may include a tuner (131), a demodulating unit (132), and a network interface unit (133).
[0067] The tuner (131) can tune to a specific broadcast channel according to a channel tuning command. The tuner (131) can receive a broadcast signal for the tuned specific broadcast channel.
[0068] The demodulator (132) can separate the received broadcast signal into a video signal, an audio signal, and a data signal related to the broadcast program, and can restore the separated video signal, audio signal, and data signal into a form that can be output.
[0069] The network interface unit (133) may provide an interface for connecting the display device (100) to a wired / wireless network including the Internet network. The network interface unit (133) may transmit or receive data to or from other users or other electronic devices through the connected network or another network linked to the connected network.
[0070] The network interface unit (133) can access a specific web page through a connected network or another network linked to the connected network. That is, it can access a specific web page through a network and transmit or receive data with the corresponding server.
[0071] In addition, the network interface unit (133) can receive content or data provided by a content provider or network operator. That is, the network interface unit (133) can receive content such as movies, advertisements, games, VOD, broadcast signals, and related information provided by a content provider or network provider through a network.
[0072] Additionally, the network interface unit (133) can receive firmware update information and update files provided by the network operator, and can transmit data to the internet, content provider, or network operator.
[0073] The network interface unit (133) can select and receive a desired application among the applications that are open to the public through the network.
[0074] The external device interface unit (135) can receive an application or a list of applications within an adjacent external device and transmit it to a processor (170) or a storage unit (140).
[0075] The external device interface section (135) can provide a connection path between the display device (100) and an external device. The external device interface section (135) can receive one or more of video and audio output from an external device connected to the display device (100) wirelessly or via a wired connection and transmit them to the processor (170). The external device interface section (135) may include a plurality of external input terminals. The plurality of external input terminals may include an RGB terminal, one or more HDMI (High Definition Multimedia Interface) terminals, and a component terminal.
[0076] The video signal of an external device input through the external device interface unit (135) can be output through the display unit (180). The voice signal of an external device input through the external device interface unit (135) can be output through the audio output unit (185).
[0077] The external device that can be connected to the external device interface section (135) may be any one of a set-top box, Blu-ray player, DVD player, game console, soundbar, smartphone, PC, USB memory, or home theater, but this is merely an example.
[0078] In addition, some of the content data stored in the display device (100) can be transmitted to another user or other electronic device selected among other users or other electronic devices that are previously registered in the display device (100).
[0079] The storage unit (140) can store programs for each signal processing and control within the processor (170), and can store signal-processed video, audio, or data signals.
[0080] Additionally, the storage unit (140) may perform the function of temporarily storing video, audio, or data signals input from the external device interface unit (135) or the network interface unit (133), and may also store information regarding a predetermined image through a channel memory function.
[0081] The storage unit (140) can store an application or a list of applications input from an external device interface unit (135) or a network interface unit (133).
[0082] The display device (100) can play content files (video files, still image files, music files, document files, application files, etc.) stored in the storage unit (140) and provide them to the user.
[0083] The user input interface (150) can transmit a signal input by the user to the processor (170) or transmit a signal from the processor (170) to the user. For example, the user input interface (150) can receive and process control signals such as power on / off, channel selection, and screen setting from the remote control device (200) according to various communication methods such as Bluetooth, Ultra Wideband (UWB), ZigBee, Radio Frequency (RF) communication, or Infrared (IR) communication, or process to transmit control signals from the processor (170) to the remote control device (200).
[0084] Additionally, the user input interface (150) can transmit control signals input from local keys (not shown), such as a power key, channel key, volume key, and setting value, to the processor (170).
[0085] The image signal processed by the processor (170) can be input to the display unit (180) and displayed as an image corresponding to the image signal. Additionally, the image signal processed by the processor (170) can be input to an external output device through the external device interface unit (135).
[0086] The voice signal processed by the processor (170) can be output as audio to the audio output unit (185). Additionally, the voice signal processed by the processor (170) can be input to an external output device through the external device interface unit (135).
[0087] In addition, the processor (170) can control the overall operation within the display device (100).
[0088] Additionally, the processor (170) can control the display device (100) by means of user commands or internal programs input through the user input interface (150). The processor (170) can connect to a network to enable the user to download desired applications or a list of applications into the display device (100). The processor (170) may be configured to execute at least one application program to control the display device (100). The processor (170) may be configured to play media of a data stream including video and audio through a media playback module (100). The media playback module (100) may be an application program of a media player that plays media. The processor (170) may detect voice included in the audio data through a voice detection module (20).
[0089] The processor (170) enables the processed video or audio signal, such as channel information selected by the user, to be output through the display unit (180) or audio output unit (185).
[0090] Additionally, the processor (170) enables a video signal or audio signal from an external device, such as a camera or camcorder, which is input through the external device interface unit (135), to be output through the display unit (180) or audio output unit (185) in accordance with an external device video playback command received through the user input interface (150).
[0091] Meanwhile, the processor (170) can control the display unit (180) to display an image, for example, a broadcast image input through the tuner (131), an external input image input through the external device interface unit (135), an image input through the network interface unit, or an image stored in the storage unit (140) can be controlled to be displayed on the display unit (180). In this case, the image displayed on the display unit (180) may be a still image or a video, and may be a 2D image or a 3D image.
[0092] Additionally, the processor (170) can control the playback of content stored in the display device (100), received broadcast content, or external input content input from the outside, and the content may be in various forms such as broadcast video, external input video, audio file, still image, connected web screen, and document file.
[0093] The communication module (173) can communicate with an external device via wired or wireless communication. The communication module (173) can perform short-range communication with an external device. To this end, the communication module (173) can support short-range communication by using at least one of Bluetooth™, BLE (Bluetooth Low Energy), RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technologies. Such a communication module (173) can support wireless communication between a display device (100) and a wireless communication system, between a display device (100) and another display device (100), or between a display device (100) and a network where a display device (100, or an external server) is located, via a wireless area network. The wireless area network may be a wireless personal area network.
[0094] Here, another display device (100) may be a wearable device (e.g., a smartwatch, smart glass, head-mounted display, or mobile terminal such as a smartphone) capable of exchanging (or interacting with) data with the display device (100) according to the present invention. A communication module (173) may detect (or recognize) a wearable device capable of communicating around the display device (100). Furthermore, if the detected wearable device is an authenticated device to communicate with the display device (100) according to the present invention, the processor (170) may transmit at least a portion of the data processed in the display device (100) to the wearable device through the communication module (173). Thus, a user of the wearable device may use the data processed in the display device (100) through the wearable device.
[0095] The voice acquisition unit (175) can acquire audio. The voice acquisition unit (175) may include at least one microphone (not shown) and can acquire audio around the display device (100) through the microphone (not shown).
[0096] The display unit (180) can generate a driving signal by converting the video signal, data signal, OSD signal processed by the processor (170) or the video signal, data signal, etc. received from the external device interface unit (135) into R, G, and B signals, respectively.
[0097] Meanwhile, since the display device (100) illustrated in FIG. 1 is merely an embodiment of the present invention, some of the illustrated components may be integrated, added, or omitted depending on the specifications of the actual implemented display device (100).
[0098] That is, as needed, two or more components may be combined into a single component, or a single component may be subdivided into two or more components. In addition, the functions performed in each block are intended to explain embodiments of the present invention, and the specific operations or devices do not limit the scope of the present invention.
[0099] FIG. 2 is a drawing for explaining a content server according to an embodiment of the present disclosure.
[0100] Referring to FIGS. 1 and FIGS. 2, the content server (300) can provide a recommendation service that recommends content that a viewer using the display device (100) may prefer.
[0101] The content server (300) may include a communication interface (310), memory (320), and a processor (330).
[0102] The content server (300) can transmit and receive data to and from at least one display device (100) via wired or wireless communication through the communication interface (310).
[0103] The memory (320) may include a content information database (321). The content information database (321) may store information related to content played on each device. For example, the content information database (321) may store content playback information, content setting information, or application installation information in association with the identification information of each device.
[0104] When the processor (330) receives a content recommendation request from a display device (100) or an external device, it can recommend content optimized for each device based on data stored in the content information database (321).
[0105] FIG. 3 is a drawing for explaining a content provision system according to an embodiment of the present disclosure.
[0106] Referring to FIGS. 1 to 3, the content providing system (1000) may include at least one display device (100), at least one remote control device (200), a content server (300), and an external device (400).
[0107] The processor (170) of the display device (100) can play content.
[0108] Additionally, the processor (170) can generate content playback information regarding the played content. Additionally, the processor (170) can generate content setting information, which is information regarding the quality, volume, and preferred channel status set when playing the content.
[0109] Content playback information may include at least one of content identification information, content genre information, content playback start time information, content playback end time information, and content total playback time information for the played content.
[0110] Content setting information may include at least one of quality information set for the content when playing the content, volume information, and preferred channel information regarding whether the user has registered the channel providing the content as a preferred channel.
[0111] The processor (170) can transmit device identification information of the display device (100), generated content playback information, and generated content setting information to the content server (300) through the communication interface (173). The device identification information may be unique identification information for distinguishing it from other devices.
[0112] The content server (300) can store content playback information and content setting information received from the display device (100) in the content information database (321) in association with device identification information.
[0113] Meanwhile, the processor (170) can receive a content recommendation command through the user input interface unit (150) or the voice acquisition unit (175).
[0114] When the processor (170) receives a content recommendation command, it can transmit device identification information of the display device (100) and a content recommendation request to the content server (300) through the communication interface (173).
[0115] The communication interface (310) of the content server (300) can receive device identification information and a content recommendation request from the display device (100).
[0116] The processor (330) of the content server (300) can obtain content playback information and content setting information associated with the display device (100) from the content information database (321) based on device identification information.
[0117] The processor (330) can generate content recommendation information and recommendation setting information for the display device (100) based on content playback information and content setting information. The content recommendation information may include recommended content identification information and recommended content genre information for at least one recommended content. Additionally, the recommendation setting information may include recommended image quality setting information and preferred channel information.
[0118] The processor (330) can transmit content recommendation information and recommendation setting information to the display device (100) through the communication interface (310).
[0119] The processor (170) can receive content recommendation information and recommendation setting information from the content server (300) through the communication interface (173).
[0120] The processor (170) can display at least one recommended content based on the received content recommendation information. Additionally, when a playback command for the recommended content is input through the user input interface unit (150) or the voice acquisition unit (175), the processor (170) can set the quality of the recommended content to be played based on the received recommendation setting information and play it.
[0121] The quality of recommended content is set for playback, and if a user requests a change to a preferred channel, a channel change to the preferred channel can be performed based on the preferred channel information.
[0122] Meanwhile, the display device (100) can mirror the content currently being played to an external device (400). The external device (400) may include another display device or a mobile device. In this case, the mirrored content can be viewed through the external device (400). Therefore, viewing information regarding the mirrored content needs to serve as basic data for recommending content to the external device (400).
[0123] Meanwhile, when the display device (100) performs a mirroring operation to an external device (400), it may receive a control command from the external device (400) to control the display device (100). The control command may include a content change command to change the content being played from the first content to the second content. When the display device (100) receives the content change command, it may play the changed content. In this case, the display device (100) needs to transmit content playback information regarding the changed content to the content server (300) as information for content recommendation by the external device (400).
[0124] Hereinafter, a display device that displays translated subtitles in conjunction with an external device in a content provision system according to the present disclosure will be described. In this regard, FIG. 4 shows a content provision system including a display device that displays translated subtitles in conjunction with an external device.
[0125] Referring to FIG. 4, the content providing system (1000) may include a display device (100), a content server (300), an external device (400), and a control server (500). The display device (100) may be configured to receive and play content from the content server (300) through the control server (500).
[0126] The content server (300) may be configured to include a first server (300a) and a second server (300b). The first server (300a) may be configured to include a first video content server (300-1) and a second video content server (300-2). The first video content server (300-1) may be configured to deliver first video content of an IP-based live broadcast to a display device (100) via a control server (500). The second video content server (300-2) may be configured to deliver second video content based on VOD (Video on demand) to a display device (100) via a control server (500). The second server (300b) may be configured to deliver digital television-based broadcast content to a display device (100).
[0127] The control server (500) may be a server operated by the manufacturer of the display device (100). The control server (500) may be configured to manage metadata of the first video content and the second video content. The control server (500) may be configured to control the recommendation and playback of the first and second video content in conjunction with the display device (100).
[0128] The display device (100) can extract audio data from a data stream and transmit it to an external device (400). The display device (100) can be configured to receive text data of translated subtitles corresponding to the audio data from the external device (400).
[0129] Meanwhile, FIGS. 5a and 5b illustrate the configuration of an initial screen of a display device according to an embodiment. FIG. 5a illustrates a screen in which a dialog box (182) is displayed to enable AI-based subtitles when a subtitle icon (181) is selected while the subtitle service is disabled. Referring to FIG. 5a, an indication bar (181a) indicating a disabled state in which translated subtitles are not displayed on the screen may be displayed in the inner area of the subtitle icon (181). The indication bar (181a) of the disabled state may be displayed in a first state (e.g., red).
[0130] When the subtitle icon (181) is selected, a first phrase "Enable AI Caption" may be displayed in the first area (182R1) of the dialog box (182). When user selection input is granted in the first area (182R1) displayed as "Enable AI Caption," the system may switch to an enabled state where translated subtitles are displayed on the screen. In the enabled state, AI-based translated subtitles may be displayed on the screen as content is played.
[0131] FIG. 5b shows a screen in which a dialog box (182) is displayed to disable AI-based subtitles when the subtitle icon (181) is selected while the subtitle service is active. Referring to FIG. 5b, an indicator bar (181b) indicating an active state in which translated subtitles are displayed on the screen may be displayed in the inner area of the subtitle icon (181). The indicator bar (181b) of the active state may be displayed as a second state (e.g., green, a specific pattern).
[0132] When the subtitle icon (181) is selected, a second phrase "Disable AI Caption" may be displayed in the first area (182R1) of the dialog box (182). When user selection input is granted in the first area (182R1) displayed as "Disable AI Caption," the screen may be switched to a disabled state where the translated subtitles are not displayed. In the disabled state, the content may be played without the translated subtitles.
[0133] Accordingly, users of the multilingual subtitle generation function can easily use the service through a subtitle icon (181) and a dialog box (182) associated with subtitle activation / deactivation and translation options. Thus, the subtitle icon and the dialog box associated with translation options can be visually arranged to make it easier for users to utilize this multilingual subtitle service.
[0134] Referring to FIGS. 5a and 5b, a subtitle icon (181) may be displayed upon initial access to the channels of the content service through a service provision button (184) labeled "LG Channel," or upon changing channels or receiving a specific input on the current screen. The text "NAIS CC" may be displayed on the subtitle icon (181). NAIS stands for nearby artificial intelligence solution. Thus, a solution may be provided to receive AI-based translated subtitles by linking with a peripheral device adjacent to the display device (100). Meanwhile, CC (Closed caption) may be configured so that, unlike subtitles, it is not normally visible and is exposed only after a separate setting is configured. The display device (100) that links with an external device according to the present disclosure may be configured to display text in the form of a translated CC or text of a subtitle translated according to a setting on the screen.
[0135] A service provision button (184) that enables content services to be provided on the initial screen may be placed adjacent to the subtitle icon (181). When the service provision button (184) is selected, a first video content of an IP-based live broadcast and a second video content of a VOD-based broadcast may be provided.
[0136] Regarding whether the text of the translated subtitle can be displayed on the screen, an indicator bar (181a) for a disabled state and an indicator bar (181b) for an enabled state may be placed in the inner area of the subtitle icon (181). As input is selected on the subtitle icon (181), a first phrase "Enable AI Caption" and a second phrase "Disable AI Caption" may be optionally displayed in the first area (182R1) of the dialog box (182). The first phrase and the second phrase in the first area (182R1) of the dialog box (182) may be toggled and displayed according to the input selection.
[0137] Referring to FIGS. 1 to 5b, when using an application program on an external device (400), control for the translation function can be linked with the display device (100). Accordingly, control operations on the display device (100) are applied to the external device (400), and control operations on the external device (400) can also be applied to the display device (100).
[0138] Referring to FIGS. 1 to 5b, a display device (100) that displays translated subtitles in conjunction with an external device according to the present disclosure will be described. The display device (100) may be configured to include a processor (170), a communication module (173), and a display (180).
[0139] Based on an input from a remote control device (200) that executes content services through multiple channels, the communication module (173) may be configured to be connected to a control server (500) that interacts with a content server (300). The display (180) may be configured to play content on the screen through the channels. When the processor (170) is connected to the control server (500) associated with the content service, it may be configured to display a subtitle icon (181) on the screen that is associated with the provision of translated subtitles in conjunction with an external device (400).
[0140] The processor (170) can detect that the display device (100) is switched from a power-off state to a power-on state. The processor (170) can detect that it is connected to a control server (500) associated with a content service in the power-on state. Additionally, the processor (170) can detect a switch between multiple channels in the power-on state.
[0141] When a connection to the control server (500) or a switch between multiple channels is detected, the processor (170) may display a subtitle icon (181) associated with the provision of translated subtitles on the screen. In this regard, the processor (170) may detect the initial connection to the control server (500) or a switch between multiple channels while the power is on. The processor (170) may detect a selection input for the subtitle icon (181). When a selection input for the subtitle icon (181) is detected, the processor (170) may display a dialog box (182) on the screen.
[0142] The dialog box (182) may be associated with the activation of subtitle display, the language of specific content being played on a specific channel, and a list of translatable languages. When a specific language is selected from the list of translatable languages, the processor (170) may display subtitles translated into the specific language on the screen. Based on the audio data of the data stream of the specific content, the processor (170) may display subtitles translated into the specific language by an external device (400) on the screen.
[0143] When a selection input for the subtitle icon (181) is detected, the processor (170) can determine whether the subtitle display is disabled. If the subtitle display is disabled, the processor (170) can display a first phrase associated with the activation of the subtitle in the first area (182R1) of the dialog box (182). The first phrase associated with the activation of the subtitle may be "Enable AI Caption" as in FIG. 5a. If the subtitle display is enabled, the processor (170) can display a second phrase associated with the deactivation of the subtitle in the first area (182R1) of the dialog box (182). The second phrase associated with the deactivation of the subtitle may be "Disable AI Caption" as in FIG. 5b.
[0144] Meanwhile, the processor (180) may display the language of specific content in the second area (182R2) of the dialog box (182). The original language may be displayed as English in the second area (182R2) of the dialog box (182). The processor (180) may display a list of translatable languages in the third area (182R3) of the dialog box (182). The list of translatable languages may include Spanish, German, French, and Korean, but is not limited thereto and may be changed depending on the application.
[0145] The processor (180) can detect that a specific language from the list of translatable languages is selected via the remote control device (200). When a specific language from the list of translatable languages is selected, the processor (180) can control the external device (400) to translate audio data into the specific language.
[0146] Meanwhile, a display device that displays translated subtitles in conjunction with an external device according to the present disclosure can perform AI-based translation in conjunction with an external device (400). In this regard, FIG. 6 shows the screen of an external device in response to a translation request on the display device.
[0147] Referring to FIG. 6, the external device (400) may be a mobile terminal such as a smartphone or tablet PC, but is not limited thereto and can be changed according to the application. The external device (400) may be any electronic device capable of interacting with a display device, such as a PC or television, or other display device. A dialog box (402) displaying the phrase "You can perform AI translation with your mobile phone" may be placed on the screen of the display (480) of the external device (400). Since sharing of specific content is performed in response to a translation request from the display device, the phrase "Share" and a corresponding image may be displayed in the inner area of the dialog box (402).
[0148] Meanwhile, a specific image (403) may be placed in the inner area of the dialog box (402). The specific image (403) may be a thumbnail image of the current screen of the display device or content associated with the subtitle to be translated, but is not limited thereto and can be changed depending on the application. A button (401) associated with agreeing to the external device (400) performing AI-based translation may be placed in the inner area of the dialog box (402).
[0149] When a button (401) agreeing to perform AI-based translation on an external device (400) is touched, it may be considered as consent to perform AI-based translation. Accordingly, the external device (400) can perform text conversion and translation of the voice of audio data received from the display device. The external device (400) can transmit text data corresponding to the translated subtitles to the external device (400).
[0150] Referring to FIGS. 1 to 6, a display device (100) that displays translated subtitles in conjunction with an external device according to the present disclosure will be described. A processor (180) can determine whether a first selection input is received at a first button of a first area (182R1) corresponding to a first phrase through a remote control device (200).
[0151] When a first selection input is received in the first area (182R1), the processor (180) may display a dialog box (402) formed with a button (401) agreeing that the external device (400) performs AI-based translation on the screen of the external device (400). In this regard, the external device (400) may perform AI-based translation considering contextual associations.
[0152] Meanwhile, a display device that displays translated subtitles in conjunction with an external device according to the present disclosure may display a translation icon before executing a content service that provides content through multiple channels. Additionally, the display device may execute the content service and implement various forms of translation methods on the initial screen. In this regard, FIGS. 7 and 8 show examples of buttons at the bottom of the initial screen according to the type of content and whether AI translation is supported when the content service is executed.
[0153] FIG. 7 shows buttons (183a, 183b) corresponding to user interfaces capable of displaying general captions and AI subtitles in IP-based live playback mode. FIG. 8 shows buttons (183c, 183d) corresponding to user interfaces capable of displaying general captions and AI subtitles in VOD-based VOD playback mode.
[0154] Referring to FIG. 7(a), in live playback mode, the phrase "CC (Closed caption)" associated with a general caption may be displayed on the button (183a) at the bottom of the initial screen of the display (180). Unlike subtitles, CC (Closed caption) is not normally visible and can be configured to be displayed only after a separate setting is made.
[0155] Referring to FIG. 7(b), in live playback mode, the phrase “AI CC (Artificial Intelligence Closed caption)” associated with AI translation-based AI subtitles may be displayed on the button (183b) at the bottom of the initial screen of the display (180).
[0156] Referring to FIG. 8(a), in VOD playback mode, the phrase "CC (Closed caption)" associated with a general caption may be displayed on the button (183c) at the bottom of the initial screen of the display (180). Unlike subtitles, CC (Closed caption) is not normally visible and can be configured to be displayed only after a separate setting is made.
[0157] Referring to FIG. 8(b), in VOD playback mode, the phrase “AI CC (Artificial Intelligence Closed caption)” associated with AI translation-based AI subtitles may be displayed on the button (183d) at the bottom of the initial screen of the display (180).
[0158] Referring to FIGS. 1 to 8, a display device (100) that displays translated subtitles in conjunction with an external device according to the present disclosure will be described.
[0159] The processor (170) may display a subtitle icon (181) in the center area of the screen adjacent to one side of the service provision button (184) that enables content services to be provided on the initial screen when the power is on. When the subtitle icon (181) is selected, a dialog box (182) may be displayed in the side area of the screen. The side area of the screen may correspond to one side of the center area of the screen. Sub-screens capable of playing recommended content, such as recommended movies, may be displayed in the area below the center area where the subtitle icon (181) is displayed.
[0160] Meanwhile, a display device that displays translated subtitles in conjunction with an external device according to the present disclosure may determine the translation method differently depending on the type / playback mode of the content being played through the content service. Additionally, the display device may determine whether the external device is capable of AI-based translation and display a translation icon differently on the surface. In this regard, FIG. 9 is a flowchart of a control method for a display device that displays translated subtitles in conjunction with an external device in live playback mode.
[0161] Referring to FIGS. 1 to 9, a control method performed by a processor (170) of a display device (100) that displays translated subtitles in conjunction with an external device according to the present disclosure will be described.
[0162] The processor (170) can determine (S110) whether it is a live playback mode via an IP-based live channel or a VOD-based VOD playback mode. The processor (170) can determine (S120) whether the external device (400) is capable of AI-based translation.
[0163] If AI-based translation is not possible in live playback mode, the processor (170) can display CC associated with general captions on the button (183a) at the bottom of the initial screen of live playback mode (S130). If AI-based translation is possible in live playback mode, the processor (170) can control the screen (S140) to display AI CC associated with AI subtitles on the button (183b) at the bottom of the initial screen of live playback mode.
[0164] When the button (183a) marked with CC is selected (S131), the processor (170) can display subtitles translated by the display device (100) on the screen (S132) while playing the first content in live playback mode. Thus, the processor (170) can display subtitles translated by itself on the screen without interacting with an external device (400). In this regard, the processor (170) may be configured to convert speech detected from audio data into text and to convert the text into text data of a specific language. When the button (183a) marked with CC is not selected, the processor (170) can play the first content (S133) without generating subtitles in live playback mode.
[0165] When the button (183b) marked with AI CC is selected (S141), the processor (170) can display translated subtitles on the screen (S142) by considering contextual associations in conjunction with an external device (400) while playing the first content in live playback mode. When the button marked with AI CC is not selected, the processor (170) can play the first content (S143) without generating AI-based subtitles in live playback mode.
[0166] Meanwhile, the processor (170) can compare the translation processing time considering contextual correlation with the threshold time to display subtitles on the screen by synchronizing them with the voice of the audio data (S150). If the translation processing time exceeds the threshold time, the processor (170) can control the external device (400) to translate based on the audio data, excluding contextual correlation (S151). If the translation processing time is less than or equal to the threshold time, the processor (170) can display the translated subtitles on the screen considering contextual correlation (S142).
[0167] Meanwhile, the processor (170) can compare the second translation processing time based on audio data with the threshold time (S160), excluding contextual associations. If the second translation processing time exceeds the threshold time, the processor (170) can transmit the parsed audio data, prior to audio data decoding, to the external device (400) (S161). If the second translation processing time is less than or equal to the threshold time, the processor (170) can control the external device (400) to translate based on the decoded audio data, excluding contextual associations (S151).
[0168] Meanwhile, the aforementioned operations performed by the processor (170) may also be performed in VOD playback mode. The operations performed in VOD playback mode may be partially identical or similar to the operations of FIG. 9. FIG. 10 is a flowchart of a control method for a display device that displays translated subtitles in conjunction with an external device in VOD playback mode.
[0169] Referring to FIGS. 1 to 10, a control method performed by a processor (170) of a display device (100) that displays translated subtitles in VOD playback mode in conjunction with an external device according to the present disclosure will be described.
[0170] The processor (170) can determine (S110) whether it is a live playback mode via an IP-based live channel or a VOD-based VOD playback mode. The processor (170) can determine (S120) whether the external device (400) is capable of AI-based translation.
[0171] If AI-based translation is not possible in VOD playback mode, the processor (170) can display CC associated with general captions on the button (183c) at the bottom of the initial screen of VOD playback mode (S130). If AI-based translation is possible in VOD playback mode, the processor (170) can control the screen (S140) to display AI CC associated with AI subtitles on the button (183d) at the bottom of the initial screen of VOD playback mode.
[0172] When the button (183a) marked with CC is selected (S131), the processor (170) can display subtitles translated by the display device (100) on the screen (S132) while playing the first content in VOD playback mode. Thus, the processor (170) can display subtitles translated by itself on the screen without interacting with an external device (400). In this regard, the processor (170) may be configured to convert speech detected from audio data into text and to convert the text into text data of a specific language. When the button (183a) marked with CC is not selected, the processor (170) can play the first content (S133) without generating subtitles in live playback mode.
[0173] When the button (183b) marked with AI CC is selected (S141), the processor (170) can display translated subtitles on the screen (S142) by considering contextual associations in conjunction with an external device (400) while playing the first content in VOD playback mode. When the button marked with AI CC is not selected, the processor (170) can play the first content (S143) without generating AI-based subtitles in VOD playback mode.
[0174] Meanwhile, the processor (170) can compare the translation processing time considering the contextual correlation with the second threshold time (S150) to display the subtitles on a screen played at high speed by synchronizing them with the voice of the audio data. In this regard, a data stream containing video data and audio data can be played at high speeds such as 1.25x, 1.5x, and 2x.
[0175] If the translation processing time exceeds the second threshold time, the processor (170) can control the external device (400) to translate based on audio data, excluding contextual associations. If the translation processing time is less than or equal to the second threshold time, the processor (170) can display the translated subtitles on the screen (S142), taking contextual associations into account.
[0176] Meanwhile, the processor (170) can compare the second translation processing time based on audio data with the second threshold time (S155), excluding contextual associations. If the second translation processing time exceeds the second threshold time, the processor (170) can display the translated subtitles (S156) while controlling the playback speed of the second content to a speed slower than the high speed. If the second translation processing time exceeds the second threshold time, the processor (170) can display the translated subtitles (S156) while controlling the playback speed of the second content to a normal speed slower than the high speed.
[0177] In this regard, a data stream containing video data and audio data may be played at high speeds such as 1.25x, 1.5x, and 2x. For example, if the second translation processing time exceeds the second threshold time when played at 1.5x speed, the second content may be played at 1.25x speed and the translated subtitles may be displayed. If the second translation processing time exceeds the second threshold time when played at 1.25x speed, the second content may be played at 1x speed and the translated subtitles may be displayed.
[0178] Meanwhile, the processor (170) can compare the second translation processing time based on audio data with the threshold time at normal speed, excluding contextual associations (S160). If the second translation processing time exceeds the threshold time, the processor (170) can transmit the parsed audio data, prior to audio data decoding, to an external device (400) (S161). If the second translation processing time is less than or equal to the threshold time, the processor (170) can control the external device (400) to translate based on the decoded audio data, excluding contextual associations (S162).
[0179] Meanwhile, a display device that interacts with an external device according to the present disclosure may be configured to display a subtitle icon during VOD playback. FIG. 11 shows a screen of a display device configured to display a subtitle icon during VOD playback according to an embodiment. Referring to FIG. 11, a subtitle icon (181) may be displayed at the bottom of one side of a second screen adjacent to one side of icons (184a, 184b, 184c) associated with the playback of content while VOD-based content is playing. For example, the icons (184a, 184b, 184c) associated with the playback of content correspond to icons associated with skipping back, play / pause, and skipping forward, respectively.
[0180] Referring to FIGS. 1 through 11, a display device (100) configured to display a subtitle icon during VOD playback is described. A processor (170) may display a subtitle icon (181) at the bottom of one side of a second screen adjacent to one side of icons (184a, 184b, 184c) associated with the playback of the second content while the second content is playing. The subtitle icon (181) may be displayed on one side of an icon (184c) associated with skipping forward. As the VOD-based second content is played, a progressive bar indicating the playback status up to the end relative to the total length of the second content may be placed at the bottom of the screen. The progressive bar may be placed above the icons (184a, 184b, 184c) associated with the playback of the second content.
[0181] When the subtitle icon (181) is selected, the processor (170) may display a dialog box (182b) adjacent to the subtitle icon (181). The dialog box (182b) may be displayed in an adjacent area of one side of the second screen at the bottom of one side of the second screen. The processor (170) may control the subtitle icon (181) to disappear when the dialog box (182b) is displayed according to user input. The processor (170) may control the subtitle icon (181) to be displayed when the dialog box (182b) disappears according to user input.
[0182] The dialog box (182b) displayed on the second screen may be positioned closer to one side boundary of the display (180) than the dialog box (182) displayed on the screen. Thus, the dialog box (182b) displayed during VOD playback is positioned closer to one side boundary than the dialog box (182) displayed on the initial screen so that the screen being played on the display (180) is not obscured, thereby improving the user viewing experience.
[0183] By placing the subtitle icon (181) and dialog box (182, 182b) in an optimal location, the subtitle icon, which is a user interface (UI) that allows for intuitive recognition that multilingual subtitle functionality is supported, can be provided in an optimal location. In this regard, the subtitle icon (181) and dialog box (182, 182b) can be placed in an optimal location on the initial screen of the display device, when switching channels, or when playing content. Thus, the subtitle icon and the dialog box of the translation option can be optimally displayed so as not to interfere with the user's recognition of the current screen of the display device.
[0184] Meanwhile, a display device that interacts with an external device according to the present disclosure can transmit audio data to the external device and receive text data corresponding to translated subtitles from the external device. Additionally, the display device can control the translation quality of the external device by considering resources and status according to the processing status of the external device. In this regard, FIGS. 12a and 12b illustrate flowcharts in which a display device according to the present disclosure controls an external device by comparing the translation processing time of the external device with a threshold time. Each of the processes of FIGS. 12a and 12b can be performed by a processor (170) of the display device (100).
[0185] Referring to FIG. 12a, after the process of comparing the translation processing time and the threshold time (S300) is performed, text data corresponding to the translated subtitle can be received (S400) from an external device (400). Referring to FIG. 12b, after receiving text data of the subtitle of the second quality from the external device (400) (S400), the estimated output time for outputting the subtitle of the first quality and the threshold time can be compared (S400b).
[0186] Referring to FIGS. 1 through 4 and FIG. 12a, the processor (170) can transmit audio data to an external device (400) (S300). Meanwhile, the processor (170) can generate text based on the context of a previously translated first subtitle and a currently translated second subtitle. Accordingly, the processor (170) can control the external device (400) to perform AI-based translation that considers the correlation between contexts.
[0187] In this regard, the processor (170) can compare the translation processing time of the external device (400) with a threshold time (S300a) for displaying subtitles on the screen synchronized with the voice. If the translation processing time is less than or equal to the threshold time, the processor (170) can control the external device (400) to generate subtitles of a first quality based on previous audio data and audio data (S310a). Subtitles of a first quality can be generated based on the context of subtitles translated from previous audio data and subtitles translated from audio data. Accordingly, the processor (170) can generate text based on the context of previously translated first subtitles and currently translated second subtitles.
[0188] If the translation processing time exceeds a threshold time, the processor (170) may control (S320b) the external device (400) to generate subtitles of a second quality based on audio data. The subtitles of the second quality may have lower translation accuracy than the subtitles of the first quality, be based on literal translation, or have a shorter length of text displayed on the screen.
[0189] The processor (170) may receive (S400) text data corresponding to a translated subtitle from an external device (400). The processor (170) may receive (S400) text data corresponding to a paraphrased subtitle based on the context of the first utterance and the second utterance and the second utterance from the external device (400). Alternatively, the processor (170) may receive (S400) text data corresponding to a literally translated subtitle based on the second utterance from the external device (400).
[0190] The processor (170) can synchronize the received text data with the voice of the audio data and display it on the screen (S500). The processor (170) can synchronize the paraphrased subtitles with the voice of the audio data and display them on the screen (S500). Alternatively, the processor (170) can synchronize the literally translated subtitles with the voice of the audio data and display them on the screen (S500).
[0191] Referring to FIGS. 1 through 4 and FIG. 12b, the processor (170) can transmit audio data to an external device (400) (S300). The processor (170) can receive text data corresponding to translated subtitles from the external device (400) (S400). The processor (170) can receive text data corresponding to subtitles of a second quality based on literal translation from the external device (400) (S400).
[0192] In this regard, the processor (170) may compare (S400a) an estimated output time for outputting a subtitle of first quality based on paraphrasing according to contextual association and a threshold time for displaying the subtitle on the screen by synchronizing it with the voice. If it is determined that the estimated output time is earlier than the threshold time, the processor (170) may generate a subtitle of first quality based on the previous audio data and the current audio data (S410b). The subtitle of first quality may be generated based on the context of the subtitle translated from the previous audio data and the subtitle translated from the audio data. Accordingly, the processor (170) may generate text based on the context of the previously translated first subtitle and the currently translated second subtitle.
[0193] If it is determined that the predicted output time is later than the threshold time, the processor (170) can generate a subtitle of a second quality (S420b) based on the current audio data. The subtitle of the second quality may have lower translation accuracy than the subtitle of the first quality, be based on a literal translation, or have a shorter length of text displayed on the screen.
[0194] The processor (170) can display a first-quality subtitle that has been paraphrased on the screen (S510b) by synchronizing it with the voice of the audio data. Alternatively, the processor (170) can display a second-quality subtitle that has been literally translated on the screen (S520b) by synchronizing it with the voice of the audio data.
[0195] Meanwhile, subtitles displayed on the display device may differ depending on the state of the external device. In this regard, FIG. 13 shows examples of subtitles of first quality and second quality displayed on the display device depending on the state of the external device.
[0196] Referring to FIG. 13, content regarding a presidential election debate can be played on the screen of the display (180) as VOD or live broadcast. News or live sports broadcasts including the presidential election debate can be played as live broadcasts. Real-time translation is required in such live broadcasts, and it is necessary to reduce the translation processing time depending on the capability / condition of the external device performing the real-time translation.
[0197] Referring to FIGS. 11 and FIGS. 13(a), as content regarding the presidential debate is played as VOD or live broadcast on the screen of the display (180), subtitles based on literal translations may be displayed for each audio segment. In this regard, a subtitle icon (182b) is selected, and Korean may be selected among the translatable languages in the third area (182R3) of the dialog box (182). Subtitles (186a) based on a literal translation of a specific part of audio data from an external device may be displayed in the lower area of the display (180) as "I will invite you to his rally."
[0198] Referring to FIG. 11 and FIG. 13(b), as content regarding the presidential debate is played on the screen of the display (180) as VOD or live broadcast, subtitles based on contextual translation can be displayed. In this regard, a subtitle icon (182b) is selected, and Korean can be selected among the translatable languages in the third area (182R3) of the dialog box (182). Subtitles (186b) based on contextual translation of a specific part of audio data from an external device can be displayed in the lower area of the display (180) as "I hope you all go to Donald Trump's rally."
[0199] Meanwhile, the first utterance in the original language before translation may be "I'm going to actually do something really unusual" and the second utterance following the first utterance may be "I'm going to invite you to attend his rallies". In this regard, the first translated subtitle corresponding to the first utterance may be generated based on contextual translation, such as "It is really unusual, but", taking into account the previously translated subtitle.
[0200] The translated second subtitle corresponding to the second utterance can be created based on the context of the first utterance and the second utterance. A literal translation of the second utterance could be "I will invite you to his rallies." The English expression corresponding to the literal translation of the second utterance could be "I'm going to invite you to attend his rallies." Therefore, the English expression corresponding to the literal translation of the second utterance is identical to the first speaker's second utterance. However, considering the context of the first and second utterances and the topic of the presidential debate, the literal translation of the second utterance may feel somewhat awkward to viewers of the Korean translation subtitles.
[0201] Meanwhile, the second utterance can be paraphrased as "I hope you all go to Donald Trump's rallies" based on the context of the first utterance and the second utterance. The English expression corresponding to the paraphrase of the second utterance could be "I'm going to invite you to attendDonald Trump's rallies(once)". Since the first speaker is Donald Trump, the second utterance can be paraphrased based on the metadata of the content, for example, the title "2024 US presidential election debate".
[0202] Regarding contextual association, the previous audio data and the current audio data may correspond to the first utterance and the second utterance, respectively. The second speaker's first utterance is expressed as "something really unusual." Considering contextual association, the second utterance is not a sincere invitation to go to his meeting, but rather an intention to go and see the situation for yourself. Accordingly, the processor (170) can control (S410) the external device (400) to generate subtitles of the first quality based on the previous audio data and the current audio data. Based on contextual association, the expression "once" may be added to the translated second subtitle of the second utterance.
[0203] On the other hand, if the translation processing time exceeds a threshold time due to resource limitations of the external device (400), the processor (170) controls the external device (400) to generate subtitles of a second quality lower than the first quality based on the current audio data. Accordingly, the external device (400) generates subtitles of a second quality with low translation accuracy or a literal translation form based on the current audio data of the second utterance without considering the previous first utterance and context.
[0204] Additionally, the external device (400) may also configure subtitles that are translated to have a shorter length of text displayed on the screen. In this regard, the second speaker's third utterance is "cause it's really interesting thing to watch." The second speaker's second and third utterances are a single sentence: "I'm going to invite you to attend his rallies because it's really interesting thing to watch."
[0205] Meanwhile, it may be predicted that the translation processing time will exceed the critical time when translating the second utterance and the third utterance simultaneously. Accordingly, the processor (170) may request the translated second subtitle of the second utterance from the external device (400) and the translated third subtitle of the third utterance. As another example, the external device (400) may distinguish between the second utterance and the third utterance and transmit the translated second subtitle of the second utterance and the translated third subtitle of the third utterance to the processor (170).
[0206] Meanwhile, a display device that displays translated subtitles in conjunction with an external device according to the present disclosure may transmit the audio data itself to the external device or transmit audio data decoded from the audio data to the external device. In this regard, FIG. 14 illustrates the configuration of a display device that transmits parsed audio data or decoded audio data. FIG. 15a and FIG. 15b illustrate the configuration of an external device that receives parsed audio data or decoded audio data.
[0207] Referring to FIG. 14, a display device (100) may be configured to execute a plurality of software modules through a processor (170). The processor (170) may be configured to include a media playback module (10), a voice detection module (20a, 20), a text reception module (30), and a subtitle generation module (40). In this regard, the media playback module (10), the voice detection module (20a), the text reception module (30), and the subtitle generation module (40) may be composed of software modules and executed by the processor (170).
[0208] Referring to FIG. 15a, an external device (400) may receive a header containing synchronization information and first audio data (PCM1) from a display device (100). The display device (100) may receive a header and text data (TEXT) of translated subtitles from the external device (400). The display device (100) may receive a header and second audio data (PCM2) in which the voice of the first audio data (PCM1) is translated from the external device (400). The first audio data (PCM1) and the second audio data (PCM2) may be data in which the voice and the translated second voice are modulated by pulse width modulation (PCM).
[0209] Referring to FIGS. 14 and FIGS. 15a, the display device (100) can transmit parsed audio data to the voice detection module (20) through the switching module (171). The external device (400) can receive the voice detected from the parsed audio data and generate translated text data. Thus, while the display device (100) is decoding the audio data, the external device (400) can decode the voice detected from the parsed audio data and translate it into a specific language to generate text data.
[0210] Referring to FIG. 15b, an external device (400) can receive transmission sequence information (Tx_seq) and first audio data (PCM1) from a display device (100). The display device (100) can receive sequence information (seq) and second audio data (PCM2) in which the voice of the first audio data (PCM1) is translated from the external device (400). The display device (100) can receive sequence information (seq) and text data (TEXT) of the translated subtitle from the external device (400). The first audio data (PCM1) and the second audio data (PCM2) may be data in which the voice and the translated second voice are modulated by pulse width modulation (PCM).
[0211] The subtitle generation module (40) may be configured to synchronize the first audio data (PCM1) with the text data (TEXT) corresponding to the translated second voice based on time stamp information and transmission sequence information (Tx_seq). Meanwhile, a request to play the translated second voice may be made via the remote control device (200) or by default settings. When a request to play the translated second voice is made, the subtitle generation module (40) may synchronize the second audio data (PCM2) with the text data (TEXT) corresponding to the translated second voice based on time stamp information and sequence information (seq). The subtitle generation module (40) may be configured to output the text data (TEXT) synchronized with the first or second audio data to the screen.
[0212] Meanwhile, the display device (100) can be implemented so as not to separately record / manage time stamp information for each utterance if the utterance times of the speakers do not overlap or occur sequentially. Accordingly, the subtitle generation module (40) can set the duration for displaying subtitles by considering the size of the text data. The subtitle generation module (40) can set the duration for displaying subtitles in proportion to the size of the text data of the translated second voice.
[0213] The subtitle generation module (40) can synchronize the first audio data (PCM1) with the text data (TEXT) corresponding to the translated second voice based on the set period for displaying subtitles. When a request is made to play the translated second voice, the subtitle generation module (40) can synchronize the second audio data (PCM2) with the text data (TEXT) corresponding to the translated second voice based on the set period for displaying subtitles. The subtitle generation module (40) can be configured to output the text data (TEXT) synchronized with the first or second audio data to the screen.
[0214] Accordingly, the external device (400) can receive the first audio data (PCM1) from the display device (100) without transmission sequence information (Tx_seq). The display device (100) can receive the second audio data (PCM2), in which the voice of the first audio data (PCM1) is translated, from the external device (400) without sequence information (seq). The display device (100) can receive the text data (TEXT) of the translated subtitles from the external device (400) without sequence information (seq).
[0215] Referring to FIGS. 14 and FIGS. 15b, the display device (100) can transmit decoded audio data to the voice detection module (20) via the switching module (171). The external device (400) can receive the voice detected from the decoded audio data and generate translated text data. Thus, the external device (400) can generate text data by translating the voice detected from the decoded audio data into a specific language without separate decoding.
[0216] With reference to FIGS. 1 to 15b, a display device for displaying translated subtitles in conjunction with an external device according to the present disclosure is described. Each software module is formed in a connected structure in which the output of a processing stage leads to the input of the next stage. The software module is configured as an instruction pipeline structure in which various instructions are executed stepwise and divided into detailed cycles such as fetching, decoding, and computation, and executed by each pipeline stage. In addition, a plurality of software modules are configured as a software pipeline in which the output of each software module is automatically connected to the input of another software module.
[0217] Accordingly, the media playback module (10) can be referred to as a media playback pipeline, and the voice detection module (20a, 20) can be referred to as a voice detection pipeline. The text reception module (30) can be referred to as a text receiver pipeline.
[0218] The media playback module (10) may be configured to extract audio data from a data stream and transmit it to a voice detection module (20a). The voice detection module (20a, 20) may be configured to collect first audio data in which voice is detected from the extracted audio data.
[0219] The text receiving module (30) may be configured to receive text data from an external device (400) in which voice or a second voice translated from voice is converted into text. The subtitle generation module (40) may be configured to synchronize text data based on one of the time stamp information of the first audio data, the size of the text data, or synchronization information included in the header of the first audio data. The subtitle generation module (40) may be configured to synchronize the first audio data and text data based on the time stamp information, the size of the audio / text data, and the synchronization information. The subtitle generation module (40) may be configured to display the synchronized text data as subtitles on a specific frame of the screen.
[0220] The media playback module (10) may be configured to transmit audio data to the voice detection module (20) through configurations corresponding to a plurality of sub-modules. The media playback module (10) may be configured to include a source input module (SRC) (11), a demultiplexer (12), a parser (13), an audio decoder (14), and an audio sink module (15).
[0221] The source input module (11) may be configured to receive a data stream containing video data and audio data. The source input module (11) may be configured to receive a data stream containing video data and audio data from a content server (300) through a broadcast receiver (130). The source input module (11) may be configured to output the data stream containing video data and audio data to a demultiplexer (12). The demultiplexer (12) may be configured to classify video data, audio data, and control information from the data stream.
[0222] The parser (13) may be configured to parse audio data. The parser (13) may parse audio data to extract playback time, audio codec information, and audio metadata. The audio decoder (14) may be configured to decode and extract the parsed audio data. The audio decoder (14) may be configured to decode and extract audio data based on audio codec information and audio metadata. The audio data extracted through the parser (13) may be transmitted to the voice detection module (20). Meanwhile, the audio data extracted through the audio decoder (14) may be transmitted to the voice detection module (20a).
[0223] The audio sink module (15) may be configured to output decoded audio data. The audio sink module (15) may be configured to output decoded audio data through an audio output unit (185), such as a speaker.
[0224] Meanwhile, the voice detection module (20) may be configured to include a source input module (21), a second audio decoder (22), a voice activity detector (VAD) (23), a payloader module (24), and a transmission module (25). The voice detection module (20) is also additionally equipped with a second audio decoder (22). Thus, as soon as audio data is decoded, it is transmitted to an external device (400) to perform AI-based translation processing in advance. Only the first audio data in which voice is detected can be selected and collected through the voice activity detector (23). A header containing synchronization information may be inserted before transmitting the first audio data in which voice is detected to an external device (400) that performs AI offloading. Since communication / processing occurs only when voice is detected, communication / processing efficiency can be improved.
[0225] The external device (400) does not require media processing for the audio data and checks the size of the audio data to be received containing voice by checking the inserted header information. After receiving the audio data in the size indicated in the header, it performs speech-to-text (STT) processing and translation. Therefore, the external device (400) performs only AI-based STT processing and translation. Meanwhile, when text data is transmitted to the TV via the display device (100), the synchronization information of the received header is transmitted along with it. The display device (100) synchronizes the time between the media playback and the subtitles by referring to the synchronization information of the received text and the header.
[0226] The source input module (21) may be configured to receive audio data extracted from the parser (13). The source input module (21) may receive an audio elementary stream, which is audio data extracted from the parser (13), and transmit it to a second audio decoder (22). The second audio decoder (22) may be configured to decode the audio data extracted from the parser (13).
[0227] A voice active detector (23) may be configured to detect voice in decoded audio data. The voice active detector (23) may be configured to distinguish between a voice region where speech occurs and a pause region in the decoded audio data, and to detect the voice spoken in the voice region. A payloader module (24) may configure a data stream including a header for synchronization and a payload of data associated with the detected voice. A transmission module (25) may be configured to transmit the data stream containing the header and payload to an external device (400).
[0228] As another example, the voice detection module (20) may be configured to include a source input module (21), a voice activity detector (VAD) (23), and a transmission module (25).
[0229] The source input module (21) may be configured to receive audio data extracted from the audio decoder (14). The source input module (21) may transmit the audio data extracted from the audio decoder (14) to the voice active detector (23). The voice active detector (23) may be configured to distinguish between a voice region where speech occurs and a pause region in the decoded audio data, and to detect the speech that is spoken in the voice region. The transmission module (25) may be configured to transmit the detected speech to an external device (400).
[0230] Meanwhile, the processor (170) of the display device (100) configured to execute the aforementioned modules may selectively execute a switch between the control methods of the aforementioned embodiments based on the operating state and capability of the external device (400).
[0231] Referring to FIGS. 1 through 15b, the processor (170) may be configured to include a media playback module (10), a voice detection module (20a, 20), and a switching module (171). If the translation processing speed of the external device (400) is below a threshold speed, the external device (400) needs to receive audio data that is small in capacity or short in length. Accordingly, the switching module (171) transmits audio data decoded to a first sound quality through the voice detection module (20) to the external device (400). Accordingly, the switching module (171) can transmit parsed audio data to the voice detection module (20). The parsed audio data can be transmitted through the source input module (21) to the second audio decoder (22) and voice activity detector (23) of the voice detection module (20) to perform decoding and voice detection.
[0232] Meanwhile, if the translation processing speed in the external device (400) is greater than or equal to the threshold speed, the switching module (171) can transmit audio data decoded from the audio decoder (15) of the media playback module (10) to the external device (400). Accordingly, the switching module (171) can transmit audio data decoded from the audio decoder (15) to the external device (400) through the voice detection module (20a). The decoded audio data can be transmitted to the voice activity detector (23) through the source input module (21) of the voice detection module (20a) to enable voice detection.
[0233] The processor (170) may be configured to receive status information and capability information of an external device (400) and to control the timing of transmitting audio data to a voice output module (20a, 20). In this regard, FIG. 16 is a flowchart of a control method that adaptively controls the timing of transmitting audio data according to the status of an external device according to the present disclosure.
[0234] Referring to FIGS. 1 through 16, the control method may be performed by a processor (170) of a display device. The processor (170) may determine (S90) whether the external device (400) can detect synchronization information of the header. It may be determined (S90) whether the external device (400) supports synchronization based on NAIS header information such as a time stamp, period, and header length.
[0235] In this regard, the capability information of the external device (400) refers to the ability to detect header information and detect and translate the voice payload from the audio data. If the external device (400) does not have the capability to detect the NAIS header, the processor (170) can transmit and receive audio data without a header through the voice detection module (20). Accordingly, the processor (170) can transmit and receive data streams based on the subtitle display period according to the size of the sequence information or text data.
[0236] The status information of the external device (400) may include information regarding the text conversion accuracy and translation processing speed of converting speech into text in the external device (400). In this regard, the audio decoder (15) of the media playback module (10) performs decoding at a quality level that can be output through the audio sink module (16). Meanwhile, the second audio decoder (22) of the speech detection module (20) performs decoding at a quality level that can detect speech from the audio data. The second audio decoder (22) of the speech detection module (20) may be referred to as the second audio decoder. The text conversion accuracy based on the audio data transmitted through the second audio decoder (22) may be lower than the text conversion accuracy based on the audio data transmitted through the audio decoder (15).
[0237] If the external device (400) can detect the synchronization information of the header, the processor (170) can transmit the audio data extracted through the parser (13) to the voice detection module (20) (S110). The processor (170) can decode the audio data extracted through the parser (13) into a first sound quality through the second audio decoder (22) (S210). The processor (170) can transmit the audio data decoded into the first sound quality to the external device (400) (S310).
[0238] The processor (170) can determine (S350) whether the text conversion accuracy of the external device (400) is below a threshold ratio. For example, the text conversion accuracy can be set to 90%, 95%, 97%, 98%, 99%, etc., but is not limited thereto and can be changed depending on the application.
[0239] If the text conversion accuracy is below a threshold ratio, the processor (170) can decode (S220) the audio data into a second sound quality through the audio decoder (15) of the media playback module (10). The processor (170) can transmit (S320) the audio data decoded into the second sound quality to an external device (400) through the voice detection module (20a). The second sound quality can be set to a higher quality than the first sound quality or implemented.
[0240] If the text conversion accuracy is above a threshold ratio, the audio data extracted through the parser (13) can be transmitted to the voice detection module (20) (S110). The processor (170) can control (S210) the audio data extracted through the parser (13) to be decoded into a first sound quality through the voice detection module (20). The processor (170) can transmit the audio data decoded into the first sound quality to an external device (400) (S310).
[0241] Meanwhile, if the external device (400) is not able to detect or does not support the synchronization information of the header, the processor (170) can decode (S220) the audio data decoded into a second sound quality through the audio decoder (15) of the media playback module (10). The processor (170) can transmit (S320) the audio data decoded into the second sound quality to the external device (400) through the voice detection module (20a). The second sound quality can be set to a higher quality than the first sound quality or implemented.
[0242] When audio data decoded through the audio decoder (15) is transmitted to an external device (400) through the voice detection module (20a), the time required for the external device (400) to perform translation processing can be reduced. Compared to the method of transmitting audio data to the external device (400) through the voice detection module (20) before transmitting it to the audio decoder (15), the time required for the external device (400) to perform translation processing is reduced.
[0243] Accordingly, the processor (170) can determine (S360) whether the translation processing speed of the external device (400) is below a threshold speed. If the translation processing speed of the external device (400) is below a threshold speed, the processor (170) can transmit the audio data extracted through the parser (13) to the voice detection module (20) (S110). The processor (170) can control (S210) the audio data extracted through the parser (13) to be decoded into a first sound quality through the voice detection module (20). The processor (170) can transmit the audio data decoded into the first sound quality to the external device (400) (S310).
[0244] The translation processing speed of the external device (400) can be defined as the speed processed by the operation associated with the text translation module (424). The translation processing speed of the external device (400) can be defined as the speed processed by the operation of the payloader module (422) to the payloader module (425). If the translation processing speed of the external device (400) is greater than or equal to the threshold speed, the processor (170) can determine (S350) whether the text conversion accuracy of the external device (400) is less than or equal to the threshold ratio.
[0245] Meanwhile, if the text conversion accuracy of the external device (400) is below a threshold rate and the translation processing speed is above a threshold rate, it is necessary to improve the text conversion accuracy. To improve the text conversion accuracy, audio data decoded by the audio decoder (15) into a second quality higher than the first quality can be transmitted to the external device (400).
[0246] Meanwhile, it is necessary to determine whether to transmit only PCM-based audio data to an external device (400) or to transmit PCM-based audio data combined with sequence information to an external device (400). If the speech times of multiple speakers overlap in the audio data, the speeches must be separated based on accurate time stamp information to generate translated subtitles. On the other hand, if the speech times do not overlap and the speech of a single speaker is detected in each speech time interval, it is possible to generate translated subtitles for each speech without sequence information containing time stamp information.
[0247] In this regard, FIG. 17 shows a flowchart of a control method for transmitting audio data of a different format to an external device by determining whether the speech times of a speaker overlap. With reference to FIGS. 1 to 17, a control method for transmitting audio data of a different format performed by a processor (170) is described.
[0248] The processor (170) can determine (S201) whether the speech times of multiple speakers included in the audio data extracted through the audio decoder (15) of the media playback module (10) overlap. If the speech times overlap, the processor (170) can transmit sequence information including the start time stamp and end time stamp of each speaker's speech time and audio data to an external device (400) (S301). The processor (170) can receive the sequence information, text data of the translated subtitles, and / or the second audio data from the external device (400) (S401). The processor (170) can display the translated subtitles of the text data synchronized with the speech times according to the sequence information.
[0249] If the speech times of each speaker do not overlap, the processor (170) can sequentially transmit (S301) segments of audio data corresponding to the speech times to an external device (400). The processor (170) can receive (S402) text data of translated subtitles and / or second audio data from the external device (400). The processor (170) can set the respective periods (S410) for the segments of audio data corresponding to speech times that are distinguished and do not overlap in the time domain. The processor (170) can display (S502) each of the translated subtitles of the segments of text data corresponding to the segments of audio data for a respective period set according to the sizes of the segments of text data.
[0250] The foregoing has described a display device for displaying translated subtitles in conjunction with an external device according to the present disclosure. The technical effects of the display device for displaying translated subtitles in conjunction with an external device according to the present disclosure may be summarized as follows, but are not limited thereto.
[0251] According to the present specification, users of a multilingual subtitle generation function can easily use the service. Accordingly, to facilitate the user's use of such multilingual subtitle services, dialog boxes associated with subtitle icons and translation options can be visually arranged for ease.
[0252] According to the present specification, a subtitle icon is displayed on the screen during the initial screen of the display device or when switching channels, so that the user can intuitively recognize that a multilingual subtitle function based on multilingual translation is supported.
[0253] According to the present specification, a subtitle icon, which is a user interface (UI) that allows for intuitive recognition that a multilingual subtitle function is supported, can be provided in an optimal location on the initial screen of a display device, during channel switching, or during content playback. Accordingly, the subtitle icon and a dialog box for translation options can be optimally displayed so as not to interfere with the user's perception of the current screen of the display device.
[0254] According to the present specification, as an external device generates translated subtitles while a display device plays video content, the difficulty of displaying translated subtitles on the screen at the time the video content is played can be resolved.
[0255] According to the present specification, a method can be proposed to synchronize subtitles generated in real time through an external device by transmitting audio data to an external device in advance before decoding media on a webOS-based display device.
[0256] According to the present specification, media currently being played can be transmitted to an external device to generate subtitles in real time using artificial intelligence (AI), and the subtitles can be accurately aligned with the timing of media playback to enhance the user's viewing experience.
[0257] According to the present specification, the timing of transmitting audio data to be translated to an external device can be adaptively controlled depending on the state of the external device, such as text translation accuracy and translation processing speed.
[0258] According to the present specification, an automatic subtitle generation service can be provided by utilizing hardware resources within an external device or a display device.
[0259] According to the present specification, translation quality can be dynamically controlled through context-based paraphrasing of consecutive utterances or literal translation based on the utterances, depending on processing capabilities such as the state of an external device and translation processing time.
[0260] The foregoing disclosure may be implemented as computer-readable code on a medium on which a program is recorded. A computer-readable medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable media include a Hard Disk Drive (HDD), a Solid State Disk (SSD), a Silicon Disk Drive (SSD), ROM, RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc., and also include implementations in the form of a carrier wave (e.g., transmission over the Internet). Additionally, the computer may include a control unit (180) of a terminal. Accordingly, the above detailed description should not be interpreted restrictively in all respects and should be considered exemplary. The scope of the invention should be determined by a reasonable interpretation of the appended claims, and all modifications within the equivalent scope of the invention are included within the scope of the invention.
Claims
1. In a display device that interacts with an external device, A communication module configured to connect to a control server linked with a content server based on an input executing a content service through multiple channels from a remote control device; A display configured to play content on a screen through the above channels; and It includes a processor that, when connected to a control server associated with the above-mentioned content service, displays a subtitle icon associated with the provision of translated subtitles on the above-mentioned screen in conjunction with an external device, The above processor is, Detecting a switch between the plurality of channels or connected to the control server, When the first connection to the control server or a switch between the plurality of channels is detected, a subtitle icon associated with the provision of translated subtitles is displayed on the screen, and When selection input for the above subtitle icon is detected, a dialog box associated with the activation of subtitle display, the language of specific content, and a list of translatable languages is displayed on the above screen, and A display device that, when a specific language is selected from the list of translatable languages, displays subtitles translated into the specific language by the external device based on audio data of the data stream of the specific content on the screen.
2. In Paragraph 1, The above processor is, When a selection input for the above subtitle icon is detected, determine whether the above subtitle display is in a disabled state, and If the above subtitle display is disabled, a first phrase associated with the activation of subtitles is displayed in the first area of the above dialog box, and A display device that, when the above subtitle display is enabled, displays a second phrase associated with the disablement of subtitles in the first area of the above dialog box.
3. In Paragraph 2, The above processor is, Determining whether a first selection input is received at a first button in the first area corresponding to the first phrase through a remote control device, and A display device that, when the first selection input is received in the first area, displays on the screen of the external device a dialog box formed with a button agreeing to the external device perform AI-based translation considering contextual association.
4. In Paragraph 3, The above processor is, Display the language of the specific content in the second area of the above dialog box, and Display a list of the translatable languages in the third area of the above dialog box, and A display device that controls an external device to translate audio data into the specific language when the specific language of the list of translatable languages is selected through the remote control device.
5. In Paragraph 3, The above processor is, Display the subtitle icon in the central area of the screen adjacent to one side of the service provision button that enables the content service to be provided on the initial screen in the power-on state, and A display device that displays the dialog box on one side area of the screen on one side of the central area when the above subtitle icon is selected.
6. In Paragraph 5, The above processor is, Determine whether it is live playback mode or VOD-based VOD playback mode through an IP-based live channel, and Determining whether the above external device is capable of AI-based translation, and If the AI-based translation is not possible in the above live playback mode, CC associated with a general caption is displayed on the button at the bottom of the initial screen of the above live playback mode, and A display device that, if AI-based translation is possible in the above live playback mode, displays an AI CC associated with AI subtitles on a button at the bottom of the initial screen of the above live playback mode.
7. In Paragraph 6, The above processor is, When the button marked with CC is selected, the display device displays the translated subtitles on the screen while playing the first content in the live playback mode, and When the button indicated by the AI CC is selected, while playing the first content in the live playback mode, the translated subtitles are displayed on the screen in conjunction with the external device, taking into account the correlation between the contexts. A display device that compares a translation processing time considering the correlation between the above contexts with a threshold time to display the above subtitles on the screen by synchronizing them with the voice of the above audio data.
8. In Paragraph 7, The above processor is, If the above translation processing time exceeds a threshold time, the external device is controlled to exclude the correlation between the contexts and translate based on the audio data, and A display device that, excluding the association between the above contexts, transmits parsed audio data prior to decoding to the external device when the second translation processing time based on the above audio data exceeds the above threshold time.
9. In Paragraph 6, The above processor is, If the AI-based translation is not possible in the above VOD playback mode, CC associated with general captions is displayed on the button at the bottom of the initial screen of the above VOD playback mode, and A display device that, if AI-based translation is possible in the above VOD playback mode, displays an AI CC associated with AI subtitles on a button at the bottom of the initial screen of the above VOD playback mode.
10. In Paragraph 9, The above processor is, When the button indicated by the AI CC is selected, while playing the second content in the VOD playback mode, the translated subtitles are displayed on the screen in conjunction with the external device, taking into account the correlation between the contexts. When the button marked with CC above is selected, while playing the second content in the VOD playback mode, the display device displays the translated subtitles on the screen, and A display device that compares a translation processing time considering the correlation between the above contexts with a second threshold time to display the above subtitles on a screen played at high speed by synchronizing them with the voice of the above audio data.
11. In Paragraph 10, The above processor is, If the above translation processing time exceeds the above second threshold time, the external device is controlled to exclude the correlation between the contexts and translate based on the audio data, and Excluding the correlation between the above contexts, if the second translation processing time based on the audio data exceeds the second threshold time, the playback speed of the second content is controlled to a normal speed slower than the high speed, and A display device that transmits parsed audio data to an external device before the audio data is decoded when the second translation processing time exceeds the threshold time in the normal speed.
12. In Paragraph 9, The above processor is, While the second content based on the above VOD is playing, the subtitle icon is displayed at the bottom of one side of the second screen adjacent to one side of the icons associated with the playback of the second content, and When the above subtitle icon is selected, the above dialog box is displayed in the one-sided area at the bottom of the above-mentioned one-sided side, and A display device in which a dialog box displayed on the second screen is positioned closer to one side boundary of the display than a dialog box displayed on the screen.
13. In Paragraph 1, The above processor is, Compare the translation processing time of the above external device with the threshold time to display the above subtitles on the screen by synchronizing them with the voice, and If the above translation processing time is less than or equal to the threshold time, the external device is controlled to generate subtitles of a first quality based on previous audio data and current audio data, and A display device that generates the subtitle of the first quality based on the context of the subtitle translated from the previous audio data and the subtitle translated from the audio data.
14. In Paragraph 13, The above processor is, If the above translation processing time exceeds a threshold time, the external device is controlled to generate subtitles of a second quality based on the audio data, and A display device characterized in that the subtitle of the second quality has lower translation accuracy than the subtitle of the first quality, is based on literal translation, or has a shorter length of text displayed on the screen.
15. In Paragraph 1, The above processor is, Receiving text data corresponding to subtitles of a second quality based on literal translation from the above external device, and Compare the estimated output time for outputting a subtitle of the first quality based on paraphrasing with the threshold time for displaying the subtitle on the screen synchronized with the audio, and If it is determined that the above predicted output time is faster than the above threshold time, the above first quality subtitle is generated based on the previous audio data and the current audio data, and A display device that generates subtitles of the second quality when it is determined that the predicted output time is later than the threshold time.
16. In Paragraph 1, The above processor is, A media playback module configured to extract and transmit audio data from a data stream; A voice detection module that collects first audio data in which voice is detected from the extracted audio data above; A communication module configured to transmit a first signal containing the first audio data to the above external device and to receive a second signal containing data necessary for the voice to be displayed as a subtitle in a specific frame of a screen; A text receiving module configured to receive text data in which the above voice or a second voice translated from the above voice is converted into text; and A subtitle generation module configured to synchronize the first audio data and the text data based on one of the time stamp information of the first audio data, the size of the text data, and synchronization information included in the header of the first audio data, and A display device in which the above subtitle generation module displays the synchronized text data as the subtitle in the above specific frame of the above screen.
17. In Paragraph 16, The above media playback module is, A source input module configured to receive the data stream including video data and the audio data; A demultiplexer configured to classify video data, audio data, and control information in the above data stream; A parser configured to parse the above audio data; An audio decoder configured to decode and extract the parsed audio data above; and It includes an audio sink module configured to output the above-decoded audio data, and A display device in which audio data extracted through the above parser is transmitted to the above voice detection module.
18. In Paragraph 17, The above voice detection module is, A source input module configured to receive audio data extracted from the above parser; An audio decoder configured to decode the extracted audio data above; A voice activity detector (VAD) configured to detect the voice in the above-decoded audio data; A payloader module that configures a data stream including a header for synchronization and a payload of data associated with the detected voice; and A display device configured to include a transmission module configured to transmit a data stream containing a header and a payload to an external device through the communication module.
19. In Paragraph 17, The above processor is configured to receive status information and capability information of the external device and to control the timing of transmitting the audio data to the voice detection module, and The above processor is, Audio data extracted through the above parser is decoded through the voice detection module, and the audio data decoded into a first sound quality is transmitted to the external device. If the text conversion accuracy of the above external device is below a threshold ratio, audio data decoded into a second sound quality higher than the first sound quality through the audio decoder of the media playback module is transmitted to the external device through the voice detection module, and A display device that, if the translation processing speed of the external device is less than a threshold speed, decodes audio data extracted through the parser through the voice detection module and transmits audio data decoded to the first sound quality to the external device.
20. In Paragraph 19, The above processor is, Determining whether the speech times of multiple speakers included in the audio data extracted through the audio decoder of the media playback module overlap, and When the above utterance times overlap, sequence information including the start time stamp and end time stamp of each speaker's utterance time and the audio data are transmitted to the external device, and Displaying the translated subtitles of the above text data synchronized with the above utterance times according to the above sequence information, and If the above utterance times do not overlap, segments of audio data corresponding to the above utterance times are sequentially transmitted to the external device, and A display device that displays each of the translated subtitles of the segments of text data corresponding to the segments of the audio data for each period set according to the sizes of the segments of the text data.