Receiving apparatus
By analyzing the user's voice signal through the built-in language model and retrieval engine of the receiving device, it can realize dialogue interaction with the user, store chat history, and quickly determine the user's intent. This solves the problem of insufficient convenience after replacing voice with remote control and touch panel commands in the existing technology, and realizes flexible content retrieval and playback.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HISENSE VISUAL TECH CO LTD
- Filing Date
- 2025-07-30
- Publication Date
- 2026-07-02
AI Technical Summary
Existing receivers have not significantly improved convenience by replacing remote control and touch panel commands with voice commands.
By using the built-in language model, search engine, and voice-to-text conversion service of the receiving device, the system can analyze the user's voice signal, enable dialogue and interaction with the user, store chat history, learn the user's intent, quickly determine instructions, and perform content retrieval and playback.
It enables more flexible determination of user intent through dialogue, improves the convenience of the receiving device, accurately responds to the user's free speech, and quickly retrieves and plays the content the user wants.
Smart Images

Figure CN2025111616_02072026_PF_FP_ABST
Abstract
Description
Receiving device
[0001] Cross-reference to related applications
[0002] This application claims priority to Japanese Patent Application No. 2024-229067, filed on December 25, 2024, entitled “Receiving Apparatus”, the entire contents of which are incorporated herein by reference. Technical Field
[0003] This application relates to a receiving device. Background Technology
[0004] Receiving devices for receiving television broadcasts, internet-published videos, etc., have the function of allowing users to select desired programs or videos using electronic program guides. Furthermore, technologies are known for allowing users to select their preferred genres such as drama, music, or sports, and to search for programs or videos. Users typically use remote controls, touch panels, or similar devices to input and give instructions in order to select their preferred genres and search for programs and videos.
[0005] With the development of sound processing technology in recent years, there has been a continuous trend towards enabling receiving devices to provide information to users and enabling users to give commands to receiving devices using sound. Through such technology, users can easily give commands to receiving devices.
[0006] However, in previous receiving devices, simply replacing commands made using remote controls or touch panels with voice commands did not necessarily improve convenience.
[0007] Existing technical documents
[0008] Patent documents
[0009] Patent Document 1: Japanese Patent Application Publication No. 2021-148974
[0010] Patent document 2: Japanese Patent Application Publication No. 2023-120205. Summary of the Invention
[0011] Thus, conventional receiving devices have limited their functionality to simply replacing commands given via remote control or touch panel with voice. The receiving device described in this embodiment was developed to address this problem, aiming to provide a receiving device that allows users to give commands and select content from the receiving device through free, voice-based dialogue.
[0012] The receiving device includes: a sound signal processing unit that acquires sound signals including user speech; a language processing unit that parses the sound signals to interpret the content of the user speech and collects information through a dialogue, wherein, based on the content of the user speech, it generates response content to the user; a chat storage unit that stores the content of the user speech and the response content to the user; and a content processing unit that retrieves content based on the content of the user speech and the response content to the user stored in the chat storage unit. Attached Figure Description
[0013] Figure 1 is a block diagram illustrating the hardware configuration of the receiving device according to an embodiment;
[0014] Figure 2 is a block diagram illustrating the functional configuration of the receiving device according to the embodiment;
[0015] Figure 3 is a block diagram showing the functional configuration of the voice retrieval processing unit in the receiving device according to the embodiment.
[0016] Figure 4 is a flowchart illustrating an example of the operation of the receiving device according to an embodiment;
[0017] Figure 5 is a flowchart illustrating an example of speech content parsing processing in the receiving device according to an embodiment.
[0018] Figure 6 is a flowchart illustrating an example of dialogue actions in the receiving device of an embodiment;
[0019] Figure 7 is a flowchart illustrating other examples of dialogue actions in the receiving device of an embodiment;
[0020] Figure 8 is a diagram showing an example of a display in the receiving device according to an embodiment.
[0021] Explanation of reference numerals: 20…receiving device, 21…transceiver unit, 22…voice retrieval processing unit, 23…data processing unit, 24…broadcast receiving unit, 25…operation receiving unit, 26…live broadcast processing unit, 27…video recording processing unit, 28…playback processing unit, 29…storage unit, 30…network, 31…language model, 32…search engine, 33…voice-to-text conversion service, 34…network dynamic image service, 41…display processing unit, 42…data control unit, 43…sound signal processing unit, 44…content processing unit, 45…language processing unit, 46…chat storage unit, 47…exploration processing unit, 201…antenna, 202a~202c…input terminals 203…Tuner, 204…Demodulator, 205…Demultiplexer, 206…Digital-to-Analog Converter, 207…Selector, 208…Signal Processing Unit, 209…Speaker, 210…Display Panel, 210a…Thumbnails, 210b…Chat, 210c…Search Titles, 210d…Program Display, 210e…Messages, 210f…Messages, 211…Operating Unit, 212…Light Receiving Unit, 213…IP Communication Unit, 214…CPU, 215…Memory, 216…Memory, 219…Remote Control. Detailed Implementation
[0022] (Summary of the implementation method)
[0023] The receiving device of this embodiment has the function of displaying broadcast programs that are received or recorded in real time, published programs that are received or recorded in real time, and other content to the user. Content selection can be performed not only by directly specifying based on the user's speech (voice), but also by keyword-based retrieval and by determining the retrieval target through dialogue.
[0024] The parsing of a user's speech can be achieved using language models, search engines, and speech-to-text conversion services. These services can be utilized via a network. Alternatively, the receiving device itself may possess these capabilities.
[0025] The receiving device in this embodiment can determine the user's intended instruction (function) by repeating the user's speech and the corresponding response from the receiving device (hereinafter also referred to as "chat" or "dialogue"). The chat can be repeated multiple times to determine the user's intention (instruction). The chat is configured to be storable and can be used to determine the user's intended instruction. That is, the receiving device in this embodiment can learn the chat and use it to determine the user's intended instruction independently without relying on a language model or retrieval engine.
[0026] Chat includes: casual chat containing surrounding information necessary to determine the user's intent; and functional chat that directly demonstrates the receiving device's capabilities and the user's intent. These chats are stored, for example, and can be reused in the parsing of new chats. Casual chat often contains information about the user's preferences, which can be reused to quickly determine the user's intent.
[0027] In this way, the receiving device in this embodiment stores the chat conversation between the user and the receiving device, thus enabling rapid determination of the user's intent. Furthermore, by utilizing parsing techniques employing language models, retrieval engines, and voice-to-text conversion services, the user's intent can be accurately determined even when the user is speaking freely.
[0028] After determining the user's intent, the receiving device in this embodiment categorizes the user's intent into keywords and retrieves content. If the content the user intended exists in the search results, it is played. The function of determining intent through user dialogue is not limited to content retrieval and playback. It can also be applied to content recording, etc.
[0029] (Structure of the implementation method)
[0030] Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. FIG1 is a block diagram showing the hardware configuration of the receiving device according to the embodiment. FIG2 is a block diagram showing the functional configuration of the receiving device according to the embodiment. In the following description, common component reference numerals and common reference numerals are shown, and repeated descriptions are omitted.
[0031] The receiving device 20 shown in Figure 1 is, for example, a television device. The receiving device 20 includes input terminals 202a-202c, tuner 203, demodulator 204, demultiplexer 205, A / D (digital-to-analog) converter 206, selector 207, signal processing unit 208, speaker 209, display panel 210, operation unit 211, light receiving unit 212, IP communication unit 213, CPU 214, memory 215, and storage unit 216.
[0032] An antenna 201 is connected to the input terminal 202a. The antenna 201 receives the broadcast signal from the digital broadcast and supplies the received broadcast signal to the tuner 203 via the input terminal 202a.
[0033] Tuner 203 selects the desired channel broadcast signal based on the broadcast signal supplied from antenna 201 and supplies the selected broadcast signal to demodulator 204.
[0034] Demodulator 204 demodulates the broadcast signal supplied from tuner 203 and supplies the demodulated broadcast signal to demultiplexer 205.
[0035] The demultiplexer 205 separates the broadcast signal supplied by the demodulator 204 to generate video and audio signals, and supplies the generated video and audio signals to the selector 207.
[0036] Selector 207 selects one signal from a plurality of signals supplied by demultiplexer 205, A / D converter 206 and input terminal 202c, and supplies the selected signal to signal processing unit 208.
[0037] The signal processing unit 208 performs prescribed signal processing on the video signal supplied from the selector 207 and supplies the processed video signal to the display panel 210. In addition, the signal processing unit 208 performs prescribed signal processing on the audio signal supplied from the selector 207 and supplies the processed audio signal to the speaker 209.
[0038] The speaker 209 outputs sound or various sounds based on the sound signal supplied from the signal processing unit 208. In addition, the speaker 209 changes the volume of the output sound or various sounds based on the control performed by the CPU 214.
[0039] The display panel 210 displays static and dynamic images, other images, and text information based on video signals supplied from the signal processing unit 208 or control performed by the CPU 214.
[0040] Input terminal 202b receives analog signals such as video and audio signals input from external sources. Input terminal 202c receives digital signals such as video and audio signals input from external sources. For example, input terminal 202c can input digital signals from a recorder or similar device equipped with a drive mechanism for recording and playing video recording media such as Blu-ray Discs (registered trademark).
[0041] A / D converter 206 supplies a digital signal to selector 207, wherein the digital signal is generated by performing A / D conversion on an analog signal supplied from input terminal 202b.
[0042] The operation unit 211 receives user operation input. Examples of the operation unit 211 include switches and touch panels.
[0043] The light-receiving unit 212 receives infrared light from the remote control 219. The infrared signals received by the light-receiving unit 212 include command signals (operation signals) received by the remote control 219 from the user, sound signals received from the user, etc. The light-receiving unit 212 may also have the function of transmitting display data, etc., from the receiving device 20 to the remote control 219.
[0044] The remote control 219 is a transceiver that transmits signals of commands (operations) received from the user to the receiving device 20 using infrared light. The remote control 219 includes a switch, a dial, a display device, etc., receives operations from the user and sends them to the receiving device 20, and displays display data received from the receiving device 20 on the display device. The remote control 219 includes a microphone that receives commands (operations) made by the user's voice, and can transmit them to the receiving device 20 using infrared light through a light-receiving unit 212. The remote control 219 may also be configured to include a speaker (not shown) and be able to output the response signal from the receiving device 20 to the user as sound. Alternatively, the microphone may be included in the receiving device 20 instead of in the remote control 219.
[0045] The IP communication unit 213 is a communication interface used for IP (Internet Protocol) communication via network 30. However, the receiving device 20 may also be able to connect to a network different from the Internet, such as a LAN.
[0046] The IP communication unit 213 can communicate with the language model 31, search engine 32, voice-to-text conversion service 33, and web animation service 34 via the network 30. The language model 31 is an engine used to analyze the user's voice signal transmitted from the remote control 219. The language model 31 is a pre-learned AI engine that can respond to queries from the receiving device 20 by organizing Japanese text according to word classes and converting it into an interpretable text format. The search engine 32 is an internet search engine that can respond to search queries from the receiving device 20 by providing information from the internet. The voice-to-text conversion service 33 converts the content spoken by the user through a microphone such as the remote control 219 into text data in a way that is easily processed by AI. The web animation service 34 is a provider or site group that publishes animations via the network 30. The language model 31, search engine 32, and voice-to-text conversion service 33 can also be provided by the receiving device 20.
[0047] CPU 214 is the arithmetic unit that controls the overall operation of the receiving device 20. CPU 214 implements the functional elements of the receiving device 20, which will be described later.
[0048] The memory 215 includes ROM that stores various computer programs executed by the CPU 214, and RAM that provides a working area for the CPU 214. For example, the ROM stores control programs and application programs used to implement various functions of the receiving device 20.
[0049] The memory 216 is a storage medium exemplified by an HDD (Hard Disk Drive) or an SSD (Solid State Drive). The memory 216 can, for example, record signals selected by the selector 207 as video recording data.
[0050] Figure 2 is a block diagram illustrating an example of the functional configuration of the receiving device 20 according to the embodiment. As shown in Figure 2, the receiving device 20 includes a transceiver unit 21, a voice retrieval processing unit 22, a data processing unit 23, a broadcast receiving unit 24, an operation receiving unit 25, a live broadcast viewing processing unit 26, a video recording processing unit 27, a playback processing unit 28, and a storage unit 29.
[0051] The functional configuration of the receiving device 20 shown in Figure 2 is achieved, for example, by the hardware configuration of the receiving device 20, which is executing a control program by a CPU 214 or operates under the control of the CPU 214.
[0052] The broadcast receiving unit 24 receives broadcast signals of broadcast programs transmitted from a radio station. The broadcast signal contains visual and audio information, and, considering ease of program selection, service information (SI) indicating the content of the broadcast program is multiplexed. As an example of service information, there may be information related to an electronic program guide (EPG) that includes information similar to a newspaper's television program.
[0053] The aforementioned video information, audio information, and their accompanying transmission control information are compressed in MPEG2 format and constitute a multiplexed transport stream (TS).
[0054] The broadcast receiver 24 is capable of receiving video information, audio information, and program arrangement information that are multiplexed in the broadcast signal.
[0055] The transceiver unit 21 receives publishing signals for publishing animated images, etc., from the network animated image service 34, etc., via the IP communication unit 213 and the network 30. The transceiver unit 21 can send a selection signal for selecting the publishing signal to be received to the network animated image service 34, the publishing source, via the IP communication unit 213 and the network 30.
[0056] The operation receiving unit 25 accepts various operations from users, such as live viewing, recording, recording reservation, and playback. The operation receiving unit 25 is also capable of accepting user voice signals transmitted from the remote control 219.
[0057] The live broadcast processing unit 26 processes live broadcasts, including broadcast programs and the display of animated images. Live broadcasting, for example, means receiving broadcast programs and animated images in real time at a given moment and displaying them on the display panel 210.
[0058] The video recording processing unit 27 processes broadcast programs and published moving images according to the recording operations and recording reservation operations from users. The video recording processing unit 27 stores the recorded programs and recorded moving images in the storage unit 29.
[0059] The playback processing unit 28 reads the recorded program and recorded motion images from the storage unit 29 and performs playback processing according to the playback operation from the user.
[0060] The storage unit 29 stores various parameters and control programs required for the operation of the receiving device 20. Additionally, the storage unit 29 can store recorded programs, recorded video images, and other content.
[0061] The data processing unit 23 is a computing unit that processes data that causes the receiving device 20 to operate as a whole.
[0062] The voice retrieval processing unit 22 is a computing unit that uses command signals and operation signals from the user's voice transmitted from the remote control 219 to retrieve and play recorded programs, recorded moving images, and other content. The voice retrieval processing unit 22 performs the following functions: retrieving and selecting content matching the user's preferences using a dialogue format with the user via voice (signals); parsing command (operation) signals from the user's voice using AI functions; and parsing voice signals and retrieving information via the network 30.
[0063] (Speech Retrieval Processing Department)
[0064] Next, the functional configuration of the speech retrieval processing unit 22 in the receiving device 20 of the embodiment will be described with reference to FIG3. As shown in FIG3, the speech retrieval processing unit 22 includes a display processing unit 41, a data control unit 42, a sound signal processing unit 43, a content processing unit 44, a language processing unit 45, a chat storage unit 46, and an exploration processing unit 47.
[0065] Display processing unit 41 is a calculation module that generates display information. This display information is used to display a menu screen on display panel 210, which represents content retrieval results based on user commands. The display information may include chat content between the user and the receiving device.
[0066] The sound signal processing unit 43 is a processing module that converts the user's voice-related sound signals sent from the remote control 219 into digital signals.
[0067] The language processing unit 45 is a processing module that parses the audio signal converted by the audio signal processing unit 43 to generate user command content (e.g., content search content). The language processing unit 45 can generate content search content using the language model 31, the search engine 32, or the audio-to-text conversion service 33. The language processing unit 45 has the following functions: parsing audio signals and interpreting the content of the user's speech; generating response content to the user based on the user's speech; continuing the dialogue until the information collected from the dialogue based on the user's speech and the response content reaches a level sufficient for content retrieval; and generating keywords for the search content.
[0068] The chat storage unit 46 is a storage medium that stores the received and response information exchanged with the user during the process of generating response content by the language processing unit 45 as chat information.
[0069] The exploration processing unit 47 is a computing module used to parse user commands and collect information during the process of generating search content by the language processing unit 45, and to access the language model 31, search engine 32, and voice-to-text conversion service 33 via the network. The exploration processing unit 47 accesses the language model 31, search engine 32, and voice-to-text conversion service 33 via the IP communication unit 213.
[0070] The data control unit 42 is a processing module that transmits digital signals converted by the voice signal processing unit 43, information sent and received by the language processing unit 45 in the form of dialogue with the user, and search content generated by the language processing unit 45 between functional modules.
[0071] The content processing unit 44 is a processing module that receives the search content generated by the language processing unit 45 from the data control unit 42 and retrieves the content related to the user's speech. For example, the content processing unit 44 uses content such as recorded programs and recorded moving images stored in the memory 216, and content currently being received or broadcast processed by the live viewing processing unit 26 as search objects, and generates search candidates that match the search content. Regarding the content selected by the user from the search candidates, the content processing unit 44 can issue a playback instruction to the playback processing unit 28.
[0072] (Example of the implementation method)
[0073] Next, referring to FIG4, the operation of the receiving device according to the embodiment will be described. The operation receiving unit 25 waits for the approach of the remote controller 219, i.e., the approach of the user, via the light receiving unit 212 (S100). The receiving device 20 detects the approach of the user via the remote controller 219. The detection of the user's approach by the receiving device 20 is not limited to the remote controller 219. The receiving device 20 may also be equipped with a detection mechanism such as a camera to detect the user's approach.
[0074] The operation receiving unit 25 detects the user's speech based on the audio signal transmitted from the remote control 219 via the light receiving unit 212 (S105). If no speech is detected (S105 no), the receiving device 20 maintains a user approach waiting state (S100).
[0075] If a voice signal is detected (S105), the operation receiving unit 25 sends the received voice signal to the voice signal retrieval processing unit 22, and the voice signal processing unit 43 receives the user's voice signal (S110).
[0076] The speech retrieval processing unit 22 analyzes the user's voice signal (S115). The speech retrieval processing unit 22 analyzes the user's voice signal to identify the user's command (operation).
[0077] The voice retrieval processing unit 22 generates a response content (response signal) corresponding to the received user's voice signal (S120). This response content may include repeating the question if the user's instruction is unclear, or adding a question if there is insufficient information for function execution (retrieval execution).
[0078] The speech retrieval processing unit 22 sends the generated response content to the operation receiving unit 25, which then displays the response content on the display panel 210 (S125). The response content can be associated with the user's speech content and displayed in a dialogue format. The speech retrieval processing unit 22 can also generate the response content as an audio signal. In this case, the operation receiving unit 25 sends the signal to the remote control 219 via the light receiving unit 212. The remote control 219 is capable of outputting an audio signal.
[0079] The operation receiving unit 25 receives an audio signal from the remote control 219 transmitted via the light receiving unit 212 (S130). Here, the received audio signal is a response from the user to the response content sent by the operation receiving unit 25.
[0080] The speech retrieval processing unit 22 analyzes the user's voice signal (S135). The speech retrieval processing unit 22 analyzes the user's voice signal to identify and interpret the user's instructions (operations).
[0081] The speech retrieval processing unit 22 determines whether a function corresponding to the identified user's instruction has been determined (S140). If the user's instruction can be determined through the exchange of information in the form of a dialogue between the user and the receiving device 20, for example, if the content of the dialogue is sufficient for content retrieval, the receiving device 20 can execute a function corresponding to that instruction (e.g., content retrieval). On the other hand, if the instruction cannot be determined (insufficient information), the receiving device 20 needs to repeat the dialogue to obtain additional information from the user. If the function corresponding to the instruction is uncertain (S140 no), the speech retrieval processing unit 22 stores the dialogue content as casual conversation in the chat storage unit 46 (S145). The casual conversation stored in the chat storage unit 46 can be used to determine whether a function has been determined based on the user's instruction. Thereafter, the process of responding to sending messages, receiving user responses, and parsing speech content is repeated until a function corresponding to the user's instruction is determined (S120-S135).
[0082] If the function corresponding to the instruction content is determined (in S140), the speech retrieval processing unit 22 stores the content of the conversation and the content of the determined function as a function chat in the chat storage unit 46 (S150).
[0083] The speech retrieval processing unit 22 retrieves content corresponding to the user's instruction content determined by the content processing unit 44 (S155). The display processing unit 41 displays the retrieval results on the display panel 210 (S160). Chatting interaction can also be performed if the user selects content from the retrieval results. The playback processing unit 28 plays the selected content (S165).
[0084] Thus, the receiving device according to the embodiment can respond to more flexible instructions because it determines the instruction content through dialogue with the user. Furthermore, the receiving device according to the embodiment stores the dialogue content with the user as chat, and this stored chat can be used when determining the instruction content. Therefore, even if the instruction from the user is not a formatted instruction, instructions that suit the user's preferences can be easily determined.
[0085] Furthermore, the action shown in Figure 4 is an example of how the same effect can be achieved even if part of the step is omitted or repeated.
[0086] (An example of the action of parsing the speech content in the implementation method)
[0087] Next, referring to FIG5, an example of the operation of the speech retrieval processing unit 22 will be described in detail. FIG5 is a flowchart showing an example of the operation of the speech retrieval processing unit 22. In the following description, elements common to the embodiments shown in FIG1 to FIG4 are shown with common reference numerals, and repeated descriptions are omitted.
[0088] In the operational example shown in Figure 5, when the operation receiving unit 25 detects a speech, it sends the received audio signal to the speech retrieval processing unit 22, and the audio signal processing unit 43 receives the user's audio signal (S116). In the following description, it is assumed that the audio signal processing unit 43 receives the audio "Minamoto no Yoritomo's younger brother's drama." The audio signal processing unit 43 then sends the received audio signal to the speech processing unit 45.
[0089] The language processing unit 45 performs language processing on the received sound signal via the exploration processing unit 47. Here, the exploration processing unit 47 sends the received sound signal to the voice-to-text conversion service 33 and the language model 31 via the network 30 for language parsing (S117). The exploration processing unit 47 receives information from the language model 31, for example, information that has been identified as "the play of Minamoto no Yoritomo's younger brother".
[0090] The exploration processing unit 47 identifies the information received from the language model 31 as "a play by Minamoto no Yoritomo's younger brother". Regarding "a play by Minamoto no Yoritomo's younger brother", the exploration processing unit 47 sends it to the search engine 32 via the network 30 for content parsing. The exploration processing unit 47 then receives the information "a play by Minamoto no Yoshitsune" from the search engine 32.
[0091] The language processing unit 45 analyzes the received results from the exploration processing unit 47 and interprets the voice signal from the user as "the drama of Minamoto no Yoshitsune" (S118). That is, the language processing unit 45 uses the language model 31 and the search engine 32 to interpret the instruction content of the user's voice signal.
[0092] The language processing unit 45 generates keywords based on the user's voice signal (S119). Here, for example, keywords such as "Minamoto no Yoshitsune" or "drama" are generated.
[0093] Based on the keywords generated by the language processing unit 45, the display processing unit 41 displays information on the display panel 210 for confirming the search content (S120). The display processing unit 41 generates a response such as "Is it a play by Minamoto no Yoshitsune?" The display processing unit 41 can output an audio signal via the operation receiving unit 25 instead of displaying it on the display panel 210, or it can output both display and audio signals.
[0094] After the operation receiving unit 25 receives confirmation from the user, the content processing unit 44 retrieves content from the storage unit 29, electronic program guide, etc., based on the keywords generated by the language processing unit 45 (S155).
[0095] The display processing unit 41 displays the search results of the content processing unit 44 on the display panel 210 (S160). The playback processing unit 28 plays the content based on the user's instruction (selection) (S165).
[0096] In the example described above, the language processing unit 45 uses the language model and retrieval engine, and voice-to-text conversion service to parse the user's command content through the exploration processing unit 47; however, it is not limited to this. The parsing results obtained in the past using the language model and retrieval engine and voice-to-text conversion service can also be stored in the chat storage unit 46 as casual chat or functional chat, and the language processing unit 45 interprets the user's commands based on the past chat history.
[0097] In this way, the receiving device in the implementation uses a language model and retrieval engine, as well as a voice-to-text conversion service, to parse the user's instructions. Therefore, even if the instruction signal is based on the user's dialogue, the keywords of the search content can be accurately obtained.
[0098] (An example of the transformation of voice dialogue in the receiving device of the embodiment)
[0099] Next, referring to Figure 6, the user instruction determination operation based on the dialogue method with the user in the receiving device of the embodiment will be described. Figure 6 is a diagram showing a chat situation in the receiving device of the embodiment, where the user speaks and the receiving device responds.
[0100] Suppose the user says via remote control 219, "I want to see a moving image of the mountain scenery" (S200).
[0101] The voice signal is sent to the language processing unit 45 via the voice signal processing unit 43. The language processing unit 45 performs language processing via the exploration processing unit 47, and interprets it as "a dynamic image of wanting to see the scenery of the mountain" (S202) through language parsing based on the voice-to-text conversion service 33 and the language model 31.
[0102] When the language processing unit 45 determines that it is uncertain about the instruction (function) based on "wanting to see a dynamic image of the mountain scenery", the language processing unit 45 generates a response content such as "What is it?", and the display processing unit 41 displays the response content generated by the language processing unit 45 on the display panel 210 (S204).
[0103] Suppose that the user, based on the response "What is it?", speaks, for example, "There are many animals" (S206) via remote control 219.
[0104] The voice signal is sent to the language processing unit 45 via the voice signal processing unit 43. The language processing unit 45 performs language processing via the exploration processing unit 47, and interprets it as "there are many animals" through language parsing based on the voice-to-text conversion service 33 and the language model 31 (S208).
[0105] When the language processing unit 45 determines that it is uncertain about the function based on "wanting to see a dynamic image of the mountain scenery" or "there are many animals", the language processing unit 45 generates a response content such as "what season is it?", and the display processing unit 41 displays the response content generated by the language processing unit 45 on the display panel 210 (S210).
[0106] Suppose that the user responds to the message “What season is it?” via remote control 219 and says, for example, “Autumn is good” (S212).
[0107] The voice signal is sent to the language processing unit 45 via the voice signal processing unit 43. The language processing unit 45 performs language processing via the exploration processing unit 47, and interprets it as "Good autumn" (S214) through language parsing based on the voice-to-text conversion service 33 and the language model 31.
[0108] When the language processing unit 45 determines that a function can be determined based on phrases such as "I want to see a dynamic image of a mountain view", "There are many animals", and "Autumn is good", it summarizes the chat based on the phrases "I want to see a dynamic image of a mountain view", "There are many animals", and "Autumn is good" (S216). For example, based on the summary content of these chats, the language processing unit 45 interprets it as "I want to see a dynamic image of a mountain with many animals in autumn".
[0109] The language processing unit 45 generates, for example, the response content "OK", and the display processing unit 41 displays the response content generated by the language processing unit 45 on the display panel 210 (S218).
[0110] The language processing unit 45 stores the summarized chat message "I want to see dynamic images of many animals in the mountains in autumn" in the chat storage unit 46 (S220). By storing the summarized chat message, it is easier to hit the message when the user makes a similar command.
[0111] The language processing unit 45 generates keywords based on the summarized phrase "want to see animated images of many animals in the mountains during autumn." Examples include "autumn," "mountain," "animals," and "animated images." The content processing unit 44 retrieves content based on these keywords (S222).
[0112] Thus, the receiving device according to the embodiment determines the content of the user's instructions based on the form of dialogue with the user, thereby enabling the exploration of the user's desired content based on the user's free speech. Furthermore, the receiving device according to the embodiment stores the summarized content as functional chat as long as the dialogue with the user continues, thus making it easy to determine future user instructions.
[0113] Next, referring to Figure 7, the instruction determination of the dialogue mode with the user in the receiving device of the embodiment will be explained. Figure 7 is a diagram showing other cases of chat in the receiving device of the embodiment, where the user initiates a conversation and the receiving device responds.
[0114] Suppose the user speaks via remote control 219, "Please suggest some recommended content" (S230).
[0115] The voice signal is sent to the language processing unit 45 via the voice signal processing unit 43. The language processing unit 45 performs language processing via the exploration processing unit 47, and interprets it as "Please advise on what content is recommended" (S232) through language parsing based on the voice-to-text conversion service 33 and the language model 31.
[0116] The language processing unit 45 determines that the instruction (function) can be determined based on "Please suggest some recommended content". The language processing unit 45 then refers to past casual chats, functional chats, electronic program guides, playback history, etc., from the chat storage unit 46 to generate response content that should be recommended to the user. Here, it is assumed that the language processing unit 45 generates the response content: "The currently recommended content includes 'programs that are trending on SNS', 'content currently being watched', 'the latest content of frequently watched content', and 'content viewed at this time every day'. Which one should you choose?" The display processing unit 41 displays the response content generated by the language processing unit 45 on the display panel 210 (S234).
[0117] Figure 8 shows an example of display information displayed on the display panel 210 by the display processing unit 41. In the example shown in Figure 8, thumbnails 210a, chat 210b, search titles 210c, program display 210d, messages from the receiving device 210e, and user messages 210f are represented.
[0118] Suppose that based on the response "The currently recommended content is... which one should I choose?", the user, via remote control 219, says, for example, "I want to see the latest content" (S236).
[0119] The voice signal is sent to the language processing unit 45 via the voice signal processing unit 43. The language processing unit 45 performs language processing via the exploration processing unit 47, and interprets it as "want to see the latest" (S238) through language parsing based on the voice-to-text conversion service 33 and the language model 31.
[0120] When the language processing unit 45 determines that a function can be determined based on "want to see the latest", the language processing unit 45 summarizes the chat for "want to see the latest" and the associated "latest content of frequently viewed content". For example, based on these chats, the language processing unit 45 interprets "display the latest content of frequently viewed content". The display processing unit 41 displays the chat summarized by the language processing unit 45 as the response content on the display panel 210 (S240).
[0121] The language processing unit 45 generates keywords based on the summarized chat. Examples include "frequently viewed content" and "latest content." The content processing unit 44 retrieves content based on these keywords (S242).
[0122] Thus, the receiving device according to the embodiment determines the content of the user's instructions based on the form of dialogue with the user, and therefore can explore the content desired by the user based on the user's free speech. Furthermore, the receiving device according to the embodiment can advance the chat based on past casual chats, functional chats, electronic program guides, playback history, etc. stored in the chat storage unit, and therefore can respond to instructions that match the user's preferences.
[0123] Several embodiments of the application have been described; however, these embodiments are presented as examples and are not intended to limit the scope of the application. These new embodiments can be implemented in a variety of other forms, and various omissions, substitutions, and modifications can be made without departing from the spirit of the application. These embodiments and their variations are included in the scope and spirit of the application, and are included in the technical solutions described in the claims and their equivalents.
Claims
1. A receiving device, wherein, The receiving device includes: The audio signal processing unit acquires audio signals, including the user's speech. The language processing unit parses the sound signal to interpret the content of the user's speech, and collects information through a dialogue, in which it generates response content to the user based on the content of the user's speech. The chat storage unit stores the content of the user's messages and the responses sent to the user. as well as The content processing unit retrieves content based on the user's messages and responses to the user stored in the chat storage unit.
2. The receiving device according to claim 1, wherein, The chat storage unit stores the content of the user's messages and responses to the user that are repeated multiple times.
3. The receiving device according to claim 1, wherein, The language processing unit repeatedly collects information based on the dialogue format until the content processing unit is able to retrieve the content based on the user's speech and the response to the user.
4. The receiving device according to claim 1, wherein, The receiving device also includes an exploration processing unit, which, in order for the language processing unit to interpret the content of the user's speech, obtains the parsing result via a network based on the content of the user's speech.
5. The receiving device according to any one of claims 1 to 4, wherein, The content includes at least one of the recorded broadcast programs and the recorded release of moving images. The receiving device further includes a content storage unit that stores the content, and the content processing unit retrieves the search object from the content storage unit based on the content of the user's messages and the response content to the user stored in the chat storage unit.