Audiovisual systems, audiovisual devices, and programs

The audiovisual system enhances realism by reflecting audio signals off the screen, addressing structural complexity and cost issues in integrated TV speakers, while maintaining image quality.

JP2026110246APending Publication Date: 2026-07-02D & M HOLDINGS INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
D & M HOLDINGS INC
Filing Date
2024-12-20
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing audio-visual systems that integrate a TV screen as a speaker face structural complexity and increased costs due to the need for vibration mechanisms, which also degrade image quality at high volumes.

Method used

An audiovisual system that separates the audio band signal component from the center channel and directs it towards the screen to be reflected, using directional speakers to enhance the sense of realism without screen vibration, thus reducing costs and maintaining image quality.

Benefits of technology

The system achieves a realistic sound experience as if the voice is directly coming from the screen, reducing costs and preventing image degradation by eliminating the need for screen vibration mechanisms.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026110246000001_ABST
    Figure 2026110246000001_ABST
Patent Text Reader

Abstract

It achieves a sense of realism as if the voice is directly coming from the person on screen, at a low cost and without affecting the quality of the video. [Solution] The audiovisual device 1 reproduces and outputs audiovisual data as a video signal and multiple channel audio signals. Here, the audiovisual device 1 separates the audio band signal component from the center channel audio signal and outputs the separated audio signal along with the separated audio band signal component. The audio speaker 4 receives the audio band signal component output from the audiovisual device 1 and emits sound toward the screen of the monitor 2 connected to the audiovisual device 1. Here, the direction of sound emission from the audio speaker 4 is set so that the audio band signal component emitted from itself is reflected by the screen of the monitor 2 and reaches the listening point P.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to an audio-visual system.

Background Art

[0002] Conventionally, a multi-channel audio device that reproduces multi-channel audio data into audio signals of a plurality of channels is known. For example, the audio reproduction device described in Patent Document 1 reproduces 5.1-channel audio data into audio signals of a center channel, a front left channel, a front right channel, a surround left channel, a surround right channel, and a subwoofer channel, and outputs each audio signal from a corresponding output terminal.

[0003] In addition, Non-Patent Document 1 discloses a video-audio integrated TV that functions the TV screen itself as a speaker and emits sound from the TV screen. According to this video-audio integrated TV, since the voice of a person shown on the TV screen is emitted from the TV screen, it is possible to obtain a feeling as if the voice such as a line is emitted from the image of the person in the TV screen, and the sense of presence is enhanced.

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Non-Patent Documents

[0005]

Non-Patent Document 1

[0006] In the integrated picture-and-sound television described in Non-Patent Document 1, a mechanism to vibrate the television screen is necessary for the television screen itself to function as a speaker, which complicates the television's structure and increases costs. Furthermore, when high volume is output from the television screen, the vibration of the television screen becomes large, which in turn causes the image displayed on the television screen to vibrate, affecting the image quality.

[0007] This invention has been made in view of the above circumstances, and its purpose is to provide a technology that can achieve a sense of realism as if the sound is directly coming from the person shown on the screen, at low cost and without affecting the quality of the image. [Means for solving the problem]

[0008] To solve the above problems, the present invention comprises an audiovisual device and a speaker for sound.

[0009] The audiovisual device reproduces audiovisual data into a video signal and a multi-channel audio signal including the center channel, and outputs the reproduced video signal and the multi-channel audio signal. Here, the audiovisual device separates the audio band signal component from the center channel audio signal and also outputs the separated audio band signal component.

[0010] The audio speaker receives the audio band signal components output from the audiovisual device and emits sound towards the screen of the monitor connected to the audiovisual device. Here, the direction of sound emission from the audio speaker is set so that the sound of the audio band signal components emitted from it is reflected off the monitor screen and reaches the listening point.

[0011] For example, the present invention is An audiovisual system comprising an audiovisual device, a speaker for sound, and speakers for multiple channels including a center channel, The aforementioned audiovisual device is A playback means for reproducing audiovisual data into a video signal and the audio signals of the plurality of channels, A separation means for separating the audio band signal component from the audio signal of the center channel reproduced by the playback means, A video output means that outputs the video signal reproduced by the playback means, A channel audio output means that outputs the audio signals of the multiple channels reproduced by the playback means, The system includes an audio output means that outputs the signal component of the audio band separated from the audio signal of the center channel by the separation means to the audio speaker. The aforementioned speaker emits the signal component in the audio band when the audiovisual device outputs the signal component in the audio band, and the direction of sound emission from the speaker is set so that the sound of the signal component in the audio band is reflected off the screen of a monitor that displays an image according to the video signal output from the audiovisual device and reaches the listening point. [Effects of the Invention]

[0012] In this invention, the sound of the audio band signal component separated from the center channel audio signal is emitted from the audio speaker, reflected off the screen of a monitor connected to the audiovisual device, and reaches the listening point. As a result, the sound of the person displayed on the monitor screen is reflected off the screen and reaches the listening point, allowing the user to experience as if the voice, such as dialogue, is directly coming from the person on the screen, thus enhancing the sense of realism. Furthermore, since there is no need for a mechanism to vibrate the monitor screen to emit sound, costs can be reduced, and the degradation of image quality due to screen vibration can be suppressed.

[0013] Therefore, according to the present invention, it is possible to realize a sense of presence as if the voice is directly emitted from the person shown on the screen at low cost without affecting the quality of the video.

Brief Description of Drawings

[0014] [Figure 1] FIG. 1 is a schematic configuration diagram of an audio-visual system according to a first embodiment of the present invention. [Figure 2] FIG. 2 is a diagram for explaining the sound emission direction of the audio speaker 4, and is a view of the monitor 2 and the audio speaker 4 seen from the side. [Figure 3] FIG. 3 is a schematic functional configuration diagram of the audio-visual device 1. [Figure 4] FIG. 4 is a schematic functional configuration diagram of the audio-visual device 1A. [Figure 5] FIGS. 5(A) and 5(B) are a front view and a side view of the audio speaker 4A. [Figure 6] FIG. 6 is a diagram for explaining the sound emission direction of the audio speaker 4A, and is a view of the monitor 2 and the audio speaker 4A seen from the side. [Figure 7] FIG. 7 is a diagram for explaining the sound emission direction of the audio speaker 4A, and is a view of the monitor 2 and the audio speaker 4A seen from above.

Embodiments for Carrying Out the Invention

[0015] Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[0016] [First Embodiment] First, a first embodiment of the present invention will be described.

[0017] FIG. 1 is a schematic configuration diagram of an audio-visual system according to the present embodiment.

[0018] As shown in the figure, the audiovisual system according to this embodiment comprises an audiovisual device 1, a monitor 2, a plurality of channel speakers 3-1 to 3-5 (hereinafter simply referred to as channel speaker 3), a voice speaker 4, and a wireless terminal 5.

[0019] Audiovisual device 1 downloads audiovisual data from a media server (not shown) or the like via a network, and plays this audiovisual data as a video signal and audio signals for the center channel (hereinafter referred to as C channel), front left channel (hereinafter referred to as FL channel), front right channel (hereinafter referred to as FR channel), surround left channel (hereinafter referred to as SL channel), and surround right channel (hereinafter referred to as SR channel). It then outputs the played video signal and the audio signals for the C channel, FL channel, FR channel, SL channel, and SR channel.

[0020] Furthermore, the audiovisual device 1 monitors whether the video signal it plays back contains a scene of a person speaking. If it detects a scene of a person speaking, it separates the audio band signal component from the C channel audio signal played back in synchronization with the video signal representing the video containing the speaking scene, and outputs the C channel audio signal with the audio band signal component separated, as well as the audio band signal component, respectively.

[0021] Monitor 2 is positioned directly in front of the user at listening point P, with its screen facing listening point P, and displays images on the screen according to the video signal output from audiovisual device 1.

[0022] The channel speakers 3 for each of the C, FL, FR, SL, and SR channels are positioned in front of the user at listening point P, to the left front, to the right front, to the left rear, and to the right rear, respectively, and emit the sound of the audio signal of their corresponding channel, output from the audiovisual device 1, toward listening point P.

[0023] The audio speaker 4 is a wireless speaker, and for example, as shown in Figure 2, it is placed on a table 6 positioned between the monitor 2 and the listening point P, and emits sound (human voice) of the audio band signal component output from the audiovisual device 1 toward the screen of the monitor 2. Here, the direction of sound emission from the audio speaker 4 is set so that the sound beam 400 of the audio band signal component emitted from the audio speaker 4 is reflected off the screen of the monitor 2 and reaches the listening point P. It is preferable that the audio speaker 4 is a directional speaker such as a parametric speaker with a narrow sound emission range of the sound beam 400.

[0024] Wireless terminal 5 is, for example, a mobile device such as a smartphone or a tablet PC (Personal Computer), and functions as a remote controller for audiovisual device 1.

[0025] Next, the details of the audiovisual device 1 that constitutes the audiovisual system according to this embodiment will be described.

[0026] Note that existing speakers can be used for channel speaker 3 and audio speaker 4, and existing mobile devices such as smartphones and tablet PCs can be used for wireless terminal 5. Therefore, a detailed explanation of channel speaker 3, audio speaker 4, and wireless terminal 5 will be omitted.

[0027] Figure 3 is a schematic diagram of the functional configuration of the audiovisual device 1.

[0028] As shown in the figure, the audiovisual device 1 includes a monitor interface unit 100, a channel speaker interface unit 101, a wireless interface unit 102, a content acquisition unit 103, a content storage unit 104, an audiovisual playback unit 105, a voice scene detection unit 106, a voice separation unit 107, and a main control unit 108.

[0029] The monitor interface unit 100 is an interface for connecting to monitor 2, and the channel speaker interface unit 101 is an interface for connecting to channel speakers 3-1 to 3-5, respectively.

[0030] The wireless interface unit 102 is an interface for wireless communication with the audio speaker 4, the wireless terminal 5, and a media server (not shown).

[0031] The content acquisition unit 103 accesses a media server (not shown) via the wireless network interface unit 100 and downloads audiovisual data from this media server.

[0032] The content storage unit 104 stores audiovisual data downloaded from a media server (not shown).

[0033] The audiovisual playback unit 105 reproduces the audiovisual data stored in the content storage unit 104 as a video signal and audio signals for multiple channels (C channel, FL channel, FR channel, SL channel, and SR channel).

[0034] The voice scene detection unit 106 performs image recognition processing on the video signal played back by the audiovisual playback unit 105 and detects voice scenes, such as people talking, from the image shown by the video signal. The voice scene detection unit 106 also outputs the video signal played back by the audiovisual playback unit 105 to the monitor 2 via the monitor interface unit 100.

[0035] The audio separation unit 107 outputs the audio signals of multiple channels (C channel, FL channel, FR channel, SL channel, and SR channel) reproduced by the audiovisual playback unit 105 to the corresponding channel speaker 3 via the channel speaker interface unit 101.

[0036] Furthermore, when the voice scene detection unit 106 detects a scene in which a person is speaking from the video signal played back by the audiovisual playback unit 105, the voice separation unit 107 separates the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105 in synchronization with the video signal representing the video containing the speaking scene. Then, it outputs the C channel audio signal with the separated audio band signal component to the C channel speaker 3-1 via the channel speaker interface unit 101, and also outputs this audio band signal component to the audio speaker 4 via the wireless interface unit 102.

[0037] The main control unit 108 then comprehensively controls each of the audiovisual device 1 parts 100-107 in accordance with user instructions received from the wireless terminal 5 via the wireless interface unit 102.

[0038] The functional configuration of the audiovisual device 1 shown in Figure 3 can be implemented in hardware using integrated logic ICs such as ASICs (Application Specific Integrated Circuits) and FPGAs (Field Programmable Gate Arrays), or in software using a computer such as a DSP (Digital Signal Processor). Alternatively, it can be implemented as a process in a general-purpose computer such as a PC, which is equipped with a CPU, memory, auxiliary storage devices such as flash memory and hard disk drives, and wireless communication devices such as wireless LAN adapters, by having the CPU load a predetermined program from the auxiliary storage device into memory and execute it.

[0039] In the audiovisual system configured as described above, the audiovisual device 1, in accordance with user instructions received from the wireless terminal 5, accesses a media server (not shown) and downloads audiovisual data from it. The audiovisual data is then reproduced as a video signal and multiple channels of audio signals, and the video signal is output to the monitor 2, while the multiple channels of audio signals are output to the corresponding channel speakers 3. In response, the monitor 2 displays the image according to the video signal output from the audiovisual device 1. In addition, channel speakers 3-1 to 3-5 each emit sound from the audio signals of their corresponding channels output from the audiovisual device 1 towards the listening point P.

[0040] Here, if the audiovisual device 1 detects a scene of a person speaking from the image shown by the played video signal, it separates the audio band signal component from the C channel audio signal played in synchronization with the video signal representing the image containing this speaking scene. Then, it outputs the C channel audio signal with the separated audio band signal component to the C channel speaker 3-1, and also outputs the audio band signal component to the audio speaker 4. In response, the audio speaker 4 emits the sound of the audio band signal component output from the audiovisual device 1 toward the screen of monitor 2. The emitted sound beam 400 is reflected by the screen of monitor 2 and reaches the listening point P.

[0041] The first embodiment of the present invention has been described above.

[0042] In this embodiment, if the video signal played back from the audiovisual data includes a scene of a person speaking, the audiovisual device 1 separates the audio band signal component from the C channel audio signal played back in synchronization with the video signal representing the image including the speaking scene, and outputs this audio band signal component to the audio speaker 4. The audio speaker 4 emits the sound of the audio band signal component output from the audiovisual device 1 toward the screen of the monitor 2. The sound beam 400 emitted from the audio speaker 4 is reflected off the screen of the monitor 2 and reaches the listening point P. As a result, the voice of the person displayed on the screen of the monitor 2 (dialogue, conversation, song, etc.) is reflected off the screen and reaches the listening point P, so the user perceives the screen of the monitor 2, which displays the person speaking, as if it were the source of the sound. Therefore, the user can experience as if the voice, such as dialogue, is directly coming from the person on the screen of the monitor 2, enhancing the sense of realism. Furthermore, since the mechanism for vibrating the screen of monitor 2 to emit sound is unnecessary, costs can be reduced, and the degradation of image quality due to screen vibration can be suppressed.

[0043] Therefore, according to this embodiment, it is possible to achieve a sense of realism as if the voice is directly coming from the person shown on the screen, at low cost and without affecting the quality of the image.

[0044] In this embodiment, in the audiovisual device 1, the voice scene detection unit 106 performs image recognition processing on the video signal played back by the audiovisual playback unit 105 and detects a person's voice scene from the image shown by this video signal. Then, when the voice scene detection unit 106 detects a person's voice scene, the audio separation unit 107 separates the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105 in synchronization with the video signal representing the image including this voice scene.

[0045] However, the present invention is not limited thereto. The voice scene detection unit 106 may perform image recognition processing on the video signal played back by the audiovisual playback unit 105 to detect a person from the image shown by the video signal, and the voice separation unit 107 may, when a person is detected by the voice scene detection unit 106, separate the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105 in synchronization with the video signal representing the image including this person. Alternatively, the voice scene detection unit 106 may be omitted from the audiovisual device 1, and the voice separation unit 107 may always separate the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105. In these cases as well, the sound including the voice of the person displayed on the screen of the monitor 2 is reflected off the screen of the monitor 2 and reaches the listening point P, so the same effect as in this embodiment can be obtained.

[0046] Furthermore, in this embodiment, the voice scene detection unit 106 and the voice separation unit 107 may be omitted from the audiovisual device 1, and the voice speaker 4 may also be omitted. Instead, a C-channel speaker 3-1 may be positioned so that the audio beam of the C-channel audio signal is emitted from this speaker 3-1 toward the screen of the monitor 2. This way, the audio beam emitted from the C-channel speaker 3-1 will be reflected by the screen of the monitor 2 and reach the listening point P.

[0047] [Second Embodiment] Next, a second embodiment of the present invention will be described.

[0048] The difference between the audiovisual system according to this embodiment and the audiovisual system according to the first embodiment shown in Figure 1 is that the audiovisual device 1A and the audio speaker 4A are provided instead of the audiovisual device 1 and the audio speaker 4. The other configurations are the same as those of the audiovisual system according to the first embodiment shown in Figure 1.

[0049] Figure 4 is a schematic diagram of the functional configuration of audiovisual device 1A.

[0050] As shown in the figure, the difference between the audiovisual device 1A and the audiovisual device 1 according to the first embodiment shown in Figure 3 is that the voice scene detection unit 106a and the voice separation unit 107a are provided instead of the voice scene detection unit 106 and the voice separation unit 107. The other configurations are the same as those of the audiovisual device 1 according to the first embodiment shown in Figure 3.

[0051] The voice scene detection unit 106a performs image recognition processing on the video signal played back by the audiovisual playback unit 105, dividing the image shown by the video signal into multiple areas (for example, four areas: hereinafter referred to as divided video areas), and detects a person speaking in each divided video area. When a person speaking is detected in a divided video area, the identification information of that divided video area is passed to the audio separation unit 107a as information representing the position of the person speaking in the video. The voice scene detection unit 106a also outputs the video signal played back by the audiovisual playback unit 105 to the monitor 2 via the monitor interface unit 100.

[0052] The audio separation unit 107a outputs the audio signals of multiple channels (C channel, FL channel, FR channel, SL channel, and SR channel) reproduced by the audiovisual playback unit 105 to the corresponding channel speaker 3 via the channel speaker interface unit 101.

[0053] Furthermore, when the voice scene detection unit 106a detects a scene of a person speaking from any of the divided video areas of the video signal played back by the audiovisual playback unit 105, the voice separation unit 107a separates the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105 in synchronization with the video signal representing the video containing the speaking scene. Then, via the channel speaker interface unit 101, it outputs the C channel audio signal with the separated audio band signal component to the C channel speaker 3-1, and via the wireless interface unit 102, it outputs this audio band signal component, along with identification information of the divided video area where the person speaking scene was detected, to the audio speaker 4.

[0054] The functional configuration of the audiovisual device 1A shown in Figure 4 is implemented in hardware using integrated logic ICs such as ASICs and FPGAs, similar to the functional configuration of the audiovisual device 1A shown in Figure 3, or in software using a computer such as a DSP. Alternatively, it can be implemented as a process in a general-purpose computer such as a PC, which is equipped with a CPU, memory, auxiliary storage devices such as flash memory and hard disk drives, and wireless communication devices such as wireless LAN adapters, by having the CPU load a predetermined program from the auxiliary storage device into memory and execute it.

[0055] Figures 5(A) and 5(B) are a front view and a side view of the audio speaker 4A.

[0056] As shown in the figure, the audio speaker 4A includes a speaker array 40, a control unit 41, and a cabinet 42.

[0057] The front surface 43 of the cabinet 42 is inclined, and a speaker surface 44 is provided on this front surface 43.

[0058] The speaker array 40 is composed of multiple speaker units 45-1 to 45-4 (hereinafter also simply referred to as speaker units 45) arranged on the speaker surface 44 of the cabinet 42. Although Figure 5(A) shows a case where the speaker array 40 is composed of four speaker units 45, the number of speaker units 45 is not limited to this and should be the same as the number of divisions of the image by the sound scene detection unit 106a of the audiovisual device 1A.

[0059] Multiple speaker units 45 are associated one-to-one with multiple display areas (areas where the corresponding split-image areas are displayed) on the screen of monitor 2, and the sound emission direction of each speaker unit 45 is set so that the emitted sound beam is reflected by the corresponding display area on the screen of monitor 2 and reaches the listening P.

[0060] For example, as shown in Figures 6 and 7, the audio speaker 4A is positioned on a table 6 located between the monitor 2 and the listening point P. Multiple speaker units 45 emit sound from the audio-visual device 1 towards the screen of the monitor 2. Here, the screen of the monitor 2 is divided into four sections, defining four display areas (upper left area 20a, lower left area 20b, upper right area 20c, and lower right area 20d). The sound emission direction of speaker unit 45-1 is set so that the sound beam 401-1 of the audio-band signal component emitted from speaker unit 45-1 is reflected in the upper left area 20a of the monitor 2 screen and reaches the listening point P. The sound emission direction of speaker unit 45-2 is set so that the sound beam 401-2 of the audio-band signal component emitted from speaker unit 45-2 is directed towards the monitor. The sound is set to reflect off the lower left area 20b of the screen of monitor 2 and reach the listening point P. The sound emission direction of speaker body 45-3 is set so that the sound beam 401-3 of the audio band signal component emitted from speaker body 45-3 reflects off the upper right area 20c of the screen of monitor 2 and reaches the listening point P. The sound emission direction of speaker body 45-4 is set so that the sound beam 401-4 of the audio band signal component emitted from speaker body 45-4 reflects off the lower right area 20d of the screen of monitor 2 and reaches the listening point P.

[0061] Furthermore, it is preferable that the speaker body 45 is a directional speaker, such as a parametric speaker, with a narrow sound emission range, so that the sound beam is concentrated in the corresponding display area.

[0062] The control unit 41 includes a wireless interface unit 46, a speaker drive unit 47, and a speaker selection unit 48.

[0063] The wireless interface unit 46 is an interface for wireless communication with the audiovisual device 1A, and receives audio band signal components transmitted wirelessly from the audiovisual device 1A, to which identification information of divided video areas has been added.

[0064] The speaker drive unit 47 drives the speaker array 40 according to the audio band signal components received by the wireless interface unit 46 and emits sound from the speaker array 40.

[0065] The speaker selection unit 48 selects a speaker unit 45 from among the multiple speaker units 45 constituting the speaker array 40 that is associated with a display area on the screen of the monitor 2 that is displaying a divided video area identified by an identification signal of a divided video area added to the audio band signal component received by the wireless interface unit 46, that is, a display area that includes the display position of a person speaking. The speaker selection unit 48 turns on this speaker unit 45 and turns off the other speaker units 45. As a result, if the image displayed in any of the display areas on the screen of the monitor 2 includes a person speaking, the sound of the audio band signal component emitted from the speaker array 40 (the voice of the person speaking) can be reflected by that display area and delivered to the listening point P.

[0066] The control unit 41 of the audio speaker 4A shown in Figure 5(B) is implemented either in hardware using integrated logic ICs such as ASICs and FPGAs, or in software using a computer such as a DSP.

[0067] In the audiovisual system configured as described above, the audiovisual device 1A, in accordance with user instructions received from the wireless terminal 5, accesses a media server (not shown) and downloads audiovisual data from it. The audiovisual data is then reproduced as a video signal and multiple channels of audio signals, and the video signal is output to the monitor 2, while the multiple channels of audio signals are output to the corresponding channel speakers 3. In response, the monitor 2 displays the image according to the video signal output from the audiovisual device 1A. In addition, channel speakers 3-1 to 3-5 each emit sound from the audio signals of the corresponding channels output from the audiovisual device 1A toward the listening point P.

[0068] Here, if the audiovisual device 1A detects a scene of a person speaking from any of the multiple divided video areas obtained by dividing the image shown by the played-back video signal, it separates the audio band signal component from the C channel audio signal played in synchronization with the video signal representing the image containing this speaking scene. The C channel audio signal with the separated audio band signal component is then output to the C channel speaker 3-1, and the audio band signal component is output to the audio speaker 4A with identification information of the divided video area in which the person speaking scene was detected added.

[0069] In response, the audio speaker 4A selects a speaker unit 45 from the speaker array 40 that corresponds to the display area on the screen of Monitor 2, which displays a segmented video area identified by the identification information attached to the audioband signal component output from the audiovisual device 1A, i.e., the display area including the position of the person speaking. The speaker then emits the sound of this audioband signal component (the sound of the person speaking scene) from the selected speaker unit 45. The emitted sound beams 401-1 to 401-4 are reflected by the display area on the screen of Monitor 2 corresponding to the selected speaker unit 45 and reach the listening point P.

[0070] The second embodiment of the present invention has been described above.

[0071] In this embodiment, if the audiovisual device 1A contains a scene of a person speaking in one of the multiple divided video areas obtained by dividing the image shown by the video signal played back from the audiovisual data, it separates the audio band signal component from the C channel audio signal played back in synchronization with the video signal representing this image, and outputs this audio band signal component, along with identification information of the divided video area containing the speaking scene, to the audio speaker 4A. The audio speaker 4A then selects a speaker body 45 from the speaker array 40 that corresponds to the display area of ​​the divided video area identified by the identification information output from the audiovisual device 1A, and emits the sound of the audio band signal component (the voice of the person speaking) output from the audiovisual device 1A from this speaker body 45. The sound beams 401-1 to 401-4 emitted from the speaker body 45 are reflected by the display area on the screen of the monitor 2 corresponding to this speaker body 45 (the display area including the display position of the person speaking) and reach the listening point P. Therefore, the voice (dialogue, conversation, song, etc.) of the person displayed on the screen of monitor 2 is reflected off the display area including the person's position and reaches the listening point P. As a result, the user perceives the display area showing the person speaking as if it were the source of the sound. Consequently, the user can experience the sensation that the voice, such as dialogue, is directly coming from the person on the screen of monitor 2, enhancing the sense of realism. Furthermore, since a mechanism to vibrate the screen of monitor 2 to emit sound is unnecessary, costs can be reduced, and the degradation of image quality due to screen vibration can be suppressed.

[0072] Therefore, according to this embodiment, it is possible to achieve a sense of realism as if the voice of a person appearing on the screen were being output from the display area of ​​the screen in which that person is shown, at low cost and without affecting the quality of the image.

[0073] In this embodiment, in the audiovisual device 1A, the voice scene detection unit 106a performs image recognition processing on the video signal played back by the audiovisual playback unit 105 to divide the image shown by the video signal into multiple segmented video areas, and detects a person's voice scene in each segmented video area. Then, when the voice scene detection unit 106a detects a person's voice scene in any of the segmented video areas, the audio separation unit 107a separates the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105 in synchronization with the video signal representing the image containing this voice scene.

[0074] However, the present invention is not limited thereto. For example, the voice scene detection unit 106a may perform image recognition processing on the video signal played back by the audiovisual playback unit 105 to divide the image shown by the video signal into a plurality of divided video areas and detect a person in each divided video area. The voice separation unit 107a may, when the voice scene detection unit 106a detects a person speaking from any of the divided video areas, separate the audio band signal component from the C channel audio signal played back by the audiovisual playback unit 105 in synchronization with the video signal representing the image including this person. In this case as well, the same effect as in this embodiment can be obtained because the sound including the voice of the person shown on the monitor 2 screen is reflected from this screen and reaches the listening point P. If a person is detected in a plurality of divided video areas, for example, the audio band signal component may be output along with identification information of the divided video area in which the person most likely to be speaking (for example, a person with more pronounced lip movements) is detected. Furthermore, if the detected person does not include a person with lip movements, the audio band signal component may not be separated from the C channel audio signal. This prevents users from feeling that the direction of the voice (narration, etc.) of someone other than the person being displayed is inconsistent.

[0075] It should be noted that the present invention is not limited to the embodiments described above, and numerous modifications are possible within the scope of its essence.

[0076] For example, in each of the embodiments described above, the case in which the audiovisual devices 1 and 1A play back audiovisual data downloaded from a media server or the like (not shown) as a video signal and audio signals for the C channel, FL channel, FR channel, SL channel, and SR channel respectively was explained as an example. However, the present invention is not limited to this. The audiovisual devices 1 and 1A only need to play back audiovisual data as a video signal and audio signals for multiple channels including the C channel, and emit the played-back audio signals for multiple channels from the corresponding channel speakers 3.

[0077] Furthermore, in each of the above embodiments, a wireless speaker may be used as the channel speaker 3, and this wireless speaker may be connected wirelessly to the audio-visual devices 1 and 1A. Alternatively, wired speakers may be used as the audio speakers 4 and 4A, and these wired speakers may be connected to the audio-visual devices 1 and 1A by wire. [Explanation of symbols]

[0078] 1, 1A: Audio-visual equipment 2: Monitor 3-1~3-5: Channel speakers; 4, 4A: Audio speakers 5: Wireless terminal 6: Table 100: Monitor Interface Section 101: Channel speaker interface section 102: Wireless interface unit 103: Content acquisition unit 104: Content storage unit 105: Audiovisual playback unit 106, 106a: Voice scene detection unit 107, 107a: Audio separation unit 108: Main control unit

Claims

1. An audiovisual system comprising an audiovisual device, a speaker for sound, and speakers for multiple channels including a center channel, The aforementioned audiovisual device is A playback means for reproducing audiovisual data into a video signal and the audio signals of the plurality of channels, A separation means for separating the audio band signal component from the audio signal of the center channel reproduced by the playback means, A video output means that outputs the video signal reproduced by the playback means, The playback means outputs the audio signals of the multiple channels reproduced by the playback means to the speaker of the corresponding channel among the multiple channels, and when the playback means separates the signal component of the audio band from the audio signal of the center channel, the speaker of the center channel is provided with a channel audio output means that outputs the audio signal of the center channel after the separation of the signal component of the audio band. The system includes an audio output means that outputs the signal component of the audio band separated from the audio signal of the center channel by the separation means to the audio speaker. The aforementioned speaker emits the signal components of the audio band when the audiovisual device outputs such signal components, and the direction of sound emission from the speaker is set such that the sound of the signal components of the audio band is reflected off the screen of a monitor that displays images according to the video signal output from the audiovisual device and reaches the listening point. An audiovisual system characterized by the following features.

2. The audiovisual system according to claim 1, The audiovisual device further includes a detection means for detecting a person from the image shown by the video signal reproduced by the playback means. The separation means separates the audio band signal component from the audio signal of the center channel reproduced by the playback means when the person is detected by the detection means from the image shown by the video signal. An audiovisual system characterized by the following features.

3. The audiovisual system according to claim 1, The detection means detects scenes of a person speaking from the video signal shown by the playback means, The separation means separates the audio band signal component from the audio signal of the center channel reproduced by the playback means when the detection means detects a scene of the person speaking from the image shown by the video signal. An audiovisual system characterized by the following features.

4. The audiovisual system according to claim 3, The detection means detects the scene in which the person speaks from the video signal reproduced by the playback means, along with information representing the position of the person in the video. The separation means adds the information detected by the detection means to the signal component of the audio band separated from the audio signal of the center channel reproduced by the playback means. The aforementioned audio speaker has speaker bodies associated with a plurality of display areas set on the monitor screen to correspond to the positions on the image, and the direction of sound emission from the speaker bodies is set so that the sound from the speaker bodies is reflected by the display areas corresponding to the speaker bodies and reaches the listening point. The aforementioned speaker outputs the signal components of the audio band output from the audiovisual device from the speaker body associated with the display area, which includes the position identified by the information added to the signal components of the audio band. An audiovisual system characterized by the following features.

5. An audiovisual system according to any one of claims 1 to 4, The aforementioned speaker for sound is a directional speaker. An audiovisual system characterized by the following features.

6. An audiovisual device that reproduces audiovisual data into a video signal and audio signals for multiple channels, including a center channel, A playback means for reproducing the audiovisual data into the video signal and the audio signals of the multiple channels, A separation means for separating the audio band signal component from the audio signal of the center channel reproduced by the playback means, A video output means that outputs the video signal reproduced by the playback means, The playback means outputs the audio signals of the multiple channels reproduced by the playback means to the speakers of the corresponding channels, and when the playback means separates the audio band signal component from the audio signal of the center channel, the center channel speaker is provided with channel audio output means that outputs the audio signal of the center channel after the audio band signal component has been separated. The system includes an audio output means that outputs the signal component of the audio band separated from the audio signal of the center channel by the separation means to an audio speaker. An audiovisual device characterized by the following features.

7. A program that makes a computer function as an audiovisual device, A playback means for reproducing audiovisual data into a video signal and audio signals with multiple channels, including a center channel. A separation means for separating the audio band signal component from the audio signal of the center channel reproduced by the playback means, Video output means that outputs the video signal reproduced by the playback means, Channel audio output means for outputting the audio signals of the plurality of channels reproduced by the playback means, and The computer functions as an audio output means that outputs the signal components of the audio band separated from the audio signal of the center channel by the separation means. An audiovisual device characterized by the following features.