Method and system for improving the intelligibility of a group of persons engaged in conversation
The system enhances speech intelligibility in group conversations by processing and timing direct and reflected speech signals, addressing noise and reverberation issues to improve communication clarity.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- ROCKET SCI AG
- Filing Date
- 2025-12-10
- Publication Date
- 2026-06-25
AI Technical Summary
In group conversations where participants can switch between speaking and listening, challenges such as background noise, reverberation, and difficulty in hearing distant speakers reduce speech intelligibility, especially for those with hearing impairments.
A system using directional microphones and loudspeakers, combined with a digital signal processor, processes and transmits acoustic signals in real time to enhance speech intelligibility by delivering direct and reflected speech signals within a defined time window, reducing noise and reverberation.
Improves syllable and speech intelligibility by amplifying direct and delayed speech signals, enhancing clarity and reducing noise, allowing better communication among group members, including those with hearing impairments.
Smart Images

Figure EP2025086439_25062026_PF_FP_ABST
Abstract
Description
[0001] METHOD AND SYSTEM FOR IMPROVING THE INFINITENESS OF A GROUP OF CONVERSING PEOPLE
[0002] Description
[0003] The invention relates to a method for improving the intelligibility of a group
[0004] The invention relates to a group of five people conversing in an environment, preferably in a room, where each person in the group can be a speaker at times and a listener at other times, with the people located at two or more stationary positions P1, P2, ... Pn. The invention also relates to a system for use in the aforementioned method.
[0005] State of the art
[0006] 10. In lectures in seminar rooms, community meetings, and similar events, there is usually one speaker and a large number of listeners. In this case, the speakers can use microphones, allowing loudspeakers to ensure good intelligibility for all listeners. Appropriate electronics can also eliminate background noise and generate further improvements in speech intelligibility.
[0007] 15. In group conversations held indoors, for example in a restaurant hall, a garden restaurant, or around a conference table in an open-plan office, every speaker can also be a listener at times, and vice versa. Communication can be much more difficult in these situations. On the one hand, background noise, especially from other people present, is disruptive; on the other hand, reverberation can reduce the intelligibility of individual syllables.
[0008] 20. To improve speech intelligibility, passive measures are used in enclosed spaces, such as sound-absorbing panels (e.g., as partitions), carpets, acoustic curtains, and the like, which are intended to largely prevent resonance and echo formation. Depending on the building's structure, these can be difficult to install.
[0009] Active measures can include playing suitable noise or quiet background music as a sound masking system to reduce the disruptive effect of sound by covering it up. Such systems are also used to increase focus, concentration, and discretion in the workplace.
[0010] An additional difficulty arises in social group conversations, for example at a regular table in a restaurant, when someone wants to listen to a speaker who is sitting somewhat further away than directly next to or opposite them, and especially when a direct neighbor is speaking loudly to another person.
[0011] RSC-P03-WO indicates that the group is having a separate conversation, the course of which he himself does not wish to follow. For listeners with a hearing impairment, such a situation is even more difficult.
[0012] Description of the invention
[0013] The object of the present invention is to provide a method as described above.
[0014] 5. To demonstrate how to improve the intelligibility of other participants in a conversation. Another task is to improve the speech intelligibility of a selected speaker when several conversations are taking place simultaneously within the group. Furthermore, a system for implementing the presented methods should be described.
[0015] 10 The invention is solved by the features of the first claims of the respective categories. Improved variants are described in the dependent claims.
[0016] In the procedure described at the beginning, persons are located at the aforementioned stationary places P1, P2, ... Pn, thus providing a total of n places.
[0017] The 15 participants can be in a room, an enclosed space such as a restaurant or office, or in a courtyard, a garden restaurant, or a means of transport like a train, to name just a few examples. The seats P are essentially stationary during the execution of the procedure, although different seats may be chosen for the next arrangement. Typically, the seats are arranged around a table, as is usually the case in meetings and group discussions, but not in a train compartment.
[0018] According to the invention, a system is preferably arranged centrally with respect to the seats P. This would, for example, be located in the middle of the table, preferably above or below the heads of the participants in the conversation, i.e., either on the
[0019] 25. The device can be placed on a table, either standing or hanging above it. It could be integrated into a lamp, a table decoration, or the table itself, or even concealed within it. Generally, a suitable location is chosen that does not obstruct the participants' line of sight, but allows for direct visual contact with all participants so that the direct sound from all speakers reaches the system immediately, maintaining the impression of a direct conversation.
[0020] The system used for the procedure is described below. It comprises two or more directional microphones M1, M2, ... Mn, hereinafter generally referred to as microphones Mi, which are directed towards positions P1, P2, ... Pn, generally referred to as P; two or more directional loudspeakers L1, L2, ... Ln, hereinafter referred to as loudspeakers Lj, which are also directed towards the 5 positions P; and at least one digital signal processor (DSP).
[0021] The RSC-P03-WO can process and transmit acoustic signals in real time and is connected to all microphones Mi and loudspeakers Li for signal transmission (i, j=1 ... n). Instead of a single DSP, multiple DSPs can also be used, which are then connected to the loudspeakers and microphones accordingly. Since any expert
[0022] Although it is known that amplifiers must be present after the microphones and in front of the loudspeakers, these are not mentioned further.
[0023] Ideally, one person is present at each seat P and remains relatively stationary there. However, several people can also be in the area of a single seat P if, for example, an extra chair is placed at the table. It is also possible for a person to move between two seats P.
[0024] The inventive method is characterized in that each microphone Mi continuously receives acoustic signals Ai and forwards them to the DSP; the DSP derives a speech signal Si1 from each acoustic signal Ai of a microphone Mi.
[0025] 15 from location Pi identified in real time and a processed speech signal aSi1 generated from it, by separating it at least from the rest of the acoustic signal Ai and preferably also subjecting it to external noise suppression (denoising / active noise reduction), and that the DSP sends the processed speech signal aSi1 to one or more other
[0026] 20 loudspeakers Lj transmit, whereby the DSP recognizes subsequently arriving speech signals Si2, Si3, ... , which have the same characteristics as Si1 , as reflections of the speech signal Si1 and generates, in real time, processed reflected signals aSi2, aSi3, which it processes within a specified time window (4), which is at most 40ms
[0027] 25 takes, sends Lj to one or more other speakers.
[0028] It has been found that syllable intelligibility or speech intelligibility is increased when the speech signal arrives several times in succession in slightly modified form. Naturally, the speech signal is reflected off various surfaces, walls, etc., and arrives at each listener or at each seat P with a time delay after the arrival of the direct speech signal. The system according to the invention additionally delivers the speech signal Si1 arriving at the system, as well as further reflections of it, as subsequently arriving speech signals Si2 and Si3, in a processed form to each listener, so that the human ear or brain can perceive more intelligible syllables from the totality of the captured speech signals.
[0029] RSC-P03-WO The processing of the initial and reflected speech signals Si1, Si2, Si3, ... to aSi1, aSi2, aSi3, ... is initially achieved by separating them from the remaining acoustic signal Ai. Acoustic filters can also be used. These can improve speech signals by reducing interference and unwanted noise, thus enhancing clarity.
[0030] 5 and increase intelligibility and optimize signal quality.
[0031] The most important techniques and principles are noise reduction, echo and reverberation suppression, frequency adjustment (equalization), removal of unwanted sounds such as clicks and pops, dynamic compression, beamforming, active noise cancellation (ANC), and
[0032] 10 Language improvement through kl.
[0033] It should be noted that directional microphones Mi and directional loudspeakers Lj are used. The directional microphone Mi can be a microphone array consisting of several interconnected microphones, a microphone with a passive reflector or screen, or a single omnidirectional microphone which, due to algorithms, is spatially optimized.
[0034] 15 is selective. The directional microphones Mi thus primarily pick up speech signals from the seats Pi to which they are assigned, while speech signals from other seats Pj are picked up much less effectively. The same applies analogously to directional loudspeakers Lj: They primarily amplify the sound for the seats Pj to which they are directed. People in seats adjacent to Pj barely hear the speech signals emitted by that loudspeaker Pj, or at least much less intensely.
[0035] In a preferred method, the processed speech signals aSi1, speech signals aSi2, aSi3, ... are transmitted to the loudspeakers Lj with a partial time delay, so that, measured from the arrival of the speech signal Si1 (t=0), they all occur within a time window of up to 10 and 50 ms, preferably up to 10 and 30 ms.
[0036] 25 are transmitted at t=0. Repetitions of the initial speech signal arriving too early at a listener do not improve sound quality, whereas those arriving within a time window of 10-30 ms later have the greatest effect. For this reason, the DSP waits up to 10 ms after t=0 before transmitting the processed signals aSi1, aSi2, aSi3, ... to the loudspeakers Lj and refrains from transmitting the processed reflected signals if they could only be transmitted after 50 ms, preferably only after 30 ms, to avoid an echo effect.
[0037] The optimal time window depends on the arrival of the first direct sound at the listener and extends between 10 and 30 ms or 10 to a maximum of 50 ms thereafter. However, since this time is unknown, the time of arrival of the first speech signal S1 at microphone M1 is approximately defined as t=0.
[0038] RSC-P03-WO Preferably, the DSP has an initial routine that is executed at least at the beginning of use, and possibly also during the conversation, for example, periodically. This routine can record the exact positions of the people located at seats P. If a seat Pj is empty, the
[0039] The corresponding microphones Mj and speakers Lj can be deactivated. The system can also detect changes in the locations of people and consequently perform a repeated initialization routine.
[0040] Based on the initial routine or other input, the time t=0 can be adjusted if the distances of the positions P to the DSP are known. Thus
[0041] In step 10, the time it takes for the first direct sound from Pi to reach Pj can be estimated and compared to the detour the sound must travel from Pi via the DSP to reach Pj. A detour of approximately 34 cm corresponds to a time delay of 1 ms. Therefore, with a detour of approximately 1 m, the optimal time window begins 7 ms after the arrival of the first speech signal S1, i.e., 3 ms earlier. Given a signal processing time of approximately 2 ms by the DSP, the first signal in the example above must be delayed by approximately 5 ms to achieve the desired total delay of 10 ms.
[0042] To further improve sound quality, the DSP can include feedback suppression, which prevents processed speech signals aSi1 , aSi2, ... , previously emitted by any loudspeaker Lj and captured by microphones Mi, from being reprocessed and sent to loudspeaker Lj.
[0043] The volume levels of the speakers Lj can preferably be adjusted individually, in particular manually or via a self-calibration function. Furthermore, it is possible to output the speech signals Si1 and their reflections aSi2, aSi3, ... in processed form not only to
[0044] 25 to send the other speakers Lj (ji), but also to Li. This results in the speaker hearing themselves better, making them feel less compelled to speak louder, which would increase the noise level in the room, as can be observed in many restaurants.
[0045] In a further improved method, at most one person is present at each position P. Additionally, either several cameras Ki are positioned, each directed at a position Pi, or a single 360° camera K is used, which can selectively locate and analyze the individual positions Pi. Using the cameras Ki or the camera K, it can be determined which speaker at position Pi a listener at position Pj is listening to. Consequently, only the processed speech signals aSi1, aSi2, ... from position Pi are transmitted to the loudspeaker at Pj. This also makes it possible to communicate with a person.
[0046] to entertain RSC-P03-WO, who is sitting far away, without being disturbed by the conversations of the other people in the group.
[0047] In a further improved version, it is even possible to select the language in which you want to hear a conversation. The DSP has a [feature / device] for this purpose.
[0048] 5. Translation routine, whereby a listener at seat Pj can select their preferred language, and then all processed speech signals aSi1, aSi2, ... are translated into this language before being sent to the loudspeaker Lj. The language selection can be made, for example, using application software that can be run on a smartphone. Such software can also be used, for example, to...
[0049] The desired volume and / or voice can be set, preferably individually, for different loud or soft speakers in different locations. The voice of the translated output can be modeled after the voice of the speaker.
[0050] In another preferred method, the DSP continuously improves its
[0051] 15. Signal processing, assignments, volume settings and / or calibrations using AI, for example through machine or deep learning. This allows positions Pi and Pj to be adjusted over time as people move around, for example when more people arrive. Several people can occupy one position P, as long as there are no individual one-on-one conversations and no
[0052] 20 translations will be carried out.
[0053] The system according to the invention for carrying out a described method comprises two or more directional microphones (microphones) M1, M2, ... Mn, which are directed in different directions towards places P1, P2, ... Pn where people may be located during use, two or more directional loudspeakers (loudspeakers) L1, L2, ... Ln, which are also directed towards places P1, P2, ... Pn, and at least one digital signal processor (DSP) which can process and transmit acoustic signals in real time and is connected for signal transmission to all microphones Mi and loudspeakers Lj, with i, j=1 ... n.
[0054] In particular, each directional microphone Mi can be a microphone array consisting of several compound microphones, a microphone with a passive reflector or screen, or a single omnidirectional microphone that is spatially selective due to algorithms.
[0055] RSC-P03-WO Preferably, the system also includes one or more cameras K directed towards seats Pi to detect which speaker at seat Pi a listener at seat Pj is currently facing.
[0056] Furthermore, the system can include a control interface with which at least the
[0057] The system allows for the adjustment of speaker volume (Lj), and preferably also other settings such as microphone sensitivity (Mi) and / or speaker positions (Pi) for preferred speakers. The control interface can preferably be configured via mobile phones or other wearable devices. A real-time translation routine with speech output can be additionally implemented for translating processed speech signals (aSi1, aSi2, ...) into a preferred foreign language. Furthermore, the speaker's voice can be imitated.
[0058] The system preferably comprises a housing in or on which the DSP, the microphones Mi, the loudspeakers Lj, and optionally one or more cameras K are mounted. The housing also includes a device for setting it up or hanging it.
[0059] 15 Brief description of the drawings
[0060] The invention is illustrated in the following drawings and explained in more detail with the aid of the reference numerals explained later. The drawings show:
[0061] Fig. 1 shows a schematic view of two people at a table with the system according to the invention suspended above the table;
[0062] 20 Fig. 2a an acoustic time signal Ai, as it is detected at microphone Mi
[0063] Fig. 2b shows a processed speech signal aSi1 and further processed reflected speech signals aSi2, aSi3, ... , which are sent to a loudspeaker Lj;
[0064] Fig. 3 An example of a table, viewed from above, with eight seats, arranged in the middle with a system according to the invention.
[0065] Ways to implement the invention
[0066] Fig. 1 shows a room 2 with system 1 and with two places Pi and Pj, in order to explain the procedure and the system 1 used for it.
[0067] The time signals in Fig. 2a and b serve for clarification, in particular to explain the time delay of the output signals aSi1, aSi2, aSi3, .... Fig. 3 shows the spatial relationships and distances of a typical group of, for example, eight people, as often found in restaurants or at meeting tables.
[0068] These figures are arranged according to RSC-P03-WO. They are not intended to be seen as restrictive; they are merely illustrative examples.
[0069] The inventive system 1 and the inventive method will subsequently be explained in more detail with the help of all figures.
[0070] 5. System 1 for carrying out a method according to the invention comprises two or more directional microphones Mi, also called microphones Mi, designated M1, M2, ... Mn. They are directed in different directions towards locations P1, P2, ... Pn where people may be present during use. In Fig. 1, two such locations Pi and Pj are shown on a table with a reflective surface 3, where i, j = 1 ... n in each case. In the illustrated
[0071] 10. For example, in Pi the person is the speaker, in Pj the listener.
[0072] Each directional microphone (Mi) is typically either a microphone array consisting of several interconnected microphones, a microphone with a passive reflector or screen, or a single omnidirectional microphone that is spatially selective due to algorithms. An amplifier is usually located after each microphone (Mi).
[0073] 15 System 1 also includes two or more directional loudspeakers Lj, also called loudspeakers Lj, which are designated L1, L2, ... Ln. They are also aimed at positions P1, P2, ... Pn; usually, an amplifier is placed in front of each loudspeaker Lj.
[0074] At least one digital signal processor (DSP) of system 1, which can process and forward acoustic signals in real time, is required for signal transmission with all
[0075] 20 microphones M and loudspeakers L are connected. If system 1 includes multiple DSPs, these are connected to each other as well as to the microphones M and loudspeakers L.
[0076] In a preferred version, the system 1 can also include cameras K1, K2, ... Kn, which are directed at seats P1, P2, ... Pn, or a single camera K with a 360° detection angle, which can identify the individual seats P.
[0077] Figure 3 shows a table with a reflective surface 3, on which eight seats P1, ... P8 are provided. The system 1, comprising microphones Mi, loudspeakers Lj, and optional cameras K, is arranged centrally on or above the table, for example, in combination with table decorations or a lamp. The system is preferably positioned centrally with respect to seats P so that it can receive direct sound waves from the speakers and thus ensure that the path lengths of all speech signals from the speakers to the system 1 are as short as possible and not too dissimilar.
[0078] Instead of a real person, a virtual person can also participate in the conversation at location P, for example with a monitor equipped with a microphone and speakers.
[0079] RSC-P03-WO voice transmission can then optionally take place acoustically via the speakers and microphone on the monitor or via cable directly to the DSP.
[0080] The system 1 preferably comprises a housing 1 in or on which the DSP, the microphones Mi, the loudspeakers Lj and optionally one or more cameras K are attached.
[0081] There are 5. It includes a device for setting up or hanging.
[0082] In Fig. 1, the direct sound from Pi to Pj is shown with dashed arrows, as well as two further sound paths that, originating from the speaking person at Pi, are reflected off the reflective surfaces 3 of the table or the lower edge of the system 1 and reach the person at Pj. These sound paths occur naturally and are...
[0083] 10 methods according to the invention were also not affected.
[0084] System 1 can, in particular, include a control interface 5 with which at least the volume of the loudspeakers Lj, and preferably also further settings such as the sensitivity of the microphones Mi and / or the positions P of preferred speakers, can be set. The control interface 5 can preferably be set via mobile phones or other wearables of the users, or one or more modules 5 similar to a remote control are provided. In addition, System 1 can include a real-time translation routine with speech output for translating processed speech signals aSi1, aSi2, ... into a preferred foreign language. The preferred language would preferably be selected via a control interface.
[0085] 20 5 adjustable.
[0086] The method according to the invention is described below. It serves to improve the intelligibility of a group of people conversing in an environment, preferably a room 2, where each person can be both speaker and listener at times, and where the people are located at two or more places P1, P2, ... Pn. These places P remain stationary within a predetermined area during the group conversation. Seats at chairs around a table can be moved, but seats at adjacent tables are not included. The method is carried out using a system 1 described above, which is preferably arranged centrally with respect to the places P. Each microphone Mi continuously receives acoustic signals Ai and forwards them to the DSP. Figure 2a shows such a continuous signal Ai as captured by the microphone Mi. It consists mostly of background noise H.As soon as the person speaks at Pi, an initial signal Si1 is detected by microphone Mi because this person is within the detection range of the directional microphone Mi. Each syllable spoken by them is then recorded.
[0087] The RSC-P03-WO is also reflected as a sound wave at the reflection surfaces 3 and later reaches the microphone Mi, which is identified at the DSP as Si2, Si3, ...
[0088] The DSP identifies a speech signal Si 1 from location Pi from each acoustic signal Ai of a microphone Mi and generates a processed speech signal aSi1 from it in real time, by
[0089] 5 it is at least separated from the remaining acoustic signal Ai and preferably also subjected to active noise reduction. Figure 2b shows the processed speech signal aSi1. The DSP sends the processed speech signal aSi1 to one or more other loudspeakers Lj. Since the passage of each signal through the DSP takes a certain amount of time, usually about 2-4 ms, the
[0090] 10 processed speech signal aSi1 is sent to the loudspeaker Lj later than the speech signal Si1 is detected.
[0091] All subsequently arriving speech signals Si2, Si3, ... which have the same characteristics as Si1, are recognized by the DSP as reflections of the speech signal Si1. The DSP then generates these reflected signals, aSi2, aSi3, which are processed in real time and processed within a
[0092] 15 predefined time window 4, which lasts at most 40ms, sends to one or more other loudspeakers Lj.
[0093] Figure 2b shows the processed reflected speech signals aSi2 and aSi3, delayed from the processed speech signal aSi1. All of them are amplified and sent from loudspeaker Lj to position Pj. (The following appears to be a separate, unrelated sentence fragment: "Delayed processed reflected speech signals.")
[0094] 20 aSi, which are only made available after the end of time window 4, will be discarded.
[0095] Figure 1 shows the speech signals aSi1, aSi2, and aSi3 emitted by the loudspeaker. These signals reach the person at Pj in addition to the direct sound and the speech signals naturally reflected from it; they are shown as dashed lines and are not further labeled.
[0096] The person at Pj now hears and understands the syllables from the person at Pi much more clearly. This is partly because more information about these syllables reaches the system 1, and partly because their volume has been amplified.
[0097] It has been shown that the processed speech signals aSi1 and the reflected processed speech signals aSi2, aSi3 contribute more to syllable intelligibility 0 when they are partially delayed or not transmitted to the loudspeakers Lj at all. Measured from the arrival of the speech signal Si1 at the DSP (t=0), they should preferably all be transmitted within a time window 4 of up to 10 and 50 ms, preferably between up to 10 and 30 ms after t=0. Speech signals Si2, Si3, ... arriving too late are therefore no longer sent to the loudspeakers.
[0098] RSC-P03-WO Otherwise, a reverberation effect or undesirable sound coloration could occur. Therefore, time window 4 lasts a maximum of 40 ms, preferably 30 ms or 20 ms.
[0099] In Fig. 2b, as indicated by the dashed arrow, the processed speech signal aSi1 was transmitted with a time delay to avoid arriving at listener Pj too early.
[0100] The processed speech signal aSi3, however, would be transmitted more than 30 ms after t=0, which can create a reverberation effect. Therefore, in a preferred version, it is not transmitted when the time window, labeled 4' in Fig. 2b, is set to 20 ms.
[0101] The best results are achieved with a time window of 4 out of 20ms, which is 10ms after
[0102] 10. The arrival of the direct sound at the listener begins and ends 30ms later.
[0103] Preferably, the DSP includes feedback suppression, which prevents processed speech signals aS captured by the microphone Mi, which were previously emitted by any loudspeaker Lj, from being reprocessed and sent to one or more loudspeakers Lj.
[0104] 15 The DSP preferably has an initialization routine that is performed at the start of use. The volume levels of the loudspeakers Lj can preferably be adjusted individually, in particular manually or by means of a self-calibration function.
[0105] In a preferred method, at most one person occupies each seat P. Using one or more cameras K directed at the seats P, it can be determined...
[0106] The system determines which speaker at position Pi a listener at position Pj is listening to. Consequently, only the processed speech signals aSi1, aSi2, ... originating from position Pi are routed from the DSP to loudspeaker Pj. This method allows people sitting far apart to converse.
[0107] In Fig. 3, the person at P1 can therefore communicate very well with a person sitting at P4 or P5, which would otherwise be hardly possible in a crowded restaurant. Even conversations from seat P1 with people at P3 or P6 are often difficult if other lively conversations are taking place at the table.
[0108] At conferences, but also in restaurants, it often happens that only one person speaks at a time, while the others listen and contribute to the conversation in turn. The acoustics, and thus the quality of the conversation, are significantly improved in this case as well by the inventive method. One consequence of this is that people can speak more quietly, which further improves the atmosphere of the conversation.
[0109] Furthermore, System 1 can have a translation routine, allowing a listener at seat Pj to select their preferred language and then hear all the processed content.
[0110] RSC-P03-WO speech signals aSi1, aSi2, ... are translated into this language before being sent to the loudspeaker Lj. Since the loudspeakers Lj are directional, the neighbors of Pj are not disturbed. It is also possible to send the processed speech signals aSi1, aSi2, ... to headphones, but due to the time delay, this is only practical for translations or if the listener uses a hearing aid. Combining the processed signals with natural direct sound is then prevented or at least severely limited.
[0111] The DSP can continuously improve its signal processing, mappings, volume settings, and / or calibrations through machine learning using Kl. This allows, particularly when cameras K are used, the time delay to be adjusted based on the distances between people. Speaker outputs to P8 in Fig. 3 originating from P1 therefore require less time delay than those originating from P4, since the direct sound from P4 to P8 travels for approximately the same amount of time as a speech signal traveling via system 1. Conversely, the direct sound from P1 to P8 arrives at P8 significantly earlier than the signal traveling via system 1.
[0112] RSC-P03-WO Reference Mark List
[0113] P Place, places
[0114] P1, P2, ... Place 1, Place 2, ...
[0115] Directional microphone or directional microphones, also microphone or microphones
[0116] 5, i=1 ... n
[0117] M1 , M2, ... Microphone 1, Microphone 2, ...
[0118] Lj directional loudspeaker, also called loudspeaker, j=1... n
[0119] L1, L2, ... Speaker i, Speaker 2
[0120] K camera or charnel house
[0121] K1 , K2, ... Camera 1 , Camera 2, ...
[0122] DSP Digital Signal Processor(s)
[0123] AI captures acoustic signals at microphone i
[0124] Si speech signal from the acoustic signal at microphone Mi captured
[0125] H background noise
[0126] 15 aS processed speech signal, generally aSi1 first captured processed speech signal from the acoustic signal at microphone Mi captured, separated from the background noise
[0127] Si2, Si3, ... later arriving speech signals from Si1, reflected speech signals from Si1; aSi2, aSi3 processed reflected speech signals from Si1, captured from the acoustic signal at microphone Mi, separated from the background noise; i, j counter, i=1... n; j=1... n
[0128] 1 system, housing
[0129] 2 rooms
[0130] 25 3 Reflective surface
[0131] 4 time windows, general, 4' time windows with a duration of 20ms
[0132] 5 Control interface
[0133] RSC-P03-WO
Claims
Patent claims 1. Method for improving the intelligibility of a group of people conversing in an environment, preferably a room (2), in which each person of the group can be a speaker at times and a listener at other times, wherein the persons are located at two or more stationary positions P1, P2, ... Pn, using a system (1) preferably arranged centrally with respect to the positions P, comprising two or more directional microphones (microphones) M1, M2, ... Mn directed towards the positions P1, P2, ... Pn, two or more directional loudspeakers (loudspeakers) L1, L2, ... Ln also directed towards the positions P1, P2, ... Pn, at least one digital signal processor (DSP) capable of processing and forwarding acoustic signals in real time and connected to all microphones Mi and loudspeakers Lj for signal transmission (i, j=1 ...n), characterized in that each microphone Mi continuously receives acoustic signals Ai and forwards them to the DSP; the DSP identifies a speech signal Si1 from location Pi in real time from each acoustic signal Ai of a microphone Mi and generates a processed speech signal aSi1 from it by separating it at least from the rest of the acoustic signal Ai and preferably also subjecting it to active noise reduction, and that the DSP sends the processed speech signal aSi1 to one or more other loudspeakers Lj, wherein the DSP recognizes subsequently arriving speech signals Si2, Si3, ... which have the same characteristics as Si1, as reflections of the speech signal Si1 and generates similarly processed reflected signals aSi2, aSi3 from them in real time, which it sends to one or more other loudspeakers Lj within a predetermined time window (4) which lasts at most 40ms.
2. Method according to claim 1, characterized in that each directional microphone Mi is a microphone array consisting of several array microphones, a microphone with a passive reflector or screen, or a single omnidirectional microphone which is spatially selective due to algorithms.
3. Method according to claim 1 or 2, characterized in that the processed speech signals aSi1 and the processed reflected speech signals aSi2, aSi3, ... are transmitted to the loudspeakers Lj with a partial time delay, so that they are all in a time window (4) between up to 10 and 50ms, preferably between up to 10 and 30ms after t=0, wherein t=0 is defined by the arrival of the speech signal Si1 at the DSP.
4. Method according to one of the preceding claims, characterized in that the DSP comprises feedback suppression which prevents processed speech signals aSi1 , aSi2, ... , previously emitted by any loudspeaker Lj and captured by microphones Mi, from being reprocessed and sent to loudspeaker L.
5. Method according to one of the preceding claims, characterized in that the DSP has an initial routine and performs this routine at least at the beginning of use.
6. Method according to one of the preceding claims, characterized in that the volume levels of the loudspeakers Lj and / or the time windows (4) of each microphone Mi to each loudspeaker Lj are individually adjusted, in particular manually or by a self-calibration function.
7. Method according to one of the preceding claims, characterized in that at most one person is located at each seat and that by means of one or more cameras K directed at the seats P it can be determined which speaker at seat Pi a listener at seat Pj is listening to, so that as a result only the processed speech signals aSi1 , aSi2, ... are routed from the DSP to the loudspeaker at Pj which originate from seat Pi.
8. Method according to claim 7, characterized in that the DSP has a translation routine, wherein a listener at position Pj can select his preferred language and thereafter all processed speech signals aSi1 , aSi2, ... are translated into this language before being directed to the loudspeaker Lj, wherein the speech output preferably also imitates the voice of the speaker.
9. Method according to one of the preceding claims, characterized in that the DSP continuously improves its signal processing, assignments, voice imitations, volume settings, individual time windows (4) and / or calibrations, preferably by machine and / or deep learning, using Kl.
10. System (1) for carrying out a method according to any of the preceding claims, comprising two or more directional microphones (microphones) M1 , M2, ... Mn, directed in different directions towards places P1, P2, ... Pn where people may be located during use, two or more directional loudspeakers (speakers) L1 , L2, ... Ln, also directed towards places P1 , P2, ... Pn, and at least one digital signal processor (DSP) capable of processing and forwarding acoustic signals in real time and connected to all microphones Mi and loudspeakers Lj for signal transmission (i, j=1... n).
11. System according to claim 10, characterized in that each directional microphone Mi is a microphone array consisting of several array microphones, a microphone with a passive reflector or screen, or a single omnidirectional microphone Mi which is spatially selective due to algorithms.
12. System according to one of claims 10 or 11, characterized by one or more cameras Ki directed at seats Pi, for detecting which speaker at seat Pi a listener at seat Pj is currently facing.
13. System according to one of claims 10 to 12, characterized by a control interface (5) with which at least the volume of the loudspeakers L, preferably also further settings such as the sensitivities of the microphones M and / or seats Pi of preferred speakers can be set, wherein the control interface (5) can preferably be set via mobile phones or other wearables of the users.
14. System according to claim 13, characterized by a real-time translation routine with speech output, for translating processed speech signals aSi1 , aSi2, ... into a preferred foreign language, wherein the speech output preferably also imitates the voice of the speaker.
15. System according to one of claims 10 to 14, comprising a housing (1) in or on which the DSP, the microphones Mi, the loudspeakers Lj and optionally the one or more cameras K are attached, wherein the housing (1) comprises a device for setting up or hanging.