Teleconference systems, communication terminals, teleconference methods, and computer-readable media

By introducing features such as speech confirmation, voice output control, and speech count display into the teleconference system, the problem of speech conflicts in teleconferences has been resolved, resulting in a smoother communication experience.

CN116636194BActive Publication Date: 2026-06-30NEC PLATFROMS LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NEC PLATFROMS LTD
Filing Date
2021-11-18
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In teleconferences, real-time communication is difficult due to communication delays and the inability to see other participants' faces directly, leading to frequent conflicts and disrupting the smooth running of the meeting.

Method used

By introducing a speech determination device, a voice output control device, a counting device, and a count display control device into the teleconference system, the speaking or agreeing status of each participant is determined, and the speaking output of another participant is suppressed when there is a speaking conflict. At the same time, the number of speaking conflicts is recorded and displayed.

Benefits of technology

It effectively reduces speaking conflicts, improves the smoothness of teleconferences, and helps participants identify who wants to speak by displaying the number and status of speaking conflicts, thereby reducing dissatisfaction and promoting the smooth progress of the meeting.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116636194B_ABST
    Figure CN116636194B_ABST
Patent Text Reader

Abstract

A telephone conferencing system capable of conducting telephone conferences smoothly is provided. According to the invention, a speech determination unit (2) determines whether the speech of each participant in a telephone conference is a statement or an interruption. A speech output control unit (4) performs control such that the speeches of the multiple participants are output to the communication terminals of the multiple participants respectively. When another participant speaks while one of the multiple participants is speaking, the speech output control unit (4) performs control to suppress the output of the other participant's speech. A counting unit (6) counts the number of speech conflicts for each participant. A count display control unit (8) performs control such that a display related to the count is displayed on the communication terminals of the multiple participants.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a telephone conferencing system, a communication terminal, a telephone conferencing method, and a computer-readable medium. Background Technology

[0002] In recent years, it has become possible to hold conferences between two or more communication terminals located in geographically distant places via the internet. In such telephone conference systems (i.e., remote conferences, remote meetings), participants do not need to gather in a single conference room, and each participant can participate in the telephone conference from his or her seat or home.

[0003] Regarding this technology, Patent Document 1 discloses a communication control device that provides a realistic conference experience while taking into account the load on the communication line. Furthermore, Patent Document 2 discloses a conference system that is more expressive, easier to speak, and more interactive, without compromising the reproduction of existing speech even when new speech is given during the reproduction of existing speech.

[0004] Reference List

[0005] Patent documents

[0006] [Patent Document 1] Japanese Unexamined Patent Application Publication No. 2010-239393

[0007] [Patent Document 2] Japanese Unexamined Patent Application Publication No. 2001-230773 Summary of the Invention

[0008] Technical issues

[0009] In teleconferences, real-time communication is sometimes difficult due to latency and other factors. Furthermore, because other participants' faces are not always visually visible, it can be difficult to understand what they are doing. In such cases, speaking conflicts may occur, where one participant is speaking while another is also speaking. When a speaking conflict occurs, the participant who spoke later may choose not to speak. However, in this situation, the dissatisfaction of the participant who spoke later (i.e., the one who caused the speaking conflict) may increase. Therefore, speaking conflicts can hinder the smooth progress of a teleconference.

[0010] This disclosure is made to address the above-mentioned problems, and the purpose of this disclosure is to provide a telephone conferencing system, communication terminal, telephone conferencing method, and program that can conduct telephone conferences smoothly.

[0011] Solution to the problem

[0012] The teleconference system according to this disclosure includes: a speech determination device for determining whether the voice of each of a plurality of participants in a teleconference is an instruction to speak or an agreement; a voice output control device for performing control such that the voice of each of the plurality of participants is output from the communication terminal of each of the plurality of participants, and performing control to suppress the output of the other participant's speech when one of the plurality of participants speaks; a counting device for counting the number of first speeches for each participant, the first speeches being the suppressed speech; and a count display control device for performing control such that a count-related display is displayed at the communication terminals of the plurality of participants.

[0013] The communication terminal according to this disclosure includes: a speech determination device for determining whether the voice of a user of the communication terminal is an instruction to speak or an agreement in a telephone conference in which the user participates; a voice output control device for performing control such that the voice of each of a plurality of participants in the telephone conference is output by the communication terminal, and the voice of the user is output by a first communication terminal which is a communication terminal of each of the plurality of participants, and performing control when the user speaks during the speaking of one of the plurality of participants, so as to suppress the output of the user's speech at the first communication terminal; a counting device for counting the number of first speeches for the user, the first speeches being the output suppressed speeches; and a count display control device for performing control such that a count-related display is performed at the first communication terminal.

[0014] A teleconference method according to this disclosure includes: determining whether the voice of each of a plurality of participants in the teleconference is an instruction to speak or an agreement; performing control such that the voice of each of the plurality of participants is output by the communication terminal of each of the plurality of participants; performing control such that when one of the plurality of participants speaks while another participant speaks, the output of the other participant's speech is suppressed; counting the number of first speeches for each participant, the first speeches being the output suppressed speeches; and performing control such that a display related to the number of speeches is displayed at the communication terminals of the plurality of participants.

[0015] The procedure according to this disclosure enables a computer to perform the following functions: determine whether the voice of each of the multiple participants in a telephone conference is an instruction to speak or an agreement; execute control such that the voice of each of the multiple participants is output by the communication terminal of each of the multiple participants, and execute control to suppress the output of the other participant's speech when one of the multiple participants speaks; count the number of first speeches for each participant, the first speeches being the suppressed speech; and execute control such that a display related to the number of speeches is displayed at the communication terminals of the multiple participants.

[0016] Beneficial effects of the present invention

[0017] According to this disclosure, a telephone conferencing system, communication terminal, telephone conferencing method, and program that enable successful telephone conferencing can be provided. Attached Figure Description

[0018] Figure 1 This is a diagram illustrating a teleconferencing system according to an example embodiment of the present disclosure;

[0019] Figure 2 This is a flowchart illustrating a teleconference method performed by a teleconference system according to an exemplary embodiment of the present disclosure;

[0020] Figure 3 This illustrates a telephone conferencing system diagram according to a first example embodiment;

[0021] Figure 4 This is a configuration diagram of a communication terminal according to a first example embodiment;

[0022] Figure 5 This is a configuration diagram of a teleconference apparatus according to a first example embodiment;

[0023] Figure 6 This illustrates a participant information diagram according to a first example embodiment;

[0024] Figure 7 This is a flowchart illustrating a teleconference method performed by a teleconference system according to a first example embodiment;

[0025] Figure 8 This is a diagram of a teleconference system according to a second example embodiment;

[0026] Figure 9 This is a state diagram illustrating the sending and receiving of speaking status information in a teleconference system according to a second example embodiment;

[0027] Figure 10 This is a block diagram illustrating the configuration of the speech state detection unit according to a second example embodiment;

[0028] Figure 11 This illustrates a meeting information diagram according to a second example embodiment;

[0029] Figure 12 This is a configuration diagram of the conference control unit according to a second example embodiment;

[0030] Figure 13 This is a flowchart illustrating a teleconference method performed by a teleconference system according to a second example embodiment;

[0031] Figure 14 It is a diagram illustrating the conference images displayed in each communication terminal during a teleconference according to a second example embodiment; and

[0032] Figure 15 This is a diagram showing conference images displayed in each communication terminal during a teleconference according to a second example embodiment. Detailed Implementation

[0033] Example Implementation

[0034] (Summary of exemplary embodiments based on this disclosure)

[0035] Before giving a description of exemplary embodiments of the present disclosure, an outline of exemplary embodiments according to the present disclosure will be described. Figure 1 This diagram illustrates a telephone conferencing system 1 according to an exemplary embodiment of the present disclosure. The telephone conferencing system 1 implements a telephone conference (web conference). The telephone conference is conducted using the communication terminals of multiple participants. The telephone conferencing system 1 can be implemented, for example, by a computer. The telephone conferencing system 1 can be implemented by each communication terminal of a participant in the telephone conference or by a server managing the telephone conference, etc. The telephone conferencing system 1 can also be implemented by multiple devices such as a server and communication terminals.

[0036] The teleconferencing system 1 includes a speech determination unit 2, a voice output control unit 4, a counting unit 6, and a count display control unit 8. The speech determination unit 2 functions as a speech determination device. The voice output control unit 4 functions as a voice output control device. The counting unit 6 functions as a counting device. The count display control unit 8 functions as a count display control device.

[0037] Figure 2This is a flowchart illustrating a telephone conferencing method performed by a telephone conferencing system 1 according to an exemplary embodiment of the present disclosure. The speech determination unit 2 determines whether the speech of each of the multiple participants in the telephone conferencing is an instruction to speak or an echo (step S12). This determination method will be described in the following exemplary embodiments. Here, "speak" is speech (voice) corresponding to words (language) with meaningful content. On the other hand, "echo" (i.e., supportive response, meaningless talk, interruption, echo, echoing feedback, or echoing chat) is speech (voice) corresponding to words that are not meaningful in themselves. In this specification, "speak" and "echo" are considered to be opposite terms.

[0038] The voice output control unit 4 performs control such that the voice of each of the multiple participants is output by the communication terminal of each of the multiple participants. When one participant speaks while another participant speaks, the voice output control unit 4 performs control to suppress the output of the other participant's speech (step S14). That is, when a speech conflict occurs, the voice output control unit 4 suppresses the output of the other participant's speech (conflicting speech). In the following text, the later speech (the speech that caused the speech conflict) is sometimes referred to as the "conflicting speech".

[0039] Therefore, a conflicting statement is a statement whose output is suppressed. Suppression of the output of a conflicting statement includes, for example, conflicting statements that are not output by the communication terminals of every participant, but is not limited to this.

[0040] In this example embodiment, the term "speech conflict" refers to another participant speaking while one participant is speaking, rather than multiple participants' speeches being output simultaneously on each communication terminal. Note that in this example embodiment, the output of later speeches among multiple participants can be suppressed. Therefore, in this example embodiment, the occurrence of a "speech conflict" can be identified by the participant making the conflicting speech, rather than by other participants. That is, since the participant making the conflicting speech has already spoken while another participant's speech is being output on his or her own communication terminal, he or she can identify the occurrence of the speech conflict. On the other hand, since the output of conflicting speeches is suppressed on each communication terminal, participants other than the participant making the conflicting speech may not be able to identify that a speech conflict has occurred.

[0041] The counting unit 6 counts the number of times each participant's output was suppressed (conflicting speech; first speech) (step S16). The count display control unit 8 performs control to display the count-related information on the communication terminals of multiple participants (step S18). Therefore, each participant can know which participant has had many speech conflicts, etc.

[0042] Here, it can be said that a participant with many conflicting statements (a participant whose statements have already clashed many times) is a participant who wants to speak. Therefore, by displaying a large number of conflicting statements on the communication terminal of a participant in the teleconference, other participants can identify that the participant wants to speak. Therefore, other participants can take actions such as encouraging the participant to speak or waiting for the participant to speak. Thus, participant frustration, such as being unable to speak even if he or she wants to, can be reduced. Therefore, the teleconference system 1 according to this example embodiment can conduct teleconferences smoothly.

[0043] (First Example Implementation)

[0044] In the following description, exemplary embodiments will be illustrated with reference to the accompanying drawings. For clarity, the description and drawings have been omitted and appropriately simplified. Furthermore, throughout the drawings, the same components are denoted by the same reference numerals, and repeated descriptions will be omitted where necessary.

[0045] Figure 3 This diagram illustrates a teleconference system 20 according to a first example embodiment. The teleconference system 20 includes a plurality of communication terminals 30 and a teleconference device 100. A communication terminal 30 may be provided for each participant in the teleconference. The plurality of communication terminals 30 and the teleconference device 100 are interconnected via a network 22, enabling them to communicate with each other. The network 22 may be wired, wireless, or a combination of wired and wireless. The network 22 may be the Internet or a local area network (LAN).

[0046] Communication terminal 30 is, for example, a computer owned by a participant. Communication terminal 30 is, for example, a personal computer (PC) or a mobile terminal such as a smartphone or tablet. When a participant engages in a teleconference, communication terminal 30 sends voice data instructing the participants' speech (speaking or echoing) to teleconference device 100 via network 22. Communication terminal 30 receives voice data instructing another participant's speech (speaking or echoing) from teleconference device 100 via network 22. Communication terminal 30 outputs voice corresponding to the voice data, making it audible to the participant, who is the user of communication terminal 30.

[0047] Teleconference device 100 is, for example, a computer such as a server. Teleconference device 100 manages telephone conferences. Teleconference device 100 receives voice data from each participant's communication terminal 30 and sends it to multiple communication terminals 30. In this case, telephone conferencing device 100 does not need to send voice data to the communication terminals 30 that have already sent voice data (this also applies to other example embodiments). Note that in the first example embodiment, the term "voice" can also refer to "voice data indicating speech" as the object of processing in information processing.

[0048] Figure 4 This diagram illustrates a configuration of a communication terminal 30 according to a first example embodiment. The communication terminal 30 includes a control unit 32, a storage unit 34, a communication unit 36, and an interface unit (IF) 38 as main hardware components. The control unit 32, storage unit 34, communication unit 36, and interface unit 38 are connected to each other via a data bus or the like.

[0049] The control unit 32 is, for example, a processor such as a central processing unit (CPU). The control unit 32 functions as an arithmetic device performing, for example, control processing and arithmetic processing. The storage unit 34 is, for example, a storage device such as a memory or a hard disk. The storage unit 34 is, for example, a read-only memory (ROM) or random access memory (RAM). The storage unit 34 functions for storing, for example, control programs and arithmetic programs executed by the control unit 32. Furthermore, the storage unit 34 functions for temporarily storing processing data, etc. The storage unit 34 may include a database.

[0050] Communication unit 36 ​​performs the processing required to communicate with devices constituting the teleconferencing system 20 (e.g., teleconferencing device 100). Communication unit 36 ​​may include, for example, a communication port, a router, and a firewall. Interface unit 108 is, for example, a user interface (UI). Interface unit 108 includes input devices such as a keyboard, touch panel, or mouse, and output devices such as a display or speaker. Interface unit 108 receives operations of input data performed by a user (operator) and outputs information to the user. Interface unit 108 may include a sound collector (e.g., a microphone) and an imaging device (e.g., a camera) as input devices. Furthermore, at least a portion of interface unit 108 does not need to be physically integrated with communication terminal 30. At least a portion of interface unit 108 can be connected to communication terminal 30 via wired or wireless connection.

[0051] The communication terminal 30 includes a voice acquisition unit 42, a voice transmission unit 44, a voice reception unit 46, a voice output unit 48, a display information receiving unit 52, and an image display unit 54 as components. The voice acquisition unit 42, the voice transmission unit 44, the voice reception unit 46, the voice output unit 48, the display information receiving unit 52, and the image display unit 54 can be implemented through the aforementioned hardware configuration or software.

[0052] The voice acquisition unit 42 acquires voice messages from users of the communication terminal 30 who are participants in the teleconference. The voice acquisition unit 42 may acquire voice messages via a sound collector, which serves as an interface unit 38. The voice transmission unit 44 transmits the acquired user voice messages (voice data) to the teleconference device 100 via the network 22. The voice transmission unit 44 may also transmit voice messages (voice data) via the communication unit 36.

[0053] The voice receiving unit 46 receives the voice (voice data) of each participant in a telephone conference from the telephone conferencing device 100 via the network 22. The voice receiving unit 46 can receive the voice (voice data) through the communication unit 36. The voice output unit 48 outputs the voices of the multiple participants so that a user of the communication terminal 30 can hear them. The voice output unit 48 can output the voice through a speaker that serves as an interface unit 38.

[0054] The display information receiving unit 52 receives display information from the teleconferencing device 100 via the network 22. Here, the display information is information indicating the information displayed by the interface unit 38 of the communication terminal 30. The display information will be described later. The display information receiving unit 52 can receive the display information via the communication unit 36. The image display unit 54 displays an image corresponding to the received display information. The image display unit 54 can display the image via a display that serves as the interface unit 38.

[0055] Figure 5 This diagram illustrates a configuration of a teleconferencing apparatus 100 according to a first exemplary embodiment. The teleconferencing apparatus 100 includes a control unit 102, a storage unit 104, a communication unit 106, and an interface unit 108 as main hardware components. The control unit 102, storage unit 104, communication unit 106, and interface unit 108 are interconnected via a data bus or the like.

[0056] The control unit 102 is, for example, a processor such as a CPU. The control unit 102 functions as an arithmetic device performing, for example, analytical processing, control processing, and arithmetic processing. The storage unit 104 is, for example, a storage device such as a memory or a hard disk. The storage unit 104 is, for example, ROM or RAM. The storage unit 104 functions for storing, for example, control programs and arithmetic programs executed by the control unit 102. Furthermore, the storage unit 104 functions for temporarily storing processed data, etc. The storage unit 104 may include a database.

[0057] Communication unit 106 performs the processing required to communicate with other devices, such as communication terminal 30, via network 22. Communication unit 106 may include, for example, communication ports, routers, and firewalls. Interface unit (IF) 108 is, for example, a user interface (UI). Interface unit 108 includes input devices such as a keyboard, touch panel, or mouse, and output devices such as a display or speaker. Interface unit 108 receives operations based on input data performed by an operator and outputs information to the operator.

[0058] The teleconferencing device 100 according to the first example embodiment includes a participant information storage unit 110, a voice receiving unit 112, a speech determination unit 120, a voice output control unit 130, a count unit 140, and a display control unit 150 as components. The voice output control unit 130 includes a speech conflict determination unit 132 and a speech output suppression unit 134. The display control unit 150 includes a count display control unit 152 and an icon display control unit 154. The teleconferencing device 100 does not need to be physically composed of a single device. In this case, each of the aforementioned components can be implemented by multiple physically separate devices.

[0059] The participant information storage unit 110 functions as a participant information storage device. The voice receiving unit 112 functions as a voice receiving device. The speech determination unit 120 corresponds to... Figure 1 The speech determination unit 2 is shown. The speech determination unit 120 includes functions as a speech determination device. The voice output control unit 130 corresponds to... Figure 1 The voice output control unit 4 is shown. The voice output control unit 130 includes functions as a voice output control device. The count unit 140 corresponds to... Figure 1 The counting unit 6 is shown. The count counting unit 140 functions as a count counting device. The display control unit 150 functions as a display control device.

[0060] The speech conflict determination unit 132 includes functions as a speech conflict determination device. The speech output suppression unit 134 includes functions as a speech output suppression device. The count display control unit 152 corresponds to... Figure 1 The count display control unit 8 is shown. The count display control unit 152 includes functions as a count display control device. The icon display control unit 154 includes functions as an icon display control device.

[0061] Note that each of the foregoing components can be implemented, for example, by executing a program under the control of control unit 102. More specifically, each of the components can be implemented by control unit 102 executing a program stored in storage unit 104. Furthermore, each of the components can be implemented as needed by installing the necessary program stored in any non-volatile recording medium. Moreover, each of the components is not necessarily implemented by software executed by a program, and can instead be implemented, for example, by any combination of hardware, firmware, and software. Furthermore, each of the components can also be implemented using a user-programmable integrated circuit such as a field-programmable gate array (FPGA) or a microcomputer. In this case, the program composed of each of the foregoing components can be implemented using that integrated circuit. This also applies to other example embodiments described later.

[0062] The participant information storage unit 110 stores participant information, which is information about the participants in the teleconference.

[0063] Figure 6 This is a participant information diagram according to a first example embodiment. Figure 6 The diagram shows participant information corresponding to the four participants A through D in a conference call. The participant information includes each participant's identifier, participation status, and number of conflicts.

[0064] Here, "participation status" indicates how each participant is currently engaged in the conference call. Participation status is determined by the speech determination unit 120 and the speech conflict determination unit 132, which will be described later. Figure 6 In the example, participant A speaks while participant B is speaking. That is, participant A causes a speaking conflict. Therefore, participant A's participation state is "speaking conflict," while participant B's participation state is "speaking." Furthermore, participant C echoes, and participant D is not speaking. Therefore, participant C's participation state is "echoing," while participant D's participation state is "no speech."

[0065] Furthermore, the "Number of Conflicts" indicates the number of times each participant has caused a speaking conflict, i.e., the number of conflicting statements made by each participant. The number of conflicts is counted by the count unit 140, which will be described later. Figure 6 In the example, participant A has one conflict. As mentioned above, since participant A has caused a speaking conflict, the conflict count is updated from 0 to 1. Furthermore, participant B has two conflicts, participant C has one conflict, and participant D has zero conflicts.

[0066] The voice receiving unit 112 receives the voice (voice data) of the participants, who are users of each communication terminal 30, from each communication terminal 30 via the network 22. The voice receiving unit 112 also receives the voice (voice data) of the participants transmitted by the voice transmitting unit 44 of the communication terminal 30 via the communication unit 106. Therefore, the voices of participants A to D are received.

[0067] The speech determination unit 120 analyzes the speech received by the speech receiving unit 112 and performs speech recognition processing for each of the multiple participants. Then, the speech determination unit 120 determines whether each participant's speech is an instruction to speak or an agreement. That is, the speech determination unit 120 determines whether each participant is speaking (speaking or agreeing).

[0068] Specifically, the speech determination unit 120 analyzes the words included in the speech by performing processes such as acoustic analysis and natural language processing. Then, the speech determination unit 120 determines whether the speech contains meaningful words (subject, predicate, object, etc.). In other words, the speech determination unit 120 determines whether the speech contains words other than meaningless words (interjections, etc.). If the speech contains meaningful words, the speech determination unit 120 determines that the speech is a "speech." On the other hand, when the speech contains only meaningless words (interjections, etc.), the speech determination unit 120 determines that the speech is an "accompaniment." The speech determination unit 120 can determine whether the received speech includes human speech. If the speech does not include human speech, assuming that the speech is background noise, the speech determination unit 120 can determine whether the speech is a speech or an accompaniment without the above explanation.

[0069] The voice output control unit 130 performs control such that the voice of each of the multiple participants is output by the communication terminal 30 of each of the multiple participants. Specifically, the voice output control unit 130 transmits the received voice (voice data) to the communication terminal 30 of each of the multiple participants via the network 22 through the communication unit 106. Therefore, the voice is output by the voice output unit 48 of each communication terminal 30. Thus, participants A through D can hear the voices of the other participants. Furthermore, the voice output control unit 130 can perform mixing processing so that the voices of multiple participants are not interrupted when they are transmitted simultaneously. However, in this example embodiment, as will be described later, when a speaking conflict occurs, the output of the voice causing the conflict is suppressed. On the other hand, when the voice corresponds to an echo, the voice output control unit 130 transmits the voice to the communication terminal 30 of each of the multiple participants. Therefore, the echo of the participants is output by the voice output unit 48 of each communication terminal 30.

[0070] The speech conflict determination unit 132 determines for each of the multiple participants whether a speech conflict has occurred. Specifically, when the speech determination unit 120 determines that a participant is speaking, the speech conflict determination unit 132 determines whether another participant has started speaking during the time period from the start to the end of the participant's speech. When another participant also starts speaking during the time period when one participant is speaking, the speech conflict determination unit 132 determines that the other participant (i.e., the participant who spoke later) has caused a speech conflict. The speech of the other participant that has caused a speech conflict is called a conflicted speech. Figure 6 In the example, since participant A started speaking during the time period when participant B was speaking, the speech conflict determination unit 132 determines that participant A has caused a speech conflict and determines that participant A's speech is a conflict speech.

[0071] The speech output suppression unit 134 performs control to suppress the output of conflicting speech. Specifically, the speech output suppression unit 134 performs control to prevent conflicting speech (voice data) from being sent to the communication terminals 30 of multiple participants. As a result, since each communication terminal 30 does not receive conflicting speech (voice data), the communication terminal 30 does not output conflicting speech. Therefore, in Figure 6 In the example, in each communication terminal 30, participant A's speech (conflicting speech) does not interfere with hearing participant B's speech. Alternatively, the speech output suppression unit 134 can perform control such that each communication terminal 30 outputs the conflicting speech at a low volume. For example, the speech output suppression unit 134 can process the voice data of the conflicting speech such that the volume of the conflicting speech is reduced to a level that does not interfere with hearing the previous speech that has already been conflicted ( Figure 6 In the example, the degree of participant B's speech. Then, the voice output control unit 130 can send the processed voice data to each communication terminal 30. Therefore, in Figure 6 In the example, each communication terminal 30 outputs participant A's speech at an extremely low volume so as not to interfere with the hearing of participant B's speech.

[0072] The count unit 140 counts the number of times conflicting statements occur for each of the multiple participants. In other words, the count unit 140 counts the number of conflicting statements for each of the multiple participants (communication terminal 30). Therefore, for Figure 6 The number of conflicts shown is counted.

[0073] The display control unit 150 controls which image to display on each communication terminal 30 for each of the multiple participants. Specifically, the display control unit 150 generates display information indicating the image to be displayed on each communication terminal 30. Then, the display control unit 150 sends the generated display information to each communication terminal 30. Furthermore, the display control unit 150 can generate display information based on participant information stored in the participant information storage unit 110. Note that the display control unit 150 can send display information to the communication terminal 30 of the participant causing the speaking conflict, displaying a message indicating that another participant is speaking. Additionally, the display control unit 150 can generate display information including participant information and instructions, which instruct the display to be performed based on the participant information. In this case, the communication terminal 30 generates the image displayed by the interface unit 28 of the communication terminal 30 based on the display information.

[0074] The count display control unit 152 performs control such that the number of conflicting statements by each of the multiple participants is displayed on each communication terminal 30. Specifically, the count display control unit 152 generates display information indicating how many conflicting statements each participant has. Then, when the display control unit 150 sends the display information to the multiple communication terminals 30, the multiple communication terminals 30 display the number of conflicting statements for each participant. Figure 6 The example shows that in each communication terminal 30 of participants A through D, participant A has 1 conflict, participant B has 2 conflicts, participant C has 1 conflict, and participant D has zero conflicts. Therefore, each participant can know the number of conflicts for all participants. Thus, each participant can determine which participant wants to speak.

[0075] The conflict count display control unit 152 can display the number of conflicts exceeding a predetermined threshold in a more prominent manner than the display of conflict counts equal to or less than a predetermined threshold. That is, when a participant's conflict count exceeds the predetermined threshold, the conflict count display control unit 152 can display the conflict count in a more prominent manner than the display of conflict counts for other participants. The conflict count display control unit 152 generates display information including instructions for displaying the conflict count in this manner. For example, the conflict count display control unit 152 can display conflict counts below the threshold in black and conflict counts exceeding the threshold in red. Therefore, each participant can more reliably identify which participant wants to speak.

[0076] Furthermore, the count display control unit 152 can enable each communication terminal 30 to display the highest number of conflicts among multiple participants in a more prominent display format than other conflict count displays. The count display control unit 152 generates display information including instructions for displaying the conflict count in this format. For example, the count display control unit 152 can display the highest conflict count in red and the other conflict counts in black. Therefore, each participant can more reliably identify which participant has a higher number of conflicts than the others. This allows for a more reliable identification of which participant is relatively more eager to speak.

[0077] Furthermore, the count display control unit 152 can cause each communication terminal 30 to display a significantly larger number of conflicts than other conflict counts in a more prominent manner. For example, the count display control unit 152 subtracts the conflict counts of each of the other participants from the conflict count of the first participant among multiple participants. Then, when all values ​​obtained by the subtraction are greater than a predetermined threshold, the count display control unit 152 can cause the first participant's conflict count to be displayed in a more prominent manner than the conflict counts of other participants. The count display control unit 152 generates display information including instructions for displaying the conflict counts in this manner. For example, the count display control unit 152 can cause the first participant's conflict count to be displayed in red and the conflict counts of other participants to be displayed in black. Therefore, each participant can more reliably identify which participant's conflict count is significantly larger than the conflict counts of other participants. Therefore, it is possible to more reliably identify which participant is relatively more eager to speak.

[0078] The icon display control unit 154 performs control to display facial icons corresponding to each of the multiple participants on the communication terminal 30 of each participant. The icon display control unit 154 generates display information including instructions for displaying the facial icons. Figure 6 In the example, four facial icons corresponding to participants A through D are displayed on the communication terminal 30.

[0079] Here, the icon display control unit 154 can generate display information such that each facial icon moves according to the participation status of the corresponding participant (i.e., action, operation, or activation). Specifically, the icon display control unit 154 can display facial icons such that the facial icons of participants who have made conflicting statements are not moved. On the other hand, the icon display control unit 154 can display facial icons such that the facial icons of participants who have made statements other than conflicting statements are moved. Furthermore, the icon display control unit 154 can display facial icons such that the facial icons of participants who have agreed are moved.

[0080] For example, the icon display control unit 154 can display facial icons so that participants who have not yet spoken (speak or agree) can see the icons. Figure 6 In the example, the mouth of participant D's facial icon is closed. Furthermore, the icon display control unit 154 can display facial icons so that participants who have already spoken (excluding conflict statements) can... Figure 6In the example, the mouth of participant B's facial icon is open. Alternatively, the icon display control unit 154 can display facial icons such that the mouth of the facial icon of a participant who has spoken other than in a conflict speech is opened and closed. Furthermore, the icon display control unit 154 can display facial icons such that the mouth of the facial icon of a participant who has echoed (…) is opened and closed. Figure 6 In the example, the mouth of participant C's facial icon is open. Alternatively, the icon display control unit 154 can display facial icons such that the mouth of the facial icon of a participant who has already agreed is opened and closed. On the other hand, the icon display control unit 154 can display facial icons such that the mouth of the facial icon of a participant who has already spoken in conflict is opened and closed. Figure 6 In the example, participant A's facial icon has its mouth closed.

[0081] Therefore, each participant can see the facial icon displayed on each communication terminal 30 and know which participant is speaking. Furthermore, even if the communication terminal 30 of a participant who has already agreed is muted, each participant can still know that the participant who agreed has agreed. Moreover, since the facial icon of the participant who caused the speaking conflict does not move, each participant can avoid receiving annoyance caused by the speaking conflict.

[0082] Figure 7 This is a flowchart illustrating a teleconference method performed by a teleconference system 20 according to a first example embodiment. Figure 7 The processing shown is primarily performed by the teleconference device 100. The teleconference device 100 starts the teleconference (step S102). At this time, the display information generated by the display control unit 150 indicates to all participants that the mouth of the face icon is closed (the face icon is not moving) and the number of conflicts is zero.

[0083] Next, the voice receiving unit 112 receives the voice of participant X (step S104). Here, when participants A to D participate in... Figure 6 In the telephone conference shown, participant X (and participant Y, who will be described later) is one of participants A through D. Then, the speech determination unit 120 determines whether participant X's voice indicates speaking or echoing, as described above (step S106). When participant X's voice does not indicate speaking (i.e., indicates echoing) (No in step S108), the voice output control unit 130 performs control such that participant X's echo is output at each communication terminal 30 (step S112). Furthermore, the display control unit 150 (icon display control unit 154) performs control such that participant X's facial icon is displayed by each communication terminal 30, thereby moving it (step S114).

[0084] On the other hand, when participant X's voice instruction to speak (Yes in S108), the speech conflict determination unit 132 determines whether participant Y, who is different from participant X, is already speaking (step S120). When participant Y is not speaking (No in S120), no speech conflict occurs because no one else speaks when participant X speaks. Therefore, the voice output control unit 130 performs control so that each communication terminal 30 outputs participant X's speech (step S122). In addition, the display control unit 150 (icon display control unit 154) performs control so that participant X's face icon is displayed by each communication terminal 30, thereby moving it (step S124). At this time, the display control unit 150 can perform control so that each communication terminal 30 displays a message indicating that participant X is speaking.

[0085] On the other hand, when participant Y is speaking (Yes in S120), a speech conflict occurs due to participant X's speech. Therefore, the voice output control unit 130 (speech output suppression unit 134) performs control to suppress the output of participant X's speech (step S132). In addition, the count unit 140 increments the number of conflicts for participant X by 1 (step S134). Therefore, the number of conflicts for participant X in the participant information stored in the participant information storage unit 110 is updated. The display control unit 150 (count display control unit 152) performs control to update the display of the number of conflicts for participant X (step S136). The display control unit 150 performs control to display "Another participant is speaking" on participant X's communication terminal 30 (step S138).

[0086] (Second Example Implementation)

[0087] Next, a second exemplary embodiment will be described with reference to the accompanying drawings. For clarity, the following description and drawings have been omitted and appropriately simplified. Furthermore, throughout the drawings, the same components are denoted by the same reference numerals, and repeated descriptions will be omitted as necessary. The second exemplary embodiment differs from the first exemplary embodiment in that the functionality of the teleconferencing apparatus 100 according to the first exemplary embodiment can be implemented in each communication terminal.

[0088] Figure 8 This is a diagram illustrating a teleconferencing system 200 according to a second example embodiment. The teleconferencing system 200 includes a plurality of communication terminals 201A to 201D and a conference server 220. The communication terminals 201A to 201D are connected to a network such as the Internet. The communication terminals 201A to 201D and the conference server 220 are connected to each other via the network, enabling them to communicate with each other. Although Figure 8Four communication terminals 201 are shown, but the number of communication terminals 201 can be any number of two or more.

[0089] Each of the multiple communication terminals 201A to 201D includes a conference execution system 202, a camera 203, a microphone 204, a display 205, and a speaker 206. The conference execution system 202 is used to conduct telephone conferences. The camera 203 can capture (i.e., take pictures) the image (face, etc.) of the user of the communication terminal 201. The microphone 204 can collect the voice of the user of the communication terminal 201. The display 205 can display images related to the telephone conference. The speaker 206 can output the voices of the participants in the telephone conference (i.e., the users of communication terminals 201A to 201D).

[0090] The conference execution system 202 includes a speaking status detection unit 207, a conference information receiving unit 208, a conference control unit 209, and a conference information sending unit 210 as components. Each communication terminal 201 may include the hardware configuration of the communication terminal 30 according to the first example embodiment described above. A description of each component of the communication terminal 201 will be given later.

[0091] Communication terminal 201 sends voice information instructing the voice of its users to conference server 220. Communication terminal 201 detects the speaking status of users and sends speaking status information instructing the conference server 220. Here, "speaking status" indicates whether each participant is speaking or echoing. Speaking status can also instruct participants to remain silent.

[0092] Upon receiving voice information and speaking status information from each communication terminal 201, the conference server 220 performs mixing processing on the voice information of each user (conference participant). The conference server 220 then sends the speaking status information and the mixed voice information to multiple communication terminals 201. By sending the mixed voice information, the voice can be stably output from the speaker 206 at each communication terminal 201.

[0093] Figure 9This is a state diagram illustrating the sending and receiving of speaking status information in a teleconference system 200 according to a second example embodiment. Communication terminal 201A (communication terminal A) sends the speaking status information of user A of communication terminal 201A to the conference server 220. Communication terminal 201B (communication terminal B) sends the speaking status information of user B of communication terminal 201B to the conference server 220. Communication terminal 201C (communication terminal C) sends the speaking status information of user C of communication terminal 201C to the conference server 220. Communication terminal 201D (communication terminal D) sends the speaking status information of user D of communication terminal 201D to the conference server 220.

[0094] Communication terminal 201A receives the speaking status information of all users (users A to D) from conference server 220. Similarly, communication terminals 201B to 201D receive the speaking status information of all users (users A to D) from conference server 220. Each communication terminal 201 can receive the speaking status information of all users except its own from conference server 220. For example, communication terminal 201A can receive the speaking status information of users B to D from conference server 220.

[0095] Figure 10 This is a block diagram illustrating the configuration of the speech state detection unit 207 according to a second example embodiment. The speech state detection unit 207 corresponds to... Figure 1 The speech determination unit 2 shown is... Figure 5 The speech determination unit 120 shown is a speech state detection unit 207 that functions as a speech determination device. The speech state detection unit 207 includes a voice input unit 222, a voice detection unit 223, a language recognition unit 224, and a speech presence / absence determination unit 225.

[0096] The voice input unit 222 receives the voice signal (the user's voice signal from the communication terminal 201) collected by the microphone 204. The voice detection unit 223 detects voice information from the voice signal. The speech recognition unit 224 performs speech recognition processing, acoustic analysis, natural language processing, etc., in order to identify meaningful language (subject, predicate, object, etc.) from the voice information.

[0097] The speech presence / absence determination unit 225 determines whether the voice information corresponds to a speech or an echo. When language (meaningful words) is recognized from the voice information, the speech presence / absence determination unit 225 determines that the voice information corresponds to a speech. When language is not recognized from the voice information, the speech presence / absence determination unit 225 determines that the voice information corresponds to an echo. When human speech is not recognized from the voice information, the speech presence / absence determination unit 225 can determine that the voice information corresponds to "silence" (a state of neither speaking nor echoing). The speech state detection unit 207 generates speech state information based on the determination result of the speech presence / absence determination unit 225. The speech state information can be generated by the conference control unit 209.

[0098] The conference information receiving unit 208 and the conference information sending unit 210 are connected to the conference server 220 via a network. The conference information receiving unit 208 receives conference information from users of communication terminals 201A to 201D from the conference server 220. The conference information sending unit 210 sends conference information from users of communication terminal 201 to the conference server 220. For example, communication terminal 201A sends conference information of user A to the conference server 220.

[0099] Figure 11 This diagram illustrates meeting information according to a second example embodiment. The meeting information includes facial icon display information, speaking status information, voice information, and conflict count information. Furthermore, the meeting information may include identification information for the corresponding user (communication terminal 201). The facial icon display information indicates how the corresponding user's facial icon should be displayed. The conflict count information indicates the number of conflicts for the corresponding user. Note that the meeting information sent by the meeting information sending unit 210 may not necessarily include... Figure 11 All the information shown. Furthermore, the conference information received by the conference information receiving unit 208 may not necessarily include... Figure 11 All the information shown.

[0100] The conference control unit 209 generates conference information to be sent by the conference information sending unit 210. In other words, the conference control unit 209 determines... Figure 11 Which of the displayed information is sent as conference information? The conference control unit 209 uses the conference information received by the conference information receiving unit 208 to generate the conference information sent by the conference information sending unit 210. The conference control unit 209 causes the display 205 to display the conference image using the conference information received by the conference information receiving unit 208. The conference control unit 209 causes the speaker 206 to output audio using the conference information received by the conference information receiving unit 208.

[0101] Figure 12This diagram illustrates the configuration of a conference control unit 209 according to a second example embodiment. The conference control unit 209 includes a voice output control unit 211, a count unit 215, and a display control unit 216. The voice output control unit 211 includes a speech conflict determination unit 212 and a speech output suppression unit 214. The display control unit 216 includes a count display control unit 217 and an icon display control unit 218. The conference control unit 209 can be configured to perform processing for each participant by the teleconference device 100, according to a first example embodiment, only for the user of the corresponding communication terminal 201.

[0102] Voice output control unit 211 corresponds to Figure 1 The voice output control unit 4 shown is... Figure 5 The voice output control unit 130 is shown. The voice output control unit 211 includes functions as a voice output control device. The speech conflict determination unit 212 corresponds to... Figure 5 The speech conflict determination unit 132 shown is included. The speech conflict determination unit 212 includes functions as a speech conflict determination device. The speech output suppression unit 214 corresponds to... Figure 5 The speech output suppression unit 134 is shown. The speech output suppression unit 214 includes functions as a speech output suppression device. The count unit 215 corresponds to... Figure 1 The counting unit 6 shown is Figure 5 The count unit 140 is shown. The count unit 215 includes functions as a counting device. The display control unit 216 corresponds to... Figure 5 The display control unit 150 is shown. The display control unit 216 includes functions as a display control device. The count display control unit 217 corresponds to... Figure 1 The number of times shown is displayed by control unit 8 and Figure 5 The count display control unit 152 is shown. The count display control unit 217 includes functions as a count display control device. The icon display control unit 218 corresponds to... Figure 5 The icon display control unit 154 is shown. The icon display control unit 218 includes functions as an icon display control device.

[0103] The voice output control unit 211 performs control such that the voice of each participant in a teleconference is output by the corresponding communication terminal 201. The voice output control unit 211 also performs control such that the voice of a user corresponding to communication terminal 201 is output by the communication terminal 201 (first communication terminal) of each participant. For example, in communication terminal 201A, the voice output control unit 211 performs control such that the voice of user A is output by the communication terminal 201 of each participant. The voice output control unit 211 may include functions substantially similar to those of the voice output control unit 130.

[0104] The speech conflict determination unit 212 determines whether a speech conflict has occurred for the user of the corresponding communication terminal 201. For example, in communication terminal 201A, the speech conflict determination unit 212 determines whether a speech conflict has occurred due to user A's speech. The speech conflict determination unit 212 uses conference information about another user received by the conference information receiving unit 208 to determine whether user A has not yet spoken during the time period when other users were speaking. The speech conflict determination unit 212 may include functions substantially similar to those of the speech conflict determination unit 132.

[0105] When a user of the corresponding communication terminal 201 causes a conflicting statement, the statement output suppression unit 214 performs control to suppress the output of the conflicting statement in the communication terminal 201 (first communication terminal) of each of the multiple participants. For example, in communication terminal 201A, the statement output suppression unit 214 performs control to suppress the output of the conflicting statement in the communication terminal 201 (first communication terminal) of each of the multiple participants when user A causes a conflicting statement. The statement output suppression unit 214 may include functions substantially similar to those of the statement output suppression unit 134.

[0106] The count unit 215 counts the number of times a user in the corresponding communication terminal 201 has had a speech conflict. For example, in communication terminal 201A, the count unit 215 counts the number of times user A has had a speech conflict. The count unit 215 may include functions substantially similar to those of the count unit 140.

[0107] For each user corresponding to communication terminal 201, display control unit 216 controls which image is displayed on the communication terminal 201 (first communication terminal) of each of the multiple participants. For example, in communication terminal 201A, display control unit 216 controls which image of user A is displayed on the communication terminal 201 (first communication terminal) of each of the multiple participants. Display control unit 216 may include functions substantially similar to those of display control unit 150.

[0108] The count display control unit 217 performs control such that each participant's communication terminal 201 (first communication terminal) displays the number of conflicting statements made by the user corresponding to the communication terminal 201. For example, in communication terminal 201A, the count display control unit 217 performs control such that each participant's communication terminal 201 (first communication terminal) displays the number of conflicting statements made by user A. The count display control unit 217 may include functions substantially similar to those of the count display control unit 152.

[0109] The icon display control unit 218 performs control to display a facial icon corresponding to the user of the corresponding communication terminal 201 on the communication terminal 201 (first communication terminal) of each of the multiple participants. For example, in communication terminal 201A, the icon display control unit 218 performs control to display a facial icon corresponding to user A on the communication terminal 201 (first communication terminal) of each of the multiple participants. The icon display control unit 218 may include functions substantially similar to those of the icon display control unit 154.

[0110] Figure 13 This is a flowchart illustrating a teleconference method performed by a teleconference system 200 according to a second example embodiment. Figure 13 The teleconference method is primarily executed by the conference execution system 202 of each communication terminal 201. The processing of communication terminal 201A will be described appropriately in the following description, but this will also apply to the other communication terminals 201.

[0111] First, the conference execution system 202 is activated (step S201). At this time, the number of conflicts among all participants in the conference call is zero. Furthermore, the face icons of all participants in the conference call all have their mouths closed. Then, the speaking status detection unit 207 (voice input unit 222) inputs a voice signal from the microphone 204 of the communication terminal 201A (step S202). The voice detection unit 223 determines whether user A has spoken (step S203).

[0112] When it is determined that no user A is speaking (No in S203), the conference control unit 209 generates conference information corresponding to this determination for user A and sends it to the conference server 220 (step S204). Then, the processing flow returns to S202. Specifically, the conference control unit 209 generates conference information and sends it to the conference server 220. This conference information includes: speaking status information indicating mute and facial icon display information indicating an open mouth. The conference server 220 sends this conference information to communication terminals 201A to 201D. As a result, a facial icon of user A with an open mouth is displayed on the display 205 of each communication terminal 201. Since the conference information does not include voice information, the speaker 206 of each communication terminal 201 does not output user A's voice. An example of the facial icon will be described later.

[0113] In the processing of S204, the speech status detection unit 207 generates speech status information indicating mute. The icon display control unit 218 of the display control unit 216 generates facial icon display information indicating that the mouth is not open. The voice output control unit 211 determines that the voice information is not included in the conference information. The conference information may include conflict count information indicating that the conflict count is zero. In this case, the conflict count display control unit 217 may generate conflict count information indicating that the conflict count has not increased.

[0114] On the other hand, when it is determined that user A's voice exists (Yes in S203), the language recognition unit 224 performs the aforementioned language recognition (step S205). Then, the speech presence / absence determination unit 225 determines whether language exists in the speech information (step S206). That is, the speech presence / absence determination unit 225 determines whether language is recognized from the speech information. If language does not exist (No in S206), the speech presence / absence determination unit 225 determines that user A's speech information corresponds to an echo.

[0115] Then, the conference control unit 209 generates conference information corresponding to the determined user A and sends it to the conference server 220 (step S207). The processing flow then returns to S202. Specifically, the conference control unit 209 generates and sends the following conference information to the conference server 220: speaking status information indicating agreement, facial icon display information indicating an open mouth, and voice information. The conference server 220 sends this conference information to communication terminals 201A to 201D. As a result, the open mouth facial icon of user A is displayed on the display 205 of each communication terminal 201. Furthermore, user A's voice (agreement) is output by the speaker 206 of each communication terminal 201.

[0116] In the processing of S207, the speech status detection unit 207 generates speech status information indicating agreement. Furthermore, the icon display control unit 218 of the display control unit 216 generates facial icon display information indicating an open mouth. The voice output control unit 211 determines that this voice information is included in the conference information. The conference information may include conflict count information indicating that the conflict count has not yet increased. In this case, the conflict count display control unit 217 can generate conflict count information indicating that the conflict count has not yet increased.

[0117] On the other hand, if language exists (yes in S206), the speech presence / absence determination unit 225 determines that a speech exists in user A's voice information (step S208). In this case, the speech conflict determination unit 212 of the conference control unit 209 determines whether a speech from another user does not exist (step S209). In other words, the speech conflict determination unit 212 uses the received conference information (voice information and speech status information) from other users to determine whether other users have not yet spoken before user A's speech. In other words, the speech conflict determination unit 212 determines whether a speech conflict has not yet occurred due to user A's speech.

[0118] When it is determined that there is no speech from another user (Yes in S209), the conference control unit 209 determines that user A's speech has not caused a speech conflict. Then, the conference control unit 209 generates conference information for user A corresponding to this determination and sends it to the conference server 220 (step S210). The processing flow then returns to S202. Specifically, the conference control unit 209 generates conference information and sends it to the conference server 220. The conference information includes: speech status information indicating speaking, facial icon display information indicating an open mouth, and voice information. The conference server 220 sends this conference information to communication terminals 201A to 201D. As a result, a facial icon indicating user A's open mouth is displayed on the display 205 of each communication terminal 201. Furthermore, user A's voice (speech) is output by the speaker 206 of each communication terminal 201. At this time, the conference information may include display information indicating that user A is speaking. In this case, a message indicating that user A is speaking is displayed on the display 205 of each communication terminal 201. Therefore, since each user can know who is speaking, he or she can easily create meeting minutes.

[0119] In the processing of S210, the speaking status detection unit 207 generates speaking status information indicating that someone is speaking. Furthermore, the icon display control unit 218 of the display control unit 216 generates facial icon display information indicating an open mouth. The voice output control unit 211 determines that this voice information is included in the conference information. The conference information may include conflict count information indicating that the conflict count has not yet increased. In this case, the conflict count display control unit 217 can generate conflict count information indicating that the conflict count has not yet increased.

[0120] On the other hand, when it is determined that there is a speech from another user (No in S209), the conference control unit 209 determines that user A's speech has caused a speech conflict. Then, the conference control unit 209 causes the display 205 of the communication terminal 201A to display a message such as "Another user is speaking" (step S211). Then, the conference control unit 209 generates conference information for user A corresponding to this determination and sends it to the conference server 220 (step S212). Then, the process returns to S202. Specifically, the conference control unit 209 generates conference information and sends the conference information to the conference server 220. The conference information includes: speech status information indicating a speech (conflicted speech), facial icon display information indicating an open mouth facial icon, and conflict count information indicating that the conflict count has been incremented by 1. The conference server 220 sends the conference information to the communication terminals 201A to 201D. As a result, the open mouth facial icon of user A is displayed on the display 205 of each communication terminal 201. Furthermore, the number of collisions for user A, which has been incremented by 1, is displayed on the display 205 of each communication terminal 201. Since voice information is not included in the conference information, the speaker 206 of each communication terminal 201 does not output user A's voice.

[0121] In the processing of S212, the speech status detection unit 207 generates speech status information indicating a conflicted speech. Furthermore, the icon display control unit 218 of the display control unit 216 generates facial icon display information indicating a closed mouth. Additionally, the speech output suppression unit 214 of the voice output control unit 211 determines that this voice information is not included in the conference information. The count display control unit 217 generates conflict count information indicating that the conflict count has increased by 1.

[0122] Figure 14 and Figure 15This is a diagram illustrating a conference image 230 displayed in each communication terminal 201 during a teleconference according to a second example embodiment. In the conference image 230, conflict counts 232 and face icons 231 corresponding to each user are displayed near each user's username. Therefore, face icon 231A and conflict count 232A are displayed near user A's username. Similarly, face icon 231B and conflict count 232B are displayed near user B's username. Face icon 231C and conflict count 232C are displayed near user C's username. Face icon 231D and conflict count 232D are displayed near user D's username. Figure 14 In the example, conflict count 232A indicates zero, conflict count 232B indicates two, conflict count 232C indicates one, and conflict count 232D indicates zero. The meeting image 230 may include display areas 230a to 230d, where a face icon 231 and conflict count 232 are displayed for each user A to D.

[0123] exist Figure 14 In the conference image 230 shown, user B is speaking. Therefore, a message 234 indicating that user B is speaking is displayed near user B's face icon 231B. Furthermore, user B's mouth is open on face icon 231B. Additionally, user C is echoing. Therefore, user C's mouth is open on face icon 231C. Furthermore, users A and D remain silent. Therefore, the mouths of user A's face icon 231A and user D's face icon 231D are closed. Furthermore, since user B is speaking, each communication terminal 201 outputs user B's speech. Since user C is echoing, each communication terminal 201 outputs user C's echo.

[0124] Figure 15 It shows in Figure 14 The conference image 230 shown depicts a situation where a speaking conflict occurs due to user A's speech. If user A speaks later than user B, user A's speech is determined to be a conflicted speech. In this case, a message 236 indicating that another user (user B) is speaking is displayed on user A's communication terminal 201A. Furthermore, user A's conflict count 232A is updated from zero to 1. Because user A's speech is a conflicted speech, user A's mouth is closed on their face icon 231A. Note that message 236 is only displayed on user A's communication terminal 201A, but apart from message 236, the conference image 230 displayed on each user's communication terminal 201 can be identical to each other.

[0125] (Advantages of this example embodiment)

[0126] The advantages of this example embodiment will now be described.

[0127] In recent years, the opportunity to hold conference calls while participants are at home has increased. The use of home internet environments for conference calls has also increased. In this case, due to delays caused by the home internet environment, there is a possibility that the speeches of multiple participants will overlap (speech conflict), or that they will avoid speaking, and the conference call may not proceed smoothly. Furthermore, when participants are at home, they often participate in the conference call only via voice for reasons such as privacy and preventing internet line congestion. In this case, there is a problem of not being able to read another person's facial expressions during the conversation. Additionally, there is a problem of not being able to convey a participant's agreement to the speaker, because he or she has muted to prevent ambient sound from entering when he or she is not speaking. Moreover, in technologies employing systems that display instructions for speaking to participants with voice information, since voice information is considered to be speaking even when it only indicates agreement, there is a problem of difficulty in knowing who is speaking when there are many participants in the conference.

[0128] The teleconference system according to this example embodiment is configured such that when one participant is speaking and another participant speaks later, the output of the later-speaking participant's speech is suppressed in each participant's communication terminal. Therefore, conflicting speech (the later-speaking participant's speech) is suppressed from being heard by each participant using their communication terminal, thus ensuring the smooth conduct of the teleconference.

[0129] Furthermore, the teleconference system according to this example embodiment is configured to count the number of conflicts for each participant who has caused a speaking conflict, and to display the number of conflicts at each communication terminal. Therefore, each participant can know which participant has had many speaking conflicts, etc. Thus, each participant can know which participant wants to speak. As a result, other participants can perform actions such as encouraging a participant to speak or waiting for a participant to speak. Therefore, the teleconference system according to this example embodiment can conduct teleconferences smoothly.

[0130] Furthermore, the teleconference system according to this example embodiment is configured such that each participant's communication terminal displays the number of conflicts for each of the multiple participants. Therefore, each participant can know the number of conflicts for each other.

[0131] Furthermore, the teleconference system according to this example embodiment is configured to display a message such as "Another user is speaking" on the communication terminal of a participant who speaks later. Therefore, a participant who has caused a speaking conflict can be aware that a speaking conflict has occurred.

[0132] Furthermore, the teleconference system according to this example embodiment is configured to cause each participant's communication terminal to output an echo even when one participant is speaking and another participant is echoing. Therefore, the participant who is speaking (the speaker) can be assured that the other participants are listening to the speech.

[0133] Furthermore, the teleconference system according to this example embodiment is configured such that when a participant echoes, each participant's communication terminal displays an open-mouth face icon corresponding to the participant who echoed. Therefore, even if the communication terminal of a participant who has echoed is muted, the speaker can still know that a participant is echoing, thus ensuring that other participants are listening.

[0134] Furthermore, the teleconference system according to the second example embodiment is configured such that, in the event of a speaking conflict, the speaking information of the conflicting speakers is not sent from the communication terminal to the conference server. Therefore, network load can be reduced.

[0135] (Modified example)

[0136] Note that the present invention is not limited to the foregoing exemplary embodiments, and appropriate modifications can be made within the scope of the present invention. For example, multiple foregoing exemplary embodiments can be applied to each other. For example, the functionality of the teleconferencing device 100 according to the first exemplary embodiment can be implemented by the communication terminal 201 according to the second exemplary embodiment. Furthermore, the functionality of the communication terminal 201 according to the second exemplary embodiment can be implemented by the teleconferencing device 100 according to the first exemplary embodiment.

[0137] Furthermore, the order of each process (step) in each of the above flowcharts can be changed appropriately. Additionally, one or more of multiple processes (steps) can be omitted. For example, in... Figure 7 In this context, the order of S170 and S180 can be reversed. Similarly, in... Figure 13 In this context, the order of S211 and S212 can be reversed. Furthermore, in... Figure 7 In this context, the processes S114, S124, and S138 may be omitted. Similarly, the process S211 may be omitted.

[0138] Furthermore, in each of the above example embodiments, although the count display control unit performs control to display the number of conflicts for each participant (user) on the communication terminals of multiple participants, it is not limited to this configuration. The count display control unit does not need to cause multiple communication terminals to display their own number of conflicts. For example, the count display control unit can also cause multiple communication terminals to display a level corresponding to the number of conflicts. For example, the count display control unit can cause each communication terminal to output a display such as level C when the number of conflicts is two or fewer, level B when the number of conflicts is three to four, level A when the number of conflicts is five or more, etc. In addition, when a participant's number of conflicts exceeds a threshold, the count display control unit can cause each communication terminal to display a warning. For example, the count display control unit can cause each communication terminal to display the face icon of the participant whose number of conflicts has increased in such a way that the face icon is activated in a way that indicates he / she wants to speak (e.g., the face icon turns red).

[0139] Furthermore, during a conference call, the conflict count can be increased each time a speaking conflict occurs, or it can be reset midway through the call. For example, the conflict count can be reset when a participant has made a predetermined number of non-conflicting remarks. It can also be reset when a participant operates their communication terminal.

[0140] In the second example embodiment, each communication terminal 201 generates facial icon display information corresponding to the user, but is not limited to this configuration. For example, each communication terminal 201 can use the speaking status information about user A sent from communication terminal 201A to generate user A's facial icon.

[0141] In the above example embodiment, each user's (participant's) facial icon is displayed on each of the multiple communication terminals during a teleconference, but this configuration is not limited to. A facial image of each user captured by a camera 203 or similar device can be displayed on each of the multiple communication terminals. However, when a user's facial image is displayed, the mouths of both the user agreeing and the user causing a conflict can move within the image. Therefore, other users may not be able to visually distinguish between agreeing and conflicting statements. On the other hand, in this example embodiment, facial icons are displayed on each communication terminal such that the mouth of the facial icon of the user causing the conflict is closed and the mouth of the facial icon of the user agreeing is open. Therefore, in this example embodiment, agreeing and conflicting statements can be visually distinguished. Furthermore, in the teleconference system according to this example embodiment, since each communication terminal does not transmit video information, the speaking status of users can be known while reducing network load.

[0142] In the examples above, any type of non-transitory computer-readable medium can be used to store and provide the program to the computer. Non-transitory computer-readable media includes various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (e.g., floppy disks, magnetic tapes, and hard disk drives), optical-magnetic storage media (e.g., magneto-optical disks), CD-ROMs (read-only memory), CD-Rs, CD-R / Ws, and semiconductor memories (e.g., mask ROMs, PROMs (programmable ROMs), EPROMs (erasable PROMs), flash memory ROMs, and RAMs (random access memory)). Various types of transient computer-readable media can be used to provide the program to the computer. Examples of transient computer-readable media include electrical signals, optical signals, and electromagnetic waves. Transitory computer-readable media can provide the program to the computer via wired communication lines (e.g., electrical wires and optical fibers) or wireless communication lines.

[0143] All or part of the above example embodiments may be described, but not limited to, the following notes.

[0144] (Note 1)

[0145] A teleconferencing system, comprising:

[0146] A speech determination device for determining whether the voice of each participant among multiple participants in a telephone conference is an instruction to speak or an echo;

[0147] A voice output control device is used to perform control such that the voice of each of a plurality of participants is output by the communication terminal of each of the plurality of participants, and to perform control when another participant speaks while one of the plurality of participants is speaking, so as to suppress the output of the other participant's speech;

[0148] A counting device for counting the number of times a participant makes a first speech, which is a suppressed output speech; and

[0149] A count display control device is used to perform control so that the count is displayed at the communication terminals of multiple participants.

[0150] (Note 2)

[0151] According to the teleconferencing system described in Note 1, wherein

[0152] The count display control device performs control so that the number of times each of the multiple participants speaks first is displayed on the communication terminal of each participant.

[0153] (Note 3)

[0154] According to the teleconferencing system described in Note 2, wherein

[0155] The count display control device enables the communication terminal to display the number of times greater than the predetermined threshold in a more prominent manner than the display of the number of times equal to or less than the predetermined threshold.

[0156] (Note 4)

[0157] According to the teleconferencing system described in Note 2, wherein

[0158] The count display control device enables the communication terminal to display the largest count among multiple participants in a more prominent display format than other count displays.

[0159] (Note 5)

[0160] The teleconference system according to any one of Notes 1 to 4, wherein

[0161] The voice output control device performs control such that when a participant echoes, the echo is output at the communication terminal of each of the plurality of participants.

[0162] (Note 6)

[0163] The teleconference system according to any one of Notes 1 to 5 further includes an icon display control device for performing control such that facial icons corresponding to the plurality of participants are displayed on the communication terminal of each of the plurality of participants, wherein...

[0164] The icon display control device displays facial icons such that the facial icon corresponding to another participant who has already made the first speech is not moved, and displays facial icons such that the facial icon corresponding to a participant who has made a speech other than the first speech is moved.

[0165] (Note 7)

[0166] According to the teleconferencing system described in Note 6, wherein

[0167] The icon display control device displays facial icons such that when the participant has agreed, the facial icon corresponding to the participant is moved.

[0168] (Note 8)

[0169] A communication terminal, comprising:

[0170] A speech determination device is used to determine whether the voice of the user of the communication terminal in the telephone conference in which the user participates is an instruction to speak or an agreement;

[0171] A voice output control device is configured to perform control such that the voice of each of the plurality of participants in the teleconference is output by the communication terminal, and the voice of the user is output by a first communication terminal which is the communication terminal of each of the plurality of participants, and to perform control when the user speaks during the speaking of one of the plurality of participants, so as to suppress the output of the user's speech at the first communication terminal;

[0172] A counting device for counting the number of times a user makes a first statement, which is a suppressed output statement; and

[0173] A count display control device is used to perform control to display the count at a first communication terminal.

[0174] (Note 9)

[0175] According to the communication terminal described in Note 8, wherein:

[0176] The count display control device performs control so that the number of times the user of the communication terminal speaks first is displayed on the first communication terminal.

[0177] (Note 10)

[0178] According to the communication terminal described in Note 8 or 9, wherein

[0179] The voice output control device performs control so that when the user of the communication terminal makes an echo, the echo is output at the first communication terminal.

[0180] (Note 11)

[0181] The communication terminal according to any one of Notes 8 to 10 further includes an icon display control device for performing control to display a facial icon corresponding to the user of the communication terminal on the first communication terminal, wherein...

[0182] The icon display control device displays a face icon such that the face icon does not move when the user of the communication terminal has made the first speech, and the face icon is moved when the user of the communication terminal has made a speech other than the first speech.

[0183] (Note 12)

[0184] According to the communication terminal described in Note 11, wherein:

[0185] The icon display control device displays the face icon so that the face icon is moved when the user of the communication terminal has agreed.

[0186] (Note 13)

[0187] A teleconference method, comprising:

[0188] Determine whether the voice of each participant in a conference call is an instruction to speak or an echo;

[0189] Execution control ensures that the voice of each of the multiple participants is output by the communication terminal of each of the multiple participants;

[0190] Control is implemented when one participant speaks while another participant is speaking, in order to suppress the output of the other participant's speech;

[0191] For each participant, the number of times the first speaker is heard is counted; the first speaker is the one whose output is suppressed.

[0192] Execution control enables the display of count-related information at the communication terminals of multiple participants.

[0193] (Note 14)

[0194] The teleconference method according to Note 13 includes: performing control such that the number of times each of the multiple participants speaks first is displayed on the communication terminal of each of the multiple participants.

[0195] (Note 15)

[0196] The teleconference method described in Note 14 includes displaying, in a manner more prominent than the display of the number of times equal to or less than a predetermined threshold, the number of times the predetermined threshold is displayed.

[0197] (Note 16)

[0198] The teleconference method described in Note 14 includes displaying the largest number of times among multiple participants in a more prominent manner than other times.

[0199] (Note 17)

[0200] A teleconference method according to any one of notes 13 to 16 includes: performing control such that when a participant echoes, the echo is output at the communication terminal of each of the plurality of participants.

[0201] (Note 18)

[0202] The teleconference method according to any one of Notes 13 to 17 includes:

[0203] The execution control causes facial icons corresponding to each of the multiple participants to be displayed on the communication terminal of each participant.

[0204] Display facial icons so that the facial icon corresponding to another participant who has already spoken is not moved; and

[0205] Display facial icons so that the facial icons corresponding to participants who have spoken (excluding the first speaker) are moved.

[0206] (Note 19)

[0207] The teleconference method according to Note 18 includes: displaying facial icons such that when a participant has agreed, the facial icon corresponding to the participant is moved.

[0208] (Note 20)

[0209] A teleconference method performed by a communication terminal, comprising:

[0210] Determine whether the user's voice at the communication terminal is instructing or echoing during the teleconference in which the user is participating;

[0211] The execution control is such that the voice of each of the multiple participants in the teleconference is output by a communication terminal, and the user's voice is output by a first communication terminal that is the communication terminal of each of the multiple participants;

[0212] Control is executed when the user speaks during a speech by one of the multiple participants, so as to suppress the output of the user's speech at the first communication terminal;

[0213] For each user, the number of times the first message is sent is counted; the first message is the one whose message is suppressed.

[0214] The execution control enables a display related to the number of times to be displayed at the first communication terminal.

[0215] (Note 21)

[0216] The teleconference method according to Note 20 includes: performing control such that the number of times a user of the communication terminal speaks first is displayed on a first communication terminal.

[0217] (Note 22)

[0218] The teleconference method according to Note 20 or 21 includes: performing control such that when a user of a communication terminal makes an echo, an echo is output at a first communication terminal.

[0219] (Note 23)

[0220] The teleconference method according to any one of Notes 20 to 22 includes:

[0221] The execution control causes a facial icon corresponding to the user of the communication terminal to be displayed on the first communication terminal;

[0222] Displaying a face icon prevents it from moving once the user of the communication terminal has already spoken; and

[0223] The face icon is displayed so that it is moved when the user of the communication terminal has made a statement other than the first one.

[0224] (Note 24)

[0225] The teleconference method according to Note 23 includes: displaying a face icon such that the face icon is moved when a user of the communication terminal has agreed.

[0226] (Note 25)

[0227] A program that enables a computer to perform the following functions:

[0228] Determine whether the voice of each participant in a conference call is an instruction to speak or an echo;

[0229] The execution control causes the speech of each of the plurality of participants to be output by the communication terminal of each of the plurality of participants, and executes the control when one of the plurality of participants speaks while another participant speaks, so as to suppress the output of the speech of the other participant;

[0230] For each participant, the number of times the first speaker is heard is counted; the first speaker is the one whose output is suppressed.

[0231] Execution control enables the display of count-related information at the communication terminals of multiple participants.

[0232] (Note 26)

[0233] A program for performing a teleconference method executed by a communication terminal, the program enabling a computer to perform the following functions:

[0234] Determine whether the user's voice at the communication terminal is instructing or echoing during the teleconference in which the user is participating;

[0235] The control is executed such that the voice of each of the multiple participants in the teleconference is output by the communication terminal, and the voice of the user is output by a first communication terminal that is the communication terminal of each of the multiple participants, and the control is executed when the user speaks during the speaking of one of the multiple participants, so as to suppress the output of the user's speech at the first communication terminal;

[0236] For each user, the number of times the first message is sent is counted; the first message is the one whose message is suppressed.

[0237] The execution control enables a display related to the number of times to be displayed at the first communication terminal.

[0238] The invention of this application has been described above with reference to exemplary embodiments, but the invention of this application is not limited to the above description. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the invention within the scope of this invention.

[0239] This application is based on and claims priority to Japanese Patent Application No. 2020-205681, filed on December 11, 2020, the entire disclosure of which is incorporated herein by reference.

[0240] List of reference numerals

[0241] 1. Teleconference System

[0242] 2. Speech Determination Unit

[0243] 4 Voice Output Control Unit

[0244] 6 counting units

[0245] 8-times display control unit

[0246] 20 Teleconference System

[0247] 22 Network

[0248] 30 communication terminals

[0249] 42 voice acquisition units

[0250] 44 voice transmission units

[0251] 46 voice receiving units

[0252] 48 voice output units

[0253] 52 Display Information Receiving Unit

[0254] 54 image display units

[0255] 100 Teleconference Device

[0256] 110 Participant Information Storage Unit

[0257] 112 Voice Receiving Unit

[0258] 120 Speaking Confirmation Unit

[0259] 130 Voice Output Control Unit

[0260] Unit 132 for Determining Conflicts in Speech

[0261] 134 Speech Output Suppression Unit

[0262] 140 count units

[0263] 150 Display Control Unit

[0264] 152 count display control unit

[0265] 154 icons display control unit

[0266] 200 Telephone Conferencing System

[0267] 201 Communication Terminal

[0268] 202 Meeting Execution System

[0269] 207 Speaking Status Detection Unit

[0270] 208 Conference Information Receiving Unit

[0271] 209 Conference Control Unit

[0272] 210 Conference Information Transmission Unit

[0273] 211 Voice Output Control Unit

[0274] 212 Unit for Determining Conflicts in Speech

[0275] 214 Speech Output Suppression Unit

[0276] 215 count units

[0277] 216 Display Control Unit

[0278] 217 times display control unit

[0279] Icon 218 displays the control unit

[0280] 220 Teleconference Server

[0281] 222 voice input unit

[0282] 223 speech detection unit

[0283] 224 Language Recognition Units

[0284] 225. The presence or absence of a speech determines the unit.

Claims

1. A teleconferencing system, comprising: A speech determination device is used to determine whether the speech of each participant in a telephone conference is an instruction to speak or an echo, wherein a speech corresponds to the utterance of words with meaningful content; and an echo corresponds to a supportive response and is the utterance of words that are not meaningful in themselves. A voice output control device is configured to perform control such that the voice of each of the plurality of participants is output by the communication terminal of each of the plurality of participants, and to perform control to suppress the output of the speech of the other participant when one of the plurality of participants speaks. A counting device is used to count the number of times a first speech is made for each participant, wherein the first speech is a suppressed speech. as well as A count display control device is used to perform control to display a count-related information at the communication terminals of the plurality of participants. in, The voice output control device performs control such that when a participant echoes, the echo is still output at the communication terminal of each of the plurality of participants, even if another participant is speaking.

2. The teleconferencing system according to claim 1, wherein, The count display control device enables the communication terminal to display the count greater than the predetermined threshold in a more prominent manner than the display of the count equal to or less than the predetermined threshold.

3. The teleconferencing system according to claim 1, wherein, The count display control device enables the communication terminal to display the largest count among the multiple participants in a more prominent display format than other count displays.

4. The teleconferencing system according to claim 1, wherein, The count display control device performs control so that the number of times each of the plurality of participants speaks for the first time is displayed on their respective communication terminals.

5. The conference call system of any one of claims 1 to 4, further comprising: An icon display control device is used to perform control such that facial icons corresponding to the plurality of participants are displayed on the communication terminal of each of the plurality of participants, wherein... The icon display control device displays the facial icons such that the facial icon corresponding to another participant who has already made the first speech is not moved; and displays the facial icons such that the facial icons corresponding to participants who have made speeches other than the first speech are moved.

6. The teleconferencing system according to claim 5, wherein, The icon display control device displays the facial icons such that when the participant has agreed, the facial icon corresponding to the participant is moved.

7. A communication terminal, comprising: A speech determination device is used to determine whether the voice of a user of the communication terminal in a telephone conference in which the user participates is an instruction to speak or an echo, wherein the speech corresponds to the pronunciation of words with meaningful content; and the echo corresponds to a supportive response and corresponds to the pronunciation of words that are not meaningful in themselves. A voice output control device is configured to perform control such that the voice of each of the plurality of participants in the teleconference is output by the communication terminal, and the voice of the user is output by a first communication terminal which is the communication terminal of each of the plurality of participants, and to perform control when the user speaks during the speaking of one of the plurality of participants, so as to suppress the output of the user's speech at the first communication terminal; A counting device is used to count the number of times a first speech is made by a user of the communication terminal, wherein the first speech is a suppressed speech. as well as A count display control device is used to perform control so that a display related to the count is displayed at the first communication terminal. in, The voice output control device performs control such that when a user of the communication terminal echoes, the echo is still output at the first communication terminal even if another participant is speaking.

8. A teleconference method, comprising: Determine whether the speech of each participant in a conference call is an instruction to speak or an echo, where a speech corresponds to the vocalization of words with meaningful content; and an echo corresponds to a supportive response and is the vocalization of words that are not meaningful in themselves. The execution control is such that the voice of each of the plurality of participants is output by the communication terminal of each of the plurality of participants; Control is implemented when one of the multiple participants speaks and another participant speaks, in order to suppress the output of the other participant's speech; For each participant, the number of times the first speech is given is counted, and the first speech is the speech whose output is suppressed. The execution control causes a display related to the number of times to be displayed at the communication terminals of the plurality of participants; as well as The control is implemented such that when a participant echoes, the echo is still output at the communication terminal of each of the plurality of participants, even if another participant is speaking.

9. A method for teleconference executed by a communication terminal, comprising: Determine whether the user's voice at the communication terminal is instructive or indicative during a conference call in which the user is participating, where instructive speech corresponds to the pronunciation of words with meaningful content, and indicative speech corresponds to supportive responses and is the pronunciation of words that are not meaningful in themselves; The execution control is such that the voice of each of the multiple participants in the teleconference is output by the communication terminal, and the user's voice is output by a first communication terminal that serves as the communication terminal of each of the multiple participants; Control is executed when the user speaks during a speech by one of the plurality of participants, so as to suppress the output of the user's speech at the first communication terminal; For the user of the communication terminal, the number of times the first message is made is counted, and the first message is the output suppressed message; The execution control causes a display related to the number of times to be displayed at the first communication terminal; as well as The control is executed such that when a user of the communication terminal makes an echo, the echo is still output at the first communication terminal even if another participant is speaking.

10. A computer-readable medium storing a program for causing a computer to perform the teleconference method according to claim 8 or 9.