Communication support systems, programs, and evaluation methods
The communication support system addresses the lack of direct customer feedback in content evaluation by analyzing audio and video data to optimize content for sales promotion.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- RICOH CO LTD
- Filing Date
- 2024-12-20
- Publication Date
- 2026-07-02
AI Technical Summary
Conventional communication systems lack the ability for content creators to directly assess customer reactions to content during business negotiations, hindering the creation of content suitable for sales promotion.
A communication support system that analyzes audio and video data from business negotiations to evaluate content effectiveness, providing tools for displaying evaluation scores and facilitating review of negotiation details.
Enables content creators to grasp customer reactions, allowing for the creation of content optimized for sales promotion by analyzing customer interactions and providing actionable feedback.
Smart Images

Figure 2026109912000001_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to a communication support system, a program, and an evaluation method.
Background Art
[0002] Communication systems for transmitting and receiving video data, audio data, etc. between multiple terminal devices have become widespread. Communication systems are also used for business negotiations of products and services.
[0003] Techniques for improving the sales promotion effect of web pages on e-commerce sites are known (see, for example, Patent Document 1). Patent Document 1 discloses a technique for selecting recommended product content according to customer attributes and generating a web page.
Summary of the Invention
Problems to be Solved by the Invention
[0004] However, in the conventional technology, information regarding the evaluation of content has not been displayed. Since the person in charge of creating the content cannot directly participate in the communication, they have not been able to grasp the evaluation of the content in the communication. If the person in charge of creating the content reflects how the customers reacted to the content used in the communication in the content, it will be possible to create content suitable for sales promotion.
[0005] In view of the above problems, the present invention provides a technique for displaying information regarding the evaluation of content.
Means for Solving the Problems
[0006] In view of the above issues, the present invention provides a communication support system in which an information processing device and an information terminal can communicate via a network, wherein the audio data recorded in the communication includes the user's utterances in response to the explanation of the content, and the video data recorded in the communication includes the video of the user receiving the explanation of the content, and comprises an acquisition unit that acquires at least one of the audio data or the video data, an evaluation unit that analyzes at least one of the user's utterances included in the audio data or the user's facial expressions included in the video data acquired by the acquisition unit to determine an evaluation score for the content, and a display control unit that displays information regarding the evaluation of the content by the evaluation unit together with an image of the content. [Effects of the Invention]
[0007] This invention can display information regarding the evaluation of content. [Brief explanation of the drawing]
[0008] [Figure 1] This figure shows an example of the system configuration of a communication support system according to one embodiment. [Figure 2] This figure shows an example of a display screen according to one embodiment. [Figure 3] This figure shows another example of a display screen according to one embodiment. [Figure 4] This figure shows another example of a display screen according to one embodiment. [Figure 5] This figure shows an example of a computer hardware configuration according to one embodiment. [Figure 6] This figure shows an example of the hardware configuration of a terminal device according to one embodiment. [Figure 7] This figure shows an example of the functional configuration of a communication support system according to one embodiment. [Figure 8] This figure shows an example of the functional configuration of an evaluation unit according to one embodiment. [Figure 9] This figure shows an example of audio and video data stored in an audio / video data storage unit according to one embodiment. [Figure 10] This figure shows an example of business negotiation information stored in a business negotiation information storage unit according to one embodiment. [Figure 11] This figure shows an example of linked information stored in a linked information storage unit according to one embodiment. [Figure 12] This figure shows an example of negotiation information stored in a negotiation information storage unit according to one embodiment. [Figure 13] This figure illustrates the analytical image data stored in the analytical image data storage unit according to one embodiment. [Figure 14] This figure shows an example of a content table stored in a content storage unit according to one embodiment. [Figure 15] This is an example sequence diagram illustrating the process of creating analytical image data to link negotiation materials and content according to one embodiment. [Figure 16] This figure shows an example of the first and second business negotiation registration screens according to one embodiment. [Figure 17] This is a sequence diagram illustrating an example of a process in which a sales representative, according to one embodiment, operates a sales representative terminal to register negotiation information and negotiation materials to a communication support server, and a linking unit links the negotiation materials with the content. [Figure 18] This is a conceptual diagram illustrating a method for evaluating content based on audio and video data according to one embodiment. [Figure 19] This is an example flowchart illustrating the process by which an evaluation unit according to one embodiment evaluates content. [Figure 20] This figure schematically shows the correspondence between audio data and content display time according to one embodiment. [Figure 21] This figure illustrates a method for calculating tone points based on the pitch of tone according to one embodiment. [Figure 22]It is a diagram for explaining a method of classifying emotions by an emotion classification model according to an embodiment. [Figure 23] It is a sequence diagram showing an example of processing performed by a communication support system when a web conference for negotiation according to an embodiment is started. [Figure 24] It is a diagram showing an example of a content evaluation result screen displayed on a marketing staff terminal according to an embodiment. [Figure 25] It is a diagram showing an example of a content-by-content evaluation screen displayed on a marketing staff terminal according to an embodiment.
Mode for Carrying Out the Invention
[0009] Hereinafter, as an example of an embodiment for carrying out the present invention, a communication support system and an evaluation method performed by the communication support system will be described with reference to the drawings.
[0010] <Regarding Terms> Communication is a process for humans to transmit, understand, and share each other's thoughts, emotions, and information. Either language or non-language can be used as a medium. Various media such as spoken language, written language, materials, gestures, expressions, and attitudes are used, but in this embodiment, at least materials are used. Also, communication performed by users at different locations connecting their terminal devices to a network is called remote communication. Also, one aspect of communication is a meeting, and in remote communication, this is called a web conference. A meeting is also called a consultation, a meeting, a negotiation, a gathering, or an assembly. Communication can be either online or offline. Also, a meeting includes a one-to-many format such as a webinar.
[0011] Merchandise refers to goods or services that are sold or provided to customers. Examples of goods include printing equipment, projectors, interactive whiteboards (IWBs), digital signage and other output devices, head-up displays (HUDs), industrial machinery, imaging devices, sound collection devices, medical equipment, networked home appliances, automobiles (connected cars), personal computers (PCs), mobile phones, smartphones, tablet devices, game consoles, personal digital assistants (PDAs), and digital cameras. In addition, any goods that are the subject of commercial transactions are considered merchandise. Services include, for example, services provided directly by store employees to customers, and various data provision and data processing services provided to users via the internet or corporate networks.
[0012] Sales materials refer to the specific media used in sales negotiations, and can be either paper documents or electronic files. In the case of paper documents, the image of the paper document must be included in the video data through imaging. Sales materials contain information about the product or service as content. These materials are not limited to sales negotiations; they can also be used in any type of meeting, such as lectures, advertisements, internal meetings, or presentations.
[0013] Content refers to the information itself incorporated into materials displayed during communication, such as flyers, manuals, price lists, photographs, 3D models, comparison tables, and case studies. While content is often related to the product or service, the type of content used in a business negotiation is optional. Content can be a single page, multiple pages, or even a video. Content can also be 3D models displayed using VR goggles, etc.
[0014] Video and audio recordings will be made during the communication. The audio data recorded during the communication will include the customer's utterances in response to the explanation of the content, and the video data recorded during the communication will include the video of the customer receiving the explanation of the content.
[0015] The content evaluation score is a numerical representation of the content's value, measured from the perspective of whether it leads customers to purchase the product. While the evaluation score could be determined by whether or not the customer actually purchases the product, the communication support server goes beyond this and anticipates the reactions of customers who will purchase the product. It then analyzes the customer's reactions from audio and video data to determine how similar or different the reactions are to those of customers who will purchase the product. Customer reactions include eye direction, facial expressions, number of questions asked, tone of voice, etc., and these are reflected in the evaluation score. The evaluation score may also be called evaluation value, value score, rating, effectiveness score, suitability, or purchase promotion score.
[0016] <System Configuration> Figure 1 shows an example of the system configuration of a communication support system 1 according to one embodiment. The communication support system 1 includes, as an example, a conference server 10 connected to a communication network 2 such as the Internet or a LAN (Local Area Network), a communication support server 20, a content server 40, a CRM server 50, multiple terminal devices 100a, 100b, 100c, 100d, 100e, etc., a sales representative terminal 3, and a marketing representative terminal 4. In the following description, "terminal device 100" will be used to refer to any terminal device among the multiple terminal devices 100a, 100b, 100c, 100d, 100e, etc. Also, the number of multiple terminal devices 100 shown in Figure 1 is just an example, and the number of multiple terminal devices 100 may be two or more other numbers.
[0017] Communication support system 1 is a system that supports various types of communication, such as business negotiations, meetings, medical consultations, lectures, or counseling, by sending and receiving at least audio between terminal device 100 and sales representative terminal 3. Here, the "support" of communication supported by communication support system 1 includes support for reviewing communication, including business negotiation materials (reviewing the content of communication that took place afterward). Here, as an example, the following explanation will be given assuming that communication support system 1 supports the review of a web conference in which a business negotiation is conducted by sending and receiving video, including audio, between the user of terminal device 100 and the user of sales representative terminal 3.
[0018] Terminal device 100 is a general-purpose information terminal used by customers participating in business negotiations, such as a PC (Personal Computer), tablet, or smartphone. Alternatively, terminal device 100 is an electronic device with web conferencing capabilities, such as a VR (Virtual Reality) goggle, video conferencing device, or IWB (Interactive White Board). An IWB is a whiteboard with electronic blackboard functionality that allows for mutual communication, and is also called an electronic blackboard. Alternatively, terminal device 100 is a meeting device that can capture images in a range of at least 360 degrees horizontally and acquire audio with a microphone. Here, as an example, the following explanation will be given assuming that terminal device 100 is a general-purpose information terminal.
[0019] Customers participating in a web conference can join by accessing the conference address provided by the conference server 10, for example, using a web conferencing application installed on the terminal device 100, or a web browser.
[0020] The conference server 10 is an information processing device having a computer configuration, or a system including multiple computers. The conference server 10 provides a web conferencing service that transmits and receives audio or video including at least one of the above between the sales representative terminal 3 and one or more terminal devices 100.
[0021] The communication support server 20 is an information processing device having a computer configuration, or a system including multiple computers. The communication support server 20 provides the communication support services according to this embodiment. Specifically, the communication support server 20 analyzes the content of business negotiations and adds evaluations to the content. In addition, the communication support server 20 provides the following services. • Recording of business negotiation details, audio data, and video data. • Video / audio analysis • Presenting information for reviewing web conferences • On-device management of business negotiation information and storage of it to CRM server 50 The content server 40 is an information processing device having a computer configuration, or a system including multiple computers. The content server 40 provides content management services related to content management. The content server 40 manages content evaluation along with the content itself.
[0022] The CRM server 50 is an information processing device having a computer configuration, or a system including multiple computers. The CRM server 50 manages records of business opportunities. For example, along with the business opportunity information described later, the CRM server 50 manages customer information, manages web conference participation, manages contacts, manages leads (management of the current status of prospective customers, etc.), etc.
[0023] Sales representative terminal 3 is an information terminal used by the sales representative, such as a PC, tablet, or smartphone. The sales representative uses the communication support service by using, for example, the web browser on sales representative terminal 3 or the application program for the communication support system 1 that is executed on it. The sales representative uses the communication support service to conduct business negotiations with customers.
[0024] Marketing personnel terminal 4 is an information terminal used by marketing personnel, such as a PC, tablet, or smartphone. Marketing personnel use communication support services by using, for example, the web browser on marketing personnel terminal 4 or the application program for communication support system 1 that they run. Marketing personnel use communication support services to develop content (products and services), plan projects, and provide sales support.
[0025] Note that the system configuration of communication support system 1 shown in Figure 1 is just one example. For example, the communication support system 1 may have two or more servers, such as the conference server 10, the communication support server 20, the content server 40, and the CRM server 50, consolidated into a single server. Alternatively, the functions of any of the servers among the conference server 10, the communication support server 20, the content server 40, or the CRM server 50 may be distributed across multiple servers.
[0026] Furthermore, the sales representative terminal 3 and the marketing representative terminal 4 do not need to be separate devices; they can be used interchangeably on a single terminal. In this case, the sales representative and the marketing representative can be the same person.
[0027] Furthermore, the communication support service according to this embodiment may be implemented, for example, by a web application or application program corresponding to the communication support system 1, which is executed by a sales representative terminal 3 or a marketing representative terminal 4.
[0028] (Overview of communication support services) When conducting business negotiations using the web conferencing service provided by the conference server 10, the sales representatives or marketing personnel who participated in the negotiations often want to review the details of the negotiations afterward. However, it was not easy for these sales representatives or marketing personnel to review the details of the negotiations afterward.
[0029] For example, if a business negotiation lasts one hour, even if the audio could be fast-forwarded at double speed, it would still take 30 minutes to play through the entire recording. Therefore, it would be difficult for a sales representative or marketing person to review past negotiation recordings and grasp all the details of the negotiation.
[0030] Therefore, the communication support system 1 according to this embodiment is equipped with a function that makes it easy to review the contents of a web conference in which a business negotiation was conducted. For example, the communication support system 1 provides the sales representative terminal 3 or the marketing representative terminal 4 with a display screen 300 as shown in Figure 2, or a display screen 400 as shown in Figure 3.
[0031] Figure 2 shows an example of a display screen 300 according to one embodiment. In the example in Figure 2, the display screen 300 provided by the communication support system 1 displays a meeting summary 310, conversation balance 320, and volume timeline 330.
[0032] The meeting summary 310 displays, as an example, the names of the participants (person in charge) who took part in the web conference (hereinafter referred to as the meeting), the names of other participants (customers) who took part in the meeting, the date and time the meeting started, and information such as the evaluation. Here, the evaluation is, for example, the result of the person in charge who took part in the meeting self-evaluating the content of the business negotiation using a star rating system.
[0033] Conversation Balance 320 is an example of an indicator that visualizes communication in a meeting. Conversation Balance 320 includes information such as the time a participant spends speaking, the time other participants spend speaking, and the silence periods when neither participant nor other participants speak.
[0034] Preferably, the communication support system 1 highlights indicators that exceed a predetermined threshold. In the example in Figure 2, the communication support system 1 highlights the value of "silence time" by underlining it. Also preferably, the communication support system 1 displays a graph 321 illustrating the ratio of the three indicators on the conversation balance 320.
[0035] The volume timeline 330 is an example of an indicator for visualizing a meeting (communication). The volume timeline 330, for example, uses time on the horizontal axis to illustrate the speech volume 331 of a participant and the speech volume 332 of other participants in chronological order.
[0036] Preferably, the communication support system 1 calculates a moving average over a predetermined period (for example, a few seconds to about ten minutes) from the speech volume data and sets it as the speech volume 331 of the person in charge or the speech volume 332 of the customer.
[0037] Preferably, the communication support system 1 displays markers 333a and 333b at potentially problematic locations based on the time-series display of the speech volume 331 of the person in charge or the speech volume 332 of the customer. Potentially problematic locations include, for example, locations where the speech volume exceeds a predetermined threshold (speech volume is too loud or too quiet).
[0038] Preferably, the communication support system 1 can display a comment 334 indicating the reason why markers 333a and 333b were displayed when a cursor, such as a mouse pointer, is brought close to the markers 333a and 333b.
[0039] Furthermore, the communication support system 1 has a function to play the speech audio at a selected point from the speech volume 331 of the participants or the speech volume 332 of other participants, which are displayed in chronological order. For example, if an administrator selects the vicinity of the marker 333a for the speech volume 331 of the person in charge, the communication support system 1 moves the bar 335, which indicates the playback position of the speech audio, to the vicinity of the marker 333a. Also, if an administrator selects the "Play" button 336, the communication support system 1 starts playing the speech audio (the participant's speech audio or the speech audio of other participants) from the playback position indicated by the bar 335.
[0040] Thus, according to the communication support system 1 of this embodiment, the volume timeline 330 displayed on the display screen 300 allows administrators to identify potentially problematic sections in a business negotiation and selectively play the audio from those potentially problematic sections.
[0041] Figure 3 shows another example of a display screen according to one embodiment. In the example in Figure 3, the display screen 400 provided by the communication support system 1 displays the meeting summary 310, conversation balance 320, and conversation density 410. Note that the meeting summary 310 and conversation balance 320 are the same as those described in Figure 2, so their explanation is omitted here.
[0042] Conversation Density 410 is another example of an indicator used to visualize meetings (communication). Conversation Density 410, for example, plots the conversation density 411 of participants (staff) and the conversation density 412 of other participants (customers) in a time-series graph, with time on the horizontal axis. Here, Conversation Density 410 is represented, for example, by the number of utterances (or words) spoken within a given time.
[0043] Preferably, the communication support system 1 calculates a moving average over a predetermined period (for example, a few seconds to about ten minutes) from the calculated conversation density data to determine the participant's conversation density 411 or the conversation density 412 of other participants.
[0044] In the example shown in Figure 3, the communication support system 1 activates the display of the conversation density 412 of other participants, which is selected by the administrator, and deactivates the conversation density 411 of the participant, out of the conversation density 411 of the participant and the conversation density 412 of other participants.
[0045] Preferably, the communication support system 1 displays markers 413a and 413b in areas that appear problematic based on the conversation density 411 of the participants or the conversation density 412 of other participants displayed in chronological order. Areas that appear problematic include, for example, areas where the conversation density exceeds a predetermined threshold (where the conversation density is above or below a predetermined value).
[0046] Preferably, the communication support system 1 can display a comment 414 indicating the reason why markers 413a and 413b were displayed when a cursor, such as a mouse pointer, is brought close to the markers 413a and 413b.
[0047] Furthermore, the communication support system 1 has a function to play the spoken audio from a selected point in the conversation density 411 of a participant or the conversation density 412 of another participant, which are displayed in chronological order. For example, if an administrator selects the vicinity of marker 413a of a participant's conversation density 411, the communication support system 1 moves the bar 415, which indicates the playback position of the spoken audio, to the vicinity of marker 413a. Also, if an administrator selects the "Play" button 416, the communication support system 1 starts playing the spoken audio (the spoken audio of the person in charge or the spoken audio of the customer) from the playback position indicated by the bar 415.
[0048] Thus, the communication support system 1 according to this embodiment allows administrators to identify potentially problematic areas in a business negotiation based on the conversation density 410 displayed on the display screen 400, and to selectively play back the audio of those potentially problematic areas.
[0049] Note that the display screens 300 and 400 shown in Figures 2 and 3 are examples of display screens provided by the communication support system 1. For example, the communication support system 1 may provide a display screen that simultaneously displays the volume timeline 330 and the conversation density 410 of display screen 300. In this case, the communication support system 1 may also display markers in areas that appear to be problematic based on the volume timeline 330 and the conversation density 410.
[0050] Thus, according to this embodiment, a communication support system 1 can be provided that facilitates the later confirmation of the content of communication.
[0051] Figure 4 shows another example of a display screen according to one embodiment. The display screen 700 in Figure 4 shows time-series conversation information 311, negotiation results 312, content evaluation summary 313, utterance timeline 314, and used content list 315.
[0052] The time-series conversation information 311 displays the volume 316 when the salesperson speaks, the volume 317 when the customer speaks, and the conversation ratio 318, corresponding to the elapsed time since the start of the business negotiation. The conversation ratio 318 is the ratio (moving average) of the customer's speaking time to the salesperson's speaking time in the most recent fixed period (e.g., 1 minute). Furthermore, the time-series conversation information 311 also displays time-series video data 319. The time-series video data 319 consists of still images periodically extracted from video data recorded during the business negotiation, or still images taken immediately after content changes. The time-series video data 319 may also allow for playback of the video data itself.
[0053] • The negotiation result 312 displays information related to the outcome of the negotiation, such as negotiation details entered by the sales representative (described later) and materials obtained during the negotiation.
[0054] The content evaluation summary 313 displays the content used during the sales negotiation in order (ranking) of how positively it influenced the negotiation. A positive influence means that the content worked in the direction of purchasing the product. The order of positive influence can be the order of the evaluation points described later. The communication support server 20 should obtain the content evaluation summary 313 by inputting the content evaluation results described later, as well as information about the sales negotiation entered by the sales representative (sales negotiation information), into a large-scale language model. The content evaluation summary 313 may also be the evaluation results for each piece of content itself.
[0055] In the example in Figure 4, three content pieces were used in the sales presentation materials: a business overview, a comparison table with other companies' copiers, and a manual. Based on the evaluation results of these contents, a large-scale language model generated the following content evaluation summary 313. 1. Business Image: The business image of Company B's copier A allowed Mr. Suzuki to concretely imagine the value and convenience of using a copier, which piqued his interest. 2. Comparison chart with other companies' copiers: The comparison chart with other companies' copiers allowed for a visual comparison of price, performance, etc., which helped Mr. Suzuki understand the advantages of Company B's copier. 3. Manuals: While manuals were an important source of information, they did not have as direct an impact on Mr. Suzuki as the business image or comparison charts.
[0056] The speech timeline 314 is text data converted by speech recognition from what the customer and sales representative said. The text data is displayed corresponding to the elapsed time since the start of the negotiation. Also, since the customer and sales representative are registered before the negotiation, the user who made the statement is identified during the negotiation. For this reason, each statement is associated with either the customer or the sales representative.
[0057] The content usage list 315 displays all content used during the business negotiation, including thumbnails. The content in Figure 4 is the business image 315a, the comparison table with other companies' copiers 315b, and the manual 315c. The content usage list 315 also displays links to the content itself 324-326 (URLs). In addition, the content usage list 315 shows an evaluation score for each piece of content, indicated by the number of stars 321-323, as described later.
[0058] Since the time period during which each piece of content is displayed (elapsed time from the start of the business negotiation) and the time the text data was spoken (elapsed time from the start of the business negotiation) are recorded, the text data is associated with the content. For example, when a marketing person selects a piece of content, only the text data of the audio spoken during the time period when that content was displayed will be displayed, or highlighted.
[0059] In addition, a summary of the meeting may be displayed on screen 700. The meeting summary may include, for example, the names of the participants (person in charge) who attended the web conference, the names of other participants (customers) who attended, the date and time the meeting started, and an evaluation. Here, the evaluation could be, for example, the result of the person in charge who attended the meeting self-evaluating the business negotiation content using a star rating system.
[0060] <Hardware Configuration> (Computer hardware configuration) The conference server 10, communication support server 20, content server 40, CRM server 50, terminal device 100, sales representative terminal 3, and marketing representative terminal 4 have, for example, the hardware configuration of a computer 500 as shown in Figure 5. Note that each server may be implemented by multiple computers 500.
[0061] Figure 5 shows an example of the hardware configuration of a computer according to one embodiment. The computer 500 includes, for example, a CPU (Central Processing Unit) 501, ROM (Read Only Memory) 502, RAM (Random Access Memory) 503, HD (Hard Disk) 504, HDD (Hard Disk Drive) controller 505, display 506, external device connection I / F (Interface) 507, network I / F 508, keyboard 509, pointing device 510, DVD-RW (Digital Versatile Disk Rewritable) drive 512, media I / F 514, and bus line 515, as shown in Figure 5.
[0062] Furthermore, if the computer 500 is the terminal device 100, the computer 500 further includes a microphone 521, a speaker 522, an audio input / output interface 523, a CMOS (Complementary Metal Oxide Semiconductor) sensor 524, and an image sensor interface 525, etc.
[0063] Of these, the CPU 501 controls the overall operation of the computer 500. The ROM 502 stores programs used to start the computer 500, such as the IPL (Initial Program Loader). The RAM 503 is used, for example, as the work area for the CPU 501. The HD 504 stores programs such as the OS (Operating System), applications, device drivers, and various data. The HDD controller 505 controls the reading or writing of various data to the HD 504, for example, according to the control of the CPU 501. Note that the HD 504 and the HDD controller 505 are examples of storage devices.
[0064] Display 506 displays various information, such as a cursor, menu, window, text, or image. Note that display 506 may be located outside the computer 500. External device connection I / F 507 is an interface for connecting various external devices to the computer 500. One or more network I / F 508 are interfaces for connecting the computer 500 to the communication network 2 and communicating with other devices.
[0065] The keyboard 509 is a type of input device equipped with multiple keys for inputting characters, numbers, and various instructions. The pointing device 510 is a type of input device for selecting and executing various instructions, selecting processing targets, moving the cursor, and so on. The keyboard 509 and the pointing device 510 may be located outside the computer 500.
[0066] The DVD-RW drive 512 controls the reading or writing of various data to the DVD-RW 511, which is an example of a removable recording medium. Note that the DVD-RW 511 is not limited to DVD-RW; other recording media may also be used. The media interface 514 controls the reading or writing (storage) of data to the media 513, such as flash memory. The bus line 515 includes an address bus, a data bus, and various control signals for electrically connecting the above components.
[0067] Microphone 521 is a built-in circuit that converts sound into electrical signals. Speaker 522 is a built-in circuit that converts electrical signals into physical vibrations to produce sound such as music and speech. Audio input / output interface 523 is a circuit that processes the input and output of audio signals between microphone 521 and speaker 522 according to the control of CPU 501.
[0068] The CMOS sensor 524 is a type of built-in imaging means that captures an image of a subject (e.g., a self-portrait) and obtains image data according to the control of the CPU 501. The computer 500 may have other imaging means, such as a CCD (Charge Coupled Device) sensor, instead of the CMOS sensor 524. The image sensor interface 525 is a circuit that controls the driving of the CMOS sensor 524.
[0069] (An example of hardware configuration for terminal equipment, sales representative terminals, and marketing representative terminals) Figure 6 shows an example of the hardware configuration of a terminal device 100 according to one embodiment. Here, an example of the hardware configuration when the terminal device 100, sales representative terminal 3, and marketing representative terminal 4 are information terminals such as smartphones or tablet terminals will be described. Note that the terminal device 100 may be an electronic device equipped with a computer configuration such as an IWB or meeting device and web conferencing functions.
[0070] In the example shown in Figure 6, the terminal device 100, the sales representative terminal 3, and the marketing representative terminal 4 are equipped with a CPU 601, ROM 602, RAM 603, storage device 604, CMOS sensor 605, image sensor I / F 606, acceleration / direction sensor 607, media I / F 609, and GPS (Global Positioning System) receiver 610.
[0071] Of these, the CPU 601 controls the operation of the entire terminal device 100, the sales representative terminal 3, and the marketing representative terminal 4 by executing a predetermined program. The ROM 602 stores programs used to start the CPU 601, such as an IPL. The RAM 603 is used as the work area for the CPU 601. The storage device 604 is a large-capacity storage device that stores programs such as the OS and applications, and various data, and is implemented by, for example, an SSD (Solid State Drive) or flash ROM.
[0072] The CMOS sensor 605 is a type of built-in imaging means that captures an image of a subject (mainly a self-portrait) and obtains image data according to the control of the CPU 601. Note that the terminal device 100, the sales representative terminal 3, and the marketing representative terminal 4 may have imaging means such as a CCD sensor instead of the CMOS sensor 605. The image sensor interface 606 is a circuit that controls the driving of the CMOS sensor 605. The acceleration / direction sensor 607 is a type of sensor such as an electronic magnetic compass, gyrocompass, or acceleration sensor that detects the Earth's magnetic field. The media interface 609 controls the reading or writing (storage) of data to or from the media (storage medium) 608, such as flash memory. The GPS receiver 610 receives GPS signals (positioning signals) from GPS satellites.
[0073] Furthermore, the terminal device 100, the sales representative terminal 3, and the marketing representative terminal 4 are equipped with a long-range communication circuit 611, an antenna 611a for the long-range communication circuit 611, a CMOS sensor 612, an image sensor interface 613, a microphone 614, a speaker 615, an audio input / output interface 616, a display 617, an external device connection interface 618, a short-range communication circuit 619, an antenna 619a for the short-range communication circuit 619, and a touch panel 620.
[0074] Of these, the long-distance communication circuit 611 is a circuit that communicates with other devices, for example, via the communication network 2. The CMOS sensor 612 is a type of built-in imaging means that captures an image of a subject and obtains image data according to the control of the CPU 601. The image sensor interface 613 is a circuit that controls the driving of the CMOS sensor 612. The microphone 614 is a built-in circuit that converts sound into electrical signals. The speaker 615 is a built-in circuit that converts electrical signals into physical vibrations to produce sounds such as music and speech. The sound input / output interface 616 is a circuit that processes the input and output of sound wave signals between the microphone 614 and the speaker 615 according to the control of the CPU 601.
[0075] The display 617 is a type of display means, such as a liquid crystal or organic EL (electroluminescence), that displays images of the subject and various icons. The external device connection I / F 618 is an interface for connecting various external devices. The short-range communication circuit 619 includes a circuit for performing short-range wireless communication. The touch panel 620 is a type of input means that allows the user to operate the terminal device 100 by pressing the display 617.
[0076] Furthermore, the terminal device 100, the sales representative terminal 3, and the marketing representative terminal 4 are equipped with a bus line 621. The bus line 621 includes an address bus, a data bus, and the like for electrically connecting each component, such as the CPU 601 shown in Figure 6.
[0077] Note that the hardware configurations of terminal device 100, sales representative terminal 3, and marketing representative terminal 4 shown in Figure 6 are just examples. Terminal device 100, sales representative terminal 3, and marketing representative terminal 4 may have various other hardware configurations, as long as they include a computer, communication circuits, a display, a microphone, and a speaker.
[0078] <Functional Configuration> Next, the functional configuration of the communication support system 1 will be described based on Figure 7. Figure 7 is a diagram showing an example of the functional configuration of the communication support system 1 according to one embodiment. Here, the communication support system 1 is assumed to have the system configuration shown in Figure 1.
[0079] <<Functional Configuration of Sales Representative Terminal 3>> The sales representative terminal 3 includes a communication unit 201, a conference control unit 202, an audio transmission unit 203, a display control unit 204, and an operation reception unit 205, etc. Each of these functional units is a function or means realized by the CPU 501 (or CPU 601) executing instructions contained in one or more programs installed on the sales representative terminal 3. This program may be a web application executed by a web browser or a dedicated native application. At least some of the above functional configurations may be realized by hardware.
[0080] The communication unit 201, for example, uses the network interface 508 (or the long-distance communication circuit 611) to connect the sales representative terminal 3 to the communication network 2 and performs communication processing to communicate with the conference server 10, the communication support server 20, the content server 40, the CRM server 50, and the terminal device 100, etc.
[0081] The conference control unit 202 connects to the conference server 10, for example, using the communication unit 201, and transmits and receives conference video (or conference audio). The conference control unit 202 also performs input and output of conference audio using the audio input / output interface 523 (or audio input / output interface 616), and output of conference video using the display 506 (or display 617). The processing performed by the conference control unit 202 may be the same as that of a typical web conference.
[0082] The audio transmission unit 203 acquires the spoken audio of participants in a meeting (for example, sales representatives) and transmits it to the communication support server 20. For example, the audio transmission unit 203 acquires audio such as voice acquired by a microphone and voice output from a speaker from the audio input / output interface 523 (or audio input / output interface 616), and transmits the acquired audio (audio data) to the communication support server 20. In this case, the communication support server 20 is not limited to web conferencing, but can also support communication in regular meetings and other situations.
[0083] The display control unit 204 performs display control processing to display, for example, the UI (User Interface) screen of the screen displayed by the sales representative terminal 3 on a display unit such as the display 506 (or display 617). The display control unit 204 may also identify the content ID of the content displayed on the screen at each unit of time and transmit it to the communication support server 20 via the communication unit 201. The display control unit 204 may also transmit the content ID of the content displayed on the screen to the communication support server 20 via the communication unit 201 at the time the content changes.
[0084] The operation reception unit 205 executes operation reception processing to receive operations from sales representatives using the sales representative terminal 3. For example, the operation reception unit 205 receives input operations on the screen displayed by the display control unit 204.
[0085] <<Functional Configuration of Marketing Personnel Terminals and Terminal Devices>> The functional configuration of terminal device 100 is the same as that of sales representative terminal 3. Specifically, terminal device 100 includes a communication unit 101, a conference control unit 102, a voice transmission unit 103, a display control unit 104, and an operation reception unit 105, etc.
[0086] Furthermore, the marketing personnel terminal 4 includes a communication unit 211 (an example of a second communication unit), a display control unit 212, and an operation reception unit 213, etc. Since the marketing personnel terminal 4 is not intended to participate in web conferences, it does not have conference-related functions, but it would not be a problem if it had a conference control unit and an audio transmission unit.
[0087] <<Communication Support Server>> The communication support server 20 includes a communication unit 21, a setting reception unit 22, an acquisition unit 23, a speech recognition unit 24, an analysis unit 25, a linking unit 26, an evaluation unit 27, and a save request unit 28. Each of these functional units is a function or means realized by the CPU 501 executing instructions contained in one or more programs installed on the communication support server 20. At least a part of the above functional configuration may be realized by hardware.
[0088] Furthermore, the communication support server 20 has a storage unit 30 formed by storage devices such as an HD504 and an HDD controller 505. The storage unit 30 includes an audio / video data storage unit 31, a business negotiation information storage unit 32, a linking information storage unit 33, a business negotiation document information storage unit 34, and an image data storage unit 35 for analysis. Each of these storage units only needs to be located on a network accessible by the communication support server 20.
[0089] The communications unit 21 (an example of the first communications unit) uses a network interface 508, etc., to connect the communication support server 20 to the communications network 2, and performs communication processing to communicate with sales representative terminals 3, terminal devices 100, marketing representative terminals 4, content servers 40, conference servers 10, CRM servers 50, etc.
[0090] The configuration reception unit 22 executes a configuration reception process to receive settings for the communication support system 1. For example, the configuration reception unit 22 receives content registered by the marketing staff member from the marketing staff member terminal 4 on the screen displayed by the marketing staff member terminal 4.
[0091] The acquisition unit 23 performs an acquisition process to acquire audio data including the speeches of sales representatives and customers participating in an offline meeting, as well as audio data including the speeches of other participants. The acquisition unit 23 also performs an acquisition process to acquire video data including the faces of sales representatives and customers participating in an offline meeting. Furthermore, the acquisition unit 23 acquires audio data including the speeches of sales representatives and participants, as well as video data, transmitted from the sales representative terminal 3 or terminal device 100 to the conference server 10 or communication support server 20 during an online meeting.
[0092] The speech recognition unit 24 performs speech recognition processing on the voice data acquired by the acquisition unit 23. For example, the speech recognition unit 24 converts the voice spoken by a sales representative, a customer, or another participant into text data through speech recognition. The speech recognition unit 24 also determines the source of the speech based on the identification information of the sales representative's terminal 3 or terminal device that transmitted the voice, and associates the source of the speech with the text data. Since participants in the business negotiation are registered in advance, whether a participant is a sales representative or a customer is determined by the user ID identified at login. The speech recognition unit 24 also associates the text data with the elapsed time since the start of the web conference. Speech recognition may be performed during the business negotiation or after the business negotiation. Speech recognition may also be performed by an external speech recognition server.
[0093] The analysis unit 25 performs analysis processing to analyze the audio data acquired by the acquisition unit 23. For example, the analysis unit 25 determines the volume of the audio data for each unit of time (e.g., from one second to several tens of seconds) and associates the volume with the elapsed time since the start of the web conference. In addition, the analysis unit 25 calculates the ratio of the customer's speaking time to the sales representative's speaking time and its moving average at regular intervals, and associates this with the elapsed time since the start of the web conference.
[0094] The linking unit 26 performs a linking process that links the sales materials used by the sales representative in a web conference with the content of the content server 40. The sales representative may use the content of the content server 40 as sales materials as is, or they may edit the content to create sales materials. In this embodiment, it is preferable that the sales materials and content are linked in advance in order to evaluate the content included in the sales materials explained during the sales meeting.
[0095] The linking unit 26 retrieves the image data of the content from the content server 40 for this linking, associates it with the content ID, and saves it as "image data for analysis." The linking unit 26 compares the image data of the sales negotiation materials with the image data for analysis, and if there is any image data for analysis that is similar enough, it links the original content with the sales negotiation materials.
[0096] The evaluation unit 27 evaluates the content used in the business negotiation and transmits the evaluation results of the content to the content server 40 via the communication unit 21. The functions of the evaluation unit 27 are explained in Figure 8. The evaluation unit 27 evaluates the content so that the higher the score, the more likely the content is to have influenced the purchase of the product.
[0097] The save request unit 28 requests the content server 40, via the communication unit 21, to save the evaluation results created by the evaluation unit 27 and the content evaluation summary.
[0098] Furthermore, the various functional configurations described above for the communication support server 20 may also be present in the sales representative terminal 3 or terminal device 100.
[0099] Figure 8 shows the functions of the evaluation unit 27 according to one embodiment. The evaluation unit 27 includes a gaze direction detection unit 271, an expression detection unit 272, a display time measurement unit 273, a question count measurement unit 274, and a tone detection unit 275.
[0100] The gaze direction detection unit 271 performs gaze direction detection processing to detect the gaze direction by analyzing the customer's face image acquired by the image sensor I / F 525 (or image sensor I / F 606) included in the video data during the business negotiation. The gaze direction may be, for example, a determination of whether the gaze is inside or outside the content (binary information), or a 3D vector corresponding to the visual axis in a 3D coordinate system with the center of the face as the origin. Existing methods are used for gaze direction detection. As an example of a specific method, the head pose estimation method provided by OpenCV and dlib is known. This method detects the 2D coordinates (two-dimensional coordinates in the image) of several feature points from the frames that make up the video data. For example, feature points may be the outer corner of the eye, the tip of the nose, the corner of the mouth, the tip of the chin, etc. Feature points can be detected using segmentation techniques such as YOLO (You Only Look Once). The gaze direction detection unit 271 has a 3D model of a typical human head. A 3D model is composed of polygons and a collection of three-dimensional points, and in a 3D model, the 3D coordinates of feature points whose 2D coordinates are detected are known. The 3D coordinates of any feature point can be identified by a matrix transformation that maps 2D coordinates to 3D coordinates. The line of sight direction is, for example, a perpendicular vector drawn from the tip of the nose to a plane that passes through the two outer corners of the eyes and the tip of the chin. Here, the orientation of the face is used as the line of sight direction, but the line of sight direction detection unit 271 may further detect the position of the pupil and correct the line of sight direction.
[0101] Furthermore, the gaze direction detection unit 271 can also measure the degree of eye opening (simply put, open and closed eye states). For example, the gaze direction detection unit 271 measures the degree of eye opening based on the difference between the position of the upper eyelid and the position of the lower eyelid. The gaze direction detection unit 271 detects the gaze direction and the degree of eye opening at intervals of time, for example, and records the gaze direction and the degree of eye opening in correspondence with the time since the start of the meeting.
[0102] The facial expression detection unit 272 detects the customer's face image acquired by the image sensor I / F525 (or image sensor I / F606) included in the video data during the business negotiation, and performs facial expression detection processing to detect facial expressions by analyzing the face image. The facial expression detection unit 272 detects the face portion using segmentation technology such as YOLO. The facial expression detection unit 272 extracts features from the face portion using a CNN (Convolutional Neural Network) or the like, inputs them into a classification model for classifying facial expressions, and classifies the facial expression (e.g., anger, disgust, fear, happiness, sadness, surprise, neutral).
[0103] The display time measurement unit 273 measures the display time for each piece of content during the business negotiation. The content is content registered in the content server 40.
[0104] The question count measurement unit 274 performs a measurement process to measure the number of questions asked by the customer through utterances for each piece of content. The audio data, including the utterances, is converted into text data. The question count measurement unit 274 can determine whether or not a question is included in the text data by converting the text data into a distributed representation and inputting it into a classification model. A distributed representation is information represented in a vector space by converting the meaning of a single word or sentence into a fixed-length vector. BERT®, Doc2Vec, word2Vec, etc., may be used for the conversion to a distributed representation.
[0105] Alternatively, the question count measurement unit 274 may determine whether a question is a question by checking whether the text data contains strings such as "~desu ka," "~masu ka," or "~deshou ka." If a question is detected in the text data, the question count measurement unit 274 records the text data (question) in association with the time from the start of the meeting.
[0106] Furthermore, the question count measurement unit 274 also counts questions as questions related to content that is not currently displayed. The determination of whether a question is about content that is not currently displayed is made by comparing the question with text data extracted from the content, and text data extracted by processing the content using OCR (Optical Character Reader).
[0107] The tone detection unit 275 performs tone detection processing to detect the tone of the customer's voice data. For example, the tone detection unit 275 performs spectral analysis on the voice data and converts the relationship between frequency and intensity into tone. The higher the frequency and the greater the intensity, the higher the tone is converted. If the tone changes in the direction of higher, it can be estimated that the customer is interested in the content. Alternatively, the tone detection unit 275 may perform formant analysis and estimate that the tone is high if the change in the first formant with respect to time exceeds a threshold.
[0108] Furthermore, the tone detection unit 275 analyzes the voice data of the customer during the business negotiation and performs a measurement process to determine what emotions the customer is expressing through their speech. In this embodiment, since we want to determine whether the customer is interested in the content, the emotions measured can be interesting, uninterested, favorable, or disgusted. Existing methods will be used for the specific measurement method. For example, the tone detection unit 275 acquires acoustic features (features related to speech frequency, volume, voice quality, speech rate, etc.) from the voice data and inputs them into a model trained using deep learning or the like to determine the emotion.
[0109] (Audio / video data storage unit 31) Next, referring to Figure 9 and other figures, we will explain the data stored in each memory unit of Figure 7.
[0110] Figure 9 shows audio and video data stored in an audio and video data storage unit 31 according to one embodiment. The audio and video data storage unit 31 stores audio and video data recorded during business negotiations. • The opportunity ID is identification information that identifies a business opportunity. One opportunity corresponds to one web conference. The speech data URL is the URL where the speech data, which is a recording of the voice spoken by participants (sales representatives, customers, and other participants) in a web conference, is stored. The text data URL is the URL where the text data, which is the result of speech recognition converting the spoken data, is stored. The video data URL is the URL where the video data of the recorded web conference is stored. Preferably, the video data is video data captured by the customer's terminal device 100. The sales opportunity information URL is the URL where the sales opportunity information entered by the sales representative is stored.
[0111] (Business negotiation information storage unit 32) Figure 10 shows the business negotiation information stored in the business negotiation information storage unit 32 according to one embodiment. The business negotiation information storage unit 32 stores bibliographic information related to the business negotiation. As will be described later, the sales representative inputs the information related to the business negotiation. • The opportunity ID is identification information that identifies a business opportunity. One opportunity corresponds to one web conference. • The date and time are the date and time when the business meeting was held or is scheduled to be held. • The products in question are those that are or have been the subject of negotiations. A sales representative is a sales representative who conducts or has conducted a business negotiation. • A client is the name of the customer who is or is currently the other party in a business negotiation. Clients are associated with company names, departments, telephone numbers, etc. The decision-maker is the person who has the authority to approve the business negotiation. The status refers to the state of a business negotiation, and can be one of the following states: pre-negotiation, negotiation in progress, or completed. • The deal details are optional details of a deal. These deal details may be entered by the sales representative, or they may be generated by a large-scale language model based on other entered fields. Other information that needs to be managed includes notes, observations, and action lists regarding business negotiations.
[0112] (Linking information storage unit 33) Figure 11 shows the linking information stored in the linking information storage unit 33 according to one embodiment. • The ID is an identifier that identifies a single record in the linked information. • The sales negotiation document ID is identification information that identifies the sales negotiation document. The linking unit 26 of the communication support server 20 assigns a unique sales negotiation document ID. • The page number is the page number of the sales presentation document. The page number of the sales presentation document containing the content that best matches the image data for analysis (assuming a certain level of matching is achieved) will be set. • The content link is the URL where the content exists, if the content is specified by a URL. A row with a null content link indicates that no content similar to the sales presentation material was found. The content ID is the content ID on the content server 40 for the content linked to the sales negotiation materials. • The matching rate is the percentage of the content included in the sales materials that matches the image data used for analysis. • Difference images are images showing the difference between the content included in the sales materials and the image data used for analysis. First, marketing personnel register the content to the content server 40. On the other hand, sales personnel may edit the content before incorporating it into sales materials, so they may not be a perfect match. Marketing personnel can learn what kind of content is desired by referring to the difference images and use that information to improve future content creation.
[0113] (Business negotiation material information storage unit 34) Figure 12 shows the sales negotiation information stored in the sales negotiation information storage unit 34 according to one embodiment. Sales negotiation information is information related to sales negotiation materials. Sales negotiation information is created when a sales representative uploads sales negotiation materials to the communication support server 20, as will be described later. The sales negotiation document ID is identification information that identifies the sales negotiation document. The sales negotiation document ID is assigned by the setting reception unit 22 of the communication support server 20. The Deal ID is the identification information for the deal in which the deal materials will be used. The sales representative registers the deal materials along with the deal on the communication support server 20. Figure 12 shows that even if the deal materials IDs are different, three deal materials are used for one deal because the deal ID is the same. • The file name is the file name of the business negotiation document. • The save location URL is the URL where the business negotiation documents are saved. The creation date is the date and time when the sales representative registered the sales materials on the communication support server 20. • The update date is the date and time the sales representative last edited the sales materials.
[0114] (Image data storage unit 35 for analysis) Figure 13 is a diagram illustrating the analysis image data stored in the analysis image data storage unit 35. Figure 13 shows what kind of information about the analysis image data is stored in the analysis image data storage unit 35 depending on the content format. • In the case of presentation files generated by applications such as presentation apps, the analysis image data converted to image data, the content ID, and the page number are saved for each page. • In the case of image data (images, single-sided flyers, etc.), the image data for analysis, which is a copy of the image data, and the content ID are saved. For video and video footage, the analysis image data, content ID, and timestamp from the start of the meeting are saved every few seconds.
[0115] <<Content Server 40>> Let's return to Figure 7 for explanation. The content server 40 has a communication unit 41, a content management unit 42, and a content storage unit 43. Each of these functional units is a function or means realized by the CPU 501 executing instructions contained in one or more programs installed on the content server 40. Note that at least a part of each of the above functional configurations may be realized by hardware.
[0116] Furthermore, the content server 40 has a content storage unit 43 formed by storage devices such as an HD 504 and an HDD controller 505. The content storage unit 43 only needs to be located on a network accessible by the content server 40.
[0117] The communications unit 41 connects the content server 40 to the communications network 2 using the network interface 508, etc., and performs communication processing to communicate with the communications support server 20, the sales representative terminal 3, the marketing representative terminal 4, etc.
[0118] The content management unit 42 performs management processing to manage the content in the content storage unit 43. The content is embedded in the business negotiation materials used for remote communication. The content management unit 42 stores the evaluation results of the content received from the communication support server 20 in the content storage unit 43, associating them with the content.
[0119] (Content storage unit 43) Figure 14 shows a content table stored in a content storage unit 43 according to one embodiment. The content table registers bibliographic information and evaluation information of the content. Since one piece of content may be used in multiple business negotiations, the content table may contain multiple records with the same content ID and the same page number. • A content ID is an identifier that uniquely identifies a piece of content. • The content name can be anything that clearly indicates the content, such as the file name, and is registered by the creator. • The page number is the page number if the content spans multiple pages. • The opportunity ID is the opportunity ID of the opportunity document that used this content. When the same content is used in different opportunities, the content can be evaluated for each opportunity. • The evaluation score is a numerical representation of the evaluation of the relevant page of content. The evaluation content URL is the URL where the evaluation results for the content by the communication support server 20 are stored. The evaluation results themselves may also be stored in the content storage unit 43. The evaluation content URL stores the evaluation content and a summary of the content evaluation. The evaluation content records the direction of gaze, facial expressions, voice characteristics, questions, etc., in association with the time when these were detected. The evaluation content also includes the number of times the gaze direction was downward, the number of times the eyes were closed, the display time, the number of questions asked, voice characteristics (number of times the tone was high, number of times it was low), the number of times each facial expression was observed, spoken text data, etc., while the content was being displayed. The creator is the person who created or registered the content, which may be, for example, a marketing professional.
[0120] In the example shown in Figure 7, the conference server 10 may be a general cloud service that provides web conferencing services, so its explanation is omitted here. Similarly, the CRM server 50 may be a general marketing service, so its explanation is omitted here.
[0121] <Content linking before business negotiations> As described above, in order to evaluate the content contained in the sales materials describing the product, it is preferable that the sales materials and the content be linked in advance. By linking the content and the sales materials, the linking unit 26 can perform the evaluation of the content contained in the video data within a practical time.
[0122] Based on Figure 15, we will explain the preprocessing required to link sales materials and content. Figure 15 is a sequence diagram illustrating the process of creating image data for analysis in order to link sales materials and content.
[0123] The communication support server 20 and the content server 40 periodically perform S1-S4 processing. This processing may be performed irregularly or triggered by content updates.
[0124] S1: The linking unit 26 of the communication support server 20 requests content update information from the content server 40.
[0125] S2: The communication unit 41 of the content server 40 receives the request, and the content management unit 42 searches the content storage unit 43 for content that has been updated between the previous request and the current request. For example, a flag is turned ON for content that has been updated but has not yet notified the communication support server 20 of the update.
[0126] S3: The communication unit 41 of the content server 40 sends the updated content and content ID to the communication support server 20.
[0127] S4: The communication unit 21 of the communication support server 20 receives the content. The linking unit 26 of the communication support server 20 converts the content into image data for analysis, associates it with the content ID, and stores it in the image data for analysis storage unit 35.
[0128] <<Registering Business Negotiation Information>> Next, we will explain how negotiation materials and content are linked during negotiation registration. Before the negotiation begins, the sales representative registers the negotiation information and negotiation materials with the communication support server 20. The registration of negotiation information is done in advance to register the sales representative and the customer, and to distinguish and analyze their respective statements.
[0129] Figure 16(a) shows the first sales opportunity registration screen 800, and Figure 16(b) shows the second sales opportunity registration screen 810. The first sales opportunity registration screen 800 is the screen for entering sales opportunity information. Sales opportunity information can also be obtained from the conference server 10. When the sales representative presses the reflect button 813 to reflect the information from the conference server 10, the communication support server 20 retrieves the schedule information identified by the sales representative's account from the conference server 10. The sales representative selects the schedule information to be reflected in the sales opportunity information from the list of schedule information. The communication support server 20 can then fill in the input fields on the first sales opportunity registration screen 800 with the selected schedule information. • The "Deal Name" field (801) will contain the title of the deal, etc. • The name of the product will be entered in the product name field 802. • The details of the business negotiation will be entered in the details field 803. • The date and time field 804 will contain the date and time on which the business meeting is scheduled to take place. • The participant candidate field 805 will be used to enter participants other than sales representatives. These participants may include other salespeople who support the sales representatives, engineers, etc. • The sales representative to be assigned to the negotiation will be entered in the "Add Sales Representative" field (806). • The client addition field 807 will contain the name of the customer who is the subject of the business negotiation. • The decision-maker who will finalize the deal is entered in the decision-maker field 808. The analysis button 809 automatically records audio and video data when a web conference begins and allows the user to evaluate the content after the meeting ends.
[0130] The second screen for registering a business deal, screen 810, is for uploading business deal documents. • The update section 811 is a field where sales representatives can drag and drop files to use as sales materials. The updated sales opportunity list section 812 displays a list of sales materials uploaded by sales representatives for sales opportunities registered on the sales opportunity registration screen 800. By uploading sales materials, sales opportunities can be associated with their corresponding documents.
[0131] <<Details of the process for linking sales materials and content>> Next, based on Figure 17, we will explain the process flow for linking sales materials and content when registering sales opportunity information and sales opportunity materials. Figure 17 is a sequence diagram relating to the process in which a sales representative operates the sales representative terminal 3 to register sales opportunity information and sales opportunity materials with the communication support server 20, and the linking unit 26 links the sales opportunity materials and content.
[0132] S11: The sales representative operates the sales representative terminal 3 to create sales materials. In this embodiment, it is preferable that the sales materials include content from the content server 40. The sales representative downloads the content from the content server 40 and performs the operation of incorporating it into the sales materials on the sales representative terminal 3. A template of the sales materials may be stored on the content server 40.
[0133] S12: The sales representative performs the operation of registering the sales opportunity information and sales opportunity materials on the communication support server 20 using the sales representative terminal 3. This operation involves inputting the information and uploading the sales opportunity materials as described in the sales opportunity registration screen 1 800 and sales opportunity registration screen 2 810. It is also possible for the sales representative to use the content itself on the content server 40 as the sales opportunity materials. In this case, the sales representative may enter the URL of the content on the content server 40 instead of uploading the sales opportunity materials. The operation reception unit of the sales representative terminal 3 receives these operations.
[0134] S13: The communication unit 201 of the sales representative terminal 3 sends a request to the communication support server 20 to register the sales negotiation information and sales negotiation materials (or URL of the content).
[0135] S14: The communication unit 21 of the communication support server 20 receives a registration request. When it receives a URL of content, the linking unit 26 requests the content server 40 to search for the content, specifying the URL of the content.
[0136] S15: In response to the content server 40's communication unit 41 receiving a request, the content management unit 42 searches the content storage unit 43 using the content's URL.
[0137] S16: The communication unit 41 of the content server 40 sends a content ID that matches the search to the communication support server 20. The communication unit 21 of the communication support server 20 receives the content ID.
[0138] S17: The setting reception unit 22 also saves the negotiation information to the negotiation information storage unit 32 and the negotiation materials to the negotiation material information storage unit 34. The setting reception unit 22 sets a unique negotiation ID in the negotiation ID field of the negotiation information storage unit 32 in Figure 10, and sets the status field to "Before Negotiation". Other fields are included in the negotiation information. The setting reception unit 22 also sets a unique negotiation material ID in the material ID field of the negotiation material information storage unit 34 in Figure 12, and sets the same negotiation ID as the negotiation information storage unit 32 in the negotiation ID field. The setting reception unit 22 sets the URL where the negotiation material files are saved in the save location URL field, and also sets the creation date and update date.
[0139] S18: For sales materials that do not contain a URL for the content, the linking unit 26 compares the analysis image data with the sales material image data. If the analysis image data storage unit 35 contains analysis image data that is similar to the sales material image data to a certain extent, the linking unit 26 identifies the most similar analysis image data and retrieves the content ID and page from the analysis image data storage unit 35. The linking unit 26 stores the unique ID, sales material ID, retrieved content ID, page, link, matching rate, and difference image in association with the linking information storage unit 33.
[0140] Furthermore, if the URL of the content is received from the sales representative's terminal 3, the linking unit 26 associates the content ID received from the content server 40 in step S16 with the sales negotiation materials, etc., and stores it in the linking information storage unit 33.
[0141] S19: The communication unit 21 of the communication support server 20 transmits a success or failure notification to the sales representative terminal 3. Failure occurs when no similar content is found. Here, we assume that the saving was successful.
[0142] S20: The communication unit 201 of the sales representative terminal 3 receives a message indicating success or failure in saving, and the display control unit 204 displays a message indicating that the registration of the business negotiation information and business negotiation materials is complete.
[0143] As described above, the sales materials and content are linked in the linked information storage unit 33, as shown in Figure 11. By pre-associating the content with the sales materials, the time required to analyze the video data of the meeting and evaluate the content can be reduced. If the linked information storage unit 33 were not present, the communication support server 20 would need to capture the video data and compare it with all the content in the content server 40.
[0144] <Content evaluation based on the analysis results of audio and video data> Next, referring to Figure 18 and other figures, the method for evaluating content based on the analysis results of audio and video data will be explained. Note that the following evaluation method is an example, and the method for determining evaluation points may be adjusted as appropriate. Figure 18 is a conceptual diagram illustrating the method for evaluating content based on audio and video data. As shown in Figure 18, the evaluation unit 27 performs a content evaluation 433 by comprehensively analyzing the audio data 431 and the video data 432. Note that at least one of the audio and video data may be analyzed. The content evaluation is performed so that higher values are given to content that is presumed to have influenced the customer to purchase the product. Actual purchase is not required. Whether or not it influenced the customer to purchase the product is determined by the customer's reaction.
[0145] As an example, the evaluation unit 27 evaluates the content through the following process. Furthermore, it is assumed that there were three types of content used in the business negotiation: a sample, a case study, and a video.
[0146] Figure 19 is a flowchart illustrating the process by which the evaluation unit 27 evaluates content. In Figure 19, the evaluation unit 27 calculates the evaluation score by calculating and weighting the gaze direction score, display time score, question count score, and tone score. For the sake of explanation, the maximum score for the gaze direction score, facial expression score, display time score, question count score, and tone score is set to 5 points. The order in which the gaze direction score, display time score, question count score, and tone score are calculated may be changed.
[0147] S101: The gaze direction detection unit 271 detects the customer's gaze direction from the video data and records the time when the customer's gaze is directed downwards or their eyes are closed. The gaze direction detection unit 271 also detects the customer's gaze direction from the video data, and the evaluation unit 27 lowers the evaluation of the page number of the content being displayed at that time if the customer's gaze is directed downwards or their eyes are closed. Normally, customers direct their gaze towards the screen, so the evaluation unit 27 sets the initial value of the gaze direction points to the highest point (for example, 5 if there are 5 levels) and deducts points from the gaze direction points according to how often the customer's gaze is directed downwards or their eyes are closed. For example, if the customer's gaze is detected to be directed downwards or their eyes closed at a predetermined frequency only while the content "video" is being displayed, the evaluation unit 27 sets the gaze direction points for the sample to 5, the gaze direction points for the case study to 5, and the gaze direction points for the video to 4.
[0148] Furthermore, the facial expression detection unit 272 analyzes the customer's face image in the video data to determine their facial expression. The evaluation unit 27 lowers the evaluation of the page number of the content currently being displayed if the expression is unfavorable, such as anger, disgust, fear, or sadness. Since customers are generally considered to be expressionless, the evaluation unit 27 sets the initial value of the facial expression score to the maximum score (for example, 5 in a 5-point scale) and deducts points from the facial expression score according to the frequency with which unfavorable expressions are detected. For example, if unfavorable customer expressions are detected at a predetermined frequency only while the content "Sample" is being displayed, the evaluation unit 27 sets the facial expression score of Sample to 4, the facial expression score of Case Introduction to 5, and the facial expression score of Video to 5. Conversely, if favorable expressions such as smiles are detected at a predetermined frequency, the evaluation unit 27 may increase the facial expression score of the page number of the content currently being displayed. However, the facial expression score shall not exceed the maximum score (for example, +5).
[0149] S102: The display time measurement unit 273 detects content and page changes by analyzing video data and measures the display time per content and per page. The evaluation unit 27 gives a higher evaluation to content that has been displayed for a longer time.
[0150] Figure 20 is a schematic diagram showing the correspondence between audio data and content display time. As explained in S54 of Figure 23, the evaluation unit 27 identifies the content ID of the content included in the video data. For example, in Figure 20, three types of content were used in the business negotiation: sample, case study, and video. The display times of each content were T1 for the sample, T2 for the case study, and T3 for the video, with a relationship of T2 > T3 > T1. The evaluation unit 27 gives a higher evaluation to content with a longer display time. For example, the evaluation unit 27 compares T1 to T3 with four thresholds that classify the display time into five stages, and assigns them to one of the stages from 1 to 5. Each stage from 1 to 5 is assigned a score in advance (e.g., 1 to 5). The evaluation unit 27 assigns display time points to each content according to the assigned stage. In this way, the evaluation unit 27 can assign higher display time points to content with a longer display time.
[0151] However, since display time tends to be determined by the explanation time regardless of customer reaction, it may not be desirable to directly reflect the display time in the evaluation. Therefore, for example, one could calculate the average display time for each piece of content and assign a rating of 5 to content with a display time longer than average and a rating of 4 to content with a display time shorter than average. For example, if only T2 has a display time longer than average, the display time score for the sample would be 4, the display time score for the case study would be 5, and the display time score for the video would be 4.
[0152] S103: The question count measurement unit 274 analyzes the text data converted from the audio data during the business negotiation and records the time when the questions were asked. The question count measurement unit 274 also measures the number of questions asked by the customer by associating them with the content and page number. The evaluation unit 27 gives a higher evaluation to the content the more questions it is asked. First, as explained in S102, the content contained in the video data is identified, so the content that was displayed when the text data was spoken is also identified. For example, in Figure 20, the question "Is it possible to borrow the equipment?" is detected while the sample is being displayed. Also, the question "Regarding the previous example, what happens in the situation shown in the video?" is detected while the video is being displayed. The question count measurement unit 274 determines that something is a question when it detects a question string such as "~is it" in the text data. Alternatively, the question count measurement unit 274 may convert the text data into a distributed representation (vector data) and input the vector data into a classification model to determine whether or not it is a question.
[0153] The question count measurement unit 274 measures the number of questions included in the text data for each piece of content. For example, suppose the number of questions while displaying a sample is N1, the number of questions while displaying a case study is N2, and the number of questions while displaying a video is N3. Assume N1 > N2 > N3. It is preferable to normalize N1 to N3 by the display time of the content. The evaluation unit 27, for example, compares N1 to N3 with four thresholds that classify the number of questions into five stages, and assigns them to one of the stages from 1 to 5. Each stage from 1 to 5 is assigned a score in advance (e.g., 1 to 5). The evaluation unit 27 assigns question count points to each piece of content according to the assigned stage. This allows the evaluation unit 27 to assign higher question count points to content with a larger number of questions.
[0154] S104: The question count measurement unit 274 also increases the question count score if a question is detected regarding content that is not being displayed. The question count measurement unit 274 extracts keywords from the content. Keywords are nouns, etc., included as text data in the content, or nouns, etc., extracted by OCR processing. The question count measurement unit 274 determines which content the question relates to and whether it is a question about the displayed content. If the question is about content that is not being displayed, the evaluation of the undisplayed content is increased. This is because being asked about undisplayed content suggests that the customer is interested in that content.
[0155] In the example in Figure 20, while a video (an example of the first content) is being displayed, the question "Regarding the previous example, what happens in the context of the video?" is detected. The question count measurement unit 274 extracts nouns from the text data converted from the audio data using morphological analysis, and compares each noun with the nouns extracted from the content and the nouns extracted by OCR processing of the content. If the text data extracted from content that is not currently being displayed contains the same noun as the text data converted from the audio data, the question count measurement unit 274 determines that the question was asked about content that is not being displayed. For example, the noun "example" is included in the text data extracted from the content "case study introduction" (an example of the second content), so it can be determined that the question was asked about the case study introduction while the video was being displayed.
[0156] When the evaluation unit 27 detects a question about content that is not currently displayed, it adds +1 to the question count score for that content. Regardless of how many times a question about undisplayed content is asked, it is sufficient to simply add +1. Alternatively, 0.1 to +1 may be added each time a question is asked. The question count score shall not exceed the maximum score (for example, +5).
[0157] S105: The tone detection unit 275 detects tone from the speech data and determines that the tone is high if it is high and low if it is low. Alternatively, the tone detection unit 275 may perform formant analysis and estimate that the tone is high if the first formant becomes larger than a certain value over time, and conversely, determine that the tone is low if it becomes smaller. Furthermore, the tone detection unit 275 may acquire acoustic features (features related to speech frequency, volume, voice quality, speech rate, etc.) from the speech data and input them into a model trained using deep learning to determine emotion, and convert this emotion into tone points. Deep learning is an algorithm that predicts the output based on the input data and then adjusts the weights between neural networks using backpropagation to reduce the error with the training data. The above model is a classification model that takes acoustic features of speech data as input and outputs emotion. Examples of classification models include decision trees, boosting decision trees, support vector machines, and logistic regression.
[0158] Furthermore, the tone detection unit 275 may deduct tone points if the customer's voice is interrupted for a certain period of time or longer.
[0159] Figure 21 illustrates a method for calculating tone points based on the pitch of the tone. As shown in Figure 21, the tone detection unit 275 may measure tone points based on the tone of the customer's voice. The tone detection unit 275, for example, performs spectral analysis on the voice data to convert it into a relationship between frequency and intensity. The tone detection unit 275 divides the frequency range in which the voice is distributed from the entire customer voice data into about five ranges. The tone detection unit 275 spectrally transforms the voice data from the start of the web conference at regular intervals and classifies it into one of the five ranges based on which frequency range component of the five divisions is the strongest. The tone detection unit 275 determines that the tone is high when the strongest component changes from a low division to a high division at least twice consecutively. For example, if the frequency ranges are A, B, C, D, and E in ascending order, the tone is determined to be high when the strongest component moves from A to C to D. The tone detection unit 275 can also detect the timing when the tone is determined to be low. The tone is determined to be low when the strongest component moves from C to B to A. Furthermore, the tone detection unit 275 determines that the conversation is stalled if it detects a low tone a certain number of times or more within a certain period of time. The tone detection unit 275 may also determine that the conversation is stalled if the voice data of the customer and the sales representative is interrupted for a certain period of time or longer.
[0160] Line 430 in Figure 21 indicates the timing at which a high tone was detected. The tone detection unit 275 records the time from the start of the meeting when a high tone was detected, as shown in Figure 21. The tone detection unit 275 measures the number of times a high tone was detected, K1 to K3, for each piece of content. It is preferable to normalize the number of times K1 to K3 by the display time of the content. The evaluation unit 27, for example, compares the number of times a high tone was detected with four thresholds that classify it into five stages, and assigns K1 to K3 to one of the stages 1 to 5. Stages 1 to 5 are assigned points in advance (e.g., 1 to 5). The evaluation unit 27 assigns tone points to each piece of content according to the assigned stage. This allows the evaluation unit 27 to assign higher tone points to content that has had a higher number of times a high tone was detected.
[0161] Figure 22 illustrates how emotions are classified using an emotion classification model. Figure 22 shows the flow of the learning phase and the inference phase, respectively. While Figure 22 explains a two-class classification (interested / not interested), multi-class classification can be explained similarly.
[0162] (1) In both the learning and inference phases, the speech data is spectrally transformed, and features (speech frequency, volume, voice quality, speech rate, etc.) are extracted. Speech frequency can be expressed as MFCC (Mel-frequency cepstrum coefficients) or Mel spectrum. Volume is the intensity after spectral transformation. Voice quality is the frequency distribution after spectral transformation, and speech rate is the number of characters per unit time. These are input into the emotion classification model 420.
[0163] (2) The emotion classification model 420 outputs a value between 0 and 1 for each input feature.
[0164] (3) If the emotion classification model 420 is built using deep learning, the emotion classification model 420 outputs values between 0 and 1 using a sigmoid function or similar during the learning phase. The learning device is given a training signal (0 or 1) indicating whether the audio data is of interest or not, so it inputs the 0-1 output and the 0 or 1 training signal into the cross-entropy and calculates the loss. This loss is fed back into the weights of the emotion classification model 420 using backpropagation, allowing the emotion classification model 420 to correctly classify whether the data is of interest or not. In the inference phase, the emotion classification model 420 outputs values between 0 and 1, and by comparing these with a threshold, it is determined whether the data is of interest or not.
[0165] The tone detection unit 275 records the time when interest or a similar emotion is detected. When interest or a similar emotion is detected, the tone detection unit 275 measures the number of interested users M1 to M3 for the content that was displayed at that time. It is preferable to normalize the number of interested users M1 to M3 by the content's display time. The evaluation unit 27, for example, compares the number of interested users with four thresholds that classify the number of interested users into five levels, and assigns M1 to M3 to one of the levels 1 to 5. Each level from 1 to 5 is assigned a score in advance (e.g., 1 to 5). The evaluation unit 27 assigns a tone score to each piece of content according to the assigned level. This allows the evaluation unit 27 to assign higher tone scores to content that has a higher number of interested users.
[0166] S106: The evaluation unit 27 calculates the evaluation score for each piece of content by weighting the gaze direction score, facial expression score, display time score, number of questions asked score, and tone score. The evaluation unit 27 calculates the score as follows, for example. The weights are just examples, and a larger weight may be assigned to the score that the marketing person considers important. Evaluation score = 0.3 × gaze direction score + 0.2 × display time score + 0.4 × number of questions score + 0.1 × tone score Furthermore, the evaluation score does not have to use all of the following: gaze direction score, display time score, number of questions asked score, and tone score; it may be calculated based on one or more of these factors. Also, if a customer purchases a product after a business negotiation, the evaluation score of all content used in that negotiation may be increased. For example, a sales representative registers in the business negotiation information storage unit 32 that the customer has purchased a product. The evaluation unit 27 identifies the business negotiation ID and multiplies the evaluation of the content associated with the business negotiation ID by, for example, +0.5 to +1. In this way, even if the content is initially rated low, if a product is actually purchased, the evaluation score of the content used in the negotiation can be corrected. Alternatively, the evaluation unit 27 may set a product purchase flag in the content storage unit 43.
[0167] Thus, in this embodiment, the content used in a business negotiation can be evaluated based on at least one of the audio or video data of the customer who participated in the negotiation. Therefore, the marketing person who registered the content can replace or improve the content based on the evaluation.
[0168] <Evaluation process for content used in business negotiations> Referring to Figure 23, the process for registering the content evaluation calculated as described above will be explained. Figure 23 is a sequence diagram illustrating the processes performed by the communication support system 1 when a web conference for a business negotiation is started. Although Figure 23 uses remote communication as an example, this embodiment can also be applied to face-to-face communication. In this case, the meeting device or similar device can capture the customer's face with a camera, acquire audio with a microphone, and transmit the audio and video data to the communication support server 20.
[0169] S41, S42: Sales representative terminal 3 and terminal device 100 each connect to conference server 10, and a web conference is started. Sales representatives and customers join the same web conference by logging in. The sales representative identifies the deal ID when they log in by specifying the deal ID. Also, the names and identification information of the sales representative and customer are detected by the conference server 10 upon login. When the conference starts, the communication unit 201 of the sales representative sends audio and video data to the conference server 10, and the communication unit 101 of terminal device 100 also sends audio and video data to the conference server 10.
[0170] S43, S44: The conference server 10 transmits audio and video data from terminal device 100 to the sales representative terminal 3 of a sales representative participating in the same conference, and transmits audio and video data from the sales representative terminal 3 to the customer terminal device 100 of a customer participating in the same conference.
[0171] S45: When the meeting begins, the sales representative inputs instructions to record audio and video data into the sales representative terminal 3. If the analysis button 809 in Figure 16 was pressed, recording may be performed automatically.
[0172] S46: The operation reception unit 205 of the sales representative terminal 3 receives the operation, and the communication unit 201 sends a recording start request to the conference server 10.
[0173] S47: Upon receiving a recording start request, the conference server 10 begins recording audio and video data. The conference server 10 records the audio and video data associated with the business negotiation ID. The conference server 10 also records the audio data associated with the identification information of the sales representative and the identification information of the customer.
[0174] S48: When the sales representative finishes a business meeting, they input the command to end the meeting into sales representative terminal 3.
[0175] S49: The operation reception unit 205 of the sales representative terminal 3 receives the operation, and the communication unit 211 sends a meeting termination request along with the negotiation ID to the meeting server 10.
[0176] S50: When conference server 10 receives notification of the end of a conference, it identifies the audio and video data associated with the deal ID.
[0177] S51: The conference server 10 transmits the identified audio and video data along with the business negotiation ID to the communication support system 1. The audio data is associated with the identification information of the sales representative or the customer. Alternatively, when the communication support server 20 receives a meeting termination request and requests the audio and video data from the conference server 10, the conference server 10 may transmit the recorded audio and video data to the communication support server 20. In this case, the audio and video data may also be recorded by the communication support server 20. In this case, the sales representative terminal 3 and the terminal device 100 each transmit audio and video data to the communication support server 20.
[0178] S52: The communication unit 21 of the communication support server 20 receives audio and video data along with the business negotiation ID. The acquisition unit 23 acquires the audio and video data. First, the speech recognition unit 24 of the communication support server 20 converts the audio data into text data.
[0179] S53: The acquisition unit 23 stores the audio / video data and text data in the audio / video data storage unit 31 shown in Figure 9. The negotiation information URL can be the URL where the negotiation information identified by the negotiation ID is stored. The audio data, text data, and video data are all associated with the time from the start of the meeting.
[0180] S54: Furthermore, the evaluation unit 27 of the communication support server 20 identifies the pages of the sales negotiation materials shown in the video data by comparing the image data captured from the video data with the image data of the sales negotiation materials. To clarify, since the sales negotiation ID is known, the sales negotiation material ID can be identified from the sales negotiation material information storage unit 34 in Figure 12. The evaluation unit 27 obtains the sales negotiation materials identified by the sales negotiation material ID and obtains the image data of the sales negotiation materials. The evaluation unit 27 compares the image data captured from the video data with this image data of the sales negotiation materials. The evaluation unit 27 identifies the pages of the sales negotiation materials whose similarity is above a threshold. Next, the evaluation unit 27 obtains the content ID associated with the identified page of the business negotiation document from the linking information storage unit 33. This allows the evaluation unit 27 to identify the content shown in the video data. By detecting content changes in the video data, the evaluation unit 27 only needs to identify the content ID at the moment the content changes, thus reducing the processing load. By identifying the timing of the content change in terms of time from the start of the meeting, it is possible to associate the audio data, text data, and video data with the content. In other words, it is possible to identify that the content with a content ID that is displayed between time A and B from the start of the meeting is being displayed.
[0181] The evaluation unit 27, by utilizing the linked information storage unit 33, only needs to compare the negotiation materials used in a single negotiation with the image data captured from the video data, thus enabling content identification in a short time.
[0182] The evaluation unit 27 may also compare the image data captured from the video data with the image data of the content on the content server 40 to identify the most similar content.
[0183] S55: The evaluation unit 27 of the communication support server 20 associates the time since the start of the business negotiation with the content ID. Since the audio data, text data, and video data are also associated with the time since the start of the business negotiation, the evaluation unit 27 associates the audio data / text data / video data with the content while the same content is displayed (until the content changes). If the content has multiple pages, the audio data, text data, and video data are associated on a page-by-page basis.
[0184] S56: The evaluation unit 27 of the communication support server 20 evaluates each piece of content for each remote communication based on the flowchart in Figure 19. This provides page-level evaluation results (evaluation score, evaluation content) for each piece of content.
[0185] S57: Next, the evaluation unit 27 of the communication support server 20 inputs, for example, negotiation information, content, and evaluation results for each piece of content into a large-scale language model to generate a content evaluation summary. A content evaluation summary is generated for each piece of content. As a result, the evaluation unit 27 generates a content evaluation summary that can be used in the content evaluation summary 313 in Figure 4.
[0186] S58: The save request unit 28 of the communication support server 20 sends a request to the content server 40 via the communication unit 21 to save the content evaluation result and content evaluation summary, associating them with the deal ID, content ID, and page number. If the communication support server 20 has a content storage unit 43, the communication support server 20 only needs to save the evaluation result.
[0187] S59: The communication unit 41 of the content server 40 receives the content evaluation results and content evaluation summary, and the content management unit 42 saves them in the content storage unit 43. The content ID, page number, deal ID, and evaluation score are transmitted from the communication support server 20. The content name and creator are already saved in the content storage unit 43. The evaluation content URL can be the URL where the content evaluation summary is saved.
[0188] S60: Next, the communication support server 20 sends the sales opportunity information shown in Figure 10 to the CRM server 50. A copy of the sales opportunity information is stored in the CRM server 50, so the sales representative can also view the sales opportunity information from the CRM server 50. Furthermore, the evaluation results of the content used in the sales opportunity may also be sent to the CRM server 50.
[0189] S61: CRM server 50 stores product information.
[0190] S62: The communication support server 20 identifies the creator of the content. Since the content creator is registered with the content server 40, the communication support server 20 requests the creator from the content server 40 by specifying the content ID.
[0191] S63: The communication unit 21 notifies the creator of the identified content (marketing staff) that the content evaluation is complete, along with the deal ID. The communication unit 211 of the marketing staff terminal 4 receives the notification that the content evaluation is complete. Upon receiving the notification that the content evaluation is complete, the communication unit 211 specifies the deal ID and receives information regarding the content evaluation from the communication support server 20. This allows the marketing staff to check the evaluation of the content they created, as shown in Figure 24. If multiple pieces of content included in one deal document are created by different creators, the notification in S63 is also sent to each creator. The notification may be sent via email, SNS, or push notification to the marketing staff terminal 4.
[0192] S64: The communication support server 20 notifies the sales representative that registration of the audio / video data and content evaluation results is complete.
[0193] <Example of displaying content evaluation results and content evaluation summary> Next, with reference to Figure 24 and other figures, examples of how content evaluation results and content evaluation summaries are displayed will be explained. Figure 24 shows the content evaluation results screen 630 displayed on the marketing staff terminal 4. When the marketing staff receives notification that the evaluation is complete, they connect the marketing staff terminal 4 to the content server 40, specify the opportunity ID for which the evaluation has been completed, and display the evaluation results of the content used in that opportunity. In this way, the marketing staff can view content evaluation results on an opportunity-by-opportunity basis. Content-specific evaluation results will be explained in Figure 25.
[0194] In response to a request for evaluation results from a marketing representative's terminal specifying a deal ID, the content server 40 sends content information associated with the deal ID to the marketing representative's terminal 4. A native application is running on the marketing representative's terminal 4. The display control unit 212 places the content information on the content evaluation results screen 630. If the marketing representative's terminal 4 is running a web browser, the content server 40 sends a program to the marketing representative's terminal 4 that displays the content evaluation results screen 630 as a web application.
[0195] The content evaluation results screen 630 displays the content evaluation results, associated with the content's page number. Additionally, the content evaluation results screen 630 displays the comments made while viewing each piece of content, also associated with its page number.
[0196] • The name field 631 contains the content file name (content name) and page number. The content in this name field 631 was used in the opportunity specified by the marketing representative using the opportunity ID. The horizontal display order of the content can be in ascending or descending order of the content file name, or in the order in which it appears in the opportunity document.
[0197] • Figure 632 displays thumbnails of the content.
[0198] • The gaze column 633 displays the result of determining whether the user's gaze is directed towards the content while the content of the corresponding page number is being displayed. The evaluation includes the gaze direction for each unit of time, so for example, if the time the gaze was directed towards the screen is longer than the time it was not, "Gaze directed towards" will be displayed.
[0199] • The projection time column 634 displays the display time of the content corresponding to the page number included in the evaluation.
[0200] • The "Number of Questions" field 635 displays the number of questions asked for the content on the corresponding page number, as included in the evaluation. The "Number of Questions" field 635 may also display the number of utterances (number of text data) for the content on the corresponding page number, not just questions.
[0201] The "Vocalization Characteristics" field 636 contains the characteristics of the customer's voice when they asked a question, as included in the evaluation. The judgment result regarding the pitch of the tone is displayed in the "Vocalization Characteristics" field 636. Since the evaluation includes pitch corresponding to time, for example, if the number of times the tone is high is longer than the number of times it is low, "High tone" will be displayed. In the opposite case, "Low tone" will be displayed. Also, for example, if the customer's speech is interrupted or if there is a period of time when the tone is low for a certain amount of time, "Stagnation" will be displayed.
[0202] • The expression column 637 displays the number of times the customer has made each expression in response to the content on the corresponding page number, as included in the evaluation.
[0203] • The related comment section 639 displays the content (text data) of comments related to the content on the corresponding page number.
[0204] • The "Interest Level" column (641) represents the evaluation score (calculated from the gaze direction score, display time score, number of questions asked score, and tone score while viewing the relevant page of the content).
[0205] Button 642 is used to switch between page-level and content-level evaluation. When switched to content-level evaluation, the content evaluation results screen 630 is displayed based on the evaluation results measured for a single piece of content.
[0206] In this way, marketing personnel can view a detailed evaluation of the content used in a particular sales negotiation in a single list. Since they can see things like eye gaze direction, viewing time, number of questions asked, vocal characteristics, facial expressions, and level of interest, it's easier to identify content that needs improvement.
[0207] Furthermore, the display control unit 212 may display the comments made by sales representatives regarding content that has an evaluation score of a predetermined value or higher as recommended comments. For example, if sales representative A is viewing the content evaluation results screen 630 related to his sales negotiation, the comments made by B, who has an evaluation score of a predetermined value or higher for the same content, will be displayed. Sales representative A can then use B's comments to improve his next sales negotiation.
[0208] Furthermore, the communication support server 20 retrieves answers to questions about the content from the sales representative's statements and saves them as example answers. The communication support server 20 then suggests to the marketing team that they add these example answers to the content. Alternatively, when content is displayed during a sales negotiation, the communication support server 20 displays the example answers on the sales representative's terminal. This enriches the content and makes it easier for sales representatives to answer questions.
[0209] Figure 25 shows an example of the content-specific evaluation screen 650, which displays evaluation results for each piece of content. When a marketing representative specifies any content or selects content on the content evaluation results screen 630, the marketing representative's terminal 4 can display the content-specific evaluation screen 650. Sales materials containing the same content may be explained in different remote communications. That is, one piece of content may be used in multiple sales negotiations, but the evaluation unit 27 evaluates the same sales materials (content) explained in different sales negotiations for each negotiation. The content-specific evaluation screen 650 can display the evaluation of the content obtained for each sales negotiation, on a content-by-content basis. In Figure 25, this content has been used in 20 sales negotiations.
[0210] On the left side of the content evaluation screen 650, thumbnails of each content page are displayed. The top thumbnail is the thumbnail 651 of the page specified by the marketing team. Thumbnail 651 is the content image. Below thumbnail 651, thumbnails 652 and 653 of other pages are displayed. When the marketing team clicks on thumbnails 652 or 653, that page is displayed as a larger thumbnail 651. By displaying the content image, the marketing team can understand which content is being evaluated.
[0211] Rating Information 654 is statistical information on content ratings. Rating Information 654 shows the distribution and ratio of multiple ratings given in 20 business negotiations. 1 to 5 stars correspond to rating scores "1 to 5". In the example in Figure 25, 60% of business negotiations were rated 5 stars, 20% were rated 4 stars, 10% were rated 3 stars, 10% were rated 2 stars, and 0% were rated 1 star.
[0212] The itemized evaluation 655 shows the degree of evaluation for each item (content clarity, degree of smiles, interest) on a 5-point scale. Clarity can be expressed as, for example, the tone of the content. Degree of smiles can be expressed as, for example, the facial expression of the content. Interest can be expressed as, for example, the number of questions asked during the content display.
[0213] Comment section 656 contains content evaluation summaries generated for each sales opportunity for the content in question. While Figure 25 shows three content evaluation summaries, it is possible to display as many as there are sales opportunities. For example, if a marketing representative clicks on the star rating in rating information 654, only the content evaluation summaries for sales opportunities that received that star rating will be displayed.
[0214] In this way, marketing personnel can display ratings and other information for each piece of content. For example, they can refer to a summary of content evaluations for low-rated deals and improve the content. When a marketing person selects thumbnail 651 and switches pages, the rating information 654, item-specific evaluations 655, and comment section 656 will also display information corresponding to the switched page. Marketing personnel can check the evaluation for each page while switching between pages.
[0215] <Main effects> The communication support system 1 of this embodiment can evaluate the content used in a business negotiation, at least one of either the audio data or the video data of the customer who participated in the negotiation. Therefore, the marketing person who registered the content can replace or improve the content based on the evaluation.
[0216] <Other application examples> Although the best mode for carrying out the present invention has been described above using examples, the present invention is not limited in any way to these examples, and various modifications and substitutions can be made without departing from the spirit of the present invention.
[0217] For example, the content may be obtained from a website. The content server stores the content obtained from the website in the content storage unit 43, associating it with its URL. Alternatively, the content may be generated by a generation AI.
[0218] Furthermore, the content to be evaluated does not need to be pre-registered in the content server 40. In this case, the communication support server 20 evaluates the content used in the sales negotiation and sends it to the content server 40 along with the content. This situation occurs when the sales negotiation materials include content that is not registered in the content server 40. After this, the sales representative can incorporate the newly registered content into the sales negotiation materials.
[0219] Alternatively, the evaluation unit 27 may evaluate the content based on the audio and video data and then transmit the content and evaluation results to the content server 40. The content server 40 identifies the content by comparing the analysis image data with the content image data.
[0220] Alternatively, the sales representative terminal 3 may perform the content evaluation. In this case, the sales representative terminal 3 has an evaluation unit 27 and downloads the analysis image data storage unit 35. During the sales negotiation, the sales representative terminal 3 receives the customer's audio and video data, so it can evaluate the content during the negotiation. The sales representative terminal 3 inputs the negotiation information and evaluation results into a large-scale language model to generate a content evaluation summary. The sales representative terminal 3 sends the negotiation ID, content ID, evaluation result, and content evaluation summary to the content server 40.
[0221] The configuration examples in Figure 7 and other figures are divided according to their main functions to facilitate understanding of the processing performed by the communication support server 20, sales representative terminal 3, and marketing representative terminal 4. The present invention is not limited by the way the processing units are divided or the names of those units. The processing of the communication support server 20, sales representative terminal 3, and marketing representative terminal 4 can be further divided into more processing units depending on the processing content. Furthermore, each processing unit can be divided to include even more processing.
[0222] Furthermore, the apparatus described in the examples represents only one of several computing environments for carrying out the embodiments disclosed herein. In one embodiment, the conference server 10, the communication support server 20, etc., include multiple computing devices such as a server cluster. The multiple computing devices are configured to communicate with each other via any type of communication link, including a network or shared memory, and perform the processing disclosed herein.
[0223] Furthermore, the apparatus described in the examples represents only one of several computing environments for carrying out the embodiments disclosed herein. In one embodiment, the communication support server 20 includes multiple computing devices, such as a server cluster. The multiple computing devices are configured to communicate with each other via any type of communication link, including networks and shared memory, and perform the processing disclosed herein.
[0224] Each function of the embodiments described above can be realized by one or more processing circuits. Hereinafter, "processing circuit" as used herein includes processors programmed to execute each function by software, such as processors implemented by electronic circuits, as well as devices such as ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), FPGAs (field programmable gate arrays), and conventional circuit modules designed to execute each function described above.
[0225] <Mode> [Aspect 1] A communication support system in which an information processing device that evaluates content displayed on a user's terminal device during online communication and an information terminal can communicate via a network, The audio data recorded in the communication includes the user's utterances in response to the explanation of the content, and the video data recorded in the communication includes the video of the user receiving the explanation of the content. An acquisition unit that acquires at least one of the aforementioned audio data or the aforementioned video data, An evaluation unit analyzes at least one of the user's speech contained in the audio data acquired by the acquisition unit or the user's facial expressions contained in the video data to determine an evaluation score for the content. A display control unit that displays information regarding the evaluation of the content by the evaluation unit together with an image of the content, A communication support system that has [a certain feature]. [Aspect 2] The same content is explained in different communications. The evaluation unit evaluates the same content described in different communications for each communication. The display control unit displays statistical information on the evaluation of the same content. The communication support system described in Item 1. [Aspect 3] The evaluation unit calculates an evaluation score for the content for each communication, The display control unit displays the distribution of multiple evaluation scores for the same content as statistical information. The communication support system described in Fire 2. [Aspect 4] The evaluation unit detects during one communication and one display of content, The number of times the tone of the user's voice included in the aforementioned audio data exceeds a certain level, The voice data of the user is converted into text data by speech recognition, and the number of questions included in the text data is The user's facial expression obtained by analyzing the aforementioned video data, The number of times the user's gaze direction, as detected from the video data, was not directed towards the screen displaying the content, and Display time of the aforementioned content, The evaluation score is calculated using at least one of the following: The communication support system described in Specifications 3. [Aspect 5] The evaluation unit, The aforementioned evaluation score, the number of times the user's voice tone exceeded a certain level, the number of questions asked, the user's facial expression, the number of times the user's gaze direction was not directed towards the screen displaying the content, and the display time of the content, Input at least one of the following into a large-scale language model to generate a content evaluation summary. The display control unit displays multiple evaluation summaries of the content generated by a large-scale language model for the same content described in different communications. The communication support system described in Embodiment 4. [Aspect 6] The evaluation unit, The number of times the user's voice tone exceeds a certain level, the number of questions asked, the user's facial expression, the number of times the user's gaze is not directed towards the screen displaying the content, or the display time of the content. Convert one or more of these into a number within a defined range. The display control unit displays the numerical value along with the name of the evaluation item. A communication support system according to the description in aspect 4 or 5. [Aspect 7] The display control unit, The text data obtained by speech recognition from the voice data spoken by the user while the content is being displayed is displayed together with a thumbnail of the content. A communication support system as described in any one of the items 4 to 6 of the Fire Protection Act. [Aspect 8] The display control unit detects during one communication and display of one piece of content, The judgment result regarding whether the tone of the user's voice is high or low, the number of questions, the user's facial expression, the judgment result regarding whether the user's gaze direction is directed toward the screen displaying the content or not, and the display time of the content. Display at least one of the above along with the thumbnail of the content. A communication support system as described in any one of the four to seven descriptions. [Aspect 9] The display control unit displays the evaluation score calculated for one communication and one piece of content, along with a thumbnail of the content. A communication support system as described in any one of the items 4 to 8 of the Specification Act. [Explanation of Symbols]
[0226] 1. Communication support system 3. Sales representative terminal 4. Marketing personnel terminal 10 Conference Server 20 Communication support server 40 Content Servers 50 CRM Servers 100, 100a, 100b, 100c, 100d, 100e Terminal devices [Prior art documents] [Patent Documents]
[0227] [Patent Document 1] Japanese Patent Publication No. 2022-096440
Claims
1. A communication support system in which an information processing device that evaluates content displayed on a user's terminal device during online communication and an information terminal can communicate via a network, The audio data recorded in the communication includes the user's utterances in response to the explanation of the content, and the video data recorded in the communication includes the video of the user receiving the explanation of the content. An acquisition unit that acquires at least one of the aforementioned audio data or the aforementioned video data, An evaluation unit analyzes at least one of the user's speech contained in the audio data acquired by the acquisition unit or the user's facial expressions contained in the video data to determine an evaluation score for the content. A display control unit that displays information regarding the evaluation of the content by the evaluation unit together with an image of the content, A communication support system that has [a certain feature].
2. The same content is explained in different communications. The evaluation unit evaluates the same content described in different communications for each communication. The display control unit displays statistical information on the evaluation of the same content. The communication support system according to claim 1.
3. The evaluation unit calculates an evaluation score for the content for each communication, The display control unit displays the distribution of multiple evaluation scores for the same content as statistical information. The communication support system according to claim 2.
4. The evaluation unit detects during one communication and one display of content, The number of times the tone of the user's voice included in the aforementioned audio data exceeds a certain level, The voice data of the user is converted into text data by speech recognition, and the number of questions included in the text data is The user's facial expression obtained by analyzing the aforementioned video data, The number of times the user's gaze direction, as detected from the video data, was not directed towards the screen displaying the content, and Display time of the aforementioned content, The evaluation score is calculated using at least one of the following: The communication support system according to claim 3.
5. The evaluation unit described above, The aforementioned evaluation score, the number of times the user's voice tone exceeded a certain level, the number of questions asked, the user's facial expression, the number of times the user's gaze direction was not directed towards the screen displaying the content, and the display time of the content, Input at least one of the following into a large-scale language model to generate a content evaluation summary. The display control unit displays multiple evaluation summaries of the content generated by a large-scale language model for the same content described in different communications. The communication support system according to claim 4.
6. The evaluation unit described above, The number of times the user's voice tone exceeds a certain level, the number of questions asked, the user's facial expression, the number of times the user's gaze is not directed towards the screen displaying the content, or the display time of the content. Convert one or more of these into a number within a defined range. The display control unit displays the numerical value along with the name of the evaluation item. The communication support system according to claim 4 or 5.
7. The display control unit, The text data obtained by speech recognition from the voice data spoken by the user while the content is being displayed is displayed together with the thumbnail of the content. The communication support system according to claim 4.
8. The display control unit detects during one communication and display of one piece of content, The judgment result regarding whether the tone of the user's voice is high or low, the number of questions, the user's facial expression, the judgment result regarding whether the user's gaze direction is directed toward the screen displaying the content or not, and the display time of the content. Display at least one of the above along with the thumbnail of the content. The communication support system according to claim 4.
9. The display control unit displays the evaluation score calculated for one communication and one piece of content, along with a thumbnail of the content. The communication support system according to claim 4.
10. An information processing device that evaluates content displayed on a user's terminal device during online communication, and a program executed by an information terminal that can communicate via a network, The audio data recorded in the communication includes the user's utterances in response to the explanation of the content, and the video data recorded in the communication includes the video of the user receiving the explanation of the content. The aforementioned information processing device is An acquisition unit that acquires at least one of the aforementioned audio data or the aforementioned video data, An evaluation unit analyzes at least one of the user's speech contained in the audio data acquired by the acquisition unit or the user's facial expressions contained in the video data to determine an evaluation score for the content. It has a first communication unit that notifies the creator of the aforementioned content that the evaluation has been completed, The program uses the information terminal, A second communication unit receives information regarding the evaluation of the content from the information processing device in response to the completion of the aforementioned evaluation, A display control unit that displays information regarding the evaluation of the content along with the content image. A program designed to function as such.
11. An evaluation method performed by a communication support system that enables communication between an information processing device, which evaluates content displayed on a user's terminal device during online communication, and an information terminal, via a network. The audio data recorded in the communication includes the user's utterances in response to the explanation of the content, and the video data recorded in the communication includes the video of the user receiving the explanation of the content. The acquisition unit performs a process to acquire at least one of the audio data or the video data, The evaluation unit analyzes at least one of the user's speech contained in the audio data acquired by the acquisition unit or the user's facial expressions contained in the video data to determine an evaluation score for the content. An evaluation method comprising: a display control unit performing a process to display information relating to the evaluation of the content by the evaluation unit together with an image of the content.