system
The system addresses BPR challenges by converting voice input to text, analyzing it for business process elements, and generating flow diagrams, enhancing operational efficiency and digitalization in local governments and private companies.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Local governments and private companies face challenges in implementing business process re-engineering (BPR) due to manual management, outdated systems, and the need for expertise, which hinders efficient digitalization and operational improvement.
A system that utilizes speech recognition to convert voice input into text, analyzes the text using natural language processing to identify business process elements, and automatically generates flow diagrams, allowing for efficient BPR by presenting challenges and solutions.
Enables efficient and in-house business process improvement by automating the analysis and generation of business process diagrams, facilitating quick digitization and streamlining operations.
Smart Images

Figure 2026101213000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In local governments and private companies, there are situations where the implementation of business process re - engineering (BPR) is not progressing. Although the digitalization and efficiency improvement of operations are important, the prevalence of conventional manual management and the use of old systems have become factors hindering effective business improvement. Also, expertise and experience are required to appropriately analyze business processes and formulate improvement measures, which takes a great deal of labor and time. There is a need for new means to solve such problems of the prior art and to easily and efficiently internalize BPR.
Means for Solving the Problems
[0005] This invention provides a speech recognition means that receives voice input, acquires voice data, and converts that voice data into text. Furthermore, it includes a generation means that automatically generates a business process flow diagram by using an analysis means that analyzes the converted text and extracts the business flow. In the analysis process, natural language processing technology is used to identify the elements of the business process flow. By presenting business challenges and solutions based on the generated business process flow diagram and automatically generating materials based on them, it becomes possible to easily implement BPR (Business Process Reengineering). As a result, it enables more efficient and in-house business process improvement in local governments and private companies.
[0006] "Voice input" refers to a method of acquiring voice data used when a user conveys information verbally.
[0007] "Audio data" refers to a digital recording of sound acquired through voice input.
[0008] "Speech recognition means" refers to a system or device that has the function of converting speech data into text format.
[0009] "Text" refers to character information converted from speech data by speech recognition technology.
[0010] "Analysis means" refers to a system or method that analyzes text and extracts elements of a business process flow.
[0011] A "business process flow" is a series of steps, branches, conditions, and other elements that make up a business process.
[0012] "Generation means" refers to a system or device that has the function of automatically creating a business process flow diagram based on the analysis results.
[0013] A "business process flow diagram" is a diagram that visually represents a business process flow and is a tool for showing the flow of a process.
[0014] "Issues" refer to problems or inefficiencies within a business process that require improvement.
[0015] A "solution" is a specific proposal indicating an improvement method or approach for a specified problem.
[0016] A "document" is a presentation document or video created based on a business flowchart, problems, and solutions.
Brief Explanation of Drawings
[0017] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Example 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.
Embodiments for Carrying Out the Invention
[0018] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0019] First, the terms used in the following description will be explained.
[0020] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0021] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0022] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0023] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0024] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0025] [First Embodiment]
[0026] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0027] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0028] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0029] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0030] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0031] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0032] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0033] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0034] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0035] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0036] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0037] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0038] This invention provides a system for automatically generating business workflows using voice input, and describes its embodiments. The system primarily operates through the interaction of a terminal that acquires voice input, a server that processes the voice data, and the user.
[0039] First, the user verbally explains the work procedures and processes using a terminal. The terminal captures the user's voice through its microphone and records it as digital audio data. This digital audio data is then transmitted to a server via the internet.
[0040] The server uses a speech recognition engine to convert this audio data into text. The converted text is then analyzed using natural language processing techniques to identify the steps and conditions that make up the business process. This analysis extracts the elements of the business process, forming the basis for the business process diagram.
[0041] The generated business process flow diagram is presented to the user via a terminal from the server. The user reviews the diagram and provides additional information via voice by pointing out any omissions or errors as needed. The terminal sends this additional information back to the server, which then corrects and completes the business process flow diagram.
[0042] Furthermore, the server analyzes potential challenges and solutions from the business workflow and uses a generated AI model to provide concrete suggestions. This information is presented to the user via the terminal, and feedback is received.
[0043] Ultimately, the server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. This allows users to easily visually confirm the results of business process improvements and share them with stakeholders.
[0044] As part of this system, an automated analysis and generation process using artificial intelligence technology is included, enabling local governments and private companies to quickly digitize and streamline their operations. For example, if a user explains the new customer contract process by voice, the system can generate a flowchart of the workflow, including "customer information input," "contract creation and confirmation," and "approval process."
[0045] The following describes the processing flow.
[0046] Step 1:
[0047] The user uses a device to verbally explain the business process. The device captures the audio in real time via its microphone and saves it as digital audio data.
[0048] Step 2:
[0049] The terminal transmits the acquired audio data to the server via the network. The server activates a speech recognition engine and converts the audio data into text. Noise reduction and formatting of the audio data are also performed at this stage.
[0050] Step 3:
[0051] The server analyzes the converted text and uses natural language processing techniques to identify elements of the business flow. Specifically, it identifies business steps and branching conditions through the extraction of noun phrases and the analysis of conditional statements.
[0052] Step 4:
[0053] Based on the analysis results, the server creates a business process flow diagram using a generated AI model. It generates a visual representation of the business process in a flowchart format, consisting of nodes and edges.
[0054] Step 5:
[0055] The server generates a business process flow diagram, which is then sent to the user's terminal for presentation. The user reviews the diagram and verbally points out any necessary corrections or missing information.
[0056] Step 6:
[0057] The terminal collects additional audio from the user again and sends it to the server. The server updates the business flow diagram based on the additional information and generates an accurate process diagram.
[0058] Step 7:
[0059] The server automatically generates anticipated issues and solutions based on the final workflow. This information is presented to the user via the terminal, and further feedback is received.
[0060] Step 8:
[0061] The server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. Terminals distribute these documents to users, enabling them to leverage the results of business process improvements.
[0062] (Example 1)
[0063] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0064] In today's business environment, there is a demand for increased efficiency and automation of business processes. However, traditional methods make it difficult to quickly and accurately grasp business procedures and easily modify automatically generated workflows. Furthermore, it is challenging to instantly provide proposals that effectively solve business problems. Additionally, there is a need to quickly create and share proposed solutions as visual materials. To address these issues, an innovative system utilizing voice input is necessary.
[0065] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0066] In this invention, the server includes means for receiving voice input and acquiring voice data, means for converting the voice data into text, means for analyzing the converted text and extracting business procedures, means for automatically generating business procedure diagrams and modifying the flowcharts based on additional information received from the user via voice, and means for presenting business problems and solutions and creating visual materials based thereon. This enables efficient automation of business processes based on voice input, immediate presentation of problem solutions, corrective actions, and rapid generation and sharing of visual materials.
[0067] "Voice input" is the process of using voice to transmit user instructions and information to a computer system.
[0068] "Audio data" refers to data that is recorded and stored in digital format from signals obtained from audio input.
[0069] "Speech recognition means" refers to a technology or device that analyzes speech data and extracts its content as text.
[0070] "Analysis method" refers to the process of analyzing received text data and extracting business procedures and conditions.
[0071] A "business procedure" is a set of steps and conditions necessary to complete a specific task.
[0072] A "business procedure diagram" is a visual representation of business procedures, illustrating the flow and relationships between them.
[0073] "Construction method" refers to the process of creating a business procedure diagram based on the analyzed information.
[0074] "Proposal method" refers to a method or technique for presenting problems and solutions to users based on a business procedure diagram.
[0075] "Generative means" refers to the technology or process of creating materials for reports or presentations based on the proposed content.
[0076] "Visual materials" are information presented in visual formats such as graphs, slides, and videos.
[0077] This invention provides a system for automatically generating business workflows using voice input and efficiently solving business challenges. The following describes specific embodiments of this system.
[0078] Users use a terminal to verbally explain the procedures and processes of the workflow. The terminal is equipped with a microphone that captures the user's voice as digital audio data and transmits it to a server via the internet. Specific hardware and software may utilize a standard smartphone, computer, or cloud-based data transmission capabilities.
[0079] The server converts the received audio data into text using a speech recognition engine. Speech recognition utilizes technologies provided by speech recognition APIs or cloud services. Next, the server analyzes the text data using natural language processing (NLP) techniques to extract the steps and conditions of the business procedure. This analysis employs libraries and software that perform semantic analysis of the text data.
[0080] Based on the analysis results, the server automatically generates a business procedure diagram. The automatically generated diagram is sent to the terminal, where the user verifies its accuracy. If there are any omissions or errors in the diagram, the user can provide additional instructions via voice. The terminal resends this additional voice data to the server, which then corrects and completes the business procedure diagram.
[0081] Furthermore, the server analyzes anticipated problems based on the generated business procedure diagrams and proposes specific solutions using a generated AI model. This proposal is made on the server using ChatGPT® or similar artificial intelligence models. The proposed solutions are sent to the terminal, where the user provides feedback.
[0082] Ultimately, the server automatically generates the finalized business process diagrams and solutions as visual materials, such as slides or videos. These visual materials are then used for sharing among stakeholders and for presentations.
[0083] For example, if a user describes the new customer contract process via voice, this system can generate a business process diagram that includes steps such as "entering customer information," "creating and reviewing the contract," and "approval process." Furthermore, by inputting a prompt such as, "How can this contract process be made more efficient?", the system can obtain suggestions for improvement.
[0084] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0085] Step 1:
[0086] The user verbally explains the steps and processes of the workflow. The user's voice is captured as input and converted into digital audio data via the device's microphone. This audio data is then sent to the server.
[0087] Step 2:
[0088] The server passes the received digital audio data to the speech recognition engine, which converts the audio data into text. This process uses speech recognition technology that analyzes the digital audio and maps its waveform to a corresponding string of characters. Text data is generated as output.
[0089] Step 3:
[0090] The server inputs text data generated by speech recognition into a natural language processing (NLP) engine to analyze the steps and conditions of the business procedure. Here, the grammar and semantics of the document are analyzed to identify the elements that make up the business procedure. The output is structural information of the analyzed business procedure.
[0091] Step 4:
[0092] The server automatically generates a business procedure diagram based on the structural information of the analyzed business procedures. This generation process uses an algorithm that visualizes the business flow in an easy-to-understand flowchart format. The output is a flowchart showing the business procedures.
[0093] Step 5:
[0094] The user reviews the work procedure diagram displayed on the terminal. If there are any omissions or errors in the diagram, the user inputs additional instructions by voice and sends them to the terminal as supplementary information.
[0095] Step 6:
[0096] The server performs speech recognition again on the additional audio data sent from the terminal and converts it to text. Then, it performs another NLP analysis to identify the necessary corrections to be reflected in the business procedure diagram. The output is the revised business procedure diagram.
[0097] Step 7:
[0098] The server analyzes potential business problems based on the finalized business procedure diagram and generates specific solutions using a generative AI model. This process involves prompting the AI model and performing calculations to generate relevant problem-solving solutions. The output is the proposed solution.
[0099] Step 8:
[0100] The server automatically generates visual materials using the final business procedure diagram and proposed solutions. A material generation algorithm is applied to present the information clearly in slide and video formats. The output is visual material.
[0101] (Application Example 1)
[0102] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0103] In modern production facilities, frequent changes in product variations and manufacturing processes can reduce productivity. In particular, changes to production lines involving complex procedures increase the burden on managers, increase the likelihood of errors, and significantly impair efficiency. To address these challenges, there is a need for a system that allows for easy setting and updating of production procedures using voice input.
[0104] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0105] In this invention, the server includes means for receiving voice information and acquiring data, voice recognition means for converting the data into text information, and analysis means for analyzing the converted text information and extracting a procedure flow. This enables administrators to quickly and accurately set and update production line procedures via voice input.
[0106] "Voice information" refers to data recorded digitally from the user's speech.
[0107] "Textual information" refers to data converted from audio information into text format by speech recognition technology.
[0108] A "procedure flow chart" is a model that represents a series of steps or processes that indicate the steps involved in a business operation.
[0109] "Analysis means" refers to a system component that has the function of processing textual information to identify and extract a procedural flow.
[0110] A "generation means" is a component of a system that has the function of automatically creating a visually understandable flowchart based on the analyzed procedure flow.
[0111] "Operational challenges" refer to factors or problems that hinder efficiency or quality in business or production processes.
[0112] "Countermeasures" refer to specific solutions or methods proposed to address identified operational challenges.
[0113] "Means for automatically generating materials" refers to a system component that has the function of generating presentation materials based on the generated procedure flowchart and countermeasures.
[0114] "Robot control means" refers to a system component that has the function of automatically updating the robot's movements and operating procedures based on a procedure flow diagram obtained through a generation means.
[0115] To implement this invention, a system is constructed that enables the efficient setup and updating of production lines within a factory. First, inputting voice information requires a voice input device with a microphone for recording the speaker's voice. The voice information is input using this device and converted into a digital format. Next, the voice information is transmitted to a server via the internet, and the server converts the voice into text information using the Google® Cloud Speech-to-Text API.
[0116] The server analyzes the converted text information and extracts the procedure flow. The Google Cloud Natural Language API is used for analysis, analyzing the text data to identify the business steps. The identified steps are automatically generated as a procedure flow diagram and visually presented through the user interface.
[0117] Users can make modifications to the flowchart as needed through voice input. Finally, the server updates the operation procedures of the factory robots using robot control means based on the generated procedure flowchart. This makes it easy for users to achieve efficient operation of the production line.
[0118] As a concrete example, consider a case where a user changes the production process for a product. The user gives a voice command saying, "For the new product, please add two repetitions to the processing step." Based on this prompt, the system automatically generates a new procedure flow and adjusts the factory robots accordingly.
[0119] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0120] Step 1:
[0121] The user speaks instructions using a voice input device. The input here is the user's voice. The microphone in the voice input device captures this voice information and converts it into a digital format. The output is digital voice data.
[0122] Step 2:
[0123] The terminal transmits digital audio data to the server via the internet. At this point, the input is digital audio data, and the output is the audio data received by the server. The server processes the data to prevent any loss.
[0124] Step 3:
[0125] The server uses the Google Cloud Speech-to-Text API to convert speech information into text. In this step, the input is digital speech data received by the server, and the output is text information (text data). The server performs acoustic modeling and phoneme recognition to accurately transcribe speech into text.
[0126] Step 4:
[0127] The server uses the Google Cloud Natural Language API to analyze textual information and extract procedure flows. The input is converted textual information, and the output is the analyzed procedure flow. Specifically, the server performs grammatical and semantic analysis to extract business procedures and conditions.
[0128] Step 5:
[0129] The server automatically generates a procedure flow diagram based on the procedure flow. The input is the analyzed procedure flow, and the output is a visually represented procedure flow diagram. The server uses a diagram generation algorithm to construct an intuitive flow diagram.
[0130] Step 6:
[0131] The user reviews the flowchart and suggests modifications using additional voice input. The flowchart is displayed on the device, and the user reviews and makes decisions based on it. The input is user feedback, and the output is the updated procedure flowchart. The device then sends the new voice data back to the server.
[0132] Step 7:
[0133] The server adjusts the factory robot appropriately using robot control means based on the final procedure flow diagram. The input is the completed procedure flow diagram, and the output is the adjusted robot's operation. The server updates the robot's operating parameters and modifies the control program.
[0134] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0135] This invention combines a system that automatically generates business workflows from voice input with an emotion engine that analyzes user emotions. This system primarily provides concrete improvement measures to enhance business efficiency through user interaction. The embodiments for carrying out this invention are described in detail below.
[0136] The user verbally explains the business process into the terminal. The terminal captures the user's voice and records it as digital audio data. This audio data is sent to a server, which uses a speech recognition engine to convert the data into text.
[0137] The converted text is analyzed using natural language processing techniques through an analysis tool to identify elements of the business flow. The server also uses an emotion engine to analyze the user's emotional state from the audio data. This emotional information is then used to generate business flow diagrams and suggest solutions.
[0138] Through the generation mechanism, the server automatically generates a business process flow diagram. The generated diagram is visually presented to the user via their terminal. Based on the analysis results of the emotion engine, the user interface and presentation method may be adjusted. For example, if the user's emotion is determined to be negative, the system will be configured to present the information provided in a more flexible manner.
[0139] The user can review the presented business process flow chart and provide additional information or modifications verbally. The terminal resends this information to the server, which updates the flow chart. The server also generates suggested issues and solutions based on the business process flow chart and presents them to the user, taking sentiment into consideration.
[0140] As an example of how emotion analysis can be useful in guiding behavior, if a user verbally expresses that they "feel stressed during project progress," the emotion engine will understand their stress level and suggest specific improvement measures such as "distributing tasks" or "reducing the frequency of progress checks." In this way, the present invention effectively utilizes the emotion engine to provide support tailored to the user's state, thereby promoting business improvement.
[0141] The following describes the processing flow.
[0142] Step 1:
[0143] The user verbally explains the business process to the terminal. The terminal uses its microphone to capture the user's voice in real time and record it as digital audio data.
[0144] Step 2:
[0145] The terminal transmits the acquired audio data to the server via the network. The server feeds the received audio data into a speech recognition engine and converts the data into text. This process also includes preprocessing to remove audio noise and improve recognition accuracy.
[0146] Step 3:
[0147] The server analyzes text data using natural language processing techniques to identify the elements that make up the business flow (steps, conditions, branches, etc.). At the same time, the server uses an emotion engine to analyze the user's emotional state from voice data and saves the results.
[0148] Step 4:
[0149] The server automatically generates a business process flow diagram using a generative AI model based on the analyzed text and sentiment information. The business process flow diagram is presented in a flowchart format, visually representing the process flow with nodes and edges.
[0150] Step 5:
[0151] The server sends the generated business process diagram to the terminal, which then presents it to the user. The user can review the diagram and verbally provide any missing information or corrections. The format and method of information presentation are dynamically adjusted according to the user's emotional state.
[0152] Step 6:
[0153] The terminal receives additional audio explanations from the user and sends them back to the server. The server understands this new information and updates the business process diagram.
[0154] Step 7:
[0155] The server analyzes the final workflow in detail and automatically generates expected challenges and solutions based on them. It incorporates the results of the emotion engine into the suggestions, presenting to the user what problems might exist and how improvements can be made.
[0156] Step 8:
[0157] The server compiles business process diagrams, challenges, and solutions, and automatically generates slides and videos as presentation materials that take into account the results of sentiment analysis. The terminal distributes these materials to the user, supporting them in taking concrete actions to improve their work processes.
[0158] (Example 2)
[0159] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0160] In today's business environment, there is a demand for both streamlined business processes and flexible problem-solving solutions that take into account user emotions. While conventional systems could convert voice data to text and automatically generate business flows, they lacked the nuanced approach to reflect the emotional state of users. As a result, there is a problem in that proposed business improvement measures are not always optimal for the user.
[0161] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0162] In this invention, the server includes means for receiving voice information and acquiring voice data, voice recognition means for converting the voice data into encoded information, analysis means for analyzing the converted encoded information and extracting business procedures, and emotion analysis means for identifying the user's emotional state from the voice data. This makes it possible to appropriately identify elements of business procedures and present flexible problem-solving solutions that take the user's emotions into consideration.
[0163] "Voice information" refers to spoken language data entered by the user, which is later converted into encoded information during processing.
[0164] "Encoded information" refers to data obtained by converting audio information into a digital format, and is used for further analysis and identification of business procedures.
[0165] "Speech recognition means" refers to a technology or device that converts speech information into encoded information, and specifically, has the function of converting speech input into text format.
[0166] "Analysis means" refers to a technology or device that analyzes encoded information to identify elements of business procedures, and utilizes natural language processing technology.
[0167] A "business procedure diagram" is a diagram that visually represents the flow and structure of business procedures, and is used to help understand and improve those procedures.
[0168] "Emotional analysis means" refers to a technology or device that identifies a user's emotional information from voice data, and uses that information to infer the user's psychological state and reflect it in business improvements.
[0169] A "self-learning model" is an algorithm or system that learns from data and improves its accuracy on its own, and is used for creating business process diagrams.
[0170] "Flexible problem-solving" refers to a method or system that takes into account the user's emotional state and business processes, and proposes the most appropriate improvement measures according to the situation.
[0171] This system receives voice information and effectively visualizes and improves the user's work procedures. Its implementation is described below.
[0172] First, the user verbally explains the business process into the terminal. The terminal uses its internal microphone to capture this audio. The audio data is stored digitally and transmitted to the server over the network.
[0173] The server converts speech data into encoded information using a speech recognition engine. A common cloud-based speech recognition service is used for this purpose. The converted encoded information is then analyzed by the server to extract elements of the business procedure. Natural language processing techniques are used for this analysis. For example, common natural language processing libraries and cloud services are utilized.
[0174] Furthermore, the server utilizes sentiment analysis to identify the user's emotions from the voice data. This sentiment information plays a crucial role in the process of generating business workflows. Based on this sentiment information, the server generates a business procedure diagram. A self-learning model is used for this, and the generated diagram is presented to the user via the terminal.
[0175] As a concrete example of its use, if a user says they "feel stressed during project progress," sentiment analysis identifies that emotion and influences the business process diagram. As a result, the server can suggest flexible solutions such as "distributing tasks" or "reducing the frequency of progress checks."
[0176] Examples of prompts include specific instructions such as, "Please provide suggestions to alleviate the anxieties one might feel when starting a new project."
[0177] This system allows users to streamline work procedures through voice input and obtain optimal solutions tailored to their emotional state.
[0178] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0179] Step 1:
[0180] The user speaks about the business process into the terminal. The terminal uses its built-in microphone to acquire voice information. This input is in the form of the user's voice and is raw data before conversion. The terminal converts this data into a digital voice format and saves it.
[0181] Step 2:
[0182] The terminal sends the stored audio data to the server over the network. The server receives this audio data. The server feeds the received audio data into a speech recognition engine and converts it into text data, which is encoded information. This conversion makes the audio signal into a format that is easily processed as digital text.
[0183] Step 3:
[0184] The server uses natural language processing technology to analyze the converted text data. This analysis extracts elements of business procedures from the input text. The analysis involves data processing to analyze keywords and sentence structure. The output identifies information related to the business procedures.
[0185] Step 4:
[0186] The server uses an emotion analysis engine to identify the user's emotional state from the audio data. The input is the unprocessed audio data, and emotion analysis is performed based on this to evaluate the emotional tone and psychological characteristics. The output is data indicating the user's emotional state.
[0187] Step 5:
[0188] The server automatically generates a business process diagram using a generation mechanism. Here, information on procedural elements and the user's emotional state are used as input, and a self-learning model constructs the business process diagram. The output is a visually represented business process diagram in a format that the user can review.
[0189] Step 6:
[0190] The server sends the generated business procedure diagram to the terminal. The terminal visually presents this diagram to the user. The diagram is displayed through the user interface, allowing the user to visualize and understand the flow and structure of the business procedure.
[0191] Step 7:
[0192] The user can verbally provide additional information or corrections based on the presented business process diagram. The terminal then sends this new verbal information back to the server, which updates the business process diagram. The input is the user's additional verbal information, and the output is the updated business process diagram.
[0193] Step 8:
[0194] The server identifies issues based on the latest business process diagrams and emotional states, and generates improvement measures as needed. In this step, a generative AI model is used to output improvement suggestions that are useful to the user. An example of a specific prompt might be, "Please provide new ideas to improve the current business flow."
[0195] (Application Example 2)
[0196] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0197] This invention provides a system that dynamically optimizes work processes while considering user emotions when improving work efficiency through voice input. Conventional technologies often manage tasks uniformly without considering the user's emotional state, limiting the improvement of the user experience. This invention solves these problems and provides a means to achieve improved work efficiency and user experience.
[0198] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0199] In this invention, the server includes means for receiving voice input and acquiring voice information, voice recognition means for converting the voice information into text information, analysis means for analyzing the converted text information and extracting business processes, and emotion analysis means for analyzing emotional states. This makes it possible to automatically generate business flows and suggest improvement measures according to the user's emotions.
[0200] "Voice input" refers to the process where a machine receives audio as a digital signal.
[0201] "Voice information" refers to digital voice data acquired through voice input.
[0202] "Textual information" refers to data in which audio information is represented as text.
[0203] "Speech recognition means" refers to technologies and devices for converting speech information into text information.
[0204] "Analysis means" refers to technologies and devices that analyze textual information and extract business processes.
[0205] A "business process" refers to a series of procedures or steps necessary to perform a specific task.
[0206] "Generation means" refers to technologies or devices that automatically generate business process diagrams based on analysis results.
[0207] A "business process diagram" is a diagram that visually represents a business process.
[0208] A "problem" is an event or obstacle that needs to be resolved in order to carry out a task.
[0209] A "solution" is a method or means proposed to solve a problem.
[0210] "Information resources" refer to documents and data that are automatically generated based on business process diagrams and proposed solutions.
[0211] "Emotional state" refers to information that represents the user's mental and psychological condition.
[0212] "Emotional analysis means" refers to technologies and devices that analyze voice and text information to identify the user's emotional state.
[0213] "Support" refers to assistance or assistance to streamline or improve the user's work.
[0214] The system implementing this invention automatically generates work processes based on voice input and provides work support that takes into account the user's emotional state. This system is realized by a series of programs implemented in a consumer robot. When a user verbally explains a work process to the robot, the voice input is acquired as digital voice information through the microphone built into the robot.
[0215] Next, the audio information is converted into text information using the Google Speech-to-Text API. The resulting text information is then analyzed using natural language processing libraries such as Transformers to extract elements of the business process. During this analysis, sentiment analysis is used to identify the user's emotional state from the audio information. Based on the analysis results and sentiment information, a business process diagram is automatically generated using machine learning models and tools such as PyFlow.
[0216] The resulting work process diagram is presented to the user via the robot's display and voice. Furthermore, based on the results of the emotion analysis, suggestions for overall task rearrangement and improvements to reminders are provided. This aims to provide alternative measures to alleviate the emotional state of users who express concerns about being overwhelmed with work.
[0217] For example, if a user tells the robot, "I'm stressed because I have a lot of meetings this week," the robot will analyze this and suggest rescheduling the meetings or adjust reminders. Another possible prompt is: "When the user says, 'Today is busy and difficult,' analyze their emotions and generate an appropriate workflow."
[0218] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0219] Step 1:
[0220] The user speaks to the robot and explains the business process. The terminal acquires the user's voice as digital audio information. The input is the user's voice, and the output is digital audio data. The audio signal is captured through the microphone, and the data is prepared to be sent to the next step.
[0221] Step 2:
[0222] The server uses the Google Speech-to-Text API to convert digital audio information into text. The input is digital audio data, and the output is text information in string format. The server sends the audio data to the cloud API and retrieves the returned text information.
[0223] Step 3:
[0224] The server utilizes the Transformers library to analyze the obtained text information and extract elements of the business process. The input is text information, and the output is a list of extracted business process elements. Natural language processing is performed to process the data in order to identify the business elements.
[0225] Step 4:
[0226] The server uses emotion analysis tools to identify the user's emotional state from audio information. Input is textual information and associated audio features, while output is data related to the emotional state. The server analyzes the intonation and speed of the speech to quantify the emotion.
[0227] Step 5:
[0228] The server uses a machine learning model to generate a business process diagram based on extracted business process elements and sentiment information. The input is a list of business elements and sentiment state data, and the output is a business process diagram. The data is then fed into a tool such as PyFlow to create a visual business process diagram.
[0229] Step 6:
[0230] The terminal presents the generated business process diagram to the user via display and audio, and proposes improvement plans based on the results of sentiment analysis. Input consists of data for the business process diagram and improvement plans, while output is visual or audio feedback to the user. The diagram is displayed on the screen, and the suggestions are communicated via the voice speaker.
[0231] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0232] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0233] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0234] [Second Embodiment]
[0235] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0236] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0237] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0238] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0239] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0240] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0241] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0242] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0243] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0244] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0245] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0246] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0247] This invention provides a system for automatically generating business workflows using voice input, and describes its embodiments. The system primarily operates through the interaction of a terminal that acquires voice input, a server that processes the voice data, and the user.
[0248] First, the user verbally explains the work procedures and processes using a terminal. The terminal captures the user's voice through its microphone and records it as digital audio data. This digital audio data is then transmitted to a server via the internet.
[0249] The server uses a speech recognition engine to convert this audio data into text. The converted text is then analyzed using natural language processing techniques to identify the steps and conditions that make up the business process. This analysis extracts the elements of the business process, forming the basis for the business process diagram.
[0250] The generated business process flow diagram is presented to the user via a terminal from the server. The user reviews the diagram and provides additional information via voice by pointing out any omissions or errors as needed. The terminal sends this additional information back to the server, which then corrects and completes the business process flow diagram.
[0251] Furthermore, the server analyzes potential challenges and solutions from the business workflow and uses a generated AI model to provide concrete suggestions. This information is presented to the user via the terminal, and feedback is received.
[0252] Ultimately, the server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. This allows users to easily visually confirm the results of business process improvements and share them with stakeholders.
[0253] As part of this system, an automated analysis and generation process using artificial intelligence technology is included, enabling local governments and private companies to quickly digitize and streamline their operations. For example, if a user explains the new customer contract process by voice, the system can generate a flowchart of the workflow, including "customer information input," "contract creation and confirmation," and "approval process."
[0254] The following describes the processing flow.
[0255] Step 1:
[0256] The user uses a device to verbally explain the business process. The device captures the audio in real time via its microphone and saves it as digital audio data.
[0257] Step 2:
[0258] The terminal transmits the acquired audio data to the server via the network. The server activates a speech recognition engine and converts the audio data into text. Noise reduction and formatting of the audio data are also performed at this stage.
[0259] Step 3:
[0260] The server analyzes the converted text and uses natural language processing techniques to identify elements of the business flow. Specifically, it identifies business steps and branching conditions through the extraction of noun phrases and the analysis of conditional statements.
[0261] Step 4:
[0262] Based on the analysis results, the server creates a business process flow diagram using a generated AI model. It generates a visual representation of the business process in a flowchart format, consisting of nodes and edges.
[0263] Step 5:
[0264] The server generates a business process flow diagram, which is then sent to the user's terminal for presentation. The user reviews the diagram and verbally points out any necessary corrections or missing information.
[0265] Step 6:
[0266] The terminal collects additional audio from the user again and sends it to the server. The server updates the business flow diagram based on the additional information and generates an accurate process diagram.
[0267] Step 7:
[0268] The server automatically generates anticipated issues and solutions based on the final workflow. This information is presented to the user via the terminal, and further feedback is received.
[0269] Step 8:
[0270] The server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. Terminals distribute these documents to users, enabling them to leverage the results of business process improvements.
[0271] (Example 1)
[0272] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0273] In today's business environment, there is a demand for increased efficiency and automation of business processes. However, traditional methods make it difficult to quickly and accurately grasp business procedures and easily modify automatically generated workflows. Furthermore, it is challenging to instantly provide proposals that effectively solve business problems. Additionally, there is a need to quickly create and share proposed solutions as visual materials. To address these issues, an innovative system utilizing voice input is necessary.
[0274] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0275] In this invention, the server includes means for receiving voice input and acquiring voice data, means for converting the voice data into text, means for analyzing the converted text and extracting business procedures, means for automatically generating business procedure diagrams and modifying the flowcharts based on additional information received from the user via voice, and means for presenting business problems and solutions and creating visual materials based thereon. This enables efficient automation of business processes based on voice input, immediate presentation of problem solutions, corrective actions, and rapid generation and sharing of visual materials.
[0276] "Voice input" is the process of using voice to transmit user instructions and information to a computer system.
[0277] "Voice data" refers to data obtained by recording and storing signals obtained from voice input in digital format.
[0278] "Voice recognition means" refers to a technology or device that analyzes voice data and extracts its content as text.
[0279] "Analysis means" is a process that analyzes the received text data and extracts business procedures and conditions.
[0280] "Business procedure" refers to a set of steps and conditions necessary to complete a specific task.
[0281] "Business procedure diagram" is a visual representation of business procedures, illustrating the flow and relationships.
[0282] "Construction means" is a process of creating a business procedure diagram based on the analyzed information.
[0283] "Proposal means" is a method or technology that presents problems and solutions to the user based on the business procedure diagram.
[0284] "Generation means" is a technology or process of creating materials for reports and presentations based on the proposed content.
[0285] "Visual materials" refer to materials that express information in a visual form such as graphs, slides, videos, etc.
[0286] This invention provides a system for automatically generating a business flow by voice input and efficiently solving business problems. The embodiments thereof will be specifically described below.
[0287] Users use a terminal to verbally explain the procedures and processes of the workflow. The terminal is equipped with a microphone that captures the user's voice as digital audio data and transmits it to a server via the internet. Specific hardware and software may utilize a standard smartphone, computer, or cloud-based data transmission capabilities.
[0288] The server converts the received audio data into text using a speech recognition engine. Speech recognition utilizes technologies provided by speech recognition APIs or cloud services. Next, the server analyzes the text data using natural language processing (NLP) techniques to extract the steps and conditions of the business procedure. This analysis employs libraries and software that perform semantic analysis of the text data.
[0289] Based on the analysis results, the server automatically generates a business procedure diagram. The automatically generated diagram is sent to the terminal, where the user verifies its accuracy. If there are any omissions or errors in the diagram, the user can provide additional instructions via voice. The terminal resends this additional voice data to the server, which then corrects and completes the business procedure diagram.
[0290] Furthermore, the server analyzes anticipated problems based on the generated business procedure diagrams and proposes specific solutions using a generated AI model. This proposal is performed on the server using ChatGPT or similar artificial intelligence models. The proposed solutions are sent to the terminal, where the user provides feedback.
[0291] Ultimately, the server automatically generates the finalized business process diagrams and solutions as visual materials, such as slides or videos. These visual materials are then used for sharing among stakeholders and for presentations.
[0292] For example, if a user describes the new customer contract process via voice, this system can generate a business process diagram that includes steps such as "entering customer information," "creating and reviewing the contract," and "approval process." Furthermore, by inputting a prompt such as, "How can this contract process be made more efficient?", the system can obtain suggestions for improvement.
[0293] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0294] Step 1:
[0295] The user verbally explains the steps and processes of the workflow. The user's voice is captured as input and converted into digital audio data via the device's microphone. This audio data is then sent to the server.
[0296] Step 2:
[0297] The server passes the received digital audio data to the speech recognition engine, which converts the audio data into text. This process uses speech recognition technology that analyzes the digital audio and maps its waveform to a corresponding string of characters. Text data is generated as output.
[0298] Step 3:
[0299] The server inputs text data generated by speech recognition into a natural language processing (NLP) engine to analyze the steps and conditions of the business procedure. Here, the grammar and semantics of the document are analyzed to identify the elements that make up the business procedure. The output is structural information of the analyzed business procedure.
[0300] Step 4:
[0301] The server automatically generates a business procedure diagram based on the analyzed structural information of the business procedure. In this generation process, an algorithm for drawing the business flow in a visually understandable flowchart format is used. The output is a flowchart showing the business procedure.
[0302] Step 5:
[0303] The user checks the business procedure diagram presented on the terminal. If there are deficiencies or errors in the diagram, the user inputs additional instructions verbally and transmits them to the terminal as supplementary information.
[0304] Step 6:
[0305] The server performs speech recognition again on the additional voice data transmitted from the terminal and converts it into text. Subsequently, it conducts analysis again using NLP to identify the points for modification to be reflected in the business procedure diagram. The output is a modified business procedure diagram.
[0306] Step 7:
[0307] The server analyzes potential business problems based on the finalized business procedure diagram and generates specific solutions using a generation AI model. In this process, a prompt is input into the AI model to perform operations for generating relevant problem solutions. The output is the proposed solutions.
[0308] Step 8:
[0309] The server automatically generates visual materials using the final business procedure diagram and the proposed solutions. Here, a material generation algorithm for presenting information clearly in the form of slides or videos is applied. The output is visual materials.
[0310] (Application Example 1)
[0311] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0312] In modern production facilities, frequent changes in product variations and manufacturing processes can reduce productivity. In particular, changes to production lines involving complex procedures increase the burden on managers, increase the likelihood of errors, and significantly impair efficiency. To address these challenges, there is a need for a system that allows for easy setting and updating of production procedures using voice input.
[0313] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0314] In this invention, the server includes means for receiving voice information and acquiring data, voice recognition means for converting the data into text information, and analysis means for analyzing the converted text information and extracting a procedure flow. This enables administrators to quickly and accurately set and update production line procedures via voice input.
[0315] "Voice information" refers to data recorded digitally from the user's speech.
[0316] "Textual information" refers to data converted from audio information into text format by speech recognition technology.
[0317] A "procedure flow chart" is a model that represents a series of steps or processes that indicate the steps involved in a business operation.
[0318] "Analysis means" refers to a system component that has the function of processing textual information to identify and extract a procedural flow.
[0319] A "generation means" is a component of a system that has the function of automatically creating a visually understandable flowchart based on the analyzed procedure flow.
[0320] "Operational challenges" refer to factors or problems that hinder efficiency or quality in business or production processes.
[0321] "Countermeasures" refer to specific solutions or methods proposed to address identified operational challenges.
[0322] "Means for automatically generating materials" refers to a system component that has the function of generating presentation materials based on the generated procedure flowchart and countermeasures.
[0323] "Robot control means" refers to a system component that has the function of automatically updating the robot's movements and operating procedures based on a procedure flow diagram obtained through a generation means.
[0324] To implement this invention, a system is constructed that enables the efficient setup and updating of production lines within a factory. First, inputting voice information requires a voice input device with a microphone for recording the speaker's voice. The voice information is input using this device and converted into a digital format. Next, the voice information is transmitted to a server via the internet, and the server converts the voice into text information using the Google Cloud Speech-to-Text API.
[0325] The server analyzes the converted text information and extracts the procedure flow. The Google Cloud Natural Language API is used for analysis, analyzing the text data to identify the business steps. The identified steps are automatically generated as a procedure flow diagram and visually presented through the user interface.
[0326] Users can make modifications to the flowchart as needed through voice input. Finally, the server updates the operation procedures of the factory robots using robot control means based on the generated procedure flowchart. This makes it easy for users to achieve efficient operation of the production line.
[0327] As a concrete example, consider a case where a user changes the production process for a product. The user gives a voice command saying, "For the new product, please add two repetitions to the processing step." Based on this prompt, the system automatically generates a new procedure flow and adjusts the factory robots accordingly.
[0328] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0329] Step 1:
[0330] The user speaks instructions using a voice input device. The input here is the user's voice. The microphone in the voice input device captures this voice information and converts it into a digital format. The output is digital voice data.
[0331] Step 2:
[0332] The terminal transmits digital audio data to the server via the internet. At this point, the input is digital audio data, and the output is the audio data received by the server. The server processes the data to prevent any loss.
[0333] Step 3:
[0334] The server uses the Google Cloud Speech-to-Text API to convert speech information into text. In this step, the input is digital speech data received by the server, and the output is text information (text data). The server performs acoustic modeling and phoneme recognition to accurately transcribe speech into text.
[0335] Step 4:
[0336] The server uses the Google Cloud Natural Language API to analyze textual information and extract procedure flows. The input is converted textual information, and the output is the analyzed procedure flow. Specifically, the server performs grammatical and semantic analysis to extract business procedures and conditions.
[0337] Step 5:
[0338] The server automatically generates a procedure flow diagram based on the procedure flow. The input is the analyzed procedure flow, and the output is a visually represented procedure flow diagram. The server uses a diagram generation algorithm to construct an intuitive flow diagram.
[0339] Step 6:
[0340] The user reviews the flowchart and suggests modifications using additional voice input. The flowchart is displayed on the device, and the user reviews and makes decisions based on it. The input is user feedback, and the output is the updated procedure flowchart. The device then sends the new voice data back to the server.
[0341] Step 7:
[0342] The server adjusts the factory robot appropriately using robot control means based on the final procedure flow diagram. The input is the completed procedure flow diagram, and the output is the adjusted robot's operation. The server updates the robot's operating parameters and modifies the control program.
[0343] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0344] This invention combines a system that automatically generates business workflows from voice input with an emotion engine that analyzes user emotions. This system primarily provides concrete improvement measures to enhance business efficiency through user interaction. The embodiments for carrying out this invention are described in detail below.
[0345] The user verbally explains the business process into the terminal. The terminal captures the user's voice and records it as digital audio data. This audio data is sent to a server, which uses a speech recognition engine to convert the data into text.
[0346] The converted text is analyzed using natural language processing techniques through an analysis tool to identify elements of the business flow. The server also uses an emotion engine to analyze the user's emotional state from the audio data. This emotional information is then used to generate business flow diagrams and suggest solutions.
[0347] Through the generation mechanism, the server automatically generates a business process flow diagram. The generated diagram is visually presented to the user via their terminal. Based on the analysis results of the emotion engine, the user interface and presentation method may be adjusted. For example, if the user's emotion is determined to be negative, the system will be configured to present the information provided in a more flexible manner.
[0348] The user can review the presented business process flow chart and provide additional information or modifications verbally. The terminal resends this information to the server, which updates the flow chart. The server also generates suggested issues and solutions based on the business process flow chart and presents them to the user, taking sentiment into consideration.
[0349] As an example of how emotion analysis can be useful in guiding behavior, if a user verbally expresses that they "feel stressed during project progress," the emotion engine will understand their stress level and suggest specific improvement measures such as "distributing tasks" or "reducing the frequency of progress checks." In this way, the present invention effectively utilizes the emotion engine to provide support tailored to the user's state, thereby promoting business improvement.
[0350] The following describes the processing flow.
[0351] Step 1:
[0352] The user verbally explains the business process to the terminal. The terminal uses its microphone to capture the user's voice in real time and record it as digital audio data.
[0353] Step 2:
[0354] The terminal transmits the acquired audio data to the server via the network. The server feeds the received audio data into a speech recognition engine and converts the data into text. This process also includes preprocessing to remove audio noise and improve recognition accuracy.
[0355] Step 3:
[0356] The server analyzes text data using natural language processing techniques to identify the elements that make up the business flow (steps, conditions, branches, etc.). At the same time, the server uses an emotion engine to analyze the user's emotional state from voice data and saves the results.
[0357] Step 4:
[0358] The server automatically generates a business process flow diagram using a generative AI model based on the analyzed text and sentiment information. The business process flow diagram is presented in a flowchart format, visually representing the process flow with nodes and edges.
[0359] Step 5:
[0360] The server sends the generated business process diagram to the terminal, which then presents it to the user. The user can review the diagram and verbally provide any missing information or corrections. The format and method of information presentation are dynamically adjusted according to the user's emotional state.
[0361] Step 6:
[0362] The terminal receives additional audio explanations from the user and sends them back to the server. The server understands this new information and updates the business process diagram.
[0363] Step 7:
[0364] The server analyzes the final workflow in detail and automatically generates expected challenges and solutions based on them. It incorporates the results of the emotion engine into the suggestions, presenting to the user what problems might exist and how improvements can be made.
[0365] Step 8:
[0366] The server compiles business process diagrams, challenges, and solutions, and automatically generates slides and videos as presentation materials that take into account the results of sentiment analysis. The terminal distributes these materials to the user, supporting them in taking concrete actions to improve their work processes.
[0367] (Example 2)
[0368] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0369] In today's business environment, there is a demand for both streamlined business processes and flexible problem-solving solutions that take into account user emotions. While conventional systems could convert voice data to text and automatically generate business flows, they lacked the nuanced approach to reflect the emotional state of users. As a result, there is a problem in that proposed business improvement measures are not always optimal for the user.
[0370] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0371] In this invention, the server includes means for receiving voice information and acquiring voice data, voice recognition means for converting the voice data into encoded information, analysis means for analyzing the converted encoded information and extracting business procedures, and emotion analysis means for identifying the user's emotional state from the voice data. This makes it possible to appropriately identify elements of business procedures and present flexible problem-solving solutions that take the user's emotions into consideration.
[0372] "Voice information" refers to spoken language data entered by the user, which is later converted into encoded information during processing.
[0373] "Encoded information" refers to data obtained by converting audio information into a digital format, and is used for further analysis and identification of business procedures.
[0374] "Speech recognition means" refers to a technology or device that converts speech information into encoded information, and specifically, has the function of converting speech input into text format.
[0375] "Analysis means" refers to a technology or device that analyzes encoded information to identify elements of business procedures, and utilizes natural language processing technology.
[0376] A "business procedure diagram" is a diagram that visually represents the flow and structure of business procedures, and is used to help understand and improve those procedures.
[0377] "Emotional analysis means" refers to a technology or device that identifies a user's emotional information from voice data, and uses that information to infer the user's psychological state and reflect it in business improvements.
[0378] A "self-learning model" is an algorithm or system that learns from data and improves its accuracy on its own, and is used for creating business process diagrams.
[0379] "Flexible problem-solving" refers to a method or system that takes into account the user's emotional state and business processes, and proposes the most appropriate improvement measures according to the situation.
[0380] This system receives voice information and effectively visualizes and improves the user's work procedures. Its implementation is described below.
[0381] First, the user verbally explains the business process into the terminal. The terminal uses its internal microphone to capture this audio. The audio data is stored digitally and transmitted to the server over the network.
[0382] The server converts speech data into encoded information using a speech recognition engine. A common cloud-based speech recognition service is used for this purpose. The converted encoded information is then analyzed by the server to extract elements of the business procedure. Natural language processing techniques are used for this analysis. For example, common natural language processing libraries and cloud services are utilized.
[0383] Furthermore, the server utilizes sentiment analysis to identify the user's emotions from the voice data. This sentiment information plays a crucial role in the process of generating business workflows. Based on this sentiment information, the server generates a business procedure diagram. A self-learning model is used for this, and the generated diagram is presented to the user via the terminal.
[0384] As a concrete example of its use, if a user says they "feel stressed during project progress," sentiment analysis identifies that emotion and influences the business process diagram. As a result, the server can suggest flexible solutions such as "distributing tasks" or "reducing the frequency of progress checks."
[0385] Examples of prompts include specific instructions such as, "Please provide suggestions to alleviate the anxieties one might feel when starting a new project."
[0386] This system allows users to streamline work procedures through voice input and obtain optimal solutions tailored to their emotional state.
[0387] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0388] Step 1:
[0389] The user speaks about the business process into the terminal. The terminal uses its built-in microphone to acquire voice information. This input is in the form of the user's voice and is raw data before conversion. The terminal converts this data into a digital voice format and saves it.
[0390] Step 2:
[0391] The terminal sends the stored audio data to the server over the network. The server receives this audio data. The server feeds the received audio data into a speech recognition engine and converts it into text data, which is encoded information. This conversion makes the audio signal into a format that is easily processed as digital text.
[0392] Step 3:
[0393] The server uses natural language processing technology to analyze the converted text data. This analysis extracts elements of business procedures from the input text. The analysis involves data processing to analyze keywords and sentence structure. The output identifies information related to the business procedures.
[0394] Step 4:
[0395] The server uses an emotion analysis engine to identify the user's emotional state from the audio data. The input is the unprocessed audio data, and emotion analysis is performed based on this to evaluate the emotional tone and psychological characteristics. The output is data indicating the user's emotional state.
[0396] Step 5:
[0397] The server automatically generates a business process diagram using a generation mechanism. Here, information on procedural elements and the user's emotional state are used as input, and a self-learning model constructs the business process diagram. The output is a visually represented business process diagram in a format that the user can review.
[0398] Step 6:
[0399] The server sends the generated business procedure diagram to the terminal. The terminal visually presents this diagram to the user. The diagram is displayed through the user interface, allowing the user to visualize and understand the flow and structure of the business procedure.
[0400] Step 7:
[0401] The user can verbally provide additional information or corrections based on the presented business process diagram. The terminal then sends this new verbal information back to the server, which updates the business process diagram. The input is the user's additional verbal information, and the output is the updated business process diagram.
[0402] Step 8:
[0403] The server identifies issues based on the latest business process diagrams and emotional states, and generates improvement measures as needed. In this step, a generative AI model is used to output improvement suggestions that are useful to the user. An example of a specific prompt might be, "Please provide new ideas to improve the current business flow."
[0404] (Application Example 2)
[0405] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0406] This invention provides a system that dynamically optimizes work processes while considering user emotions when improving work efficiency through voice input. Conventional technologies often manage tasks uniformly without considering the user's emotional state, limiting the improvement of the user experience. This invention solves these problems and provides a means to achieve improved work efficiency and user experience.
[0407] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0408] In this invention, the server includes means for receiving voice input and acquiring voice information, voice recognition means for converting the voice information into text information, analysis means for analyzing the converted text information and extracting business processes, and emotion analysis means for analyzing emotional states. This makes it possible to automatically generate business flows and suggest improvement measures according to the user's emotions.
[0409] "Voice input" refers to the process where a machine receives audio as a digital signal.
[0410] "Voice information" refers to digital voice data acquired through voice input.
[0411] "Textual information" refers to data in which audio information is represented as text.
[0412] "Speech recognition means" refers to technologies and devices for converting speech information into text information.
[0413] "Analysis means" refers to technologies and devices that analyze textual information and extract business processes.
[0414] A "business process" refers to a series of procedures or steps necessary to perform a specific task.
[0415] "Generation means" refers to technologies or devices that automatically generate business process diagrams based on analysis results.
[0416] A "business process diagram" is a diagram that visually represents a business process.
[0417] A "problem" is an event or obstacle that needs to be resolved in order to carry out a task.
[0418] A "solution" is a method or means proposed to solve a problem.
[0419] "Information resources" refer to documents and data that are automatically generated based on business process diagrams and proposed solutions.
[0420] "Emotional state" refers to information that represents the user's mental and psychological condition.
[0421] "Emotional analysis means" refers to technologies and devices that analyze voice and text information to identify the user's emotional state.
[0422] "Support" refers to assistance or assistance to streamline or improve the user's work.
[0423] The system implementing this invention automatically generates work processes based on voice input and provides work support that takes into account the user's emotional state. This system is realized by a series of programs implemented in a consumer robot. When a user verbally explains a work process to the robot, the voice input is acquired as digital voice information through the microphone built into the robot.
[0424] Next, the audio information is converted into text information using the Google Speech-to-Text API. The resulting text information is then analyzed using natural language processing libraries such as Transformers to extract elements of the business process. During this analysis, sentiment analysis is used to identify the user's emotional state from the audio information. Based on the analysis results and sentiment information, a business process diagram is automatically generated using machine learning models and tools such as PyFlow.
[0425] The resulting work process diagram is presented to the user via the robot's display and voice. Furthermore, based on the results of the emotion analysis, suggestions for overall task rearrangement and improvements to reminders are provided. This aims to provide alternative measures to alleviate the emotional state of users who express concerns about being overwhelmed with work.
[0426] For example, if a user tells the robot, "I'm stressed because I have a lot of meetings this week," the robot will analyze this and suggest rescheduling the meetings or adjust reminders. Another possible prompt is: "When the user says, 'Today is busy and difficult,' analyze their emotions and generate an appropriate workflow."
[0427] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0428] Step 1:
[0429] The user speaks to the robot and explains the business process. The terminal acquires the user's voice as digital audio information. The input is the user's voice, and the output is digital audio data. The audio signal is captured through the microphone, and the data is prepared to be sent to the next step.
[0430] Step 2:
[0431] The server uses the Google Speech-to-Text API to convert digital audio information into text. The input is digital audio data, and the output is text information in string format. The server sends the audio data to the cloud API and retrieves the returned text information.
[0432] Step 3:
[0433] The server utilizes the Transformers library to analyze the obtained text information and extract elements of the business process. The input is text information, and the output is a list of extracted business process elements. Natural language processing is performed to process the data in order to identify the business elements.
[0434] Step 4:
[0435] The server uses emotion analysis tools to identify the user's emotional state from audio information. Input is textual information and associated audio features, while output is data related to the emotional state. The server analyzes the intonation and speed of the speech to quantify the emotion.
[0436] Step 5:
[0437] The server uses a machine learning model to generate a business process diagram based on extracted business process elements and sentiment information. The input is a list of business elements and sentiment state data, and the output is a business process diagram. The data is then fed into a tool such as PyFlow to create a visual business process diagram.
[0438] Step 6:
[0439] The terminal presents the generated business process diagram to the user via display and audio, and proposes improvement plans based on the results of sentiment analysis. Input consists of data for the business process diagram and improvement plans, while output is visual or audio feedback to the user. The diagram is displayed on the screen, and the suggestions are communicated via the voice speaker.
[0440] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0441] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0442] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0443] [Third Embodiment]
[0444] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0445] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0446] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0447] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0448] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0449] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0450] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0451] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0452] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0453] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0454] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0455] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0456] This invention provides a system for automatically generating business workflows using voice input, and describes its embodiments. The system primarily operates through the interaction of a terminal that acquires voice input, a server that processes the voice data, and the user.
[0457] First, the user verbally explains the work procedures and processes using a terminal. The terminal captures the user's voice through its microphone and records it as digital audio data. This digital audio data is then transmitted to a server via the internet.
[0458] The server uses a speech recognition engine to convert this audio data into text. The converted text is then analyzed using natural language processing techniques to identify the steps and conditions that make up the business process. This analysis extracts the elements of the business process, forming the basis for the business process diagram.
[0459] The generated business process flow diagram is presented to the user via a terminal from the server. The user reviews the diagram and provides additional information via voice by pointing out any omissions or errors as needed. The terminal sends this additional information back to the server, which then corrects and completes the business process flow diagram.
[0460] Furthermore, the server analyzes potential challenges and solutions from the business workflow and uses a generated AI model to provide concrete suggestions. This information is presented to the user via the terminal, and feedback is received.
[0461] Ultimately, the server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. This allows users to easily visually confirm the results of business process improvements and share them with stakeholders.
[0462] As part of this system, an automated analysis and generation process using artificial intelligence technology is included, enabling local governments and private companies to quickly digitize and streamline their operations. For example, if a user explains the new customer contract process by voice, the system can generate a flowchart of the workflow, including "customer information input," "contract creation and confirmation," and "approval process."
[0463] The following describes the processing flow.
[0464] Step 1:
[0465] The user uses a device to verbally explain the business process. The device captures the audio in real time via its microphone and saves it as digital audio data.
[0466] Step 2:
[0467] The terminal transmits the acquired audio data to the server via the network. The server activates a speech recognition engine and converts the audio data into text. Noise reduction and formatting of the audio data are also performed at this stage.
[0468] Step 3:
[0469] The server analyzes the converted text and uses natural language processing techniques to identify elements of the business flow. Specifically, it identifies business steps and branching conditions through the extraction of noun phrases and the analysis of conditional statements.
[0470] Step 4:
[0471] Based on the analysis results, the server creates a business process flow diagram using a generated AI model. It generates a visual representation of the business process in a flowchart format, consisting of nodes and edges.
[0472] Step 5:
[0473] The server generates a business process flow diagram, which is then sent to the user's terminal for presentation. The user reviews the diagram and verbally points out any necessary corrections or missing information.
[0474] Step 6:
[0475] The terminal collects additional audio from the user again and sends it to the server. The server updates the business flow diagram based on the additional information and generates an accurate process diagram.
[0476] Step 7:
[0477] The server automatically generates anticipated issues and solutions based on the final workflow. This information is presented to the user via the terminal, and further feedback is received.
[0478] Step 8:
[0479] The server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. Terminals distribute these documents to users, enabling them to leverage the results of business process improvements.
[0480] (Example 1)
[0481] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0482] In today's business environment, there is a demand for increased efficiency and automation of business processes. However, traditional methods make it difficult to quickly and accurately grasp business procedures and easily modify automatically generated workflows. Furthermore, it is challenging to instantly provide proposals that effectively solve business problems. Additionally, there is a need to quickly create and share proposed solutions as visual materials. To address these issues, an innovative system utilizing voice input is necessary.
[0483] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0484] In this invention, the server includes means for receiving voice input and acquiring voice data, means for converting the voice data into text, means for analyzing the converted text and extracting business procedures, means for automatically generating business procedure diagrams and modifying the flowcharts based on additional information received from the user via voice, and means for presenting business problems and solutions and creating visual materials based thereon. This enables efficient automation of business processes based on voice input, immediate presentation of problem solutions, corrective actions, and rapid generation and sharing of visual materials.
[0485] "Voice input" is the process of using voice to transmit user instructions and information to a computer system.
[0486] "Audio data" refers to data that is recorded and stored in digital format from signals obtained from audio input.
[0487] "Speech recognition means" refers to a technology or device that analyzes speech data and extracts its content as text.
[0488] "Analysis method" refers to the process of analyzing received text data and extracting business procedures and conditions.
[0489] A "business procedure" is a set of steps and conditions necessary to complete a specific task.
[0490] A "business procedure diagram" is a visual representation of business procedures, illustrating the flow and relationships between them.
[0491] "Construction method" refers to the process of creating a business procedure diagram based on the analyzed information.
[0492] "Proposal method" refers to a method or technique for presenting problems and solutions to users based on a business procedure diagram.
[0493] "Generative means" refers to the technology or process of creating materials for reports or presentations based on the proposed content.
[0494] "Visual materials" are information presented in visual formats such as graphs, slides, and videos.
[0495] This invention provides a system for automatically generating business workflows using voice input and efficiently solving business challenges. The following describes specific embodiments of this system.
[0496] Users use a terminal to verbally explain the procedures and processes of the workflow. The terminal is equipped with a microphone that captures the user's voice as digital audio data and transmits it to a server via the internet. Specific hardware and software may utilize a standard smartphone, computer, or cloud-based data transmission capabilities.
[0497] The server converts the received audio data into text using a speech recognition engine. Speech recognition utilizes technologies provided by speech recognition APIs or cloud services. Next, the server analyzes the text data using natural language processing (NLP) techniques to extract the steps and conditions of the business procedure. This analysis employs libraries and software that perform semantic analysis of the text data.
[0498] Based on the analysis results, the server automatically generates a business procedure diagram. The automatically generated diagram is sent to the terminal, where the user verifies its accuracy. If there are any omissions or errors in the diagram, the user can provide additional instructions via voice. The terminal resends this additional voice data to the server, which then corrects and completes the business procedure diagram.
[0499] Furthermore, the server analyzes anticipated problems based on the generated business procedure diagrams and proposes specific solutions using a generated AI model. This proposal is performed on the server using ChatGPT or similar artificial intelligence models. The proposed solutions are sent to the terminal, where the user provides feedback.
[0500] Ultimately, the server automatically generates the finalized business process diagrams and solutions as visual materials, such as slides or videos. These visual materials are then used for sharing among stakeholders and for presentations.
[0501] For example, if a user describes the new customer contract process via voice, this system can generate a business process diagram that includes steps such as "entering customer information," "creating and reviewing the contract," and "approval process." Furthermore, by inputting a prompt such as, "How can this contract process be made more efficient?", the system can obtain suggestions for improvement.
[0502] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0503] Step 1:
[0504] The user verbally explains the steps and processes of the workflow. The user's voice is captured as input and converted into digital audio data via the device's microphone. This audio data is then sent to the server.
[0505] Step 2:
[0506] The server passes the received digital audio data to the speech recognition engine, which converts the audio data into text. This process uses speech recognition technology that analyzes the digital audio and maps its waveform to a corresponding string of characters. Text data is generated as output.
[0507] Step 3:
[0508] The server inputs text data generated by speech recognition into a natural language processing (NLP) engine to analyze the steps and conditions of the business procedure. Here, the grammar and semantics of the document are analyzed to identify the elements that make up the business procedure. The output is structural information of the analyzed business procedure.
[0509] Step 4:
[0510] The server automatically generates a business procedure diagram based on the structural information of the analyzed business procedures. This generation process uses an algorithm that visualizes the business flow in an easy-to-understand flowchart format. The output is a flowchart showing the business procedures.
[0511] Step 5:
[0512] The user reviews the work procedure diagram displayed on the terminal. If there are any omissions or errors in the diagram, the user inputs additional instructions by voice and sends them to the terminal as supplementary information.
[0513] Step 6:
[0514] The server performs speech recognition again on the additional audio data sent from the terminal and converts it to text. Then, it performs another NLP analysis to identify the necessary corrections to be reflected in the business procedure diagram. The output is the revised business procedure diagram.
[0515] Step 7:
[0516] The server analyzes potential business problems based on the finalized business procedure diagram and generates specific solutions using a generative AI model. This process involves prompting the AI model and performing calculations to generate relevant problem-solving solutions. The output is the proposed solution.
[0517] Step 8:
[0518] The server automatically generates visual materials using the final business procedure diagram and proposed solutions. A material generation algorithm is applied to present the information clearly in slide and video formats. The output is visual material.
[0519] (Application Example 1)
[0520] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0521] In modern production facilities, frequent changes in product variations and manufacturing processes can reduce productivity. In particular, changes to production lines involving complex procedures increase the burden on managers, increase the likelihood of errors, and significantly impair efficiency. To address these challenges, there is a need for a system that allows for easy setting and updating of production procedures using voice input.
[0522] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0523] In this invention, the server includes means for receiving voice information and acquiring data, voice recognition means for converting the data into text information, and analysis means for analyzing the converted text information and extracting a procedure flow. This enables administrators to quickly and accurately set and update production line procedures via voice input.
[0524] "Voice information" refers to data recorded digitally from the user's speech.
[0525] "Textual information" refers to data converted from audio information into text format by speech recognition technology.
[0526] A "procedure flow chart" is a model that represents a series of steps or processes that indicate the steps involved in a business operation.
[0527] "Analysis means" refers to a system component that has the function of processing textual information to identify and extract a procedural flow.
[0528] A "generation means" is a component of a system that has the function of automatically creating a visually understandable flowchart based on the analyzed procedure flow.
[0529] "Operational challenges" refer to factors or problems that hinder efficiency or quality in business or production processes.
[0530] "Countermeasures" refer to specific solutions or methods proposed to address identified operational challenges.
[0531] "Means for automatically generating materials" refers to a system component that has the function of generating presentation materials based on the generated procedure flowchart and countermeasures.
[0532] "Robot control means" refers to a system component that has the function of automatically updating the robot's movements and operating procedures based on a procedure flow diagram obtained through a generation means.
[0533] To implement this invention, a system is constructed that enables the efficient setup and updating of production lines within a factory. First, inputting voice information requires a voice input device with a microphone for recording the speaker's voice. The voice information is input using this device and converted into a digital format. Next, the voice information is transmitted to a server via the internet, and the server converts the voice into text information using the Google Cloud Speech-to-Text API.
[0534] The server analyzes the converted text information and extracts the procedure flow. The Google Cloud Natural Language API is used for analysis, analyzing the text data to identify the business steps. The identified steps are automatically generated as a procedure flow diagram and visually presented through the user interface.
[0535] Users can make modifications to the flowchart as needed through voice input. Finally, the server updates the operation procedures of the factory robots using robot control means based on the generated procedure flowchart. This makes it easy for users to achieve efficient operation of the production line.
[0536] As a concrete example, consider a case where a user changes the production process for a product. The user gives a voice command saying, "For the new product, please add two repetitions to the processing step." Based on this prompt, the system automatically generates a new procedure flow and adjusts the factory robots accordingly.
[0537] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0538] Step 1:
[0539] The user speaks instructions using a voice input device. The input here is the user's voice. The microphone in the voice input device captures this voice information and converts it into a digital format. The output is digital voice data.
[0540] Step 2:
[0541] The terminal transmits digital audio data to the server via the internet. At this point, the input is digital audio data, and the output is the audio data received by the server. The server processes the data to prevent any loss.
[0542] Step 3:
[0543] The server uses the Google Cloud Speech-to-Text API to convert speech information into text. In this step, the input is digital speech data received by the server, and the output is text information (text data). The server performs acoustic modeling and phoneme recognition to accurately transcribe speech into text.
[0544] Step 4:
[0545] The server uses the Google Cloud Natural Language API to analyze textual information and extract procedure flows. The input is converted textual information, and the output is the analyzed procedure flow. Specifically, the server performs grammatical and semantic analysis to extract business procedures and conditions.
[0546] Step 5:
[0547] The server automatically generates a procedure flow diagram based on the procedure flow. The input is the analyzed procedure flow, and the output is a visually represented procedure flow diagram. The server uses a diagram generation algorithm to construct an intuitive flow diagram.
[0548] Step 6:
[0549] The user reviews the flowchart and suggests modifications using additional voice input. The flowchart is displayed on the device, and the user reviews and makes decisions based on it. The input is user feedback, and the output is the updated procedure flowchart. The device then sends the new voice data back to the server.
[0550] Step 7:
[0551] The server adjusts the factory robot appropriately using robot control means based on the final procedure flow diagram. The input is the completed procedure flow diagram, and the output is the adjusted robot's operation. The server updates the robot's operating parameters and modifies the control program.
[0552] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0553] This invention combines a system that automatically generates business workflows from voice input with an emotion engine that analyzes user emotions. This system primarily provides concrete improvement measures to enhance business efficiency through user interaction. The embodiments for carrying out this invention are described in detail below.
[0554] The user verbally explains the business process into the terminal. The terminal captures the user's voice and records it as digital audio data. This audio data is sent to a server, which uses a speech recognition engine to convert the data into text.
[0555] The converted text is analyzed using natural language processing techniques through an analysis tool to identify elements of the business flow. The server also uses an emotion engine to analyze the user's emotional state from the audio data. This emotional information is then used to generate business flow diagrams and suggest solutions.
[0556] Through the generation mechanism, the server automatically generates a business process flow diagram. The generated diagram is visually presented to the user via their terminal. Based on the analysis results of the emotion engine, the user interface and presentation method may be adjusted. For example, if the user's emotion is determined to be negative, the system will be configured to present the information provided in a more flexible manner.
[0557] The user can review the presented business process flow chart and provide additional information or modifications verbally. The terminal resends this information to the server, which updates the flow chart. The server also generates suggested issues and solutions based on the business process flow chart and presents them to the user, taking sentiment into consideration.
[0558] As an example of how emotion analysis can be useful in guiding behavior, if a user verbally expresses that they "feel stressed during project progress," the emotion engine will understand their stress level and suggest specific improvement measures such as "distributing tasks" or "reducing the frequency of progress checks." In this way, the present invention effectively utilizes the emotion engine to provide support tailored to the user's state, thereby promoting business improvement.
[0559] The following describes the processing flow.
[0560] Step 1:
[0561] The user verbally explains the business process to the terminal. The terminal uses its microphone to capture the user's voice in real time and record it as digital audio data.
[0562] Step 2:
[0563] The terminal transmits the acquired audio data to the server via the network. The server feeds the received audio data into a speech recognition engine and converts the data into text. This process also includes preprocessing to remove audio noise and improve recognition accuracy.
[0564] Step 3:
[0565] The server analyzes text data using natural language processing techniques to identify the elements that make up the business flow (steps, conditions, branches, etc.). At the same time, the server uses an emotion engine to analyze the user's emotional state from voice data and saves the results.
[0566] Step 4:
[0567] The server automatically generates a business process flow diagram using a generative AI model based on the analyzed text and sentiment information. The business process flow diagram is presented in a flowchart format, visually representing the process flow with nodes and edges.
[0568] Step 5:
[0569] The server sends the generated business process diagram to the terminal, which then presents it to the user. The user can review the diagram and verbally provide any missing information or corrections. The format and method of information presentation are dynamically adjusted according to the user's emotional state.
[0570] Step 6:
[0571] The terminal receives additional audio explanations from the user and sends them back to the server. The server understands this new information and updates the business process diagram.
[0572] Step 7:
[0573] The server analyzes the final workflow in detail and automatically generates expected challenges and solutions based on them. It incorporates the results of the emotion engine into the suggestions, presenting to the user what problems might exist and how improvements can be made.
[0574] Step 8:
[0575] The server compiles business process diagrams, challenges, and solutions, and automatically generates slides and videos as presentation materials that take into account the results of sentiment analysis. The terminal distributes these materials to the user, supporting them in taking concrete actions to improve their work processes.
[0576] (Example 2)
[0577] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0578] In today's business environment, there is a demand for both streamlined business processes and flexible problem-solving solutions that take into account user emotions. While conventional systems could convert voice data to text and automatically generate business flows, they lacked the nuanced approach to reflect the emotional state of users. As a result, there is a problem in that proposed business improvement measures are not always optimal for the user.
[0579] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0580] In this invention, the server includes means for receiving voice information and acquiring voice data, voice recognition means for converting the voice data into encoded information, analysis means for analyzing the converted encoded information and extracting business procedures, and emotion analysis means for identifying the user's emotional state from the voice data. This makes it possible to appropriately identify elements of business procedures and present flexible problem-solving solutions that take the user's emotions into consideration.
[0581] "Voice information" refers to spoken language data entered by the user, which is later converted into encoded information during processing.
[0582] "Encoded information" refers to data obtained by converting audio information into a digital format, and is used for further analysis and identification of business procedures.
[0583] "Speech recognition means" refers to a technology or device that converts speech information into encoded information, and specifically, has the function of converting speech input into text format.
[0584] "Analysis means" refers to a technology or device that analyzes encoded information to identify elements of business procedures, and utilizes natural language processing technology.
[0585] A "business procedure diagram" is a diagram that visually represents the flow and structure of business procedures, and is used to help understand and improve those procedures.
[0586] "Emotional analysis means" refers to a technology or device that identifies a user's emotional information from voice data, and uses that information to infer the user's psychological state and reflect it in business improvements.
[0587] A "self-learning model" is an algorithm or system that learns from data and improves its accuracy on its own, and is used for creating business process diagrams.
[0588] "Flexible problem-solving" refers to a method or system that takes into account the user's emotional state and business processes, and proposes the most appropriate improvement measures according to the situation.
[0589] This system receives voice information and effectively visualizes and improves the user's work procedures. Its implementation is described below.
[0590] First, the user verbally explains the business process into the terminal. The terminal uses its internal microphone to capture this audio. The audio data is stored digitally and transmitted to the server over the network.
[0591] The server converts speech data into encoded information using a speech recognition engine. A common cloud-based speech recognition service is used for this purpose. The converted encoded information is then analyzed by the server to extract elements of the business procedure. Natural language processing techniques are used for this analysis. For example, common natural language processing libraries and cloud services are utilized.
[0592] Furthermore, the server utilizes sentiment analysis to identify the user's emotions from the voice data. This sentiment information plays a crucial role in the process of generating business workflows. Based on this sentiment information, the server generates a business procedure diagram. A self-learning model is used for this, and the generated diagram is presented to the user via the terminal.
[0593] As a concrete example of its use, if a user says they "feel stressed during project progress," sentiment analysis identifies that emotion and influences the business process diagram. As a result, the server can suggest flexible solutions such as "distributing tasks" or "reducing the frequency of progress checks."
[0594] Examples of prompts include specific instructions such as, "Please provide suggestions to alleviate the anxieties one might feel when starting a new project."
[0595] This system allows users to streamline work procedures through voice input and obtain optimal solutions tailored to their emotional state.
[0596] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0597] Step 1:
[0598] The user speaks about the business process into the terminal. The terminal uses its built-in microphone to acquire voice information. This input is in the form of the user's voice and is raw data before conversion. The terminal converts this data into a digital voice format and saves it.
[0599] Step 2:
[0600] The terminal sends the stored audio data to the server over the network. The server receives this audio data. The server feeds the received audio data into a speech recognition engine and converts it into text data, which is encoded information. This conversion makes the audio signal into a format that is easily processed as digital text.
[0601] Step 3:
[0602] The server uses natural language processing technology to analyze the converted text data. This analysis extracts elements of business procedures from the input text. The analysis involves data processing to analyze keywords and sentence structure. The output identifies information related to the business procedures.
[0603] Step 4:
[0604] The server uses an emotion analysis engine to identify the user's emotional state from the audio data. The input is the unprocessed audio data, and emotion analysis is performed based on this to evaluate the emotional tone and psychological characteristics. The output is data indicating the user's emotional state.
[0605] Step 5:
[0606] The server automatically generates a business process diagram using a generation mechanism. Here, information on procedural elements and the user's emotional state are used as input, and a self-learning model constructs the business process diagram. The output is a visually represented business process diagram in a format that the user can review.
[0607] Step 6:
[0608] The server sends the generated business procedure diagram to the terminal. The terminal visually presents this diagram to the user. The diagram is displayed through the user interface, allowing the user to visualize and understand the flow and structure of the business procedure.
[0609] Step 7:
[0610] The user can verbally provide additional information or corrections based on the presented business process diagram. The terminal then sends this new verbal information back to the server, which updates the business process diagram. The input is the user's additional verbal information, and the output is the updated business process diagram.
[0611] Step 8:
[0612] The server identifies issues based on the latest business process diagrams and emotional states, and generates improvement measures as needed. In this step, a generative AI model is used to output improvement suggestions that are useful to the user. An example of a specific prompt might be, "Please provide new ideas to improve the current business flow."
[0613] (Application Example 2)
[0614] Next, we will explain Application Example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0615] This invention provides a system that dynamically optimizes work processes while considering user emotions when improving work efficiency through voice input. Conventional technologies often manage tasks uniformly without considering the user's emotional state, limiting the improvement of the user experience. This invention solves these problems and provides a means to achieve improved work efficiency and user experience.
[0616] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0617] In this invention, the server includes means for receiving voice input and acquiring voice information, voice recognition means for converting the voice information into text information, analysis means for analyzing the converted text information and extracting business processes, and emotion analysis means for analyzing emotional states. This makes it possible to automatically generate business flows and suggest improvement measures according to the user's emotions.
[0618] "Voice input" refers to the process where a machine receives audio as a digital signal.
[0619] "Voice information" refers to digital voice data acquired through voice input.
[0620] "Textual information" refers to data in which audio information is represented as text.
[0621] "Speech recognition means" refers to technologies and devices for converting speech information into text information.
[0622] "Analysis means" refers to technologies and devices that analyze textual information and extract business processes.
[0623] A "business process" refers to a series of procedures or steps necessary to perform a specific task.
[0624] "Generation means" refers to technologies or devices that automatically generate business process diagrams based on analysis results.
[0625] A "business process diagram" is a diagram that visually represents a business process.
[0626] A "problem" is an event or obstacle that needs to be resolved in order to carry out a task.
[0627] A "solution" is a method or means proposed to solve a problem.
[0628] "Information resources" refer to documents and data that are automatically generated based on business process diagrams and proposed solutions.
[0629] "Emotional state" refers to information that represents the user's mental and psychological condition.
[0630] "Emotional analysis means" refers to technologies and devices that analyze voice and text information to identify the user's emotional state.
[0631] "Support" refers to assistance or assistance to streamline or improve the user's work.
[0632] The system implementing this invention automatically generates work processes based on voice input and provides work support that takes into account the user's emotional state. This system is realized by a series of programs implemented in a consumer robot. When a user verbally explains a work process to the robot, the voice input is acquired as digital voice information through the microphone built into the robot.
[0633] Next, the audio information is converted into text information using the Google Speech-to-Text API. The resulting text information is then analyzed using natural language processing libraries such as Transformers to extract elements of the business process. During this analysis, sentiment analysis is used to identify the user's emotional state from the audio information. Based on the analysis results and sentiment information, a business process diagram is automatically generated using machine learning models and tools such as PyFlow.
[0634] The resulting work process diagram is presented to the user via the robot's display and voice. Furthermore, based on the results of the emotion analysis, suggestions for overall task rearrangement and improvements to reminders are provided. This aims to provide alternative measures to alleviate the emotional state of users who express concerns about being overwhelmed with work.
[0635] For example, if a user tells the robot, "I'm stressed because I have a lot of meetings this week," the robot will analyze this and suggest rescheduling the meetings or adjust reminders. Another possible prompt is: "When the user says, 'Today is busy and difficult,' analyze their emotions and generate an appropriate workflow."
[0636] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0637] Step 1:
[0638] The user speaks to the robot and explains the business process. The terminal acquires the user's voice as digital audio information. The input is the user's voice, and the output is digital audio data. The audio signal is captured through the microphone, and the data is prepared to be sent to the next step.
[0639] Step 2:
[0640] The server uses the Google Speech-to-Text API to convert digital audio information into text. The input is digital audio data, and the output is text information in string format. The server sends the audio data to the cloud API and retrieves the returned text information.
[0641] Step 3:
[0642] The server utilizes the Transformers library to analyze the obtained text information and extract elements of the business process. The input is text information, and the output is a list of extracted business process elements. Natural language processing is performed to process the data in order to identify the business elements.
[0643] Step 4:
[0644] The server uses emotion analysis tools to identify the user's emotional state from audio information. Input is textual information and associated audio features, while output is data related to the emotional state. The server analyzes the intonation and speed of the speech to quantify the emotion.
[0645] Step 5:
[0646] The server uses a machine learning model to generate a business process diagram based on extracted business process elements and sentiment information. The input is a list of business elements and sentiment state data, and the output is a business process diagram. The data is then fed into a tool such as PyFlow to create a visual business process diagram.
[0647] Step 6:
[0648] The terminal presents the generated business process diagram to the user via display and audio, and proposes improvement plans based on the results of sentiment analysis. Input consists of data for the business process diagram and improvement plans, while output is visual or audio feedback to the user. The diagram is displayed on the screen, and the suggestions are communicated via the voice speaker.
[0649] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0650] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0651] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0652] [Fourth Embodiment]
[0653] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0654] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0655] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0656] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0657] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0658] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0659] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0660] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0661] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0662] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0663] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0664] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0665] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0666] This invention provides a system for automatically generating business workflows using voice input, and describes its embodiments. The system primarily operates through the interaction of a terminal that acquires voice input, a server that processes the voice data, and the user.
[0667] First, the user verbally explains the work procedures and processes using a terminal. The terminal captures the user's voice through its microphone and records it as digital audio data. This digital audio data is then transmitted to a server via the internet.
[0668] The server uses a speech recognition engine to convert this audio data into text. The converted text is then analyzed using natural language processing techniques to identify the steps and conditions that make up the business process. This analysis extracts the elements of the business process, forming the basis for the business process diagram.
[0669] The generated business process flow diagram is presented to the user via a terminal from the server. The user reviews the diagram and provides additional information via voice by pointing out any omissions or errors as needed. The terminal sends this additional information back to the server, which then corrects and completes the business process flow diagram.
[0670] Furthermore, the server analyzes potential challenges and solutions from the business workflow and uses a generated AI model to provide concrete suggestions. This information is presented to the user via the terminal, and feedback is received.
[0671] Ultimately, the server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. This allows users to easily visually confirm the results of business process improvements and share them with stakeholders.
[0672] As part of this system, an automated analysis and generation process using artificial intelligence technology is included, enabling local governments and private companies to quickly digitize and streamline their operations. For example, if a user explains the new customer contract process by voice, the system can generate a flowchart of the workflow, including "customer information input," "contract creation and confirmation," and "approval process."
[0673] The following describes the processing flow.
[0674] Step 1:
[0675] The user uses a device to verbally explain the business process. The device captures the audio in real time via its microphone and saves it as digital audio data.
[0676] Step 2:
[0677] The terminal transmits the acquired audio data to the server via the network. The server activates a speech recognition engine and converts the audio data into text. Noise reduction and formatting of the audio data are also performed at this stage.
[0678] Step 3:
[0679] The server analyzes the converted text and uses natural language processing techniques to identify elements of the business flow. Specifically, it identifies business steps and branching conditions through the extraction of noun phrases and the analysis of conditional statements.
[0680] Step 4:
[0681] Based on the analysis results, the server creates a business process flow diagram using a generated AI model. It generates a visual representation of the business process in a flowchart format, consisting of nodes and edges.
[0682] Step 5:
[0683] The server generates a business process flow diagram, which is then sent to the user's terminal for presentation. The user reviews the diagram and verbally points out any necessary corrections or missing information.
[0684] Step 6:
[0685] The terminal collects additional audio from the user again and sends it to the server. The server updates the business flow diagram based on the additional information and generates an accurate process diagram.
[0686] Step 7:
[0687] The server automatically generates anticipated issues and solutions based on the final workflow. This information is presented to the user via the terminal, and further feedback is received.
[0688] Step 8:
[0689] The server automatically generates business process diagrams, challenges, and solutions as presentation slides and videos. Terminals distribute these documents to users, enabling them to leverage the results of business process improvements.
[0690] (Example 1)
[0691] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0692] In today's business environment, there is a demand for increased efficiency and automation of business processes. However, traditional methods make it difficult to quickly and accurately grasp business procedures and easily modify automatically generated workflows. Furthermore, it is challenging to instantly provide proposals that effectively solve business problems. Additionally, there is a need to quickly create and share proposed solutions as visual materials. To address these issues, an innovative system utilizing voice input is necessary.
[0693] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0694] In this invention, the server includes means for receiving voice input and acquiring voice data, means for converting the voice data into text, means for analyzing the converted text and extracting business procedures, means for automatically generating business procedure diagrams and modifying the flowcharts based on additional information received from the user via voice, and means for presenting business problems and solutions and creating visual materials based thereon. This enables efficient automation of business processes based on voice input, immediate presentation of problem solutions, corrective actions, and rapid generation and sharing of visual materials.
[0695] "Voice input" is the process of using voice to transmit user instructions and information to a computer system.
[0696] "Audio data" refers to data that is recorded and stored in digital format from signals obtained from audio input.
[0697] "Speech recognition means" refers to a technology or device that analyzes speech data and extracts its content as text.
[0698] "Analysis method" refers to the process of analyzing received text data and extracting business procedures and conditions.
[0699] A "business procedure" is a set of steps and conditions necessary to complete a specific task.
[0700] A "business procedure diagram" is a visual representation of business procedures, illustrating the flow and relationships between them.
[0701] "Construction method" refers to the process of creating a business procedure diagram based on the analyzed information.
[0702] "Proposal method" refers to a method or technique for presenting problems and solutions to users based on a business procedure diagram.
[0703] "Generative means" refers to the technology or process of creating materials for reports or presentations based on the proposed content.
[0704] "Visual materials" are information presented in visual formats such as graphs, slides, and videos.
[0705] This invention provides a system for automatically generating business workflows using voice input and efficiently solving business challenges. The following describes specific embodiments of this system.
[0706] Users use a terminal to verbally explain the procedures and processes of the workflow. The terminal is equipped with a microphone that captures the user's voice as digital audio data and transmits it to a server via the internet. Specific hardware and software may utilize a standard smartphone, computer, or cloud-based data transmission capabilities.
[0707] The server converts the received audio data into text using a speech recognition engine. Speech recognition utilizes technologies provided by speech recognition APIs or cloud services. Next, the server analyzes the text data using natural language processing (NLP) techniques to extract the steps and conditions of the business procedure. This analysis employs libraries and software that perform semantic analysis of the text data.
[0708] Based on the analysis results, the server automatically generates a business procedure diagram. The automatically generated diagram is sent to the terminal, where the user verifies its accuracy. If there are any omissions or errors in the diagram, the user can provide additional instructions via voice. The terminal resends this additional voice data to the server, which then corrects and completes the business procedure diagram.
[0709] Furthermore, the server analyzes anticipated problems based on the generated business procedure diagrams and proposes specific solutions using a generated AI model. This proposal is performed on the server using ChatGPT or similar artificial intelligence models. The proposed solutions are sent to the terminal, where the user provides feedback.
[0710] Ultimately, the server automatically generates the finalized business process diagrams and solutions as visual materials, such as slides or videos. These visual materials are then used for sharing among stakeholders and for presentations.
[0711] For example, if a user describes the new customer contract process via voice, this system can generate a business process diagram that includes steps such as "entering customer information," "creating and reviewing the contract," and "approval process." Furthermore, by inputting a prompt such as, "How can this contract process be made more efficient?", the system can obtain suggestions for improvement.
[0712] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0713] Step 1:
[0714] The user verbally explains the steps and processes of the workflow. The user's voice is captured as input and converted into digital audio data via the device's microphone. This audio data is then sent to the server.
[0715] Step 2:
[0716] The server passes the received digital audio data to the speech recognition engine, which converts the audio data into text. This process uses speech recognition technology that analyzes the digital audio and maps its waveform to a corresponding string of characters. Text data is generated as output.
[0717] Step 3:
[0718] The server inputs text data generated by speech recognition into a natural language processing (NLP) engine to analyze the steps and conditions of the business procedure. Here, the grammar and semantics of the document are analyzed to identify the elements that make up the business procedure. The output is structural information of the analyzed business procedure.
[0719] Step 4:
[0720] The server automatically generates a business procedure diagram based on the structural information of the analyzed business procedures. This generation process uses an algorithm that visualizes the business flow in an easy-to-understand flowchart format. The output is a flowchart showing the business procedures.
[0721] Step 5:
[0722] The user reviews the work procedure diagram displayed on the terminal. If there are any omissions or errors in the diagram, the user inputs additional instructions by voice and sends them to the terminal as supplementary information.
[0723] Step 6:
[0724] The server performs speech recognition again on the additional audio data sent from the terminal and converts it to text. Then, it performs another NLP analysis to identify the necessary corrections to be reflected in the business procedure diagram. The output is the revised business procedure diagram.
[0725] Step 7:
[0726] The server analyzes potential business problems based on the finalized business procedure diagram and generates specific solutions using a generative AI model. This process involves prompting the AI model and performing calculations to generate relevant problem-solving solutions. The output is the proposed solution.
[0727] Step 8:
[0728] The server automatically generates visual materials using the final business procedure diagram and proposed solutions. A material generation algorithm is applied to present the information clearly in slide and video formats. The output is visual material.
[0729] (Application Example 1)
[0730] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0731] In modern production facilities, frequent changes in product variations and manufacturing processes can reduce productivity. In particular, changes to production lines involving complex procedures increase the burden on managers, increase the likelihood of errors, and significantly impair efficiency. To address these challenges, there is a need for a system that allows for easy setting and updating of production procedures using voice input.
[0732] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0733] In this invention, the server includes means for receiving voice information and acquiring data, voice recognition means for converting the data into text information, and analysis means for analyzing the converted text information and extracting a procedure flow. This enables administrators to quickly and accurately set and update production line procedures via voice input.
[0734] "Voice information" refers to data recorded digitally from the user's speech.
[0735] "Textual information" refers to data converted from audio information into text format by speech recognition technology.
[0736] A "procedure flow chart" is a model that represents a series of steps or processes that indicate the steps involved in a business operation.
[0737] "Analysis means" refers to a system component that has the function of processing textual information to identify and extract a procedural flow.
[0738] A "generation means" is a component of a system that has the function of automatically creating a visually understandable flowchart based on the analyzed procedure flow.
[0739] "Operational challenges" refer to factors or problems that hinder efficiency or quality in business or production processes.
[0740] "Countermeasures" refer to specific solutions or methods proposed to address identified operational challenges.
[0741] "Means for automatically generating materials" refers to a system component that has the function of generating presentation materials based on the generated procedure flowchart and countermeasures.
[0742] "Robot control means" refers to a system component that has the function of automatically updating the robot's movements and operating procedures based on a procedure flow diagram obtained through a generation means.
[0743] To implement this invention, a system is constructed that enables the efficient setup and updating of production lines within a factory. First, inputting voice information requires a voice input device with a microphone for recording the speaker's voice. The voice information is input using this device and converted into a digital format. Next, the voice information is transmitted to a server via the internet, and the server converts the voice into text information using the Google Cloud Speech-to-Text API.
[0744] The server analyzes the converted text information and extracts the procedure flow. The Google Cloud Natural Language API is used for analysis, analyzing the text data to identify the business steps. The identified steps are automatically generated as a procedure flow diagram and visually presented through the user interface.
[0745] Users can make modifications to the flowchart as needed through voice input. Finally, the server updates the operation procedures of the factory robots using robot control means based on the generated procedure flowchart. This makes it easy for users to achieve efficient operation of the production line.
[0746] As a concrete example, consider a case where a user changes the production process for a product. The user gives a voice command saying, "For the new product, please add two repetitions to the processing step." Based on this prompt, the system automatically generates a new procedure flow and adjusts the factory robots accordingly.
[0747] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0748] Step 1:
[0749] The user speaks instructions using a voice input device. The input here is the user's voice. The microphone in the voice input device captures this voice information and converts it into a digital format. The output is digital voice data.
[0750] Step 2:
[0751] The terminal transmits digital audio data to the server via the internet. At this point, the input is digital audio data, and the output is the audio data received by the server. The server processes the data to prevent any loss.
[0752] Step 3:
[0753] The server uses the Google Cloud Speech-to-Text API to convert speech information into text. In this step, the input is digital speech data received by the server, and the output is text information (text data). The server performs acoustic modeling and phoneme recognition to accurately transcribe speech into text.
[0754] Step 4:
[0755] The server uses the Google Cloud Natural Language API to analyze textual information and extract procedure flows. The input is converted textual information, and the output is the analyzed procedure flow. Specifically, the server performs grammatical and semantic analysis to extract business procedures and conditions.
[0756] Step 5:
[0757] The server automatically generates a procedure flow diagram based on the procedure flow. The input is the analyzed procedure flow, and the output is a visually represented procedure flow diagram. The server uses a diagram generation algorithm to construct an intuitive flow diagram.
[0758] Step 6:
[0759] The user reviews the flowchart and suggests modifications using additional voice input. The flowchart is displayed on the device, and the user reviews and makes decisions based on it. The input is user feedback, and the output is the updated procedure flowchart. The device then sends the new voice data back to the server.
[0760] Step 7:
[0761] The server adjusts the factory robot appropriately using robot control means based on the final procedure flow diagram. The input is the completed procedure flow diagram, and the output is the adjusted robot's operation. The server updates the robot's operating parameters and modifies the control program.
[0762] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0763] This invention combines a system that automatically generates business workflows from voice input with an emotion engine that analyzes user emotions. This system primarily provides concrete improvement measures to enhance business efficiency through user interaction. The embodiments for carrying out this invention are described in detail below.
[0764] The user verbally explains the business process into the terminal. The terminal captures the user's voice and records it as digital audio data. This audio data is sent to a server, which uses a speech recognition engine to convert the data into text.
[0765] The converted text is analyzed using natural language processing techniques through an analysis tool to identify elements of the business flow. The server also uses an emotion engine to analyze the user's emotional state from the audio data. This emotional information is then used to generate business flow diagrams and suggest solutions.
[0766] Through the generation mechanism, the server automatically generates a business process flow diagram. The generated diagram is visually presented to the user via their terminal. Based on the analysis results of the emotion engine, the user interface and presentation method may be adjusted. For example, if the user's emotion is determined to be negative, the system will be configured to present the information provided in a more flexible manner.
[0767] The user can review the presented business process flow chart and provide additional information or modifications verbally. The terminal resends this information to the server, which updates the flow chart. The server also generates suggested issues and solutions based on the business process flow chart and presents them to the user, taking sentiment into consideration.
[0768] As an example of how emotion analysis can be useful in guiding behavior, if a user verbally expresses that they "feel stressed during project progress," the emotion engine will understand their stress level and suggest specific improvement measures such as "distributing tasks" or "reducing the frequency of progress checks." In this way, the present invention effectively utilizes the emotion engine to provide support tailored to the user's state, thereby promoting business improvement.
[0769] The following describes the processing flow.
[0770] Step 1:
[0771] The user verbally explains the business process to the terminal. The terminal uses its microphone to capture the user's voice in real time and record it as digital audio data.
[0772] Step 2:
[0773] The terminal transmits the acquired audio data to the server via the network. The server feeds the received audio data into a speech recognition engine and converts the data into text. This process also includes preprocessing to remove audio noise and improve recognition accuracy.
[0774] Step 3:
[0775] The server analyzes text data using natural language processing techniques to identify the elements that make up the business flow (steps, conditions, branches, etc.). At the same time, the server uses an emotion engine to analyze the user's emotional state from voice data and saves the results.
[0776] Step 4:
[0777] The server automatically generates a business process flow diagram using a generative AI model based on the analyzed text and sentiment information. The business process flow diagram is presented in a flowchart format, visually representing the process flow with nodes and edges.
[0778] Step 5:
[0779] The server sends the generated business process diagram to the terminal, which then presents it to the user. The user can review the diagram and verbally provide any missing information or corrections. The format and method of information presentation are dynamically adjusted according to the user's emotional state.
[0780] Step 6:
[0781] The terminal receives additional audio explanations from the user and sends them back to the server. The server understands this new information and updates the business process diagram.
[0782] Step 7:
[0783] The server analyzes the final workflow in detail and automatically generates expected challenges and solutions based on them. It incorporates the results of the emotion engine into the suggestions, presenting to the user what problems might exist and how improvements can be made.
[0784] Step 8:
[0785] The server compiles business process diagrams, challenges, and solutions, and automatically generates slides and videos as presentation materials that take into account the results of sentiment analysis. The terminal distributes these materials to the user, supporting them in taking concrete actions to improve their work processes.
[0786] (Example 2)
[0787] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0788] In today's business environment, there is a demand for both streamlined business processes and flexible problem-solving solutions that take into account user emotions. While conventional systems could convert voice data to text and automatically generate business flows, they lacked the nuanced approach to reflect the emotional state of users. As a result, there is a problem in that proposed business improvement measures are not always optimal for the user.
[0789] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0790] In this invention, the server includes means for receiving voice information and acquiring voice data, voice recognition means for converting the voice data into encoded information, analysis means for analyzing the converted encoded information and extracting business procedures, and emotion analysis means for identifying the user's emotional state from the voice data. This makes it possible to appropriately identify elements of business procedures and present flexible problem-solving solutions that take the user's emotions into consideration.
[0791] "Voice information" refers to spoken language data entered by the user, which is later converted into encoded information during processing.
[0792] "Encoded information" refers to data obtained by converting audio information into a digital format, and is used for further analysis and identification of business procedures.
[0793] "Speech recognition means" refers to a technology or device that converts speech information into encoded information, and specifically, has the function of converting speech input into text format.
[0794] "Analysis means" refers to a technology or device that analyzes encoded information to identify elements of business procedures, and utilizes natural language processing technology.
[0795] A "business procedure diagram" is a diagram that visually represents the flow and structure of business procedures, and is used to help understand and improve those procedures.
[0796] "Emotional analysis means" refers to a technology or device that identifies a user's emotional information from voice data, and uses that information to infer the user's psychological state and reflect it in business improvements.
[0797] A "self-learning model" is an algorithm or system that learns from data and improves its accuracy on its own, and is used for creating business process diagrams.
[0798] "Flexible problem-solving" refers to a method or system that takes into account the user's emotional state and business processes, and proposes the most appropriate improvement measures according to the situation.
[0799] This system receives voice information and effectively visualizes and improves the user's work procedures. Its implementation is described below.
[0800] First, the user verbally explains the business process into the terminal. The terminal uses its internal microphone to capture this audio. The audio data is stored digitally and transmitted to the server over the network.
[0801] The server converts speech data into encoded information using a speech recognition engine. A common cloud-based speech recognition service is used for this purpose. The converted encoded information is then analyzed by the server to extract elements of the business procedure. Natural language processing techniques are used for this analysis. For example, common natural language processing libraries and cloud services are utilized.
[0802] Furthermore, the server utilizes sentiment analysis to identify the user's emotions from the voice data. This sentiment information plays a crucial role in the process of generating business workflows. Based on this sentiment information, the server generates a business procedure diagram. A self-learning model is used for this, and the generated diagram is presented to the user via the terminal.
[0803] As a concrete example of its use, if a user says they "feel stressed during project progress," sentiment analysis identifies that emotion and influences the business process diagram. As a result, the server can suggest flexible solutions such as "distributing tasks" or "reducing the frequency of progress checks."
[0804] Examples of prompts include specific instructions such as, "Please provide suggestions to alleviate the anxieties one might feel when starting a new project."
[0805] This system allows users to streamline work procedures through voice input and obtain optimal solutions tailored to their emotional state.
[0806] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0807] Step 1:
[0808] The user speaks about the business process into the terminal. The terminal uses its built-in microphone to acquire voice information. This input is in the form of the user's voice and is raw data before conversion. The terminal converts this data into a digital voice format and saves it.
[0809] Step 2:
[0810] The terminal sends the stored audio data to the server over the network. The server receives this audio data. The server feeds the received audio data into a speech recognition engine and converts it into text data, which is encoded information. This conversion makes the audio signal into a format that is easily processed as digital text.
[0811] Step 3:
[0812] The server uses natural language processing technology to analyze the converted text data. This analysis extracts elements of business procedures from the input text. The analysis involves data processing to analyze keywords and sentence structure. The output identifies information related to the business procedures.
[0813] Step 4:
[0814] The server uses an emotion analysis engine to identify the user's emotional state from the audio data. The input is the unprocessed audio data, and emotion analysis is performed based on this to evaluate the emotional tone and psychological characteristics. The output is data indicating the user's emotional state.
[0815] Step 5:
[0816] The server automatically generates a business process diagram using a generation mechanism. Here, information on procedural elements and the user's emotional state are used as input, and a self-learning model constructs the business process diagram. The output is a visually represented business process diagram in a format that the user can review.
[0817] Step 6:
[0818] The server sends the generated business procedure diagram to the terminal. The terminal visually presents this diagram to the user. The diagram is displayed through the user interface, allowing the user to visualize and understand the flow and structure of the business procedure.
[0819] Step 7:
[0820] The user can verbally provide additional information or corrections based on the presented business process diagram. The terminal then sends this new verbal information back to the server, which updates the business process diagram. The input is the user's additional verbal information, and the output is the updated business process diagram.
[0821] Step 8:
[0822] The server identifies issues based on the latest business process diagrams and emotional states, and generates improvement measures as needed. In this step, a generative AI model is used to output improvement suggestions that are useful to the user. An example of a specific prompt might be, "Please provide new ideas to improve the current business flow."
[0823] (Application Example 2)
[0824] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0825] This invention provides a system that dynamically optimizes work processes while considering user emotions when improving work efficiency through voice input. Conventional technologies often manage tasks uniformly without considering the user's emotional state, limiting the improvement of the user experience. This invention solves these problems and provides a means to achieve improved work efficiency and user experience.
[0826] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0827] In this invention, the server includes means for receiving voice input and acquiring voice information, voice recognition means for converting the voice information into text information, analysis means for analyzing the converted text information and extracting business processes, and emotion analysis means for analyzing emotional states. This makes it possible to automatically generate business flows and suggest improvement measures according to the user's emotions.
[0828] "Voice input" refers to the process where a machine receives audio as a digital signal.
[0829] "Voice information" refers to digital voice data acquired through voice input.
[0830] "Textual information" refers to data in which audio information is represented as text.
[0831] "Speech recognition means" refers to technologies and devices for converting speech information into text information.
[0832] "Analysis means" refers to technologies and devices that analyze textual information and extract business processes.
[0833] A "business process" refers to a series of procedures or steps necessary to perform a specific task.
[0834] "Generation means" refers to technologies or devices that automatically generate business process diagrams based on analysis results.
[0835] A "business process diagram" is a diagram that visually represents a business process.
[0836] A "problem" is an event or obstacle that needs to be resolved in order to carry out a task.
[0837] A "solution" is a method or means proposed to solve a problem.
[0838] "Information resources" refer to documents and data that are automatically generated based on business process diagrams and proposed solutions.
[0839] "Emotional state" refers to information that represents the user's mental and psychological condition.
[0840] "Emotional analysis means" refers to technologies and devices that analyze voice and text information to identify the user's emotional state.
[0841] "Support" refers to assistance or assistance to streamline or improve the user's work.
[0842] The system implementing this invention automatically generates work processes based on voice input and provides work support that takes into account the user's emotional state. This system is realized by a series of programs implemented in a consumer robot. When a user verbally explains a work process to the robot, the voice input is acquired as digital voice information through the microphone built into the robot.
[0843] Next, the audio information is converted into text information using the Google Speech-to-Text API. The resulting text information is then analyzed using natural language processing libraries such as Transformers to extract elements of the business process. During this analysis, sentiment analysis is used to identify the user's emotional state from the audio information. Based on the analysis results and sentiment information, a business process diagram is automatically generated using machine learning models and tools such as PyFlow.
[0844] The resulting work process diagram is presented to the user via the robot's display and voice. Furthermore, based on the results of the emotion analysis, suggestions for overall task rearrangement and improvements to reminders are provided. This aims to provide alternative measures to alleviate the emotional state of users who express concerns about being overwhelmed with work.
[0845] For example, if a user tells the robot, "I'm stressed because I have a lot of meetings this week," the robot will analyze this and suggest rescheduling the meetings or adjust reminders. Another possible prompt is: "When the user says, 'Today is busy and difficult,' analyze their emotions and generate an appropriate workflow."
[0846] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0847] Step 1:
[0848] The user speaks to the robot and explains the business process. The terminal acquires the user's voice as digital audio information. The input is the user's voice, and the output is digital audio data. The audio signal is captured through the microphone, and the data is prepared to be sent to the next step.
[0849] Step 2:
[0850] The server uses the Google Speech-to-Text API to convert digital audio information into text. The input is digital audio data, and the output is text information in string format. The server sends the audio data to the cloud API and retrieves the returned text information.
[0851] Step 3:
[0852] The server utilizes the Transformers library to analyze the obtained text information and extract elements of the business process. The input is text information, and the output is a list of extracted business process elements. Natural language processing is performed to process the data in order to identify the business elements.
[0853] Step 4:
[0854] The server uses emotion analysis tools to identify the user's emotional state from audio information. Input is textual information and associated audio features, while output is data related to the emotional state. The server analyzes the intonation and speed of the speech to quantify the emotion.
[0855] Step 5:
[0856] The server uses a machine learning model to generate a business process diagram based on extracted business process elements and sentiment information. The input is a list of business elements and sentiment state data, and the output is a business process diagram. The data is then fed into a tool such as PyFlow to create a visual business process diagram.
[0857] Step 6:
[0858] The terminal presents the generated business process diagram to the user via display and audio, and proposes improvement plans based on the results of sentiment analysis. Input consists of data for the business process diagram and improvement plans, while output is visual or audio feedback to the user. The diagram is displayed on the screen, and the suggestions are communicated via the voice speaker.
[0859] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0860] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0861] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0862] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0863] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0864] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0865] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0866] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0867] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0868] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0869] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0870] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0871] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0872] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0873] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0874] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0875] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0876] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0877] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0878] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0879] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0880] The following is further disclosed regarding the embodiments described above.
[0881] (Claim 1)
[0882] A means of receiving voice input and acquiring voice data,
[0883] A speech recognition means for converting the aforementioned speech data into text,
[0884] An analysis method that analyzes the converted text to extract the business flow,
[0885] A generation means for automatically generating a business process flow diagram based on the aforementioned analysis results,
[0886] A means of presenting business challenges and solutions based on the aforementioned business process flow diagram,
[0887] A means for automatically generating documents based on the aforementioned business process flow diagram and solution,
[0888] A system that includes this.
[0889] (Claim 2)
[0890] The system according to claim 1, wherein the analysis means identifies elements of the business flow using natural language processing technology.
[0891] (Claim 3)
[0892] The system according to claim 1, wherein the generation means creates a business flow diagram using an artificial intelligence model.
[0893] "Example 1"
[0894] (Claim 1)
[0895] A means of receiving voice input and acquiring voice data,
[0896] A speech recognition means for converting the aforementioned speech data into text,
[0897] An analysis tool that analyzes the converted text to extract business procedures,
[0898] A construction means for automatically generating a business procedure diagram based on the aforementioned analysis results,
[0899] A proposal means for presenting business problems and solutions based on the aforementioned business procedure diagram,
[0900] A generation means for automatically generating report materials based on the aforementioned business procedure diagram and solution,
[0901] A means for receiving additional information from the user as audio and modifying the aforementioned business procedure diagram,
[0902] A means of creating visual materials based on the generated business procedure diagrams and proposed content,
[0903] A system that includes this.
[0904] (Claim 2)
[0905] The system according to claim 1, wherein the analysis means identifies elements of a business procedure using natural language processing technology.
[0906] (Claim 3)
[0907] The system according to claim 1, wherein the proposed means derives business solutions using an artificial intelligence model.
[0908] "Application Example 1"
[0909] (Claim 1)
[0910] A means of receiving audio information and acquiring data,
[0911] A speech recognition means that converts the aforementioned data into text information,
[0912] An analysis means for analyzing the converted character information and extracting the procedure flow,
[0913] A generation means for automatically generating a procedure flow chart based on the aforementioned analysis results,
[0914] A means of presenting operational challenges and countermeasures based on the aforementioned procedure flowchart,
[0915] A means for automatically generating documents based on the aforementioned procedure flowchart and countermeasures,
[0916] A robot control means that updates the operation procedure through the generation means,
[0917] A system that includes this.
[0918] (Claim 2)
[0919] The system according to claim 1, wherein the analysis means uses natural language processing technology to identify elements of the procedure flow.
[0920] (Claim 3)
[0921] The system according to claim 1, wherein the generation means creates a procedure flow diagram using an artificial intelligence model.
[0922] "Example 2 of combining an emotion engine"
[0923] (Claim 1)
[0924] A means of receiving audio information and acquiring audio data,
[0925] A speech recognition means that converts the aforementioned audio data into encoded information,
[0926] An analysis means for analyzing the converted encoded information and extracting business procedures,
[0927] A generation means for automatically generating a business procedure diagram based on the aforementioned analysis results,
[0928] An emotion analysis means for identifying the user's emotional state from the aforementioned audio data,
[0929] A means for flexibly presenting business challenges and improvement measures based on the aforementioned business procedure diagram and the user's emotional state,
[0930] A means for automatically generating information based on the aforementioned business procedure diagram and improvement measures,
[0931] A system that includes this.
[0932] (Claim 2)
[0933] The system according to claim 1, wherein the analysis means uses advanced language analysis technology to identify elements of business procedures.
[0934] (Claim 3)
[0935] The system according to claim 1, wherein the generation means creates a business procedure diagram using a self-learning model.
[0936] "Application example 2 when combining with an emotional engine"
[0937] (Claim 1)
[0938] A means of receiving voice input and acquiring voice information,
[0939] A speech recognition means that converts the aforementioned speech information into text information,
[0940] An analysis means for analyzing converted character information and extracting the business process,
[0941] A generation means for automatically generating a business process diagram based on the aforementioned analysis results,
[0942] A means of presenting business problems and solutions based on the aforementioned business process diagram,
[0943] A means for automatically generating information resources based on the aforementioned business process diagram and solution,
[0944] A means of analyzing emotional states,
[0945] A means for reflecting emotional information obtained by the aforementioned emotion analysis means into the work process and providing support,
[0946] A system that includes this.
[0947] (Claim 2)
[0948] The system according to claim 1, wherein the analysis means uses natural language processing technology to identify elements of the business process.
[0949] (Claim 3)
[0950] The system according to claim 1, wherein the generation means creates a business process diagram using a machine learning model. [Explanation of Symbols]
[0951] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means of receiving audio information and acquiring data, A speech recognition means that converts the aforementioned data into text information, An analysis means for analyzing the converted character information and extracting the procedure flow, A generation means for automatically generating a procedure flow chart based on the aforementioned analysis results, A means of presenting operational challenges and countermeasures based on the aforementioned procedure flowchart, A means for automatically generating documents based on the aforementioned procedure flowchart and countermeasures, A robot control means that updates the operation procedure through the generation means, A system that includes this.
2. The system according to claim 1, wherein the analysis means uses natural language processing technology to identify elements of the procedure flow.
3. The system according to claim 1, wherein the generation means uses an artificial intelligence model to create a procedure flow diagram.