Order execution system, voice operation assistance system, voice operation assistance program, electronic device, and order execution program

The system effectively addresses the challenge of accurately interpreting voice inputs by employing multiple interpretation units to convert voice commands into actionable commands, thereby improving user convenience and accuracy.

WO2026126877A1PCT designated stage Publication Date: 2026-06-18KYOCERA DOCUMENT SOLUTIONS INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
KYOCERA DOCUMENT SOLUTIONS INC
Filing Date
2025-12-02
Publication Date
2026-06-18

Smart Images

  • Figure JP2025041971_18062026_PF_FP_ABST
    Figure JP2025041971_18062026_PF_FP_ABST
Patent Text Reader

Abstract

An image formation system comprises an image formation device for executing orders and a voice operation assistance system for assisting with voice operation of the image formation device. The voice operation assistance system comprises: a command interpretation unit that interprets information input by voice by mutually different methods and converts the information into orders; a basic interpretation unit; and an advance interpretation unit. The image formation device indicates orders generated by each of two or more interpretation units among the command interpretation unit, the basic interpretation unit, and the advance interpretation unit and executes an order selected by a user from among the orders of the two or more interpretation units (S162).
Need to check novelty before this filing date? Find Prior Art

Description

Command execution system, voice operation support system, voice operation support program, electronic device, and command execution program 【0001】 The present invention relates to a command execution system, a voice operation support system, a voice operation support program, an electronic device, and a command execution program that execute commands obtained by interpreting and converting information input by voice. 【0002】 As a conventional command execution system, there is known one that executes commands obtained by interpreting and converting information input by voice (see, for example, Patent Documents 1 and 2). 【0003】 Japanese Patent Application Laid-Open No. 2020-087347, Japanese Patent Application Laid-Open No. 2020-087381 【0004】 However, in the conventional command execution system, there is a problem that when information input by voice cannot be appropriately interpreted, the voice operation intended by the user cannot be realized. 【0005】 Therefore, an object of the present invention is to provide a command execution system, a voice operation support system, a voice operation support program, an electronic device, and a command execution program that can improve the possibility of realizing a voice operation intended by a user. 【0006】 A command execution system according to an aspect of the present invention includes an electronic device that executes commands, and a voice operation support system that supports a voice operation of the electronic device. The voice operation support system includes a plurality of interpretation units that interpret information input by voice in different ways and convert it into the commands. The electronic device displays the commands generated by each of two or more of the plurality of interpretation units, and executes the generated commands according to an execution instruction by a user. 【0007】With this configuration, the command execution system according to one aspect of the present invention displays commands generated by two or more interpretation units, which interpret the voice-input information in different ways and convert it into commands. The system then executes the generated commands according to the user's execution instructions, thereby improving the likelihood of realizing the voice operation intended by the user. 【0008】 In an instruction execution system according to one aspect of the present invention, the electronic device may display the instruction generated by an interpretation unit specified by the user among a plurality of interpretation units, and execute the generated instruction in accordance with the user's execution instruction. 【0009】 This configuration allows the command execution system according to one aspect of the present invention to display commands generated by at least one of a plurality of interpretation units that interpret voice-input information in different ways and convert it into commands, in response to user instructions, thereby improving convenience. In the command execution system according to one aspect of the present invention, the electronic device may display a message indicating that there are no commands corresponding to the voice-input information if the interpretation unit is unable to convert the voice-input information into commands. 【0010】 A voice operation support system according to one aspect of the present invention is a voice operation support system that supports voice operation of an electronic device that executes commands, and comprises a plurality of interpretation units that interpret information input by voice in different ways and convert it into commands. 【0011】 With this configuration, the voice operation support system according to one aspect of the present invention generates commands for electronic devices by having multiple interpretation units, each of which interprets the voice-input information in a different way and converts it into a command. This improves the possibility of realizing the voice operation intended by the user. 【0012】A voice operation support program according to one aspect of the present invention is a voice operation support program for supporting voice operation of an electronic device that executes commands, wherein the computer operates as a plurality of interpreting units that interpret information input by voice in different ways and convert it into commands. 【0013】 With this configuration, the computer executing the voice operation support program of the present invention generates commands for electronic devices by having multiple interpretation units, each of which interprets the voice-input information in a different way and converts it into commands. This improves the likelihood of achieving the voice operation intended by the user. 【0014】 An electronic device according to one aspect of the present invention receives and displays commands generated by each of a plurality of interpretation units that interpret information input by voice in different ways and convert it into commands, and executes a command selected by the user from among these commands. 【0015】 With this configuration, the electronic device of the present invention displays commands generated by two or more interpreting units that interpret voice input in different ways and convert it into commands, and executes the command selected by the user from among these commands, thereby improving the possibility of realizing the voice operation intended by the user. 【0016】 An instruction execution program according to one aspect of the present invention causes a computer to display the instructions generated by each of a plurality of interpretation units that interpret information input by voice in different ways and convert it into instructions, and then executes the instruction selected by the user from among these instructions. 【0017】 With this configuration, the electronic device executing the instruction execution program of the present invention displays instructions generated by two or more interpretation units that interpret the voice input information in different ways and convert it into instructions, and executes the instruction selected by the user from among these instructions, thereby improving the possibility of realizing the voice operation intended by the user. 【0018】The command execution system, voice operation support system, voice operation support program, electronic device, and command execution program of the present invention can improve the possibility of realizing the voice operation intended by the user. 【0019】 This is a block diagram of an example of an image forming system according to one embodiment of the present invention. This is a block diagram of an example of an image forming apparatus shown in Figure 1 when it is configured with an MFP. This is a block diagram of an example of a voice input device shown in Figure 1. This is a block diagram of an example of a voice operation support system shown in Figure 1 when it is configured with one computer. This is a diagram showing an example of customer management information shown in Figure 4. This is a diagram showing an example of device management information shown in Figure 4. This is a diagram showing an example of interpretation history information shown in Figure 4. This is a diagram showing an example of fee structure information shown in Figure 4. This is a sequence diagram of the operation of the image forming system shown in Figure 1 when the image forming apparatus displays an interpretation result screen. This is a diagram showing an example of an interpretation result screen displayed in the operation shown in Figure 9. This is a sequence diagram of the operation of the image forming system shown in Figure 1 when the basic interpretation execution button is pressed on the interpretation result screen. This is a diagram showing an example of an interpretation result screen after the basic interpretation execution button is pressed in the state shown in Figure 10. This is a sequence diagram of the operation of the image forming system shown in Figure 1 when the high-spec interpretation execution button is pressed on the interpretation result screen. This is a diagram showing an example of an interpretation result screen after the high-spec interpretation execution button is pressed in the state shown in Figure 10. This is a diagram showing an example of an interpretation result screen after the high-spec interpretation execution button is pressed in the state shown in Figure 12. This figure shows an example of an interpretation result screen in a state different from the states shown in Figures 10, 12, 14, and 15. Figure 2 is a flowchart of the operation of the image forming apparatus when executing the command selected on the interpretation result screen. Figure 4 is a flowchart of the operation of the voice operation support system when billing for text interpretation. 【0020】 Hereinafter, embodiments of the present invention will be described with reference to the drawings. 【0021】 First, the configuration of the image forming system as an instruction execution system according to one embodiment of the present invention will be described. 【0022】 Figure 1 is a block diagram of an example of an image forming system 10 according to this embodiment. 【0023】 As shown in Figure 1, the image forming system 10 includes an image forming apparatus 20, which is an electronic device for forming images. The image forming system 10 may also include at least one other image forming apparatus with a configuration similar to that of the image forming apparatus 20. The image forming apparatus may be, for example, a dedicated printer or an MFP (Multifunction Peripheral). 【0024】 The image forming system 10 includes a voice input device 30 into which the voice of a user of the image forming apparatus is input. The image forming system 10 may also include at least one other voice input device having a similar configuration to the voice input device 30. The voice input device may be configured as, for example, a smart speaker or a computer such as a smartphone. 【0025】 The image forming system 10 includes a voice operation support system 40 that assists the user in operating the image forming apparatus by voice. The voice operation support system 40 may be composed of one computer, such as a PC (Personal Computer), or it may be composed of multiple computers. The voice operation support system 40 may be configured on the same LAN (Local Area Network) as the image forming apparatus, or it may be configured on the cloud. For example, the voice operation support system 40 may be composed of a server device located on the cloud. 【0026】 The image forming apparatus in the image forming system 10 and the voice operation support system 40 are connected to each other via a network 11 such as a LAN or the Internet, enabling them to communicate with one another. Similarly, the voice input device in the image forming system 10 and the voice operation support system 40 are connected to each other via the network 11, enabling them to communicate with one another. 【0027】 Figure 2 is a block diagram of an example of an image forming apparatus 20 that is composed of MFPs. 【0028】 As shown in Figure 2, the image forming apparatus 20 is a computer comprising: an operation unit 21 which is an operation device such as buttons into which various operations are input; a display unit 22 which is a display device such as an LCD (Liquid Crystal Display) that displays various information; a printer 23 which is a printing device for printing images on a recording medium such as paper, for example, equipped with an electrophotographic image forming mechanism; a scanner 24 which is a reading device that reads images from an original document; a communication unit 25 which is a communication device that communicates with external devices via a network such as a LAN or the Internet, or directly by wired or wireless connection without a network; a fax communication unit 26 which is a fax device that communicates faxes with an external facsimile device (not shown) via a communication line such as a public telephone line; a storage unit 27 which is a non-volatile storage device such as a semiconductor memory or HDD (Hard Disk Drive) that stores various information; and a control unit 28 for controlling the entire image forming apparatus 20. 【0029】 The memory unit 27 stores an instruction execution program 27a for executing instructions. The instruction execution program 27a may be installed in the image forming apparatus 20 during the manufacturing stage, or it may be additionally installed in the image forming apparatus 20 from an external storage medium such as a USB (Universal Serial Bus) memory, or it may be additionally installed in the image forming apparatus 20 from a network. 【0030】 The control unit 28 shown in Figure 2 includes, for example, a processor consisting of a CPU (Central Processing Unit), a ROM (Read Only Memory) that stores programs and various data, and a RAM (Random Access Memory) used as a working area for the processor. The processor operates as a control unit 280 by executing a program stored in the ROM of the storage unit 27 or the control unit 28. The processor is an example of a computer included in the electronic device described in the claims. 【0031】 The processor in the control unit 28 operates as an instruction execution unit 28a that executes instructions by executing the instruction execution program 27a. 【0032】 Figure 3 is a block diagram of an example of an audio input device 30. 【0033】 As shown in Figure 3, the voice input device 30 includes an operation unit 31 which is an operation device such as a keyboard or mouse into which various operations are input; a display unit 32 which is a display device such as an LCD that displays various information; a microphone 33 for inputting voice; a speaker 34 for outputting voice; a communication unit 35 which is a communication device that communicates with external devices via a network such as a LAN or the Internet, or directly by wired or wireless connection without going through a network; a storage unit 36 ​​which is a non-volatile storage device such as a semiconductor memory or HDD that stores various information; and a control unit 37 that controls the entire voice input device 30. 【0034】 The storage unit 36 ​​stores an audio processing program 36a for processing audio. The audio processing program 36a may be installed in the audio input device 30 during the manufacturing stage, or it may be additionally installed in the audio input device 30 from an external storage medium such as a USB memory, or it may be additionally installed in the audio input device 30 from a network. 【0035】 The control unit 37 includes, for example, a CPU, a ROM that stores programs and various data, and RAM as memory used as a working area for the CPU of the control unit 37. The CPU of the control unit 37 executes programs stored in the storage unit 36 ​​or the ROM of the control unit 37. 【0036】The control unit 37 operates as a voice processing unit 37a that processes sound by executing the voice processing program 36a. The voice processing unit 37a inputs the user's voice via the microphone 33 and outputs messages to the user via the speaker 34. For example, the voice processing unit 37a can accept voice commands from the user to operate the image forming apparatus 20 through interaction with the user. 【0037】 Figure 4 is a block diagram of an example of a voice operation support system 40 that is configured with a single computer. 【0038】 As shown in Figure 4, the voice operation support system 40 includes an operation unit 41 which is an operation device such as a keyboard or mouse into which various operations are input, a display unit 42 which is a display device such as an LCD that displays various information, a communication unit 43 which is a communication device that communicates with external devices via a network such as a LAN or the Internet, or directly via wired or wireless connection without going through a network, a storage unit 44 which is a non-volatile storage device such as a semiconductor memory or HDD that stores various information, and a control unit 45 that controls the entire voice operation support system 40. 【0039】 The memory unit 44 stores a voice operation support program 44a for assisting the user in voice-operating the image forming apparatus. The voice operation support program 44a may be installed in the voice operation support system 40 during the manufacturing stage, or it may be additionally installed in the voice operation support system 40 from an external storage medium such as a USB memory, or it may be additionally installed in the voice operation support system 40 from a network. 【0040】 The memory unit 44 further stores customer management information 44b for managing customers of the voice operation support system 40. 【0041】 Figure 5 shows an example of customer management information 44b. 【0042】The customer management information 44b shown in FIG. 5 includes, for each customer, a customer ID as the identification information of the customer and the name of the customer. The customer management information 44b shown in FIG. 5 is depicted with some information omitted. 【0043】 As shown in FIG. 4, the storage unit 44 further stores device management information 44c for managing the image forming apparatus. 【0044】 FIG. 6 is a diagram showing an example of the device management information 44c. 【0045】 The device management information 44c shown in FIG. 6 includes, for each image forming apparatus, a device ID as the identification information of the image forming apparatus and the customer ID of the customer to which the image forming apparatus belongs. The device management information 44c shown in FIG. 6 is depicted with some information omitted. 【0046】 As shown in FIG. 4, the storage unit 44 further stores device correspondence information 44d indicating the correspondence relationship between the image forming apparatus and the voice input apparatus. For example, the device correspondence information 44d shows that the image forming apparatus 20 and the voice input apparatus 30 are associated with each other. 【0047】 The storage unit 44 stores interpretation history information 44e for managing the history of interpretations of text by the voice operation support system 40. 【0048】 FIG. 7 is a diagram showing an example of the interpretation history information 44e. 【0049】 The interpretation history information 44e shown in FIG. 7 includes, for each history, the date and time when the voice operation support system 40 executed an interpretation on the text, the device ID of the image forming apparatus that is the target of the interpretation executed by the voice operation support system 40 on the text, the type of interpretation executed by the voice operation support system 40 on the text, and the number of characters of the text that is the target of the interpretation executed by the voice operation support system 40. The interpretation history information 44e shown in FIG. 7 is depicted with some information omitted. 【0050】The voice operation support system 40 can perform two types of interpretation: a command-type interpretation that can interpret text expressed in a specific command format, and a natural language-type interpretation that can interpret text expressed in natural language. Advantages of the command-type interpretation include, for example, shorter processing time compared to the natural language type, a lower probability of misinterpretation if the text to be interpreted is expressed in appropriate commands, and lower development and operating costs. Disadvantages of the command-type interpretation include, for example, less flexibility in the content of the text to be interpreted compared to the natural language type, requiring users to memorize commands. Advantages of the natural language-type interpretation include, for example, a higher degree of flexibility in the content of the text to be interpreted compared to the command-type. Disadvantages of the natural language-type interpretation include, for example, longer processing time compared to the command-type, a higher probability of misinterpretation, and higher development and operating costs. In the voice operation support system 40, the natural language-type interpretation includes a standard specification and a high-spec version capable of performing interpretations of a higher difficulty than the standard specification. 【0051】 As shown in Figure 4, the memory unit 44 stores fee structure information 44f that indicates the fee structure for the interpretation of text by the voice operation support system 40. 【0052】 Figure 8 shows an example of the fee structure information 44f. 【0053】 The fee structure information 44f shown in Figure 8 includes the per-interpretation fee for command-type interpretation, the per-interpretation fee for standard-spec natural language interpretation, and the per-interpretation fee for high-spec natural language interpretation. 【0054】The control unit 45 shown in Figure 4 includes, for example, a processor equipped with a CPU, a ROM that stores programs and various data, and RAM as memory used as a working area for the CPU of the control unit 45. The processor of the control unit 45 executes programs stored in the storage unit 44 or the ROM of the control unit 45. By executing programs stored in the storage unit 44 or the ROM of the control unit 45, the processor operates as a control unit 450. The processor is an example of a computer within the scope of the claims. 【0055】 The processor in the control unit 45 operates as a receiving unit 45a that receives information from the image forming apparatus and the voice input device, a transmitting unit 45b that transmits information to the image forming apparatus, an interpretation control unit 45c that controls the execution of the interpretation of the information received from the voice input device, a command interpretation unit 45d that interprets the information received from the voice input device, a basic interpretation unit 45e that interprets the information received from the voice input device, and an advanced interpretation unit 45f that interprets the information received from the voice input device, by executing the voice operation support program 44a. 【0056】 The command interpretation unit 45d performs command-type interpretation. That is, the command interpretation unit 45d can interpret the information received from the voice input device when the information received from the voice input device is expressed in a specific command format, and converts the information received from the voice input device into an instruction for the image forming apparatus. 【0057】The basic interpretation unit 45e and the advanced interpretation unit 45f each perform natural language type interpretation. That is, the basic interpretation unit 45e and the advanced interpretation unit 45f are each capable of interpreting information received from the voice input device when the information received from the voice input device is expressed in natural language, and convert the information received from the voice input device into commands for the image forming apparatus. The basic interpretation unit 45e performs standard specification natural language type interpretation. The advanced interpretation unit 45f performs high specification natural language type interpretation. That is, the advanced interpretation unit 45f is capable of performing interpretations of a higher difficulty level compared to the basic interpretation unit 45e. 【0058】 Next, the operation of the image forming system 10 will be described. 【0059】 In the following, the image forming apparatus 20 will be described as a representative example of the image forming apparatus. Similarly, the voice input device 30 will be described as a representative example of the voice input device. 【0060】 First, we will explain the operation of the image forming system 10 when the image forming apparatus 20 displays a screen showing the result of the interpretation of the voice input into the voice input device 30 by the voice operation support system 40 (hereinafter referred to as the "interpretation result screen"). 【0061】 Figure 9 is a sequence diagram of the operation of the image forming system 10 when the image forming apparatus 20 displays the interpretation result screen. 【0062】 The user inputs instructions for the image forming apparatus 20 via the microphone 33 of the voice input device 30. For example, the user might input the voice command "Adjust the color balance to make the blue a little stronger" into the voice input device 30 as an instruction for the image forming apparatus 20. 【0063】 When an operation is input by voice, the voice processing unit 37a of the voice input device 30 converts the input voice into text (S101), as shown in Figure 9. 【0064】 When the processing in S101 is completed, the voice processing unit 37a transmits the text generated in S101 from the communication unit 35 to the voice operation support system 40 (S102). 【0065】 In S102, when the receiving unit 45a of the voice operation support system 40 receives the text transmitted by the voice input device 30 with the communication unit 43, it passes the text received from the voice input device 30 to the interpretation control unit 45c. When the interpretation control unit 45c receives the text received by the voice operation support system 40 from the voice input device 30 from the receiving unit 45a, it passes the text received from the receiving unit 45a to the command interpretation unit 45d. When the command interpretation unit 45d receives the text from the interpretation control unit 45c, it performs a command-type interpretation on the text received from the interpretation control unit 45c, thereby converting this text into an instruction for the image forming apparatus 20 (S103). 【0066】 When the processing in S103 is completed, the command interpretation unit 45d passes the result of the command type interpretation in S103 (hereinafter referred to as the "command type interpretation result") to the interpretation control unit 45c. That is, if the command interpretation unit 45d was able to generate an instruction in S103, it passes information including the instruction generated in S103 to the interpretation control unit 45c as the command type interpretation result, and if it was not able to generate an instruction in S103, it passes information indicating that it was not able to generate an instruction to the interpretation control unit 45c as the command type interpretation result. When the interpretation control unit 45c receives the command type interpretation result from the command interpretation unit 45d, it passes the text received from the receiving unit 45a and the command type interpretation result received from the command interpretation unit 45d to the transmission unit 45b. When the transmission unit 45b receives the text and the command type interpretation result from the interpretation control unit 45c, it transmits the text and the command type interpretation result received from the interpretation control unit 45c to the image forming apparatus 20 which is associated with the voice input device 30 in the device correspondence information 44d via the communication unit 43 (S104). 【0067】 When the command execution unit 28a of the image forming apparatus 20 receives the text and command-type interpretation results transmitted by the voice operation support system 40 via the communication unit 25 in S104, it notifies the voice operation support system 40 of the success of receiving the text and command-type interpretation results (S105). 【0068】When the receiving unit 45a of the voice operation support system 40 receives the notification from the image forming apparatus 20 in S105 via the communication unit 43, it passes the notification from the image forming apparatus 20 in S105 to the interpretation control unit 45c. When the interpretation control unit 45c receives the notification from the image forming apparatus 20 in S105 from the receiving unit 45a, it writes the history of command-type interpretation performed on the image forming apparatus 20 to the interpretation history information 44e (S106). Here, the interpretation control unit 45c writes the date and time of the process in S103 as the date and time in the history written to the interpretation history information 44e. 【0069】 When the command execution unit 28a of the image forming apparatus 20 receives the text and command-type interpretation results transmitted by the voice operation support system 40 in S104, it displays the interpretation result screen, including the text and command-type interpretation results received from the voice operation support system 40, on the display unit 22 (S107). 【0070】 Figure 10 shows an example of the interpretation result screen 50 displayed during the operation shown in Figure 9. 【0071】 The interpretation result screen 50 shown in Figure 10 includes a speech recognition result display area 51 showing text received from the voice operation support system 40, a command candidate display area 52a as a candidate display area showing command-type interpretation results received from the voice operation support system 40, a basic candidate display area 53a as a candidate display area showing the result of standard specification natural language type interpretation (hereinafter referred to as "standard specification interpretation result"), a basic interpretation execution button 53b for executing standard specification natural language type interpretation, an advanced candidate display area 54a as a candidate display area showing the result of high specification natural language type interpretation (hereinafter referred to as "high specification interpretation result"), a high specification interpretation execution button 54b for executing high specification natural language type interpretation, an command execution button 55 for executing an interpretation result command shown in any of the command candidate display area 52a, basic candidate display area 53a, and advanced candidate display area 54a, and a cancel button 56 for canceling the execution of voice operation. 【0072】If the command execution unit 28a has not yet received the standard specification interpretation result from the voice operation support system 40, it displays "<Not Executed>" in the basic candidate display area 53a and displays the basic interpretation execution button 53b on the interpretation result screen 50. If the command execution unit 28a has not yet received the high specification interpretation result from the voice operation support system 40, it displays "<Not Executed>" in the advanced candidate display area 54a and displays the high specification interpretation execution button 54b on the interpretation result screen 50. 【0073】 If the command execution unit 28a receives information indicating that it was not possible to generate an instruction, it displays "Not applicable" in the command candidate display area 52a. 【0074】 The command execution unit 28a grays out the command execution button 55 and makes it inoperable if either "Not applicable" or "<Not executed>" is displayed in any of the command candidate display area 52a, basic candidate display area 53a, or advanced candidate display area 54a. In other words, the command execution unit 28a makes the command execution button 55 operable only if information other than either "Not applicable" or "<Not executed>" is displayed in at least one of the command candidate display area 52a, basic candidate display area 53a, or advanced candidate display area 54a. 【0075】 Next, we will explain the operation of the image forming system 10 when the basic interpretation execution button 53b is pressed on the interpretation result screen 50. 【0076】 Figure 11 is a sequence diagram of the operation of the image forming system 10 when the basic interpretation execution button 53b is pressed on the interpretation result screen 50. 【0077】 The user inputs a basic interpretation execution instruction to the operation unit 21 by pressing the basic interpretation execution button 53b via the operation unit 21 on the interpretation result screen 50. 【0078】When a basic interpretation execution instruction is input to the operation unit 21 of the image forming apparatus 20, the command execution unit 28a transmits the text shown in the speech recognition result display area 51 and an instruction to execute a standard specification natural language type interpretation (hereinafter referred to as the "standard specification interpretation instruction") to the voice operation support system 40 from the communication unit 25, as shown in Figure 11 (S121). 【0079】 In S121, the receiving unit 45a of the voice operation support system 40 receives the text and standard specification interpretation instructions transmitted by the image forming apparatus 20 via the communication unit 43, and passes the text and standard specification interpretation instructions received from the image forming apparatus 20 to the interpretation control unit 45c. When the interpretation control unit 45c receives the text and standard specification interpretation instructions from the receiving unit 45a, it passes the text received from the receiving unit 45a to the basic interpretation unit 45e. When the basic interpretation unit 45e receives the text from the interpretation control unit 45c, it performs a natural language type interpretation of the standard specification on the text received from the interpretation control unit 45c, thereby converting this text into an instruction for the image forming apparatus (S122). 【0080】 When the basic interpretation unit 45e completes the processing in S122, it passes the standard specification interpretation result to the interpretation control unit 45c. That is, if the basic interpretation unit 45e was able to generate an instruction in S122, it passes information including the instruction generated in S122 to the interpretation control unit 45c as the standard specification interpretation result. If it was not able to generate an instruction in S122, it passes information indicating that it was not able to generate an instruction to the interpretation control unit 45c as the standard specification interpretation result. When the interpretation control unit 45c receives the standard specification interpretation result from the basic interpretation unit 45e, it passes the standard specification interpretation result received from the basic interpretation unit 45e to the transmission unit 45b. When the transmission unit 45b receives the standard specification interpretation result from the interpretation control unit 45c, it transmits the standard specification interpretation result received from the interpretation control unit 45c to the image forming apparatus 20 via the communication unit 43 (S123). 【0081】When the command execution unit 28a of the image forming apparatus 20 receives the standard specification interpretation result transmitted by the voice operation support system 40 in S123, it notifies the voice operation support system 40 from the communication unit 25 that the reception of the standard specification interpretation result was successful (S124). 【0082】 When the receiving unit 45a of the voice operation support system 40 receives a notification from the image forming apparatus 20 in S124, it passes the notification from the image forming apparatus 20 in S124 to the interpretation control unit 45c. When the interpretation control unit 45c receives the notification from the image forming apparatus 20 in S124 from the receiving unit 45a, it writes the history of performing a standard-spec natural language type interpretation on the image forming apparatus 20 to the interpretation history information 44e (S125). Here, the interpretation control unit 45c writes the date and time of the process in S122 as the date and time in the history to be written to the interpretation history information 44e. 【0083】 When the command execution unit 28a of the image forming apparatus 20 receives the standard specification interpretation result transmitted by the voice operation support system 40 in S123, it reflects the standard specification interpretation result received from the voice operation support system 40 on the interpretation result screen 50 and displays it on the display unit 22 (S126). 【0084】 Figure 12 shows an example of the interpretation result screen 50 after the basic interpretation execution button 53b is pressed in the state shown in Figure 10. 【0085】 The interpretation result screen 50 shown in Figure 12 displays the standard specification interpretation result received from the voice operation support system 40 in the basic candidate display area 53a. The command execution unit 28a displays "Not applicable" in the basic candidate display area 53a if the information indicating that it was not possible to generate a command is the standard specification interpretation result. 【0086】 If the instruction execution unit 28a displays an instruction that is neither "Not applicable" nor "<Not executed>" in the basic candidate display area 53a, it places a radio button 53c to the right of the basic candidate display area 53a to accept the selection of the instruction displayed in the basic candidate display area 53a. 【0087】Next, we will explain the operation of the image forming system 10 when the high-spec interpretation execution button 54b is pressed on the interpretation result screen 50. 【0088】 Figure 13 is a sequence diagram of the operation of the image forming system 10 when the high-spec interpretation execution button 54b is pressed on the interpretation result screen 50. 【0089】 The user inputs a high-spec interpretation execution instruction to the operation unit 21 by pressing the high-spec interpretation execution button 54b on the interpretation result screen 50. 【0090】 When a high-spec interpretation execution instruction is input to the image forming apparatus 20, the command execution unit 28a transmits the text shown in the speech recognition result display area 51 and the instruction to execute a high-spec natural language type interpretation (hereinafter referred to as the "high-spec interpretation instruction") to the voice operation support system 40 from the communication unit 25, as shown in Figure 13 (S141). 【0091】 In S141, when the receiving unit 45a of the voice operation support system 40 receives the text and high-spec interpretation instructions transmitted by the image forming apparatus 20 at the communication unit 43, it passes the text and high-spec interpretation instructions received from the image forming apparatus 20 to the interpretation control unit 45c. When the interpretation control unit 45c receives the text and high-spec interpretation instructions from the receiving unit 45a, it passes the text received from the receiving unit 45a to the advanced interpretation unit 45f. When the advanced interpretation unit 45f receives the text from the interpretation control unit 45c, it performs a high-spec natural language type interpretation on the text received from the interpretation control unit 45c, thereby converting this text into an instruction for the image forming apparatus (S142). 【0092】When the processing in S142 is completed, the advanced interpretation unit 45f passes the high-spec interpretation result to the interpretation control unit 45c. That is, if the advanced interpretation unit 45f was able to generate an instruction in S142, it passes information including the instruction generated in S142 to the interpretation control unit 45c as the high-spec interpretation result, and if it was not able to generate an instruction in S142, it passes information indicating that it was not able to generate an instruction to the interpretation control unit 45c as the high-spec interpretation result. When the interpretation control unit 45c receives the high-spec interpretation result from the advanced interpretation unit 45f, it passes the high-spec interpretation result received from the advanced interpretation unit 45f to the transmission unit 45b. When the transmission unit 45b receives the high-spec interpretation result from the interpretation control unit 45c, it transmits the high-spec interpretation result received from the interpretation control unit 45c to the image forming apparatus 20 via the communication unit 43 (S143). 【0093】 When the command execution unit 28a of the image forming apparatus 20 receives the high-spec interpretation result transmitted by the voice operation support system 40 in S143, it notifies the voice operation support system 40 from the communication unit 25 that the high-spec interpretation result has been successfully received (S144). 【0094】 When the receiving unit 45a of the voice operation support system 40 receives the notification from the image forming apparatus 20 in S144 via the communication unit 43, it passes the notification from the image forming apparatus 20 in S144 to the interpretation control unit 45c. When the interpretation control unit 45c receives the notification from the image forming apparatus 20 in S144 from the receiving unit 45a, it writes a history of performing a high-spec natural language type interpretation on the image forming apparatus 20 to the interpretation history information 44e (S145). Here, the interpretation control unit 45c writes the date and time of the process in S142 as the date and time in the history to be written to the interpretation history information 44e. 【0095】 In S143, the command execution unit 28a of the image forming apparatus 20 receives the high-spec interpretation result transmitted by the voice operation support system 40 via the communication unit 25, and then reflects the high-spec interpretation result received from the voice operation support system 40 on the interpretation result screen 50 and displays it on the display unit 22 (S146). 【0096】Figure 14 shows an example of the interpretation result screen 50 after the high-spec interpretation execution button 54b is pressed in the state shown in Figure 10. 【0097】 The interpretation result screen 50 shown in Figure 14 displays the high-spec interpretation result received from the voice operation support system 40 in the advanced candidate display area 54a. The command execution unit 28a displays "Not applicable" in the advanced candidate display area 54a if the high-spec interpretation result indicates that it was not possible to generate a command. 【0098】 If the instruction execution unit 28a displays an instruction in the advanced candidate display area 54a that is neither "Not applicable" nor "<Not executed>", it places a radio button 54c to the right of the advanced candidate display area 54a for selecting the instruction displayed in the advanced candidate display area 54a. 【0099】 Figure 15 shows an example of the interpretation result screen 50 after the high-spec interpretation execution button 54b is pressed in the state shown in Figure 12. 【0100】 The interpretation result screen 50 shown in Figure 15 displays the high-spec interpretation result received from the voice operation support system 40 in the advanced candidate display area 54a. 【0101】 The command execution unit 28a of the image forming apparatus 20 displays the interpretation result screen 50 with only one of the radio buttons selected, when multiple radio buttons are displayed. For example, the interpretation result screen 50 shown in Figure 15 shows that only one of the radio buttons 53c and 54c (radio button 53c) is selected. 【0102】 Figure 16 shows an example of an interpretation result screen 50 for a state different from the states shown in Figures 10, 12, 14, and 15. 【0103】The interpretation result screen 50 shown in Figure 16 is an example of a case where an instruction that is neither "Not Applicable" nor "<Not Executed>" is displayed in the command candidate display area 52a. When the instruction execution unit 28a displays an instruction that is neither "Not Applicable" nor "<Not Executed>" in the command candidate display area 52a, it displays a radio button 52c to the right of the command candidate display area 52a to accept the selection of the instruction shown in the command candidate display area 52a. The interpretation result screen 50 shown in Figure 16 shows a state where only one of the radio buttons 52c, 53c, and 54c (radio button 53c) is selected. 【0104】 Next, we will explain the operation of the image forming apparatus 20 when executing the command selected on the interpretation result screen 50. 【0105】 Figure 17 is a flowchart showing the operation of the image forming apparatus 20 when executing the command selected on the interpretation result screen 50. 【0106】 When the interpretation result screen 50 is displayed on the display unit 22, the command execution unit 28a of the image forming apparatus 20 performs the operation shown in Figure 17. 【0107】 If the command execution button 55 is operable on the interpretation result screen 50, the user inputs a command execution instruction by operating the command execution button 55 via the operation unit 21 on the interpretation result screen 50. 【0108】 As shown in Figure 17, the command execution unit 28a of the image forming apparatus 20 determines whether or not a command execution instruction has been input to the operation unit 21 (S161). 【0109】When the command execution unit 28a determines in S161 that a command execution instruction has been input, it executes the instruction displayed in the candidate display area corresponding to the radio button selected at the time the command execution instruction was input (S162). Specifically, if radio button 52c is selected at the time the command execution instruction is input, the command execution unit 28a executes the instruction displayed in the command candidate display area 52a corresponding to radio button 52c. If radio button 53c is selected at the time the command execution instruction is input, the command execution unit 28a executes the instruction shown in the basic candidate display area 53a corresponding to radio button 53c. If radio button 54c is selected at the time the command execution instruction is input, the command execution unit 28a executes the instruction shown in the advanced candidate display area 54a corresponding to radio button 54c. 【0110】 When the processing in S162 is completed, the instruction execution unit 28a stops displaying the interpretation result screen 50 on the display unit 22 (S163). After this, the operation shown in Figure 17 is terminated. 【0111】 Next, we will describe the operation of the voice operation support system 40 when it performs a charge for interpreting text. 【0112】 Figure 18 is a flowchart showing the operation of the voice operation support system 40 when charging for text interpretation. 【0113】 The interpretation control unit 45c of the voice operation support system 40 performs the actions shown in Figure 18 for a specific customer at specific times, such as every day. 【0114】 As shown in Figure 18, the interpretation control unit 45c identifies in the interpretation history information 44e a history in which the date and time when interpretation was performed on text by the voice operation support system 40 falls within a specific period and is associated with the device ID of the image forming apparatus of the target customer (S181). Here, the interpretation control unit 45c identifies the device ID of the image forming apparatus of the target customer based on the customer management information 44b and the device management information 44c. 【0115】When the processing in S181 is completed, the interpretation control unit 45c calculates the charges for the target customer for a specific period based on the history identified in S181 and the fee structure information 44f (S182). After this, the operation shown in Figure 18 is completed. 【0116】 Furthermore, the interpretation control unit 45c can, for example, respond to customer instructions via a computer (not shown) to display the charges calculated in S182 for each type of interpretation, for each image forming apparatus, or for each combination of interpretation type and image forming apparatus. 【0117】 As described above, the image forming system 10 displays commands generated by two or more of the command interpretation units 45d, basic interpretation unit 45e, and advanced interpretation unit 45f, which interpret the voice input information in different ways and convert it into commands (S126 and S146), and executes the command selected by the user from among these commands (S162), thereby improving the possibility of realizing the voice operation intended by the user. 【0118】 The image forming system 10 displays commands generated by at least one of its basic interpretation unit 45e and advanced interpretation unit 45f, which interpret the voice-input information in different ways and convert it into commands, in response to user instructions (S126 or S146). This allows the user to convert the voice-input information using their desired interpretation, improving convenience. 【0119】For example, in S107, if the command-type interpretation result shown in the command candidate display area 52a of the interpretation result screen 50 displayed on the display unit 22 represents the voice operation command intended by the user, there is no need to perform basic interpretation or high-spec interpretation, and therefore there is no need to operate the basic interpretation execution button 53b and the high-spec interpretation execution button 54b. Similarly, in S126, if the standard-spec interpretation result shown in the basic candidate display area 53a of the interpretation result screen 50 displayed on the display unit 22 represents the voice operation command intended by the user, there is no need to perform high-spec interpretation. Furthermore, in S146, if the high-spec interpretation result shown in the advanced candidate display area 54a of the interpretation result screen 50 displayed on the display unit 22 represents the voice operation command intended by the user, there is no need to perform basic interpretation. For this reason, the user can choose how to use the image forming system 10, where they only obtain the voice operation command intended by the user through another interpretation if they cannot obtain it through one interpretation. 【0120】 In this embodiment, when a high-spec interpretation execution instruction is input while "<Not Executed>" is displayed in the basic candidate display area 53a, the image forming system 10 executes the high-spec natural language interpretation without executing the standard-spec natural language interpretation. However, in this embodiment, when a high-spec interpretation execution instruction is input while "<Not Executed>" is displayed in the basic candidate display area 53a, the image forming system 10 may execute both the standard-spec natural language interpretation and the high-spec natural language interpretation. 【0121】 In this embodiment, the image forming system 10 charges based on the number of times the interpretation is performed. However, the image forming system 10 may also charge based on a quantity other than the number of times the interpretation is performed. For example, the image forming system 10 may charge based on the number of characters in the text that is the subject of the interpretation performed by the voice operation support system 40. 【0122】In this embodiment, the image forming system 10 employs a post-payment billing system. However, the image forming system 10 may also employ a pre-payment billing system. 【0123】 In this embodiment, the image forming system 10 does not impose any restrictions on the execution of interpretation. However, the image forming system 10 may restrict the execution of interpretation by setting an upper limit on the number of times interpretation is performed, or an upper limit on the number of characters in the text to be interpreted. 【0124】 In this embodiment, the image forming system 10 displays the commands generated by the command interpretation unit 45d on the interpretation result screen 50 by default. However, the image forming system 10 may also display the commands generated by the command interpretation unit 45d in accordance with user instructions, just as it does with the commands generated by the basic interpretation unit 45e and the commands generated by the advanced interpretation unit 45f. 【0125】 In this embodiment, the image forming system 10 includes three interpretation units: a command interpretation unit 45d, a basic interpretation unit 45e, and an advanced interpretation unit 45f. However, the image forming system 10 may also include only two interpretation units with different interpretation methods, or it may include four or more interpretation units with different interpretation methods. 【0126】 In this embodiment, the image forming system 10 converts the audio input to the audio input device 30 into text using the audio input device 30. However, the image forming system 10 may also convert the audio input to the audio input device 30 into text using the voice operation support system 40. 【0127】 The image forming apparatus 20 may have at least some of the functions of the voice input device 30 in this embodiment. If the image forming apparatus 20 has all of the functions of the voice input device 30 in this embodiment, the image forming system 10 does not need to have the voice input device 30. 【0128】The image forming apparatus 20 may include at least some of the functions of the voice operation support system 40 in this embodiment. If the image forming apparatus 20 includes all of the functions of the voice operation support system 40 in this embodiment, the image forming system 10 does not need to include the voice operation support system 40. 【0129】 In this embodiment, the electronic device of the present invention is an image forming apparatus. However, the electronic device of the present invention may be an electronic device other than an image forming apparatus.

Claims

1. An instruction execution system comprising: an electronic device that executes instructions; and a voice operation support system that assists in voice operation of the electronic device, wherein the voice operation support system comprises a plurality of interpreting units that interpret information input by voice in different ways and convert it into instructions; and the electronic device displays the instructions generated by each of the plurality of interpreting units, and executes the generated instructions according to execution instructions from the user.

2. The instruction execution system according to claim 1, wherein the electronic device displays the instruction generated by the interpretation unit specified by the user among a plurality of interpretation units, and executes the generated instruction in accordance with the user's execution instruction.

3. The instruction execution system according to claim 1, wherein the electronic device displays a message indicating that there is no instruction corresponding to the information if the interpretation unit is unable to convert the information input by voice into an instruction.

4. A voice operation support system that assists in the voice operation of an electronic device that executes commands, comprising a plurality of interpreting units that interpret information input by voice in different ways and convert it into commands.

5. A voice control support program for assisting the voice operation of an electronic device that executes commands, wherein the voice control support program causes a computer to operate as a plurality of interpreting units that interpret information input by voice in different ways and convert it into commands.

6. An electronic device that receives and displays commands generated by each of a plurality of interpreting units that interpret information input by voice in different ways and convert it into commands, and executes a command selected by the user from among these commands.

7. An instruction execution program that causes a computer to display the instructions generated by each of a plurality of interpreters that interpret information input by voice in different ways and convert it into instructions, and to execute the instruction selected by the user from among these instructions.