Voice recognition model switching device and computer-readable recording medium
The voice recognition model switching device addresses the challenges of response speed and resource usage by adaptively selecting models based on situational information, enhancing accuracy and speed in manufacturing environments.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- FANUC LTD
- Filing Date
- 2024-03-04
- Publication Date
- 2026-07-02
AI Technical Summary
Existing voice recognition models in manufacturing sites face challenges with response speed and resource usage when large or small models are employed, and there is a need for a flexible system to adapt to varying situations.
A voice recognition model switching device that acquires situational information from industrial machines, selects an appropriate model from a storage unit based on this information, and switches models to optimize recognition accuracy and speed.
The system maintains stability, improves recognition accuracy, and enhances response speed by dynamically selecting models based on the industrial machine's status.
Smart Images

Figure JP2024008103_02072026_PF_FP_ABST
Abstract
Description
Voice Recognition Model Switching Device and Computer-Readable Recording Medium
[0001] The present disclosure relates to a voice recognition model switching device and a computer-readable recording medium.
[0002] In a manufacturing site where industrial machines such as machine tools and robots are installed, the operation of the industrial machine may be controlled based on a voice command from a user (for example, Patent Document 1, etc.). The voice recognition model used for recognizing the voice command analyzes the input voice and outputs the character string of the voice. This voice recognition model uses a model suitable for the usage environment.
[0003] Japanese Unexamined Patent Application Publication No. 2020-042420
[0004] When using a large-sized voice recognition model that can recognize various voices, the response speed to the voice may become slow or a large amount of resources may be required. Conversely, when using a small-sized voice recognition model, the recognition rate decreases. It is difficult to pre-select one voice recognition model that can handle various situations that may occur in the manufacturing site. On-site, a voice recognition mechanism that can flexibly respond to situations is desired.
[0005] The voice recognition model switching device according to the present disclosure solves the above problems by switching the voice recognition model according to situations such as the usage environment and the work content.
[0006] And one aspect of the present disclosure is a voice recognition model switching device including a data acquisition unit that acquires information related to the situation of an industrial machine operated based on a voice command, a model selection unit that selects a voice recognition model corresponding to the information related to the situation acquired by the data acquisition unit from a model storage unit that holds a plurality of voice recognition models, and a model switching unit that switches to use the voice recognition model selected by the model selection unit for recognizing the voice command.
[0007] [Correction based on Rule 91 15.04.2026] This is a schematic hardware configuration diagram of a speech recognition model switching device according to a first embodiment of the present disclosure. This is a block diagram showing the schematic functions of a speech recognition model switching device according to a first embodiment. This is a table diagram showing examples of multiple speech recognition models stored in the model storage unit. This is a table diagram showing other examples of multiple speech recognition models stored in the model storage unit. This is a block diagram showing the schematic functions of a speech recognition model switching device according to another embodiment.
[0008] Embodiments of this disclosure will be described below with reference to the drawings. In the following description, components having the same or similar functions will be denoted by the same reference numerals. Duplication of these components may be omitted.
[0009] In this application, "based on XX" means "based on at least XX," and includes cases where it is based on another element in addition to XX. Furthermore, "based on XX" is not limited to cases where XX is used directly, but also includes cases where it is based on something that has been calculated or processed. "XX" is any element (for example, any information).
[0010] [First Embodiment] Figure 1 is a schematic hardware configuration diagram showing the main parts of a speech recognition model switching device according to the first embodiment of the present disclosure. The speech recognition model switching device 1 according to this embodiment can be implemented, for example, on a control device that controls industrial machinery. The speech recognition model switching device 1 can also be implemented, for example, on a personal computer attached to the control device, or on a personal computer, cell computer, fog computer 6, cloud server 7, or other computer connected to the control device via a wired / wireless network. In this embodiment, an example is shown in which the speech recognition model switching device 1 is implemented on a computer connected to a control device that controls industrial machinery via a network.
[0011] The CPU 11 in the speech recognition model switching device 1 according to this embodiment is a processor that controls the speech recognition model switching device 1 as a whole. The CPU 11 reads the system program stored in the ROM 12 via the bus 22 and controls the entire speech recognition model switching device 1 according to the system program. The RAM 13 temporarily stores temporary calculation data, display data, and various data acquired from external sources.
[0012] The non-volatile memory 14 is composed of, for example, a memory backed up by a battery (not shown) or an SSD (Solid State Drive), and its stored state is maintained even when the power to the voice recognition model switching device 1 is turned off. The non-volatile memory 14 stores programs and data read from external devices 72 via the interface 15, programs and data input via the input device 71, and programs and data acquired from the control device 3 that controls the industrial machine 4 or other devices via the network 5. The programs and data stored in the non-volatile memory 14 may be expanded into the RAM 13 when executed / used. In addition, various system programs, such as known analysis programs, are pre-written in the ROM 12.
[0013] Interface 15 is an interface for connecting the CPU 11 of the speech recognition model switching device 1 to an external device 72 such as a USB device. System programs, configuration data, etc., can be read from the external device 72. Programs and configuration data created or edited within the speech recognition model switching device 1 can also be stored in an external storage means via the external device 72.
[0014] Interface 20 is an interface for connecting the CPU 11 of the speech recognition model switching device 1 to a wired or wireless network 5. The network 5 may communicate using technologies such as serial communication (e.g., RS-485), Ethernet® communication, optical communication, wireless LAN, Wi-Fi®, Bluetooth®, etc. At least one control device 3 for controlling an industrial machine 4, a fog computer 6, a cloud server 7, etc., are connected to the network 5, and they exchange data with the speech recognition model switching device 1.
[0015] The display device 70 displays data obtained as a result of the execution of various data, programs, etc., loaded into memory, via the interface 17. In addition, the input device 71, which consists of at least one input device such as a keyboard, pointing device, voice input device, and imaging device, passes commands, data, etc., based on user operations to the CPU 11 via the interface 18.
[0016] The speech recognition model switching device 1 according to this embodiment switches the speech recognition model used for recognizing voice commands in the control device 3 that controls the industrial machine 4. The control device 3 receives voice commands from the user and performs voice recognition on the voice commands. The speech recognition model switching device 1 supplies the speech recognition model to be used for this voice recognition to the control device 3 via the network 5. The control device 3 uses the speech recognition model supplied from the speech recognition model switching device 1 for voice recognition of voice commands from the user.
[0017] Figure 2 is a schematic block diagram showing the functions of the speech recognition model switching device 1 according to the first embodiment of this disclosure. Each function of the speech recognition model switching device 1 according to this embodiment is realized by the CPU 11 of the speech recognition model switching device 1 shown in Figure 1 executing a system program and controlling the operation of each part of the speech recognition model switching device 1.
[0018] The speech recognition model switching device 1 of this embodiment includes a data acquisition unit 100, a model selection unit 110, and a model switching unit 120. Furthermore, the RAM 13 to non-volatile memory 14 of the speech recognition model switching device 1 is provided with a model storage unit 200, which is an area that stores multiple speech recognition models in advance.
[0019] [Correction based on Rule 91 15.04.2026] The data acquisition unit 100 acquires information relating to the status of the industrial machine 4 from the control device 3 of the industrial machine 4, which is operated based on voice commands. Information relating to the status of the industrial machine 4 includes, for example, information relating to the operating environment of the industrial machine 4. An example of information relating to the operating environment of the industrial machine 4 is the level of noise such as machine operation sounds and other people's voices in the surrounding area where the industrial machine 4 is installed. If the surrounding noise is high, it may be difficult to recognize the voice commands spoken by the user of the industrial machine 4. Another example of information relating to the operating environment of the industrial machine 4 is the performance of a voice input device (not shown), such as a microphone, which acquires voice commands for controlling the industrial machine 4. Depending on the performance of the voice input device, there may be changes in the recognition of the voice commands spoken by the user of the industrial machine 4. Furthermore, an example of information relating to the operating environment of the industrial machine 4 is the load status of the control device 3 that controls the industrial machine 4. The load status and memory usage of the control device 3 that performs voice recognition may affect the operating speed of the voice recognition process. Thus, the information relating to the operating environment of the industrial machine 4 may be information relating to the control device 3 and the environment inside and outside the industrial machine 4 that affects the recognition accuracy and recognition speed of voice commands related to the operation of the industrial machine 4.
[0020] Information relating to the status of the industrial machine 4 may, for example, be information relating to the content of work performed on the industrial machine 4. Examples of information relating to the content of work performed on the industrial machine 4 include information indicating the type of work, such as setting work, preparation work, automatic operation, and inspection work. Depending on these differences in work content, there is a desire to prioritize the recognition of frequently used words and important words. Another example of information relating to the content of work performed on the industrial machine 4 is information indicating the operating state of the industrial machine 4. For example, the set of voice commands used for operation may differ depending on whether a tool is being changed, a workpiece is being changed, or the workpiece and tool are in contact and processing is in progress. Thus, information relating to the content of work performed on the industrial machine 4 may be information relating to the state of the control device 3 and the industrial machine 4 such that the types of voice commands used to operate the industrial machine and the types of voice commands that should be recognized with high accuracy change.
[0021] The data acquisition unit 100 acquires information related to the status of the industrial machine 4 from sources such as information set in the industrial machine 4 and the control device 3, information indicating the operating status of processes running on the control device 3, the type of screen displayed on the display device (not shown) of the control device 3, feedback information from drive units such as motors (not shown), information indicating the status of each signal input to the control device 3, and information acquired from sensors attached to the industrial machine 4. The acquired information related to the status of the industrial machine 4 is then output to the model selection unit 110.
[0022] The model selection unit 110 selects a speech recognition model corresponding to the situation information acquired by the data acquisition unit 100 from among multiple speech recognition models stored in the model storage unit 200. Multiple speech recognition models are pre-stored in the model storage unit 200. Each speech recognition model is associated with and stored information related to its performance corresponding to its respective situation. Figure 3 is a table diagram showing an example of speech recognition models stored in the model storage unit. In the example in Figure 3, free speech model A has high noise immunity for speech recognition, but uses a lot of memory and has a high execution load. Free speech model C has low memory usage and a low execution load, but has low noise immunity for speech recognition. When such models are stored, the model selection unit 110 selects free speech model A if the information regarding the situation of the industrial machine 4 indicates that there is a lot of ambient noise. The model selection unit 110 also selects free speech model C if the information regarding the situation of the industrial machine 4 indicates that there is little ambient noise. On the other hand, the model selection unit 110 selects free speech model C if the information regarding the status of the industrial machine 4 indicates high memory usage, and selects free speech model A if the information indicates low memory usage. The information regarding the status of the industrial machine 4 considered in model selection may each be given priority. For example, in the above example, memory usage and execution load status may be given priority over noise immunity. This is because if there are not enough resources to perform speech recognition processing using the speech recognition model in the first place, speech recognition itself will not be possible. Such priorities can be set in advance at the design stage. In addition to this method, for example, the information indicating the status of each industrial machine 4 may be quantified and weighted, and the speech recognition model may be selected using the evaluation value calculated from that.
[0023] Figure 4 is a table diagram showing other examples of speech recognition models stored in the model memory unit. In the example in Figure 4, numerical model A is a model designed to recognize words related to numbers with high accuracy. Inspection model A is a model designed to recognize words frequently used in inspection work with high accuracy. The general model is designed to recognize words commonly used in manufacturing sites with reasonable accuracy, without specializing in any particular task. When such models are stored, the model selection unit 110 selects numerical model A if the information regarding the status of the industrial machine 4 indicates that the current user is performing setting work such as setting an offset value. The model selection unit 110 also selects inspection model A if the information regarding the status of the industrial machine 4 indicates that an inspection is in progress.
[0024] The model selection unit 110 can also select a model by combining information relating to the status of the industrial machine 4, specifically information relating to the operating environment of the industrial machine 4 and information relating to the work performed on the industrial machine 4. For example, multiple models are prepared for each type of work and according to the operating environment for each type of work to be stored in the model storage unit 200. Then, based on the information relating to the operating environment of the industrial machine 4 and the work performed on the industrial machine 4 acquired by the data acquisition unit 100, a voice recognition model suitable for that situation can be selected from among the prepared multiple models.
[0025] The model selection unit 110 outputs the model selected in this manner to the model switching unit 120.
[0026] The model switching unit 120 switches to use the voice recognition model selected by the model selection unit 110 for voice recognition of voice commands. The model switching unit 120 transmits the voice recognition model selected by the model selection unit 110 to the control device 3 that controls the industrial machine 4 via the network 5 and instructs it to use it for voice recognition of voice commands. The control device 3, upon receiving this instruction, then uses the voice recognition model sent from the model switching unit 120 to recognize subsequent voice commands received from the user.
[0027] The voice recognition model switching device 1 according to this embodiment, having the above configuration, can select a voice recognition model according to the status of the industrial machine 4 and use the selected voice recognition model for recognizing voice commands. As a result, it is expected that the stability of voice recognition operation according to the status of the industrial machine 4 will be maintained, the accuracy of voice recognition will be improved, and the response speed will be improved.
[0028] [Second Embodiment] The following describes a speech recognition model switching device according to a second embodiment of the present disclosure. The speech recognition model switching device 1 according to this embodiment has the same hardware configuration as the speech recognition model switching device 1 according to the first embodiment.
[0029] The speech recognition model switching device 1 of this embodiment, like the speech recognition model switching device 1 of the first embodiment, includes a data acquisition unit 100, a model selection unit 110, and a model switching unit 120. Furthermore, the RAM 13 to non-volatile memory 14 of the speech recognition model switching device 1 is provided with a model storage unit 200, which is an area that stores multiple speech recognition models in advance.
[0030] The function of the model switching unit 120 in the speech recognition model switching device 1 according to this embodiment is the same as that of the model switching unit 120 according to the first embodiment.
[0031] The data acquisition unit 100 acquires information relating to the status of the industrial machine 4 from the control device 3 of the industrial machine 4, which is operated based on voice commands. In this embodiment, the data acquisition unit 100 accepts the specification of a voice recognition model by the user of the industrial machine 4 as information relating to the status of the industrial machine 4. This specification of the voice recognition model may directly specify the name or identification information of the model. For example, if the voice recognition model illustrated in Figure 4 is stored in the model storage unit 200, the voice command "Switch the voice recognition model to numerical model A" can be accepted as information relating to the status of the industrial machine 4. Alternatively, this specification of the voice recognition model may indirectly specify a change in the model. For example, if the voice recognition model illustrated in Figure 3 is stored in the model storage unit 200, the voice command "Switch to a voice recognition model with higher noise immunity" can be accepted as information relating to the status of the industrial machine 4. At this time, information indicating which voice recognition model is currently being used by the control device 3 may also be accepted. The data acquisition unit 100 outputs the specified voice recognition model received as information relating to the status of the industrial machine 4 to the model selection unit 110.
[0032] The model selection unit 110 selects a speech recognition model corresponding to the status information acquired by the data acquisition unit 100 from among a plurality of speech recognition models stored in the model storage unit 200. In this embodiment, when the model selection unit 110 receives a specification for a speech recognition model as information relating to the status of the industrial machine 4, it selects the specified speech recognition model from among a plurality of speech recognition models stored in the model storage unit 200. If the name or identification information of the speech recognition model is specified directly, the model selection unit 110 selects the specified speech recognition model. If the specification is indirect, relating to the performance of the speech recognition model, it selects a speech recognition model that satisfies the specified performance by comparing it with the currently used speech recognition model.
[0033] The model selection unit 110 outputs the selected model to the model switching unit 120. The model switching unit 120 then switches to using the selected speech recognition model for speech recognition of voice commands.
[0034] The voice recognition model switching device 1 according to this embodiment, which has the above configuration, can switch voice recognition models based on voice commands from the user. As a result, the user, who is aware of the status of the industrial machine 4, can switch to the appropriate voice recognition model.
[0035] [Other Embodiments] In the embodiments described above, the control device 3 is configured to perform voice recognition of voice commands. However, as illustrated in Figure 5, for example, the functions of the voice recognition model switching device 1 and the voice recognition unit 130 may be provided on a PC installed alongside the control device 3. In this case, the control device 3 is operated based on the voice commands recognized on the PC. The model switching unit 120 then switches the voice recognition model used in the voice recognition unit 130.
[0036] Furthermore, the above-described embodiment shows a configuration in which the model storage unit 200 is provided on the speech recognition model switching device 1. However, the model storage unit 200 may be provided on other devices such as a fog computer 6 or a cloud server 7. In this case, the speech recognition model switching device 1 accesses the model storage unit 200 via the network 5. With this configuration, it becomes possible to centrally manage speech recognition models in a manufacturing site where many speech recognition model switching devices 1, control devices 3, and industrial machines 4 are installed.
[0037] While embodiments of this disclosure have been described in detail above, this disclosure is not limited to the individual embodiments described above. These embodiments can be added, replaced, modified, partially deleted, etc., in any way that does not depart from the spirit of the invention or from the idea and intent of this disclosure derived from the claims and their equivalents. For example, the order of operations and processes in the embodiments described above are shown as examples only and are not limited thereto. The same applies when numerical values or mathematical formulas are used in the description of the embodiments described above.
[0038] The following are annotations relating to embodiments of the present disclosure. (Annotation 1) A speech recognition model switching device (1) according to one aspect of the present disclosure includes: a data acquisition unit (100) that acquires information relating to the status of an industrial machine (4) operated based on a voice command; a model selection unit (110) that selects a speech recognition model corresponding to the status information acquired by the data acquisition unit (100) from a model storage unit (200) that holds a plurality of speech recognition models; and a model switching unit (120) that switches the speech recognition model selected by the model selection unit (110) to be used for recognizing the voice command.
[0039] (Note 2) The information relating to the status of the industrial machine (4) acquired by the speech recognition model switching device (1) according to another aspect of this disclosure is at least one of the information relating to the operating environment of the industrial machine (4) and the information relating to the work content of the work performed on the industrial machine (4). (Note 3) The information relating to the status of the industrial machine (4) acquired by the speech recognition model switching device (1) according to another aspect of this disclosure is the designation of the speech recognition model by the voice command of the user of the industrial machine (4).
[0040] (Note 4) A computer-readable recording medium according to one aspect of the present disclosure records a program that causes the computer to operate as a data acquisition unit (100) that acquires information relating to the status of an industrial machine (4) operated based on voice commands, a model selection unit (110) that selects a voice recognition model corresponding to the status information acquired by the data acquisition unit (100) from a model storage unit (200) that holds a plurality of voice recognition models, and a model switching unit (120) that switches the voice recognition model selected by the model selection unit (110) to be used for recognizing the voice commands.
[0041] 1. Voice recognition model switching device 3. Control device 4. Industrial machine 5. Network 6. Fog computer 7. Cloud server 11. CPU 12. ROM 13. RAM 14. Non-volatile memory 15, 17, 18, 20. Interface 22. Bus 70. Display device 71. Input device 72. External device 100. Data acquisition unit 110. Model selection unit 120. Model switching unit 130. Voice recognition unit 200. Model storage unit
Claims
1. A voice recognition model switching device comprising: a data acquisition unit that acquires information relating to the status of an industrial machine operated based on voice commands; a model selection unit that selects a voice recognition model corresponding to the status information acquired by the data acquisition unit from a model storage unit that holds a plurality of voice recognition models; and a model switching unit that switches the voice recognition model selected by the model selection unit to be used for recognizing the voice commands.
2. The voice recognition model switching device according to claim 1, wherein the information relating to the status of the industrial machine is at least one of the information relating to the operating environment of the industrial machine and the information relating to the work content of the work performed on the industrial machine.
3. The voice recognition model switching device according to claim 1, wherein the information relating to the status of the industrial machine is the designation of a voice recognition model by a voice command of the user of the industrial machine.
4. A computer-readable recording medium that records a program causing a computer to operate as: a data acquisition unit that acquires information relating to the status of industrial machinery operated based on voice commands; a model selection unit that selects a voice recognition model corresponding to the status information acquired by the data acquisition unit from a model storage unit that holds multiple voice recognition models; and a model switching unit that switches the voice recognition model selected by the model selection unit to be used for recognizing the voice commands.