A method and system for voice recognition control of machine tools

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By synchronously collecting and recognizing the direct and reflected sounds of voice commands in an industrial workshop, calculating the number of repetitions and delay time within a time interval, and generating a time vector, the problem of misjudging the source of voice commands is solved, the accuracy of sound source localization and the reliability of operation signals are improved, and industrial safety risks are reduced.

CN122201274APending Publication Date: 2026-06-12NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Filing Date: 2026-02-06
Publication Date: 2026-06-12

Application Information

Patent Timeline

06 Feb 2026

Application

12 Jun 2026

Publication

CN122201274A

IPC: G10L15/20; G10L15/22; G10L15/08; G10L15/26; G10L25/03; G10L25/27; G10L25/51

AI Tagging

Application Domain

Speech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In complex acoustic environments, voice commands undergo multipath reflection and superposition during propagation, leading to misjudgment of the source of the voice command, insufficient spatial resolution, increased industrial safety risks, and decreased production efficiency.

⚗Method used

By arranging several sound sensors at preset workstations, the direct and reflected sounds of voice commands are collected synchronously. The system performs text recognition and divides time intervals, calculates the number of repetitions to predict the delay time, and generates Class I and Class II time vectors. Based on these vectors, the system calculates the preliminary spatial features of the voice command sound source and performs comparison and verification to ensure the accuracy of the operation signal.

🎯Benefits of technology

It significantly improves the accuracy of sound source localization, reduces localization errors caused by reflected sound interference, ensures accurate judgment of sound source location, and improves the reliability of operating signals and the overall performance of the system.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201274A_ABST

Patent Text Reader

Abstract

The application provides a method and system for controlling a machine tool by voice recognition; the system comprises an instruction mapping module, a voice input and recognition module, and a voice processing module. The application also relates to a method for controlling a machine tool by voice recognition. The application synchronously collects direct sound and reflected sound of voice instructions, performs text recognition on the two, divides time intervals, calculates the number of repetitions in each interval, and thus predicts the delay time. Based on this, the arrival time of the direct sound and the reflected sound received by each sound sensor is set, a type of time vector and a type of time vector are generated. This process not only effectively distinguishes the direct sound and the reflected sound, but also provides accurate time data support for subsequent sound source positioning and operation signal verification through the generation of time vectors, significantly improving the accuracy of sound source positioning.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of CNC machine tool control technology; and more particularly to a method and system for controlling machine tools using voice recognition. Background Technology

[0002] In the current trend of intelligent development in manufacturing, existing voice recognition CNC machine tool systems have improved the convenience of operation to a certain extent. By pre-setting the associated templates corresponding to different voice information, a bridge is built between voice and machine tool control. After the operator issues a voice command, the system can quickly and accurately retrieve the corresponding control command based on the preset associated template and input it into the CNC machine tool for execution in a timely manner.

[0003] In complex acoustic environments such as industrial workshops, voice recognition technology is increasingly used to control machine tools and start and stop equipment. However, due to the presence of numerous hard reflective surfaces and operating equipment in workshops, voice commands undergo multiple reflections and superpositions during propagation. Sound signals often arrive at the sound acquisition device simultaneously along multiple paths, resulting in multipath interference in the collected voice commands. Existing voice recognition technologies largely rely on single-channel and simple beamforming methods, making it difficult to effectively distinguish between direct and reflected sound. This leads to increased positioning errors and insufficient spatial resolution of voice command sources. In situations where workstations are adjacent or multiple people are working simultaneously, this positioning error may cause the system to misjudge the location of the voice command, thus incorrectly triggering machine tool actions, leading to potential industrial safety risks and decreased production efficiency. Summary of the Invention

[0004] The purpose of this invention is to provide a method and system for controlling machine tools by voice recognition. This invention addresses the shortcomings of existing technologies by solving the problem that sound in the workshop can enter the sound collection device through multiple paths, leading to misjudgment of the source of voice commands, which in turn results in insufficient spatial resolution of the voice system and thus causes industrial risks.

[0005] This invention is achieved through the following technical solution:

[0006] This invention relates to a method for controlling a machine tool using voice recognition, comprising the following steps:

[0007] Step S1: Collect machine tool language commands and corresponding operation signals, establish a mapping relationship between machine tool voice commands and operation signals, and generate a command library;

[0008] Step S2: Arrange several sound sensors at a preset workstation to synchronously collect the direct sound and reflected sound of the voice command, and perform character recognition on the direct sound and reflected sound. Based on the character recognition results, divide the time interval and calculate the number of repetitions of each character in different time intervals to predict the delay time. Set the arrival time of the direct sound received by each sound sensor and generate a first-class time vector; set the arrival time of the reflected sound received by each sound sensor and generate a second-class time vector.

[0009] Step S3: Calculate the preliminary spatial features of the voice command source based on a type of time vector, compare the results with the spatial features of the preset workstation, and generate a comparison result; generate a preparatory operation signal based on the comparison result.

[0010] Based on the comparison results, the preparatory operation signal is verified. If the verification is successful, the preparatory operation signal is updated to the operation signal.

[0011] Based on the instruction library, the machine tool language instructions associated with the operation signals are extracted and output.

[0012] Preferably, in step S2, the plurality of sound sensors are arranged in a rectangular pattern; wherein the method for determining the spacing between adjacent sound sensors is as follows:

[0013] The frequency range of the target speech is obtained as (fmin, fmax), and the formula is as follows: Calculate the target wavelength range ( );

[0014] In the formula, c is the speed of sound, which is taken as 343 m / s; Wavelength;

[0015] spacing ,in, This refers to the beamwidth.

[0016] Preferably, in step S2, the specific process of generating the first-class time vector and the second-class time vector is as follows:

[0017] (1) Select any sound sensor and identify the voice commands corresponding to the direct sound and reflected sound received by it to obtain a set of recurring target text;

[0018] (2) Record the timestamp t0 of the first byte of the target text in the storage unit, and retrieve the timestamp t1 of the first byte of the next set of repeated target text. Calculate the difference t0-ty between the timestamp t0 and the delay time ty, and set t0-ty as the arrival time of the direct sound received by the sound sensor.

[0019] (3) Calculate the difference t1-ty between the timestamp t1 and the delay time ty, and set t1-ty as the arrival time of the reflected sound received by the sound sensor;

[0020] (4) Perform the above steps on other sound sensors in sequence, calculate the arrival time of direct sound and reflected sound received by each sound sensor, and generate a first-class time vector and a second-class time vector.

[0021] Preferably, the first type of time vector is T1 = ( The second type of time vector is T2 = ( Where n is the total number of sound sensors in the preset workstation.

[0022] Preferably, the calculation process for the delay time ty is as follows:

[0023] (1) Divide the target text into several consecutive equal-length intervals according to the time sequence, count the number of times each character in the target text appears in each interval and sum them up to obtain the number of repetitions in that interval, and then obtain the repetition distribution sequence.

[0024] (2) Calculate the mean and standard deviation of the number of repetitions in each time interval, select the intervals with the number of repetitions greater than the mean plus the standard deviation as candidate intervals, and sort the candidate intervals according to the time sequence;

[0025] (3) Select the first interval with the largest repetition frequency and the earliest position from the candidate intervals as the direct sound interval, and record the center time of the interval as the direct sound reference time;

[0026] (4) Within several intervals after the direct sound interval, select the interval with the second largest number of repetitions as the reflected sound interval. If there are multiple intervals with the same number of repetitions, select the earliest appearing interval as the reflected sound interval and record the center time of the interval as the reflected sound reference time.

[0027] (5) Calculate the difference between the direct sound reference time and the reflected sound reference time, and calculate the delay time ty.

[0028] Preferably, in step S3, the preliminary spatial features of the voice command source calculated based on a type of time vector are compared with the spatial features of a preset workstation to generate a comparison result. The specific process is as follows:

[0029] S301. Based on the rectangular arrangement of sound sensors within the preset workstation, set the plane coordinate axis. Arbitrarily select one sound sensor as the origin, and calculate the time difference between the other sound sensors when receiving a voice command and the reference point. Select the two sound sensor points associated with the maximum and minimum time differences, and obtain the coordinates (x1, y1) and (x2, y2) of these two sound sensor points respectively. Let the coordinates of the sound source point be (xs, ys), and construct a function formula to calculate the coordinates (xs, ys) of the sound source point:

[0030] ;

[0031] ;

[0032] Where t1 and t2 are the arrival times of the direct sound recorded by the sound sensor at the corresponding selected point;

[0033] S302. Let the horizontal range of the preset workstation in the coordinate axis be (xmin, xmax), and let the vertical range of the preset workstation in the coordinate axis be (ymin, ymax). When the sound source point (xs, ys) simultaneously satisfies: xmin≤xs≤xmax, ymin≤ys≤ymax, it is confirmed that the sound source point is located within the preset workstation, and a preparatory operation signal is generated.

[0034] When the sound source location (xs, ys) satisfies xmin≤xs≤xmax or ymin≤ys≤ymax, the spatial characteristics of the sound source location are marked as questionable.

[0035] If the sound source location (xs, ys) does not simultaneously satisfy: xmin≤xs≤xmax and ymin≤ys≤ymax, then it is confirmed that the sound source location is outside the preset workstation.

[0036] S303. After integration, the comparison results are generated.

[0037] Preferably, in step S3, generating a preparatory operation signal based on the comparison result includes:

[0038] If the comparison result shows that the sound source location is within the preset workstation, the preparatory operation signal will be updated to the operation signal.

[0039] If the comparison result indicates that the spatial characteristics of the sound source location are questionable, the difference between the first type of time vector and the second type of time vector is calculated to obtain the time difference vector. The median of the time difference vector is selected and compared with a preset threshold. If the median is less than or equal to the preset threshold, the sound source location is considered to be in the preset work position, and the preparatory operation signal is updated to the operation signal. Otherwise, the sound source location is considered to be outside the preset work position, and the corresponding voice command is discarded.

[0040] If the comparison result shows that the sound source location is outside the preset workstation, the corresponding voice command is discarded.

[0041] The present invention also relates to a system for controlling a machine tool by voice recognition, comprising: an instruction mapping module, a voice input and recognition module, and a voice processing module;

[0042] in,

[0043] The instruction mapping module is used to collect machine tool language instructions and corresponding operation signals, establish the mapping relationship between machine tool voice instructions and operation signals, and generate an instruction library;

[0044] The voice input and recognition module is used to set a preset workstation and place several sound sensors at the preset workstation to synchronously collect the direct sound and reflected sound of the voice command, and perform text recognition on the direct sound and reflected sound. Based on the text recognition results, the module divides the time interval and calculates the number of repetitions of each word in different time intervals to predict the delay time. The module sets the arrival time of the direct sound received by each sound sensor and generates a first-class time vector. The module also sets the arrival time of the reflected sound received by each sound sensor and generates a second-class time vector.

[0045] The speech processing module is used to calculate the preliminary spatial features of the speech command sound source based on a type of time vector, compare the result with the spatial features of the preset workstation, generate a preliminary operation signal based on the comparison result, verify the preliminary operation signal based on the comparison result, and update the preliminary operation signal to the operation signal if the verification is successful.

[0046] Based on the instruction library, the machine tool language instructions associated with the operation signals are extracted and output.

[0047] The present invention has the following advantages:

[0048] (1) This invention synchronously collects the direct sound and reflected sound of voice commands, performs text recognition on both, divides time intervals and calculates the number of repetitions in each interval, thereby predicting the delay time; based on this, the arrival time of the direct sound and reflected sound received by each sound sensor is set respectively, generating a first-class time vector and a second-class time vector; this process not only realizes the effective distinction between direct sound and reflected sound, but also provides accurate time data support for subsequent sound source localization and operation signal verification through the generation of time vectors, significantly improving the accuracy of sound source localization.

[0049] (2) The method involved in this invention can effectively reduce the positioning error caused by reflected sound interference and ensure the accurate judgment of the sound source location. When the spatial characteristics of the sound source location are questionable, the median of the time difference vector is compared with the preset threshold to further ensure that the sound source is indeed located in the preset work position, thereby improving the reliability of the operation signal and the overall performance of the system. Attached Figure Description

[0050] Figure 1 This is a schematic diagram illustrating the principle and flow of the present invention;

[0051] Figure 2 This is a schematic diagram of the structural framework of the present invention. Detailed Implementation

[0052] The present invention will now be described in detail with reference to specific embodiments. It should be noted that the following embodiments are merely further illustrations of the present invention, but the scope of protection of the present invention is not limited to the following embodiments.

[0053] Example 1

[0054] This embodiment relates to a method for controlling a machine tool using voice recognition. See Figure 1 As shown, it includes the following steps:

[0055] S1. Collect machine tool language commands and corresponding operation signals, establish a mapping relationship between machine tool voice commands and operation signals, and generate a command library;

[0056] S2. Arrange several sound sensors at preset workstations to synchronously collect the direct sound and reflected sound of voice commands, and perform text recognition on the direct sound and reflected sound. Based on the text recognition results, divide the time intervals and calculate the number of repetitions of each word in different time intervals to predict the delay time. Set the arrival time of the direct sound received by each sound sensor and generate a first-class time vector. Set the arrival time of the reflected sound received by each sound sensor and generate a second-class time vector.

[0057] The sound sensors in the preset workstation are arranged in a rectangular pattern. The steps for determining the spacing between two adjacent sound sensors are as follows:

[0058] The frequency range of the target speech is obtained as (fmin, fmax), according to the formula Calculate the target wavelength range ( (c) represents the speed of sound, typically 343 meters per second. For wavelength,

[0059] To meet the beamwidth requirements, the spacing d satisfies: ,in, Beamwidth;

[0060] In summary, the spacing d is: ;

[0061] Actual calculation: Assuming the target speech frequency range is 1 kHz to 10 kHz, and the required beamwidth is 30° horizontally, then the wavelength range is:

[0062] , ;

[0063] ;

[0064] Generating first-class and second-class time vectors includes the following steps:

[0065] S301. Select any sound sensor and identify the voice commands corresponding to the direct sound and reflected sound received by it to obtain a set of repeating target text; record the timestamp t0 of the first byte of the target text being written to the storage unit, and retrieve the timestamp t1 of the first byte of the next set of repeating target text; calculate the difference t0-ty between the timestamp t0 and the delay time ty, and set t0-ty as the arrival time of the direct sound received by the sound sensor; calculate the difference t1-ty between the timestamp t1 and the delay time ty, and set t1-ty as the arrival time of the reflected sound received by the sound sensor;

[0066] The above steps are performed sequentially on other sound sensors to calculate the arrival time of direct sound and reflected sound received by each sound sensor, and to generate a first-class time vector and a second-class time vector.

[0067] A type of time vector is T1=( The second type of time vector is T2 = ( Where n is the total number of sound sensors in the preset workstation;

[0068] The delay time ty in S302 and S301 is calculated as follows:

[0069] The target text is divided into several consecutive equal-length intervals according to time sequence. The number of times each character in the target text appears in each interval is counted and summed to obtain the number of repetitions in that interval, and then the distribution sequence of the number of repetitions is obtained.

[0070] Calculate the mean and standard deviation of the number of repetitions within each time interval, select the intervals with a repetition count greater than the mean plus the standard deviation as candidate intervals, and sort the candidate intervals according to their time sequence.

[0071] Select the first interval with the highest repetition frequency and earliest position from the candidate intervals as the direct sound interval, and record the center time of this interval as the direct sound reference time.

[0072] Within several intervals following the direct sound interval, the interval with the second largest number of repetitions is selected as the reflected sound interval. If there are multiple intervals with the same number of repetitions, the earliest appearing interval is selected as the reflected sound interval, and the center time of that interval is recorded as the reflected sound reference time.

[0073] The difference between the direct sound reference time and the reflected sound reference time is calculated to obtain the delay time ty;

[0074] S3. Calculate the preliminary spatial features of the voice command source based on a type of time vector, compare the results with the spatial features of the preset workstation, and generate a preparatory operation signal based on the comparison results.

[0075] Based on the comparison results, the preparatory operation signal is verified. If the verification is successful, the preparatory operation signal is updated to the operation signal.

[0076] Based on the instruction library, extract the machine tool language instructions associated with the operation signals and output them;

[0077] Preliminary spatial features of the speech command source are calculated based on a class of time vectors. The results are then compared with the spatial features of a preset workstation to generate a comparison result. The specific process is as follows:

[0078] S401. Based on the rectangular arrangement of sound sensors within the preset workstation, set the planar coordinate axis. Arbitrarily select one sound sensor as the origin, and calculate the time difference between the remaining sound sensors when receiving a voice command and the reference point. Select the two sound sensor points associated with the maximum and minimum time differences, and obtain the coordinates (x1, y1) and (x2, y2) of these two sound sensor points respectively. Let the coordinates of the sound source point be (xs, ys). Based on the above conditions, construct a function formula to calculate the coordinates (xs, ys) of the sound source point:

[0079] ;

[0080] ;

[0081] Where t1 and t2 are the arrival times of the direct sound recorded by the sound sensor at the corresponding selected point;

[0082] S402. Let the horizontal range of the preset workstation in the coordinate axis be (xmin, xmax), and let the vertical range of the preset workstation in the coordinate axis be (ymin, ymax). When the sound source point (xs, ys) simultaneously satisfies: xmin≤xs≤xmax, ymin≤ys≤ymax, it is confirmed that the sound source point is located within the preset workstation, and a preparatory operation signal is generated.

[0083] When the sound source location (xs, ys) satisfies xmin≤xs≤xmax or ymin≤ys≤ymax, the spatial characteristics of the sound source location are marked as questionable.

[0084] If the sound source location (xs, ys) does not simultaneously satisfy: xmin≤xs≤xmax and ymin≤ys≤ymax, then it is confirmed that the sound source location is outside the preset workstation.

[0085] After integration, comparison results are generated;

[0086] Based on the comparison results, the pre-operation signal is verified, including the following steps:

[0087] If the comparison result shows that the sound source location is within the preset workstation, the preparatory operation signal will be updated to the operation signal.

[0088] If the comparison result indicates that the spatial characteristics of the sound source location are questionable, the difference between the first type of time vector and the second type of time vector is calculated to obtain the time difference vector. The median of the time difference vector is selected and compared with a preset threshold. If the median is less than or equal to the preset threshold, the sound source location is considered to be in the preset work position, and the preparatory operation signal is updated to the operation signal. Otherwise, the sound source location is considered to be outside the preset work position, and the corresponding voice command is discarded.

[0089] If the comparison result shows that the sound source location is outside the preset workstation, the corresponding voice command is discarded.

[0090] This invention synchronously collects the direct and reflected sounds of voice commands, performs text recognition on both, divides time intervals, and calculates the repetition count within each interval to predict the delay time. Based on this, the arrival times of the direct and reflected sounds received by each sound sensor are set, generating a first-class and a second-class time vector. This process not only effectively distinguishes between direct and reflected sounds but also provides accurate time data support for subsequent sound source localization and operation signal verification through the generation of time vectors, significantly improving the accuracy of sound source localization. In practical applications, this method can effectively reduce localization errors caused by reflected sound interference, ensuring accurate judgment of the sound source location. When the spatial characteristics of the sound source location are questionable, the median of the time difference vector is compared with a preset threshold to further ensure that the sound source is indeed located within the preset workstation, thereby improving the reliability of the operation signal and the overall performance of the system.

[0091] Example 2

[0092] This embodiment relates to a system for controlling a machine tool using voice recognition. See Figure 2 As shown, it includes a command mapping module, a voice input and recognition module, and a voice processing module;

[0093] The instruction mapping module is used to collect machine tool language instructions and corresponding operation signals, establish the mapping relationship between machine tool voice instructions and operation signals, and generate an instruction library;

[0094] The voice input and recognition module is used to set a preset workstation and place several sound sensors at the preset workstation to synchronously collect the direct sound and reflected sound of the voice command, and perform text recognition on the direct sound and reflected sound. Based on the text recognition results, the module divides the time interval and calculates the number of repetitions of each word in different time intervals to predict the delay time. The module sets the arrival time of the direct sound received by each sound sensor and generates a first-class time vector. The module also sets the arrival time of the reflected sound received by each sound sensor and generates a second-class time vector.

[0095] The speech processing module is used to calculate the preliminary spatial features of the speech command sound source based on a type of time vector, compare the results with the spatial features of the preset workstation, generate a preliminary operation signal based on the comparison results, and verify the preliminary operation signal based on the comparison results. If the verification is successful, the preliminary operation signal is updated to the operation signal.

[0096] Based on the instruction library, the machine tool language instructions associated with the operation signals are extracted and output.

[0097] The threshold is set to facilitate comparison. The size of the threshold depends on the amount of sample data and the number of bases set by those skilled in the art for each set of sample data; as long as it does not affect the ratio between the parameter and the quantized value, it is acceptable.

[0098] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.

[0099] In Embodiments 1 and 2 provided by this invention, the disclosed methods and systems can be implemented in other ways; for example, the division of modules is only a logical functional division, and there may be other division methods in actual implementation. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed; another point is that the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of modules may be electrical or other forms.

[0100] The specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art can make various modifications or variations within the scope of the claims, which do not affect the essence of the present invention.

Claims

1. A method for controlling a machine tool using voice recognition, characterized in that, Includes the following steps: Step S1: Collect machine tool language commands and corresponding operation signals, establish a mapping relationship between machine tool voice commands and operation signals, and generate a command library; Step S2: Arrange several sound sensors at a preset workstation to synchronously collect the direct sound and reflected sound of the voice command, and perform character recognition on the direct sound and reflected sound. Based on the character recognition results, divide the time interval and calculate the number of repetitions of each character in different time intervals to predict the delay time. Set the arrival time of the direct sound received by each sound sensor and generate a first-class time vector; set the arrival time of the reflected sound received by each sound sensor and generate a second-class time vector. Step S3: Calculate the preliminary spatial features of the voice command source based on a type of time vector, compare the results with the spatial features of the preset workstation, and generate a comparison result; generate a preparatory operation signal based on the comparison result. Based on the comparison results, the preparatory operation signal is verified. If the verification is successful, the preparatory operation signal is updated to the operation signal. Based on the instruction library, the machine tool language instructions associated with the operation signals are extracted and output.

2. The method for controlling a machine tool using voice recognition as described in claim 1, characterized in that, In step S2, the plurality of sound sensors are arranged in a rectangular pattern; wherein the method for determining the spacing between adjacent sound sensors is as follows: The frequency range of the target speech is obtained as (fmin, fmax), and the formula is as follows: Calculate the target wavelength range ( ); In the formula, c is the speed of sound, which is taken as 343 m / s; Wavelength; spacing ,in, This refers to the beamwidth.

3. The method for controlling a machine tool using voice recognition as described in claim 1, characterized in that, In step S2, the specific process of generating the first-class time vector and the second-class time vector is as follows: (1) Select any sound sensor and identify the voice commands corresponding to the direct sound and reflected sound received by it to obtain a set of recurring target text; (2) Record the timestamp t0 of the first byte of the target text in the storage unit, and retrieve the timestamp t1 of the first byte of the next set of repeated target text. Calculate the difference t0-ty between the timestamp t0 and the delay time ty, and set t0-ty as the arrival time of the direct sound received by the sound sensor. (3) Calculate the difference t1-ty between the timestamp t1 and the delay time ty, and set t1-ty as the arrival time of the reflected sound received by the sound sensor; (4) Perform the above steps on other sound sensors in sequence, calculate the arrival time of direct sound and reflected sound received by each sound sensor, and generate a first-class time vector and a second-class time vector.

4. The method for controlling a machine tool using voice recognition as described in claim 3, characterized in that, The first type of time vector is T1=( The second type of time vector is T2 = ( Where n is the total number of sound sensors in the preset workstation.

5. The method for controlling a machine tool using voice recognition as described in claim 3, characterized in that, The calculation process for the delay time ty is as follows: (1) Divide the target text into several consecutive equal-length intervals according to the time sequence, count the number of times each character in the target text appears in each interval and sum them up to obtain the number of repetitions in that interval, and then obtain the repetition distribution sequence. (2) Calculate the mean and standard deviation of the number of repetitions in each time interval, select the intervals with the number of repetitions greater than the mean plus the standard deviation as candidate intervals, and sort the candidate intervals according to the time sequence; (3) Select the first interval with the largest repetition frequency and the earliest position from the candidate intervals as the direct sound interval, and record the center time of the interval as the direct sound reference time; (4) Within several intervals after the direct sound interval, select the interval with the second largest number of repetitions as the reflected sound interval. If there are multiple intervals with the same number of repetitions, select the earliest appearing interval as the reflected sound interval and record the center time of the interval as the reflected sound reference time. (5) Calculate the difference between the direct sound reference time and the reflected sound reference time, and calculate the delay time ty.

6. The method for controlling a machine tool using voice recognition as described in claim 1, characterized in that, In step S3, the preliminary spatial features of the voice command source are calculated based on a type of time vector, and the results are compared with the spatial features of the preset workstation to generate a comparison result. The specific process is as follows: S301. Based on the rectangular arrangement of sound sensors within the preset workstation, set the planar coordinate axis. Arbitrarily select one sound sensor as the origin, and calculate the time difference between each of the remaining sound sensors when receiving a voice command and the reference point. Select the two sound sensor points associated with the maximum and minimum time differences, and obtain the coordinates (x1, y1) and (x2, y2) of these two sound sensor points respectively. Let the coordinates of the sound source point be (xs, ys), and construct a function formula to calculate the coordinates (xs, ys) of the sound source point: ；； Where t1 and t2 are the arrival times of the direct sound recorded by the sound sensor at the corresponding selected point; S302. Let the horizontal range of the preset workstation in the coordinate axis be (xmin, xmax), and let the vertical range of the preset workstation in the coordinate axis be (ymin, ymax). When the sound source point (xs, ys) simultaneously satisfies: xmin≤xs≤xmax, ymin≤ys≤ymax, it is confirmed that the sound source point is located within the preset workstation, and a preparatory operation signal is generated. When the sound source location (xs, ys) satisfies xmin≤xs≤xmax or ymin≤ys≤ymax, the spatial characteristics of the sound source location are marked as questionable. If the sound source location (xs, ys) does not simultaneously satisfy: xmin≤xs≤xmax and ymin≤ys≤ymax, then it is confirmed that the sound source location is outside the preset workstation. S303. After integration, the comparison results are generated.

7. The method for controlling a machine tool by voice recognition as described in claim 1, characterized in that, In step S3, generating a preparatory operation signal based on the comparison result includes: If the comparison result shows that the sound source location is within the preset workstation, the preparatory operation signal will be updated to the operation signal. If the comparison result indicates that the spatial characteristics of the sound source location are questionable, the difference between the first type of time vector and the second type of time vector is calculated to obtain the time difference vector. The median of the time difference vector is selected and compared with a preset threshold. If the median is less than or equal to the preset threshold, the sound source location is considered to be in the preset work position, and the preparatory operation signal is updated to the operation signal. Otherwise, the sound source location is considered to be outside the preset work position, and the corresponding voice command is discarded. If the comparison result shows that the sound source location is outside the preset workstation, the corresponding voice command is discarded.

8. A system for controlling a machine tool using voice recognition, characterized in that, A method for controlling a machine tool with voice recognition as described in any one of claims 1-5, the system comprising: an instruction mapping module, a voice input and recognition module, and a voice processing module; The instruction mapping module is used to collect machine tool language instructions and corresponding operation signals, establish the mapping relationship between machine tool voice instructions and operation signals, and generate an instruction library. The voice input and recognition module is used to set a preset workstation and place several sound sensors at the preset workstation to synchronously collect the direct sound and reflected sound of the voice command, and perform text recognition on the direct sound and reflected sound. Based on the text recognition results, the module divides the time interval and calculates the number of repetitions of each word in different time intervals to predict the delay time. The module sets the arrival time of the direct sound received by each sound sensor and generates a first-class time vector. The module also sets the arrival time of the reflected sound received by each sound sensor and generates a second-class time vector. The speech processing module is used to calculate the preliminary spatial features of the speech command sound source based on a type of time vector, compare the result with the spatial features of the preset workstation, generate a preliminary operation signal based on the comparison result, verify the preliminary operation signal based on the comparison result, and update the preliminary operation signal to the operation signal if the verification is successful. Based on the instruction library, the machine tool language instructions associated with the operation signals are extracted and output.