In order to explain the technical solutions of the present invention more clearly, the technical solutions of various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.
 The present invention provides a self-service terminal based on microphone 11 array voice interaction, such as figure 1 As shown, the self-service terminal includes: a voice interaction system 1 for collecting user voice and converting it into instructions;
 The control module 2 is used to receive the instruction and output the services required by the user according to the instruction;
 The voice interaction system 1 includes a microphone 11 array composed of at least two microphones 11 for collecting user voice.
 In this embodiment, the self-service terminal refers to a device that is used in communications, finance, government, transportation, medical, industry and commerce, taxation and other industries without assistance from counter personnel. The user can follow the text and screen on the interface Or voice prompts to complete the required services by themselves. For example, if the user speaks to the self-service terminal and speaks the service content or service keywords he needs, the voice interaction system 1 converts the voice into instructions after voice recognition, and the control module 2 converts the instructions into corresponding Operation, that is, after the customer speaks, the corresponding service process and operation details will be directly displayed on the display interface 3, and the customer will be prompted to perform the next operation or voice instructions, for example, enter a password or other important numbers in the input box 4, and enter the card slot Insert the identification card in 5 and receive the receipt from the receipt exit 6 until all the services required by the user are completed.
 The present invention mainly completes the service required by the user through the voice interaction between the user and the terminal. The way of voice interaction is: the user speaks the service he needs to the microphone 11 according to the prompt of the self-service terminal. After the microphone array collects the voice of the user, that is, the user’s service request, the voice interaction system 1 will Recognition and processing are performed, and converted into corresponding instructions, and the instructions are sent to the control module 2, and the control module 2 can output corresponding operations according to the instructions.
 Among them, the present invention includes at least two microphones 11 according to a certain geometric structure (such as linear, circular, etc.) combined into a microphone 11 array to collect user voice, because the use of the microphone array to collect user voice has the effect of reducing the sideways radio, therefore It is possible to suppress the microphone's collection of noise in the environment, and relatively enhance the recognition and understanding rate of the voice interaction system 1 of the user's voice in the noise environment.
 In addition, the voice interaction system 1 of the present invention can further suppress the noise in the collected sound according to the microphone array signal processing technology, and effectively enhance the user voice in the noise environment, thereby eliminating the influence of noise on the user’s original voice and determining the signal Information about the location, intensity, and status of the source.
 The invention improves the recognition of the user's voice by the method of collecting the user's voice by the microphone 11 array in combination with the self-service terminal, so that the voice interaction system 1 can more accurately understand and judge the user's voice, namely Let the self-service terminal understand the specific requirements of the user and operate accordingly, thereby improving the interaction efficiency between the self-service terminal and the user, making the user operation more convenient and simple, improving the humanized design of the product, and having a good Effect.
 In a preferred embodiment, the distance between every two microphones 11 is 30-50 mm.
 The distance between the two microphones 11 refers to the hole center distance between the two microphone mounting holes 7 for mounting the microphone. The present invention is mainly aimed at collecting and analyzing a single user's sound source, so an array composed of two microphones 11 is preferred, which can form an array to collect user voice, support far-field recording, de-reverberation, and noise reduction, while reducing production costs. And difficulty of realization. The distance between the two microphones 11 is 30-50 mm, preferably 40 mm. This distance is suitable for the size of commonly used self-service terminal bodies, and at the same time satisfies the range of user voice collection.
 Such as figure 1 As shown, the voice interaction system 1 further includes a voice processing module 12, which is used to process the voice collected by the microphone 11 to improve the voice recognition.
 Such as figure 1 As shown, the speech processing module 12 includes a noise reduction module 121 for performing noise reduction processing on the speech.
 In this embodiment, the present invention mainly performs noise reduction processing for Gaussian white noise. Gaussian white noise refers to a kind of noise whose amplitude distribution obeys Gaussian distribution, and its power spectral density is uniformly distributed, including the environment where the user is located. Thermal noise and shot noise, such as car horn sound and alarm sound, which are easy to appear, the existence of these frequency bandwidth noises can easily affect the discrimination rate of the voice processing module 12 to the main sound source. In the present invention, a noise reduction module 121 is specially provided to suppress and filter these environmental noises, so as to retain a simpler main sound.
 In a preferred embodiment, such as figure 1 As shown, the speech processing module 12 further includes a filtering module 122 for performing Kalman filtering on the speech. Among them, Kalman filtering refers to an algorithm that uses linear system state equations to perform optimal estimation of system state through system input and output observation data. That is, the filtering module 122 estimates the sound wave signal closest to the actual user's original voice based on the acoustic wave signal collected by the microphone 11 through optimization calculation, so as to perform further noise filtering on the user's voice.
 In a preferred embodiment, such as figure 1 As shown, the voice processing module 12 further includes a beamforming module 123 for beamforming the voice to eliminate voice in the side direction of the microphone 11 and enhance voice in the vertical direction, thereby improving the transmission quality of the sound source signal.
 In a preferred embodiment, such as figure 1 As shown, the speech processing module 12 also includes a speech enhancement module 124, which is used to frame the noisy speech to ensure the short-term stability of the speech, and then through windowing processing, the final output is composed of multiple adjacent speech frames Synchronize the superimposed voice signal to obtain an enhanced voice signal.
 In a preferred embodiment, such as figure 1 As shown, the voice interaction system 1 further includes a semantic understanding module 13 for receiving voice signals and converting them into corresponding instructions. Wherein, the speech signal is preferably the speech processed by the speech processing module 12, which can improve the understanding of the original speech by the semantic understanding module 13.
 In a preferred embodiment, such as figure 1 As shown, the semantic understanding module 13 includes a speech-to-text module 131 for converting speech into text.
 In a preferred embodiment, such as figure 1 As shown, the semantic understanding module 13 further includes a matching degree calculation module 132 and an instruction query module 133. The matching degree calculation module 132 is used to convert the text into a corresponding instruction number, and the instruction query module 133 is used to The instruction number is converted into an operation instruction, and the operation instruction is sent to the control module 2.
 In this embodiment, the operation mode of the semantic understanding module 13 is: the speech-to-text module 131 receives the enhanced speech signal and converts the speech into text, and the matching degree calculation module 132 converts the converted text into a corresponding instruction number. The specific method of this conversion is to perform an intersection comparison between the text and the reserved instruction number keywords in the instruction table, and calculate a ratio score. If the ratio score exceeds the system preset threshold, the comparison is considered successful. The matching degree calculation module 132 sends the instruction number to the instruction query module 133 after the comparison is successful. The instruction query module 133 queries the actual operation instruction according to the instruction number, and sends it to the control module 2, and the control module 2 outputs the corresponding service.
 The present invention understands the user's voice through the semantic understanding module 13, and outputs the operation instructions of the service required by the user, which reduces the user's operation difficulty and improves the user experience.
 The present invention also includes a voice output module, which is used to convey service information content and operation requirements to users through voice output, which is convenient for user groups who are inconvenient to read to use the present invention.
 The present invention also includes a touch display screen and a touch interaction module, so that the user can realize touch operation through touch interaction with the self-service terminal.
 The present invention is also provided with video monitoring equipment used to ensure the safety of the user's use process; 4G routers used to provide external data exchange; printers used to output required information; IC manual card readers used to input information and output device status; Three-in-one card reader used for output card demand operations; contactless card reader used for inputting information and output device status; and equipment such as encrypted keyboard used for inputting information. The purpose is to provide users with exactly the services they need to achieve the purpose of improving customer experience and improving equipment utilization.
 It should be noted that the technical solutions between the various embodiments of the present invention can be combined with each other, but they must be based on what can be achieved by a person of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be achieved, it should be considered that this technical solution is The combination does not exist, nor does it fall within the scope of protection of the present invention.
 The above are only part of or preferred embodiments of the present invention. Neither the text nor the drawings can therefore limit the scope of protection of the present invention. All the contents of the description and drawings of the present invention are used under the overall concept of the present invention. The equivalent structural transformation, or direct/indirect application in other related technical fields are all included in the protection scope of the present invention.