Intelligent voice interaction system and method thereof

An interactive system and intelligent voice technology, applied in voice analysis, voice recognition, voice input/output, etc., can solve the problems of voice recognition word limit, reduced fun, low recognition rate, etc., to achieve fast data processing speed, voice interaction The effect of being friendly and recognizing a wide vocabulary

Inactive Publication Date: 2008-08-20
BEIHANG UNIV
0 Cites 62 Cited by

AI-Extracted Technical Summary

Problems solved by technology

However, due to the limitations of hardware technology, the existing products cannot update and solidify the voice dialogue content stored in the hardware. The capacity is limited, and the number of words for voice recognition is also limited, generally 6 to 10 words. The recognition effect is not good, the recognition rate is low, and its interaction methods, occasions, reliability, and updateability are greatly restricted, which makes this technology unable to be widely used
[0003] For example, there is an existing voice interactive toy whose sound signal is an...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

By the LCD display that this system has, can realize the broadcast of 3D, animation, thereby make the content of this interactive system richer, simultaneously this LCD display can play the pattern of different emotions (as joy, anger, sorrow, joy, crying face, smiling face, downcast, etc.), combined with speech dialogue recognition, so that the interaction process is more natural and realistic, as if two people are communicating and talking.
System adopts non-directional microphone to carry out voice input, can collect the voice signal in frontal 120 degree included angle range; Preamplification anti-aliasing filter and A/D conversion adopt the CODEC that is suitable for voice application of WOLFSON company Chip WM8731, the voice processing chip has low power consumption. There are 2 sets of ADC (Analog/Digital Converter) and DAC (Digital/Analog Converter) inside. The sampling frequency is set to 8KHz by the external crystal oscillator frequency and register, 16 bits A/D sampling, turn off the BYPASS mode, set the chip to Slave (slave) mode; and adjust the gain of the input power amplifier, so that the effect of the microphone collecting voice is the best within the range of 50-60cm; at the same time, in order to make the output sound of the speaker Large enough to adjust the output gain to its maximum value.
Voice interactive system is connected to the interface of computer by USB interface, and the connection circuit of USB interface is shown in Fig. 11, thereby can facilitate the ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The present invention provides an intelligent voice interactive system and interactive method, the system includes: a processor, a storage device, a voice processing unit, a voice input device, a voice output device, a communication processing unit; the processor, the storage device, the voice processing unit and the communication processing unit are set on the circuit board, the storage device, the voice processing unit and the communication processing unit are connected with the processor by the concentration line forming an embedded control board; the voice input device, the voice output device are connected with the voice processing unit of the embedded control board respectively; the communication processing unit is set with a communication interface used for connnecting with the computer of client end software for install custom. The system is used as a common intelligent interactive platform, and user can set different interactive scenes and content aware, having wide applications, such as man-machine interactive processing system, intelligent toy or service robot etc, having strong application, strong practicability, processing work without computer.

Application Domain

Technology Topic

Image

  • Intelligent voice interaction system and method thereof
  • Intelligent voice interaction system and method thereof
  • Intelligent voice interaction system and method thereof

Examples

  • Experimental program(3)

Example Embodiment

[0056] Example one
[0057] Such as figure 1 As shown, this embodiment provides a voice intelligence system for smart toys. The system can be applied to multiple voice platforms to realize voice interaction, such as voice smart toys, robots, man-machine dialogue systems, etc. The specific structure is as follows figure 1 Shown, including:
[0058] The processor, the memory, the voice processing unit and the communication processing unit are all arranged on the circuit board, and the memory, the voice processing unit and the communication processing unit are connected with the processor through a bus to form an embedded control board;
[0059] The voice input device and the voice output device are respectively connected to the voice processing unit on the embedded control board;
[0060] The communication processing unit is provided with a communication interface, and the communication interface is connected with a computer on which the customized client software is installed.
[0061] Wherein, the memory includes: a dynamic memory and a FLASH memory, which are respectively connected to the processor through an address/data bus.
[0062] Such as figure 2 As shown, in the above system, the voice processing unit includes: a voice acquisition module, connected to the voice input device and the processor, and configured to receive voice information input by the voice input device and transmit it to the processor;
[0063] The voice output module is connected to the processor and the voice output device, and is used to output the voice information processed by the processor to the voice output device.
[0064] The communication processing unit includes:
[0065] The USB interface processing module is connected to the USB interface of the computer connected to the client software for customization, and transmits the data obtained from the computer via the USB interface to the processor for processing;
[0066] The wireless processing module is wirelessly connected to a computer installed with client software for customization, and is used to transmit data obtained from the computer through the wireless connection to the processor for processing. Wherein, the wireless processing module may adopt a Bluetooth module or a wireless network card WIFI module, etc., and the main purpose is to perform data interaction with a computer through the wireless processing module.
[0067] The system may also include: a display processing module, which is connected to the processor via a bus, and is used to process the graphic interface information output by the processor (for example, when the voice intelligent system is connected to the network, interfaces in various usage states, etc.) , The display processing module is provided with a display interface for connecting the display device. On a system with a display processing module, a display device can also be provided, and the display device is connected to the display interface of the display processing module for displaying the graphic interface signals output by the display processing module. In practice, the display device can be a liquid crystal display. Wait.

Example Embodiment

[0068] Example two
[0069] This embodiment provides an interaction method based on the voice intelligent system of the smart toy in the first embodiment, and the method includes:
[0070] After the system is started, through the control of the processor, the speech recognition module and speech library module stored in the FLASH memory are loaded into the dynamic memory. In practice, these two modules are in the form of software, such as HMM-based speech recognition engine;
[0071] The external voice command is input by the voice input device, and the analog voice signal of the external voice command is converted into a digital voice signal through a voice processing unit (for example, a voice acquisition chip in the voice processing unit);
[0072] Sending the digital sound signal to the processor, and at the same time the processor calls the speech recognition module in the dynamic memory, and compares the pronunciation characteristics of the speech pronunciation feature library in the speech recognition engine in combination with the information in the dynamic memory;
[0073] According to the comparison result, the processor outputs the corresponding response digital sound signal in the voice library module to the voice output module (such as a voice acquisition chip) in the voice processing unit;
[0074] The voice output module outputs the voice through the voice output device (speaker) to complete a human-machine voice interaction process.
[0075] The above method also includes: customizing and updating the configuration file corresponding to the voice library module from a computer in an online state through customization client software; or, downloading and updating the voice library module from a network server through a computer in an online state The corresponding configuration file is used to update the voice library module to achieve the purpose of updating the dialogue scene of the system.
[0076] The specific process of updating the customized dialogue scene is as follows image 3 Shown, including:
[0077] Step 31: Open the client software for customization in the computer;
[0078] Step 32: Set the initial tone of the dialogue;
[0079] Step 33, the user designs the dialogue scene;
[0080] Step 34: Generate a dialog configuration file after the setting is completed, which can generally be an XML file;
[0081] Step 35: Connect the USB interface (or wireless interface) of the intelligent voice system to the computer;
[0082] Step 36: Download the user-customized dialogue scene (including the configuration file and the packaged voice file) in the computer to the intelligent voice system to complete the customized update of the dialogue scene;
[0083] The connected computer downloads and updates the dialogue scene of the voice intelligent system from the network server. The process is basically the same as the above process, except that the configuration file and the corresponding voice file are customized by the service provider and stored in the network server. Download After that, just update directly.
[0084] Among them, the specific customization process such as Figure 4 Shown, including:
[0085] Step 41: Set the scene start sound;
[0086] Step 42, the user designs the first set of dialogues, such as designing the user's first question (text input), and setting the voice response of the toy;
[0087] Step 43, judge whether there is a state jump?
[0088] Step 44, if it is no, then the user will design the second group of dialogues, the second question (text input), set the voice response of the system; if it is, then go to step 46, the user will design the Nth group of dialogues , The Nth question (text input), set the voice response of the system;
[0089] Step 45, judge whether there is a state jump? If not, the user will design the next group of subsequent dialogues in turn, and design the corresponding answers to the group of questions (you can use text input), and then set the system's voice response corresponding to the input text;
[0090] Step 46, if yes, then the user conducts the design of the Nth group of dialogues, the Nth question (text input), and sets the voice response of the system;
[0091] Step 47, the scene customization ends.

Example Embodiment

[0092] Example three
[0093] In this embodiment, the intelligent voice system of the present invention and its interaction method are further described through the specific implementation process, such as Figure 6-13 As shown, the system is divided into two aspects: software and hardware, including:
[0094] (1) Hardware part:
[0095] The hardware is based on the ARM9 high-performance SOC processor S3C2410, the main frequency is 200Mhz, and the ARM9 SC2410 embedded controller is the center, through the external microphone sensor to complete the voice signal collection, voice signal sampling, amplification and pre-filtering and subsequent voice The playback is completed by the audio chip WM8731. The board is equipped with 32×16bit extended SDRAM storage space and 64M×16bit NAND Flash storage space. The system uses the USB interface to communicate with the client interface of the user application development layer. In this system, the USB interface is taken as an example to illustrate. Here, modules such as wireless network card and Bluetooth interface can also be added to realize the connection with S3C2410 for data exchange. And can increase the LED display on the S3C2410 processor module to achieve 3D animation output and other effects.
[0096] The circuit part of the hardware is divided into several parts:
[0097] ①Front-end processing circuit
[0098] The system uses a non-directional microphone for voice input, which can collect voice signals within a 120-degree frontal angle; pre-amplification anti-aliasing filtering and A/D conversion use WOLFSON’s CODEC chip WM8731 suitable for voice applications, The voice processing chip has low power consumption. There are two sets of ADC (analog/digital converter) and DAC (digital/analog converter) inside. The sampling frequency is set to 8KHz by the external crystal oscillator frequency and register, 16-bit A/D Sampling, the BYPASS mode is turned off, the chip is set to Slave mode; and the gain of the input power amplifier is adjusted to make the microphone's voice collection effect in the range of 50-60cm to achieve the best; at the same time, in order to make the speaker output sound loud enough, Adjust the output gain to the maximum value.
[0099] ②System function circuit
[0100] The core processor of the system adopts the SAMSUNG S3C2410 processor based on the ARM 920T core, the main frequency is 203Mhz, and the externally extended 64M×16bit NAND Flash memory is used to store the embedded real-time operating system, the speech recognition engine and the scene content of the voice interaction; 32× The 16bit extended SDRAM memory is used as the data buffer for the operation of the voice interactive system, so that the S3C2410 can normally process the signal and judge the state.
[0101] ③Man-machine interface circuit
[0102] The voice interaction system is connected to the computer through the USB interface. The connection circuit of the USB interface is shown in Picture 11 , Which can facilitate the user to design the voice interaction scene content customization, and quickly download it to the system through the USB interface, so as to achieve the update of the interactive content. In practice, the interface part can also use a wireless interface, such as: Bluetooth module, wireless Network module WLAN, etc.
[0103] (3) Software part:
[0104] ①The software structure of the voice interactive system:
[0105] Such as Figure 14 As shown, the software architecture of the voice interaction system is divided into three layers, followed by embedded Linux real-time operating system, voice recognition engine layer, and user application development layer. The user application development layer includes: user client software, XML-based configuration File contextual dialog setting, USB download interface, among them, the voice recognition engine based on Hidden Markov Model (HMM) can recognize 200 unspecified person command sentences.
[0106] The user uses the client software (such as Figure 15 The customized interface shown) generates a situational dialogue based on a speech recognition configuration file (XML file). Extensible markup language (XML) uses self-descriptive neutral data as its structure, which can represent complex data and make it readable. In this software structure, XML documents are used as configuration files for voice interaction and store the initial state information and parameters of the interactive dialogue. When the voice interactive system is started, the information and parameters that need to be loaded for the dialogue content are read from the XML file. Through the loading and analysis of the XML configuration file, the obtained state information is passed to the finite state machine, and the connection is dynamically established.
[0107] The process for users to customize voice interaction scenarios through client software is image 3 As shown, the user needs to first set the start tone of the situational dialogue, and then develop the design around the dialogue content of the service robot in a certain scene (such as home entertainment, patrol monitoring, etc.), which can include the user's initiative to ask and the robot to answer; or According to the judgment of the state, the robot actively strikes up a conversation with people and starts the process of interaction, which makes it more cordial and natural; after the design is completed, click Generate in the client to generate the XML file of the voice configuration, and perform the audio file. Compress and pack, and complete the recognition of external devices through the USB interface connecting the desktop and the voice interaction system, and finally download the user-customized voice interaction scene content to the Flash of the voice interaction system to realize the design and update of the voice interaction content.
[0108] The scene design process is as Figure 4 As shown, after the user designs the initial sound of the scene, he enters the text of the first set of dialogues, and specifies the state corresponding to the input event, and according to the result of the voice recognition state, through the analysis of the state transition function δ, whether to jump is obtained And subsequent interactive processes. Next, continue the design of the second set of dialogue processes, recognize and judge according to the input of the event, and determine the state transition and the robot's response. Continue in sequence until the design of the interactive content of the entire scene is completed.
[0109] ② Application of Finite State Machine in voice interaction
[0110] Different events in the voice interaction module, that is, different voice or key input, the limited state set Q here includes four types of states, namely: voice collection and A/D conversion state, voice recognition state, conversion logic query state, voice Answer the output state, the exit state at the end of the interaction; a limited set of events Σ, that is, different sound input or key input; the state transfer function δ refers to the corresponding rules for completing data processing according to different inputs and realizing different states of output.
[0111] Such as Figure 16 As shown, between the five different states, through the specific rules of the state transition function δ, namely E1, E2...E7, the state transition is realized. E1 is the state transition after the normal operation from the voice acquisition and A/D conversion state. Go to the voice recognition state; E2 is to jump to the conversion logic query state after correct voice recognition of the input event; E3 is to jump to the voice answer output state when the voice output is satisfied; E4 is the output response during the voice interaction process When completed, jump to the end push-out state; E5 is to return to the previous state when the judgment cannot be made in a certain state; E6 is to jump directly to the end state when an error occurs in a certain state, thus End the word dialogue and start a new interaction.
[0112] Combining the theory of finite state machine with the voice interaction process, the voice collection and A/D conversion, voice recognition, XML configuration file analysis, conversion rule correspondence, voice response output, etc. are regarded as different states, and the state is performed in the state machine. Judgment and conversion, so as to realize the natural and harmonious voice interaction process.
[0113] As the dynamic behavior model of the voice interactive system, FSM is based on the "event-driven" "state transition", which is suitable for the expression of dynamic control process, so that the system has the logical expression ability of interactive behavior, and greatly enhances the operability. The advantages of voice interaction based on finite state machines are: on the one hand, it standardizes the behavior and control process of the interactive system, shortens the design and development cycle of the user's voice interaction in a certain situation, and increases the natural and smooth interaction; on the other hand , Using the FSM model, the state of the control functions required to be implemented in the voice interaction process, and the degree of inheritance and transfer relationships, are more clearly expressed in the interaction system of the input events, corresponding rules, state jumps, and interactive output. Make up the structure. Using this method, we successfully designed a voice interaction system for children's "entertainment and fun" smart toys, and verified the feasibility of the above-mentioned design method in the actual product operation.
[0114] The system of this embodiment can also be connected to the computer through a variety of connection modes, and can realize a variety of extended functions:
[0115] In addition to the USB interface, the above system can also use a wireless network module to enable the voice interaction system to automatically connect to the matching website in an environment that supports wireless networks, and according to user requirements, (for example, for the application of the system in smart The situation on the toy, when you press a button on a certain part of the toy), realize the dialogue content, song, story, math breakthrough or other dialogue themes provided on the website (such as birthday greeting dialogue, lover confession dialogue, parents and relatives Missing greetings dialogue) and other topics download, so as to realize the automatic update of the network.
[0116] The wireless module of each voice system has a separate IP address. When in an environment that supports wireless networks, the wireless module will automatically search and establish a link with the wireless router. The wireless router is connected to the external Internet network, so that the voice system Establish a connection with the Internet network and have an independent IP address; the voice system pre-selects the address of the download website (web server) built in, and when it is connected to the external network, it will automatically log in to the website and press the download button according to the user The instruction to download the corresponding network content, to achieve the update of the content.
[0117] ②Real-time conversations with relatives and friends via wireless network
[0118] After the voice system is automatically connected to the Internet network through the wireless module, the system has an independent IP address, so that it can establish a connection with the voice system in any other place where the Internet can be accessed, and realize the call function of the network, such as: The voice interaction system is held in the arms of children in the form of smart toys. Parents in the unit can talk to their children in real time through the Internet to understand their situation and communicate with family members. Children only need to press a certain part of the toy. The button can be realized.
[0119] In the above process, when the voice system establishes a connection with the external Internet network through the wireless network module, the system has a corresponding IP address. External users can establish a connection with the voice system through the IP address and send a call request. There will be a prompt tone on the system side, and the user can establish a call connection with other users on the Internet by pressing the answer button, and multiplex the audio input and output devices of the above-mentioned voice system to make a call. It is realized that users in any place can have a conversation with the voice system as long as they can surf the Internet, thereby realizing the expansion of local voice interaction to voice interaction on the network.
[0120] ③Bluetooth function
[0121] The voice system can also be equipped with a Bluetooth module, which can realize the interconnection with Bluetooth-enabled PCs, mobile phones and other smart devices, so as to conveniently establish a wireless connection with the client software on the PC and realize voice recognition Updates in the form of dialogue content, songs, stories, etc. It can also complete functions such as intelligent upgrade of built-in software.
[0122] When the bluetooth module in the system is turned on, the bluetooth module will automatically search for the surrounding bluetooth communication. When a bluetooth device (such as a bluetooth-enabled laptop or high-end mobile phone) is found, it will connect with the device After the notebook is allowed to connect, the Bluetooth module establishes a Bluetooth-based wireless connection with the notebook computer, so as to realize the communication between the client software running on the notebook computer and the voice system and the download of dialogue content and other files.
[0123] The system described in this embodiment can also realize the setting of the interactive system and the demonstration of 3D and animation without a computer. The details are as follows:
[0124] ①Setting of interactive system without computer connection:
[0125] The voice system can include a true-color TFT LCD and an outside touch screen, which can display some operating conditions and information of the interactive system. At the same time, the user can conveniently set the dialogue content and the playback sequence of songs and stories through the touch screen. Therefore, the interactive system can be set and updated without connecting to a PC.
[0126] ②3D and animation presentation:
[0127] Through the LCD display of the system, 3D and animation can be played, so that the content of the interactive system is more enriched. At the same time, the LCD display can display patterns of different emotions (such as joy, sorrow, crying, smiling face, dejection Etc.), combined with voice dialogue recognition, so that the interaction process is more natural and lifelike, as if two people are communicating and talking.
[0128] See the hardware circuit structure of the LCD part Picture 12 , The LCD driver is supported in the embedded Linux operating system, just like the desktop monitor, it is connected through the line. At the same time, the system can also be set with a touch screen. When the touch screen is set, its control signal is also connected to the central processing unit S3C2410. Calibration is performed during the first use. When the user taps the touch screen with the touch pen, the (x, y) coordinate information corresponding to the touch screen is transmitted to the CPU, and the corresponding operation is performed according to the position information.
[0129] The 3D and animated demonstrations are stored in the Flash memory, called by the central processing unit S3C2410, and displayed on the LCD screen. At the same time, it is combined with the voice recognition state machine (FSM). The central processing unit S3C2410 is based on the state of the voice system Make judgments so that while outputting voice information, different animations and patterns are displayed on the LCD.
[0130] In summary, the system in the embodiment of the present invention is connected to the computer through a USB cable (or wireless connection). The client software installed on the computer can automatically identify the system and establish a connection. The client software can easily customize your own voice interaction scenarios, including the ability to set up the recognized question, use your own recording as the system answer, and insert songs, stories and other scenarios in the middle, and you can also design based on voice recognition The game links of the game, such as story solitaire, mathematics breakthrough, quiz, etc., after completing the steps specified by the client software, they can be quickly and conveniently downloaded to the system memory through the USB interface line, thus becoming a new content and content Voice interaction device for own voice. It can be customized by users, give full play to their imagination, and create different scenarios and content, which is more flexible, intelligent, and participatory.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Classification and recommendation of technical efficacy words

  • Practical
  • Strong applicability

Optimization method for spray gun track of spraying robot on irregular polyhedron

ActiveCN102500498APracticalImprove work efficiency and product qualitySpraying apparatusSpecial data processing applicationsComputer Aided DesignParticle swarm algorithm
Owner:JIANGSU NEWBLUE INTELLIGENT EQUIP CO LTD

Intelligent assisted driving method and system

Owner:CHINA UNIV OF GEOSCIENCES (WUHAN)

Method and device for identifying image definition

ActiveCN105809704AIdentify wellStrong applicabilityImage enhancementImage analysisRegion of interestDistinctness of image
Owner:BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products