Example three
 In this embodiment, the intelligent voice system of the present invention and its interaction method are further described through the specific implementation process, such as Figure 6-13 As shown, the system is divided into two aspects: software and hardware, including:
 (1) Hardware part:
 The hardware is based on the ARM9 high-performance SOC processor S3C2410, the main frequency is 200Mhz, and the ARM9 SC2410 embedded controller is the center, through the external microphone sensor to complete the voice signal collection, voice signal sampling, amplification and pre-filtering and subsequent voice The playback is completed by the audio chip WM8731. The board is equipped with 32×16bit extended SDRAM storage space and 64M×16bit NAND Flash storage space. The system uses the USB interface to communicate with the client interface of the user application development layer. In this system, the USB interface is taken as an example to illustrate. Here, modules such as wireless network card and Bluetooth interface can also be added to realize the connection with S3C2410 for data exchange. And can increase the LED display on the S3C2410 processor module to achieve 3D animation output and other effects.
 The circuit part of the hardware is divided into several parts:
 ①Front-end processing circuit
 The system uses a non-directional microphone for voice input, which can collect voice signals within a 120-degree frontal angle; pre-amplification anti-aliasing filtering and A/D conversion use WOLFSON’s CODEC chip WM8731 suitable for voice applications, The voice processing chip has low power consumption. There are two sets of ADC (analog/digital converter) and DAC (digital/analog converter) inside. The sampling frequency is set to 8KHz by the external crystal oscillator frequency and register, 16-bit A/D Sampling, the BYPASS mode is turned off, the chip is set to Slave mode; and the gain of the input power amplifier is adjusted to make the microphone's voice collection effect in the range of 50-60cm to achieve the best; at the same time, in order to make the speaker output sound loud enough, Adjust the output gain to the maximum value.
 ②System function circuit
 The core processor of the system adopts the SAMSUNG S3C2410 processor based on the ARM 920T core, the main frequency is 203Mhz, and the externally extended 64M×16bit NAND Flash memory is used to store the embedded real-time operating system, the speech recognition engine and the scene content of the voice interaction; 32× The 16bit extended SDRAM memory is used as the data buffer for the operation of the voice interactive system, so that the S3C2410 can normally process the signal and judge the state.
 ③Man-machine interface circuit
 The voice interaction system is connected to the computer through the USB interface. The connection circuit of the USB interface is shown in Picture 11 , Which can facilitate the user to design the voice interaction scene content customization, and quickly download it to the system through the USB interface, so as to achieve the update of the interactive content. In practice, the interface part can also use a wireless interface, such as: Bluetooth module, wireless Network module WLAN, etc.
 (3) Software part:
 ①The software structure of the voice interactive system:
 Such as Figure 14 As shown, the software architecture of the voice interaction system is divided into three layers, followed by embedded Linux real-time operating system, voice recognition engine layer, and user application development layer. The user application development layer includes: user client software, XML-based configuration File contextual dialog setting, USB download interface, among them, the voice recognition engine based on Hidden Markov Model (HMM) can recognize 200 unspecified person command sentences.
 The user uses the client software (such as Figure 15 The customized interface shown) generates a situational dialogue based on a speech recognition configuration file (XML file). Extensible markup language (XML) uses self-descriptive neutral data as its structure, which can represent complex data and make it readable. In this software structure, XML documents are used as configuration files for voice interaction and store the initial state information and parameters of the interactive dialogue. When the voice interactive system is started, the information and parameters that need to be loaded for the dialogue content are read from the XML file. Through the loading and analysis of the XML configuration file, the obtained state information is passed to the finite state machine, and the connection is dynamically established.
 The process for users to customize voice interaction scenarios through client software is image 3 As shown, the user needs to first set the start tone of the situational dialogue, and then develop the design around the dialogue content of the service robot in a certain scene (such as home entertainment, patrol monitoring, etc.), which can include the user's initiative to ask and the robot to answer; or According to the judgment of the state, the robot actively strikes up a conversation with people and starts the process of interaction, which makes it more cordial and natural; after the design is completed, click Generate in the client to generate the XML file of the voice configuration, and perform the audio file. Compress and pack, and complete the recognition of external devices through the USB interface connecting the desktop and the voice interaction system, and finally download the user-customized voice interaction scene content to the Flash of the voice interaction system to realize the design and update of the voice interaction content.
 The scene design process is as Figure 4 As shown, after the user designs the initial sound of the scene, he enters the text of the first set of dialogues, and specifies the state corresponding to the input event, and according to the result of the voice recognition state, through the analysis of the state transition function δ, whether to jump is obtained And subsequent interactive processes. Next, continue the design of the second set of dialogue processes, recognize and judge according to the input of the event, and determine the state transition and the robot's response. Continue in sequence until the design of the interactive content of the entire scene is completed.
 ② Application of Finite State Machine in voice interaction
 Different events in the voice interaction module, that is, different voice or key input, the limited state set Q here includes four types of states, namely: voice collection and A/D conversion state, voice recognition state, conversion logic query state, voice Answer the output state, the exit state at the end of the interaction; a limited set of events Σ, that is, different sound input or key input; the state transfer function δ refers to the corresponding rules for completing data processing according to different inputs and realizing different states of output.
 Such as Figure 16 As shown, between the five different states, through the specific rules of the state transition function δ, namely E1, E2...E7, the state transition is realized. E1 is the state transition after the normal operation from the voice acquisition and A/D conversion state. Go to the voice recognition state; E2 is to jump to the conversion logic query state after correct voice recognition of the input event; E3 is to jump to the voice answer output state when the voice output is satisfied; E4 is the output response during the voice interaction process When completed, jump to the end push-out state; E5 is to return to the previous state when the judgment cannot be made in a certain state; E6 is to jump directly to the end state when an error occurs in a certain state, thus End the word dialogue and start a new interaction.
 Combining the theory of finite state machine with the voice interaction process, the voice collection and A/D conversion, voice recognition, XML configuration file analysis, conversion rule correspondence, voice response output, etc. are regarded as different states, and the state is performed in the state machine. Judgment and conversion, so as to realize the natural and harmonious voice interaction process.
 As the dynamic behavior model of the voice interactive system, FSM is based on the "event-driven" "state transition", which is suitable for the expression of dynamic control process, so that the system has the logical expression ability of interactive behavior, and greatly enhances the operability. The advantages of voice interaction based on finite state machines are: on the one hand, it standardizes the behavior and control process of the interactive system, shortens the design and development cycle of the user's voice interaction in a certain situation, and increases the natural and smooth interaction; on the other hand , Using the FSM model, the state of the control functions required to be implemented in the voice interaction process, and the degree of inheritance and transfer relationships, are more clearly expressed in the interaction system of the input events, corresponding rules, state jumps, and interactive output. Make up the structure. Using this method, we successfully designed a voice interaction system for children's "entertainment and fun" smart toys, and verified the feasibility of the above-mentioned design method in the actual product operation.
 The system of this embodiment can also be connected to the computer through a variety of connection modes, and can realize a variety of extended functions:
 In addition to the USB interface, the above system can also use a wireless network module to enable the voice interaction system to automatically connect to the matching website in an environment that supports wireless networks, and according to user requirements, (for example, for the application of the system in smart The situation on the toy, when you press a button on a certain part of the toy), realize the dialogue content, song, story, math breakthrough or other dialogue themes provided on the website (such as birthday greeting dialogue, lover confession dialogue, parents and relatives Missing greetings dialogue) and other topics download, so as to realize the automatic update of the network.
 The wireless module of each voice system has a separate IP address. When in an environment that supports wireless networks, the wireless module will automatically search and establish a link with the wireless router. The wireless router is connected to the external Internet network, so that the voice system Establish a connection with the Internet network and have an independent IP address; the voice system pre-selects the address of the download website (web server) built in, and when it is connected to the external network, it will automatically log in to the website and press the download button according to the user The instruction to download the corresponding network content, to achieve the update of the content.
 ②Real-time conversations with relatives and friends via wireless network
 After the voice system is automatically connected to the Internet network through the wireless module, the system has an independent IP address, so that it can establish a connection with the voice system in any other place where the Internet can be accessed, and realize the call function of the network, such as: The voice interaction system is held in the arms of children in the form of smart toys. Parents in the unit can talk to their children in real time through the Internet to understand their situation and communicate with family members. Children only need to press a certain part of the toy. The button can be realized.
 In the above process, when the voice system establishes a connection with the external Internet network through the wireless network module, the system has a corresponding IP address. External users can establish a connection with the voice system through the IP address and send a call request. There will be a prompt tone on the system side, and the user can establish a call connection with other users on the Internet by pressing the answer button, and multiplex the audio input and output devices of the above-mentioned voice system to make a call. It is realized that users in any place can have a conversation with the voice system as long as they can surf the Internet, thereby realizing the expansion of local voice interaction to voice interaction on the network.
 ③Bluetooth function
 The voice system can also be equipped with a Bluetooth module, which can realize the interconnection with Bluetooth-enabled PCs, mobile phones and other smart devices, so as to conveniently establish a wireless connection with the client software on the PC and realize voice recognition Updates in the form of dialogue content, songs, stories, etc. It can also complete functions such as intelligent upgrade of built-in software.
 When the bluetooth module in the system is turned on, the bluetooth module will automatically search for the surrounding bluetooth communication. When a bluetooth device (such as a bluetooth-enabled laptop or high-end mobile phone) is found, it will connect with the device After the notebook is allowed to connect, the Bluetooth module establishes a Bluetooth-based wireless connection with the notebook computer, so as to realize the communication between the client software running on the notebook computer and the voice system and the download of dialogue content and other files.
 The system described in this embodiment can also realize the setting of the interactive system and the demonstration of 3D and animation without a computer. The details are as follows:
 ①Setting of interactive system without computer connection:
 The voice system can include a true-color TFT LCD and an outside touch screen, which can display some operating conditions and information of the interactive system. At the same time, the user can conveniently set the dialogue content and the playback sequence of songs and stories through the touch screen. Therefore, the interactive system can be set and updated without connecting to a PC.
 ②3D and animation presentation:
 Through the LCD display of the system, 3D and animation can be played, so that the content of the interactive system is more enriched. At the same time, the LCD display can display patterns of different emotions (such as joy, sorrow, crying, smiling face, dejection Etc.), combined with voice dialogue recognition, so that the interaction process is more natural and lifelike, as if two people are communicating and talking.
 See the hardware circuit structure of the LCD part Picture 12 , The LCD driver is supported in the embedded Linux operating system, just like the desktop monitor, it is connected through the line. At the same time, the system can also be set with a touch screen. When the touch screen is set, its control signal is also connected to the central processing unit S3C2410. Calibration is performed during the first use. When the user taps the touch screen with the touch pen, the (x, y) coordinate information corresponding to the touch screen is transmitted to the CPU, and the corresponding operation is performed according to the position information.
 The 3D and animated demonstrations are stored in the Flash memory, called by the central processing unit S3C2410, and displayed on the LCD screen. At the same time, it is combined with the voice recognition state machine (FSM). The central processing unit S3C2410 is based on the state of the voice system Make judgments so that while outputting voice information, different animations and patterns are displayed on the LCD.
 In summary, the system in the embodiment of the present invention is connected to the computer through a USB cable (or wireless connection). The client software installed on the computer can automatically identify the system and establish a connection. The client software can easily customize your own voice interaction scenarios, including the ability to set up the recognized question, use your own recording as the system answer, and insert songs, stories and other scenarios in the middle, and you can also design based on voice recognition The game links of the game, such as story solitaire, mathematics breakthrough, quiz, etc., after completing the steps specified by the client software, they can be quickly and conveniently downloaded to the system memory through the USB interface line, thus becoming a new content and content Voice interaction device for own voice. It can be customized by users, give full play to their imagination, and create different scenarios and content, which is more flexible, intelligent, and participatory.