[0048] It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
[0049] Such as figure 1 As shown, figure 1 It is a schematic structural diagram of the hardware operating environment involved in the solution of the embodiment of the present invention.
[0050] It should be noted, figure 1 That can be a structural diagram of the hardware operating environment of the voice remote control device. The voice remote control device in the embodiment of the present invention may be a terminal device such as a PC and a portable computer.
[0051] Such as figure 1 As shown, the voice remote control device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.
[0052] Those skilled in the art can understand, figure 1 The structure of the voice remote control device shown in does not constitute a limitation on the voice remote control device, and may include more or fewer components than shown in the figure, or a combination of some components, or a different component arrangement.
[0053] Such as figure 1 As shown, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a voice remote control program. Among them, the operating system is a program that manages and controls the hardware and software resources of the voice remote control device, and supports the operation of the voice remote control program and other software or programs.
[0054] in figure 1 In the voice remote control device shown, the user interface 1003 is mainly used for data communication with various terminals; the network interface 1004 is mainly used for connecting to a background server, and for data communication with the background server; and the processor 1001 can be used to call the storage in the memory 1005 Voice remote control program and do the following:
[0055] Receiving first audio data sent by a remote control terminal, where the first audio data is processed by the remote control terminal according to an acquired user voice remote control instruction;
[0056] Processing the first audio data according to preset rules to obtain second audio data;
[0057] Sending the second audio data to a cloud server;
[0058] Receive a control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command; the control command text is processed by the cloud server according to the second audio data.
[0059] Further, before the step of sending the second audio data to the cloud server, the processor 1001 may also be used to call a voice remote control program stored in the memory 1005 and execute the following steps:
[0060] Create a Socket connection and send a connection request to the cloud server;
[0061] Receiving a response from the cloud server to the connection request, and establishing a Socket connection with the cloud server.
[0062] Further, the processor 1001 may also be used to call the voice remote control program stored in the memory 1005 and execute the following steps:
[0063] Obtain preset audio optimization standards;
[0064] The first audio data is optimized based on the acquired audio optimization standard, and the optimized first audio data is used as the second audio data.
[0065] Further, before the step of receiving the first audio data sent by the remote control terminal, the processor 1001 may also be used to call the voice remote control program stored in the memory 1005 and execute the following steps:
[0066] Check whether a preset write command is received;
[0067] If yes, enter the step: receiving the first audio data sent by the remote control terminal.
[0068] Further, before the step of sending the second audio data to the cloud server, the processor 1001 may also be used to call a voice remote control program stored in the memory 1005 and execute the following steps:
[0069] Detect whether a preset read instruction is received;
[0070] If yes, enter the step: sending the second audio data to the cloud server.
[0071] Further, the processor 1001 may also be used to call the voice remote control program stored in the memory 1005 and execute the following steps:
[0072] In response to the Bluetooth pairing request sent by the remote control terminal, establishing a Bluetooth connection with the remote control terminal;
[0073] Based on the Bluetooth connection, receiving the first audio data sent by the remote control terminal.
[0074] Based on the above structure, various embodiments of the voice remote control method of the present invention are proposed.
[0075] Reference figure 2 , figure 2 It is a schematic flowchart of the first embodiment of the voice remote control method of the present invention.
[0076] The embodiment of the voice remote control method provided by the embodiment of the present invention, it should be noted that although the logical sequence is shown in the flowchart, in some cases, the sequence shown or described may be executed in a different order than here. A step of.
[0077] The voice remote control method in the embodiment of the present invention is applied to a controlled device. The controlled device in the embodiment of the present invention may be a terminal device such as a smart TV or a set-top box of a digital TV, which is not specifically limited here.
[0078] The voice remote control method of this embodiment includes:
[0079] Step S100, receiving first audio data sent by a remote control terminal; wherein, the first audio data is processed by the remote control terminal according to an acquired user voice remote control instruction;
[0080] At present, most consumer electronic devices are controlled by the user through the mechanical remote control when they are in use. For example, when watching TV, the user needs to manually operate the remote control for channel search, volume adjustment, program switching, signal source switching, and opening/ Turn off the application, switch the machine, adjust the TV image/sound parameters, etc.; however, users often need to click on the remote control multiple times to open the multi-level menu, find the programs to watch one by one in the program list or find the buttons that need to adjust the parameters. The search operation is very cumbersome, which brings great inconvenience to users, and the adjustment response cannot meet the requirements of real-time response.
[0081] In this embodiment, as an implementation manner, the remote control terminal has a built-in microphone input module. After the microphone input module obtains the user's voice remote control instruction, it remotely controls the acquired user voice through the MCU (Microcontroller Uni) built in the remote control terminal The instruction is processed. The voice remote control instruction is an analog signal. The processing operation can be to identify keywords, extract the backbone of the voice instruction, sample the extracted voice instruction backbone, PDM (Pulse Density Modulation; pulse density modulation) modulation, MCU encoding, etc., In this way, the analog voice remote control command is converted into a digital signal to form DMA (Direct Memory Access) data, that is, the first audio data is obtained, and the obtained first audio data is sent to the controlled device.
[0082] In this embodiment, the controlled device receives the first audio data sent by the remote control terminal. As an implementation manner, the remote control terminal establishes a wireless connection with the controlled device, such as a Bluetooth connection. Based on the wireless connection established by both parties, the remote control terminal transmits the first audio data To the controlled device.
[0083] Step S200: Process the first audio data according to a preset rule to obtain second audio data;
[0084] Specifically, after the controlled device receives the first audio data sent by the remote control terminal, it processes the first audio data. As an implementation manner, the first audio data may be processed through Alsa (Advanced Linux Sound Architecture, Advanced Linux Sound Architecture). ) Reduce noise to generate a PCM file, and then upload the generated PCM file, that is, the second audio data to a cloud server, where the second audio data is a processed recording file stream.
[0085] Step S300, sending the second audio data to a cloud server;
[0086] In this embodiment, as an implementation manner, the websocket mechanism is used to upload the second audio data. The controlled device sends a file transfer request to the cloud server by creating a Socket connection socket, and the cloud server receives the processed data from the controlled device. After the recording file stream is the second audio data, the text recognition engine server recognizes the second audio data, generates the text recognition stream, which is the control command text, according to the recognition result, and sends the generated recognition command text to the controlled device.
[0087] It should be noted that the websocket mechanism adopted in this embodiment can prevent the controlled device from sending data to the cloud via HTTP request, because the HTTP client needs to synchronize with the server, which causes a large network overhead. For example, if the network is unstable, how to ensure that the data is not sent repeatedly, how to reconnect after the connection is disconnected, etc., the controlled device of this embodiment And the cloud server establishes a connection based on the websocket mechanism, which avoids the above-mentioned problems in HTTP transmission.
[0088] Step S400: Receive a control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command; wherein the control command text is executed by the cloud server according to the second audio data Handle it.
[0089] Further, after receiving the control command text from the cloud server, the controlled device parses the control command text to obtain the control instruction, and executes the control instruction. Taking the controlled device as a smart TV as an example, if the obtained control instruction is the command analysis for adjusting the volume As a result, after receiving the command text, the controlled device calls the TV system API to perform volume adjustment operations, such as Power On/Off, Mute Mute, Change channel, and OpenYouTube application.
[0090] In this embodiment, since the control command text contains character strings and numerical values, the control command text in this embodiment is expressed in JSON (JavaScript Object Notation, JS object notation) format. It can be understood that, in other embodiments, the control command text is also It can be in other forms of expression, and there is no specific restriction here.
[0091] The present invention receives first audio data sent by a remote control terminal. The first audio data is processed by the remote control terminal according to the acquired user voice remote control instructions; the first audio data is processed according to preset rules to obtain Second audio data; send the second audio data to the cloud server; receive the control command text issued by the cloud server, parse the control command text to obtain the control command, and execute the control command; the control command text is determined by The cloud server is processed according to the second audio data; thus, the user’s voice remote control instructions are sampled and encoded by the remote control terminal, or digitally processed by other data processing methods to form DMA data, that is, the first audio data, and then the An audio data is transmitted to the controlled device through the established wireless connection, and the controlled device processes the first audio data again. For example, the pcm file is generated by Alsa noise reduction processing, that is, the second audio data, and the controlled device transmits the second audio data Uploaded to the cloud server through the websocket mechanism, the cloud server performs text recognition on the second audio data, and sends the recognized control command text to the control device. The control command text can be in JSON format. Finally, the controlled device responds to the received JSON text Analyze and execute corresponding actions, which effectively solves the problem that users need to click on the remote control multiple times to open the multi-level menu when using the traditional mechanical remote control, and find the programs to watch one by one in the program list or find the buttons that need to adjust parameters The search operation is very cumbersome, and the adjustment response cannot meet the requirements of real-time response. Using the voice remote control method of the present invention, the controlled device directly performs operations based on the user’s voice remote control instructions, thereby greatly improving the user’s operational convenience, especially The convenience of operation for elderly users and users with limited mobility also meets the real-time response needs of adjustment operations.
[0092] Further, a second embodiment of the voice remote control method of the present invention is provided.
[0093] Reference image 3 , image 3 It is a schematic flowchart of the second embodiment of the voice remote control method of the present invention. Based on the first embodiment of the voice remote control method described above, in this embodiment, step S300, before the step of sending the second audio data to the cloud server, further includes:
[0094] Step S201, creating a Socket connection and sending a connection request to the cloud server;
[0095] Step S202: Receive a response from the cloud server to the connection request, and establish a Socket connection with the cloud server.
[0096] The disadvantage of data transmission based on HTTP protocol is that the HTTP client needs to synchronize with the server and wait, which requires a large network overhead for the device, and the data transmission of smart devices will face many problems, such as network instability. Next, if there is no problem with the data transmission, how to ensure that the data is not sent repeatedly, how to reconnect after the connection is disconnected, HTTP cannot solve this kind of problem.
[0097] In this embodiment, the websocket mechanism is used to upload the recording file, that is, the second audio data. The controlled device sends a file transmission request to the cloud server by creating a Socket connection socket. The cloud server receives the recording file and recognizes it as text. The specific process In order, the controlled device creates a Socket connection to send a request to the cloud server, the cloud server establishes a server-side Socket listening request, and the controlled device establishes a connection with the cloud server; the controlled device sends the recording file stream, that is, the second audio data to the cloud server, the cloud server After receiving the recording file stream, the text recognition engine server recognizes the recording file stream as text to form a text recognition stream, that is, a control command text. The cloud server sends the text recognition stream to the controlled device, and the controlled device receives the cloud server issued The control command text is parsed to obtain the control command, the control command is executed, and the Socket connection is closed to release the resource.
[0098] This embodiment adopts the websocket mechanism, which can avoid that when the controlled device sends data to the cloud via HTTP request, the HTTP client needs to synchronize with the server, which causes a large network overhead, and the controlled device will face a lot of data transmission. For example, when the network is unstable, if the data transmission is guaranteed to be ok, how to ensure that the data is not sent repeatedly, how to reconnect after the connection is disconnected, etc., the controlled device and cloud server of this embodiment are based on the websocket mechanism Establishing a connection can avoid the above-mentioned problems in HTTP transmission.
[0099] Further, a third embodiment of the voice remote control method of the present invention is provided.
[0100] Reference Figure 4 , Figure 4 It is a schematic flowchart of the third embodiment of the voice remote control method of the present invention. Based on the second embodiment of the voice remote control method described above, in this embodiment, in step S200, the first audio data is processed according to a preset rule to obtain a second audio The data steps include:
[0101] Step S210, obtaining a preset audio optimization standard;
[0102] Step S220: Optimize the first audio data based on the acquired audio optimization standard, and use the optimized first audio data as the second audio data.
[0103] In this embodiment, after the controlled device receives the DMA data sent by the remote control terminal, that is, the first audio data, the main chip end of the controlled device processes the first audio data. As an implementation manner, the built-in microphone input of the remote control terminal While the module collects the user's voice remote control instructions, it also collects the environmental noise parameters of the current scene, that is, the first audio data includes the digitized user voice remote control instructions and the environmental noise parameters of the current scene. After the controlled device receives the first audio data According to the environmental noise parameters of the current environment included in the first audio data, a preset inverted noise signal that matches the environmental noise parameters is called, and the environmental noise parameters of the current environment are canceled to realize the reduction of the first audio data. Noise processing, the first audio data after noise reduction is uploaded to the cloud server as the second audio data; it is understandable that in other embodiments, the audio optimization standard may have other implementation modes, and is not limited to the one described in this embodiment. The described implementation.
[0104] In this embodiment, by receiving the first audio data sent by the remote control terminal, a preset audio optimization standard is obtained, the first audio data is optimized based on the obtained audio optimization standard, and the optimized first audio data is used as the second audio data. Audio data, create a Socket connection and send a connection request to the cloud server, receive a response from the cloud server to the connection request, establish a Socket connection with the cloud server, send the second audio data to the cloud server, and receive the The control command text issued by the cloud server is analyzed, the control command text is parsed to obtain the control command, and the control command is executed; thus, while improving the convenience of user operation and meeting the real-time response requirements of the adjustment operation, the voice control command is improved The accuracy of recognition ensures the effectiveness of voice control.
[0105] Further, a fourth embodiment of the voice remote control method of the present invention is provided.
[0106] Reference Figure 5 , Figure 5 It is a schematic flowchart of the fourth embodiment of the voice remote control method of the present invention. Based on the third embodiment of the voice remote control method described above, in this embodiment, step S100, before the step of receiving the first audio data sent by the remote control terminal, further includes:
[0107] Step S101, detecting whether a preset write instruction is received;
[0108] If yes, proceed to step S100 to receive the first audio data sent by the remote control terminal.
[0109] Further, in this embodiment, step S300, before the step of sending the second audio data to the cloud server, further includes:
[0110] Step S301, detecting whether a preset read instruction is received;
[0111] If yes, enter step S300 to send the second audio data to the cloud server.
[0112] In this embodiment, the controlled device uses Alsa (Advanced Linux Sound Architecture, Advanced Linux Sound Architecture) audio driver, Alsa supports Bluetooth sound equipment, Alsa read and write operations are set by the user to call write and read instructions. Triggered, the controlled device of this embodiment receives the first audio data sent by the remote control terminal after detecting that the preset write instruction is received; after detecting that the preset read instruction is received, it sends the second audio data to Cloud Server.
[0113] Further, a fifth embodiment of the voice remote control method of the present invention is provided.
[0114] Reference Image 6 , Image 6 It is a schematic flowchart of the fifth embodiment of the voice remote control method according to the present invention. Based on the first embodiment of the voice remote control method described above, in this embodiment, step S100, the step of receiving the first audio data sent by the remote control terminal includes:
[0115] Step S110, in response to the Bluetooth pairing request sent by the remote control terminal, establish a Bluetooth connection with the remote control terminal;
[0116] Step S120, based on the Bluetooth connection, receive first audio data sent by the remote control terminal.
[0117] In this embodiment, as an implementation manner, the remote control terminal has a built-in first Bluetooth module, and the controlled device has a built-in second Bluetooth module. The first Bluetooth module establishes a wireless connection with the second Bluetooth module through searching, scanning, and pairing. Bluetooth connection, the remote control terminal transmits the original audio data queue to the controlled device via Bluetooth, that is, the remote control terminal sends the first audio data to the controlled device.
[0118] It should be noted that, in other embodiments, the wireless connection between the remote control terminal and the controlled device is not limited to Bluetooth connection, but may also be other wireless connection methods, which is not specifically limited in this embodiment.
[0119] In addition, an embodiment of the present invention also provides a voice remote control system, which includes a remote control terminal, a controlled device, and a cloud server;
[0120] The remote control terminal is used to obtain a user voice remote control instruction based on preset conditions, and is also used to perform analog-to-digital conversion processing on the voice remote control instruction to obtain first audio data, and send the first audio data to the Controlled device
[0121] The controlled device is configured to, after receiving the first audio data sent by the remote control terminal, process the first audio data according to preset rules to obtain second audio data, and send the first audio data Second audio data to the cloud server;
[0122] The cloud server is configured to, after receiving the second audio data, recognize the second audio data according to a preset recognition rule and generate a control command text, and send the control command text to the controlled device ;
[0123] The controlled device is further configured to receive the control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command.
[0124] Preferably, the remote control terminal includes:
[0125] The receiving unit is configured to receive the start recording instruction or the stop recording instruction input by the user, and send the received start recording instruction or the stop recording instruction to:
[0126] The recording unit is used to detect the user's voice remote control instruction and record the detected voice remote control instruction after receiving the start recording instruction, and the recording unit is also used to stop the recording after receiving the stop recording instruction Describe the recording actions and save the recorded user voice remote control instructions;
[0127] A processing unit, configured to perform analog-to-digital conversion processing on the voice remote control instruction to obtain first audio data;
[0128] The sending unit is configured to send the first audio data to the controlled device.
[0129] In this embodiment, as an implementation manner, the remote control terminal has a physical voice key or touch voice key to trigger the capture of the user's voice remote control instruction. When the user needs to record the voice remote control instruction, press the voice button to start recording and release the voice Press the key to stop recording, so that only relevant data is collected, which avoids unnecessary recognition pressure and transmission bandwidth pressure caused by the continuous monitoring of environmental voice commands by the remote control terminal, and improves the accuracy of voice remote control command control.
[0130] The steps of implementing the voice remote control method as described above when each component of the voice remote control system proposed in this embodiment is running will not be repeated here.
[0131] In addition, an embodiment of the present invention also provides a controlled device, and the controlled device includes:
[0132] A receiving module, configured to receive first audio data sent by a remote control terminal, the first audio data being processed by the remote control terminal according to an acquired user voice remote control instruction;
[0133] A processing module, configured to process the first audio data according to preset rules to obtain second audio data;
[0134] The upload module is used to send the second audio data to the cloud server;
[0135] The execution module is configured to receive a control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command; the control command text is used by the cloud server according to the second audio data Handle it.
[0136] The steps of the voice remote control method as described above are implemented when each module of the voice remote control device proposed in this embodiment is running, which will not be repeated here.
[0137] In addition, the embodiment of the present invention also proposes a computer-readable storage medium with a voice remote control program stored on the storage medium, and when the voice remote control program is executed by a processor, the steps of the voice remote control method described above are implemented.
[0138] For the method implemented when the voice remote control program running on the processor is executed, refer to the various embodiments of the voice remote control method of the present invention, which will not be repeated here.
[0139] It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article or device that includes the element.
[0140] The sequence numbers of the foregoing embodiments of the present invention are only for description, and do not represent the superiority of the embodiments.
[0141] Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. 的实施方式。 Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.
[0142] The above are only the preferred embodiments of the present invention, and do not limit the scope of the present invention. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the present invention, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of the present invention.