Method and device for executing voice instruction of intelligent device and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By segmenting user voice commands and comparing their similarity with sound waves, words not in the database are corrected, ensuring accurate execution of voice commands. This solves the problem of voice recognition deviation caused by different pronunciations and improves the effectiveness of smart home control.

CN116052660BActive Publication Date: 2026-06-16GREE ELECTRIC APPLIANCE INC OF ZHUHAI +1

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: GREE ELECTRIC APPLIANCE INC OF ZHUHAI
Filing Date: 2022-12-21
Publication Date: 2026-06-16

Application Information

Patent Timeline

21 Dec 2022

Application

16 Jun 2026

Publication

CN116052660B

IPC: G10L15/22; G10L15/08; G10L15/18; G10L25/51; H04L12/28

CPC: G10L15/22; G10L15/08; G10L15/1822; G10L25/51; H04L12/2803; G10L2015/223; Y02P90/02

AI Tagging

Application Domain

Speech recognition Total factory control

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Because everyone's pronunciation is different, there are deviations in speech recognition, resulting in low speech recognition efficiency and affecting the effectiveness of controlling smart homes through voice commands.

Method used

By segmenting user voice commands into words, comparing the segmentation results with words in a preset database, and using semantic corrections based on sound wave graph similarity exceeding a threshold, the accurate execution of voice commands is ensured.

Benefits of technology

Even with accent issues, it can accurately recognize users' voice commands, avoiding misoperation or failure to recognize them, thus improving the accuracy of voice command execution.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116052660B_ABST

Patent Text Reader

Abstract

The application relates to an execution method and device of a voice instruction of an intelligent device and a storage medium, wherein the method comprises the following steps: acquiring a voice instruction for controlling an intelligent device by a user; performing word segmentation on a sentence corresponding to the voice instruction, and comparing a word segmentation result with words in a preset database; comparing a sound wave graph of a first word not in the database in the word segmentation result with a sound wave graph of a second word in the database; and executing the voice instruction in combination with semantics of the second word in a case where the similarity of the sound wave graphs exceeds a first preset threshold. Through the application, the voice recognition is deviated due to different pronunciation in the related art, thereby causing the problem that the effect of controlling an intelligent home through a voice instruction is poor.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of voice processing, and more particularly to a method, apparatus and storage medium for executing voice commands of a smart device. Background Technology

[0002] With the development of technology, smart homes have become common in the lives of most people, and voice operation of smart products has become the mainstream trend. However, this has also brought about some problems, such as different pronunciations of each person, which leads to deviations in the system's voice recognition, or errors in recognition due to certain homophones, resulting in low voice recognition efficiency and poor performance in controlling smart homes through voice commands. Summary of the Invention

[0003] This application provides a method, apparatus, and storage medium for executing voice commands on a smart device, in order to solve the problem in related technologies where voice recognition is flawed due to differences in pronunciation, resulting in poor control of smart homes via voice commands.

[0004] In a first aspect, this application provides a method for executing voice commands on a smart device, comprising: acquiring a voice command from a user to control the smart device; segmenting the statement corresponding to the voice command into words, and comparing the segmentation results with words in a preset database; comparing the acoustic waveform of a first word whose segmentation result is not in the database with the acoustic waveform of a second word in the database; and executing the voice command in combination with the semantics of the second word if the similarity of the acoustic waveforms exceeds a first preset threshold.

[0005] Secondly, this application provides an execution device for voice commands of a smart device, comprising: an acquisition module for acquiring voice commands from a user to control the smart device; a processing module for segmenting the statement corresponding to the voice command into words and comparing the segmentation results with words in a preset database; a comparison module for comparing the sound wave map of a first word whose segmentation result is not in the database with the sound wave map of a second word in the database; and an execution module for executing the voice command in combination with the semantics of the second word if the similarity of the sound wave maps exceeds a first preset threshold.

[0006] Thirdly, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

[0007] Memory, used to store computer programs;

[0008] When a processor executes a program stored in a memory, it implements the steps of the method for executing voice commands of a smart device as described in any embodiment of the first aspect.

[0009] Fourthly, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of the method for executing voice commands for a smart device as described in any embodiment of the first aspect.

[0010] The technical solutions provided in this application have the following advantages compared with the prior art:

[0011] In this embodiment, after receiving a user's voice command to control a smart device, the voice command is segmented. If any segmented words are not in the preset database, they are compared with words in the database to obtain words with waveform similarity. Words not in the preset database are then corrected to obtain the correct semantics of the voice command, and the corrected voice command is executed based on the correct semantics. In other words, this application can correct inaccurate recognition of user-speaking information even if accent issues occur, resulting in a correct voice command. This ensures more accurate execution of subsequent voice commands, avoids misoperation or unrecognition, and solves the problem in related technologies where different pronunciations lead to speech recognition deviations, resulting in poor performance when controlling smart homes via voice commands. Attached Figure Description

[0012] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.

[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0014] Figure 1 One of the flowcharts illustrating a method for executing voice commands on a smart device, provided in an embodiment of this application;

[0015] Figure 2 A second schematic flowchart illustrating a method for executing voice commands on a smart device, provided as an embodiment of this application;

[0016] Figure 3 A flowchart illustrating a semantic correction method based on acoustic wave recognition provided in an embodiment of this application;

[0017] Figure 4 This is one of the structural schematic diagrams of a voice command execution device for a smart device provided in an embodiment of this application;

[0018] Figure 5 A second schematic diagram of the structure of a voice command execution device for a smart device provided in an embodiment of this application;

[0019] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0021] Figure 1 A flowchart illustrating a method for executing voice commands on a smart device, as provided in this application embodiment, is shown below. Figure 1 As shown, the steps of this method include:

[0022] Step 101: Obtain the user's voice commands to control the smart device;

[0023] It should be noted that the smart devices in the embodiments of this application can be smart home devices, such as air conditioners, televisions, washing machines, refrigerators, etc., or smart office equipment, such as printers, smart chairs, etc., or personal smart devices, such as mobile phones, smartwatches, etc.

[0024] Step 102: Segment the sentence corresponding to the voice command into words, and compare the segmentation results with words in the preset database;

[0025] In this embodiment, voice commands are voice information used to control smart devices, such as "check water purifier filter lifespan," "check air conditioner maintenance schedule," and "turn on printer." Taking the voice command "check water purifier filter lifespan" as an example, the result after word segmentation is "check," "water purifier," "filter," and "lifespan." If, due to the user's accent, "check water purifier filter lifespan" might be misinterpreted as "check water purifier travel lifespan," then the result after analysis would be "check," "water purifier," "travel," and "lifespan." It is evident that word segmentation involves dividing words into their conventional combinations in Chinese to obtain the smallest semantically recognizable unit.

[0026] Step 103: Compare the sonic waveform of the first word whose segmentation result is not in the database with the sonic waveform of the second word in the database;

[0027] It should be noted that the second word can be any word in the database, or it can be a word that is prone to errors. Furthermore, if the user's intended voice command is "Query water purifier filter lifespan," but due to accent issues, the final recognized voice command is "Query water purifier travel lifespan," then after word segmentation, "travel" is clearly not the intended meaning of the voice command. Therefore, it needs to be corrected to obtain the correct voice command. For voice control commands of smart devices, commonly used ones are "Open…", "Close…", "Query…", so a database can be pre-set, storing commonly used voice commands. After receiving a voice command from the user, it can be compared with words in the database. For example, if only the waveforms of "filter" and "travel" are similar in the database, and "filter" is also in the preset voice commands, then "travel" can be corrected to "filter." In other words, the database in this embodiment can pre-store voice commands and their word segmentation results, thereby quickly identifying the current user's voice command.

[0028] Step 104: If the similarity of the sound wave map exceeds the first preset threshold, execute the voice command in combination with the semantics of the second word.

[0029] It should be noted that the first preset threshold in the embodiments of this application can be set according to actual needs. For example, the amplitude and frequency in the waveform after comparison are both greater than 50%, 60%, 80%, etc.

[0030] As can be seen, in this embodiment, after obtaining the user's voice command to control the smart device, the voice command is segmented. If any segmented words are not in the preset database, they are compared with words in the database to obtain words with waveform similarity. Then, words not in the preset database are corrected to obtain the correct semantics of the voice command, and the corrected voice command is executed based on the correct semantics. In other words, this application can correct even inaccurate recognition of user-generated voice information due to accent issues, ensuring more accurate execution of subsequent voice commands. This avoids misoperation or unrecognized situations and solves the problem in related technologies where different pronunciations lead to speech recognition deviations, resulting in poor performance when controlling smart homes via voice commands.

[0031] In an optional embodiment of this application, the method of comparing the word segmentation results with words in a preset database in step 102 may further include:

[0032] Step 11: Count the words in the word segmentation results that exist in the database;

[0033] Step 12: Compare the number of words in the segmentation results that exist in the database with the total number of words in the segmentation results to obtain the effectiveness of the voice commands;

[0034] In a specific example, taking "query water purifier travel lifespan" as an example, the analysis results are "query", "water purifier", "travel", and "lifespan". Among them, "query", "water purifier", and "lifespan" have corresponding words in the database, while "travel" does not exist in the database. Therefore, the number of words after word segmentation for "query water purifier travel lifespan" is 4, and the number of words that exist in the database is 3. Thus, the effectiveness rate of "query water purifier travel lifespan" is 3 / 4 = 0.75, which corresponds to an effectiveness rate of 75%.

[0035] Step 13: If the effectiveness rate exceeds the second preset threshold, determine the voice command as a valid command;

[0036] In this embodiment of the application, the second preset threshold can be set according to actual needs, such as 60%, 65%, 70%, etc.

[0037] Step 14: If the effectiveness rate does not exceed the second preset threshold, the voice command is determined to be an invalid command; wherein, the first word in the valid command that does not exist in the database is compared with the sound wave diagram.

[0038] The second preset threshold is set to ensure that fewer words are not correctly recognized in the voice command, so that the correct semantics of the voice command can be obtained in subsequent corrections. If the second preset threshold is set too low, then if the current voice command recognition error rate is high, it cannot be corrected, that is, it is an invalid command, and the user needs to issue the voice command again.

[0039] In the optional implementations of this application, the method of comparing the acoustic waveform of the first word whose word segmentation result is not in the database with the acoustic waveform of the second word in the database in step 103 above can further include:

[0040] Step 21: Obtain the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word, wherein the waveform parameters include at least one of the following: amplitude and frequency;

[0041] Step 22: Compare the waveform parameters of the acoustic waveform of the first word with the waveform parameters of the acoustic waveform of the second word;

[0042] Step 23: Obtain the ratio between the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word, wherein the ratio is used to characterize the similarity.

[0043] In this embodiment, the waveform is obtained based on the intonation of the words; that is, each word has a corresponding waveform, and each waveform has a corresponding amplitude and frequency. Since this application primarily addresses speech recognition errors caused by accents or pronunciation issues, and these errors often involve words with similar pronunciations (e.g., "travel" and "filter"), this application can determine the similarity between two words by comparing the amplitude and frequency in the waveform. If the similarity is high, the second word is considered the correct word.

[0044] It should be noted that if there are multiple words in the current database whose similarity meets the first preset threshold, then semantic recognition can be performed on these multiple words, and the word whose semantics best matches the current voice command can be determined as the word that needs to replace the first word.

[0045] In an optional embodiment of this application, the method of executing voice commands by combining the semantics of the second word involved in step 103 above may further include:

[0046] Step 31: Correct the first word in the word segmentation result to the second word;

[0047] Step 32: Based on the semantics of the speech command corresponding to the corrected word segmentation result, execute the speech command corresponding to the corrected word segmentation result.

[0048] For steps 31 and 32 above, taking "Query water purifier travel life" as an example, the analyzed result would be "Query", "water purifier", "travel", and "life". "Query", "water purifier", and "life" have corresponding words in the database, while "travel" does not. Comparing the waveform of "travel" with the waveforms of words in the database, based on waveform similarity, the database ultimately identifies "filter cartridge". Therefore, the voice command is corrected based on "filter cartridge", and the correct voice command "Query water purifier filter cartridge life" is executed. Of course, it's also possible to identify multiple words that meet the waveform similarity criteria, such as "filter cartridge", "aluminum cartridge", and "green core". In this case, it's necessary to further determine whether the semantics of the identified result match the current voice command. In this example, "filter cartridge" matches the current voice command, while the semantics of "aluminum cartridge" and "green core" do not match. Therefore, the "filter cartridge" voice command needs to be corrected, and the correct voice command "Query water purifier filter cartridge life" is executed.

[0049] In this embodiment of the application, after executing the voice command in conjunction with the semantics of the second word, such as Figure 2 As shown, the method steps in the embodiments of this application may further include:

[0050] Step 105: Back up the first word to the database.

[0051] As can be seen, in this embodiment, the first word can be added to the database. In a specific example, taking "check water purifier travel life" as an example, "travel" can be added to the database. The next time the command is "check water purifier travel life," the voice command "check water purifier travel life" can be executed directly without adjustment, as its correct meaning, "check water purifier filter life," is already known. In other words, even if the pronunciation is not standard, the correct meaning can be directly recognized in this application, improving the user experience of controlling smart devices with voice commands.

[0052] The following describes a specific implementation of this application, which provides a semantic correction method based on sound wave recognition. This method generates corresponding sound wave maps for unrecognized words, compares the sound wave maps with data in an existing database, and if a set value is reached, corrects the word and adds the original word to a backup database. If the same word appears again, it will be successfully recognized. Figure 3 As shown, the steps of this semantic correction method based on sound wave recognition include:

[0053] Step 301: Collect the voice commands spoken by the user;

[0054] For example, if the user's intention is to check the lifespan of a water purifier filter, the voice message can be extracted into text.

[0055] Step 302, extract the voice command as: Query the water purifier's travel life;

[0056] The reason for step 302 is that the user's pronunciation or other factors may have caused inaccurate speech extraction, that is, "query water purifier filter life" was identified as "query water purifier travel life".

[0057] Step 303: The sentence is segmented into "query", "water purifier", "travel", and "lifespan" through word segmentation. At this point, the system can recognize the semantics of "query", "water purifier", and "lifespan", but cannot recognize "travel". The current effectiveness rate is 0.75%.

[0058] In a specific example, a threshold can be set to control resource consumption. If the value exceeds this threshold, it is likely a valid command; if it is below the threshold, it indicates that the current voice command is invalid.

[0059] Step 304: Generate a sound wave map of "travel" and compare it with the sound wave maps in the existing database;

[0060] In a specific example, a threshold can be set to determine whether the similarity meets the standard. The threshold can be set according to the acceptable level of the specific scenario. The higher the value, the higher the accuracy of the system.

[0061] Step 305: If the set threshold is reached, the original word can be treated as the matched word. At this time, the system can successfully identify the intent and add the original word to a backup database.

[0062] Adding the original words to a backup database allows subsequent identical statements to be successfully analyzed for intent upon reaching step 303.

[0063] Through steps 301 to 305 above, for homophones or near-homophones extracted due to pronunciation problems that may exist during speech recognition, resulting in a low speech recognition rate, this application generates a sound wave map of the unrecognized part of the text, and compares it with the data in the existing hot words to correct the unrecognized content, thereby improving the semantic recognition efficiency and automatically expanding the database.

[0064] Corresponding to the above Figure 1 This application also provides a device for executing voice commands on a smart device, such as... Figure 4 As shown, the device includes:

[0065] The acquisition module 42 is used to acquire the user's voice commands to control the smart device;

[0066] Processing module 44 is used to segment the sentences corresponding to the voice commands into words and compare the segmentation results with words in a preset database.

[0067] The comparison module 46 is used to compare the sonic waveform of the first word whose word segmentation result is not in the database with the sonic waveform of the second word in the database.

[0068] The execution module 48 is used to execute a voice command in combination with the semantics of the second word when the similarity of the sound wave map exceeds a first preset threshold.

[0069] As can be seen, in this embodiment, after obtaining the user's voice command to control the smart device, the voice command is segmented. If any segmented words are not in the preset database, they are compared with words in the database to obtain words with waveform similarity. Then, words not in the preset database are corrected to obtain the correct semantics of the voice command, and the corrected voice command is executed based on the correct semantics. In other words, this application can correct even inaccurate recognition of user-generated voice information due to accent issues, ensuring more accurate execution of subsequent voice commands. This avoids misoperation or unrecognized situations and solves the problem in related technologies where different pronunciations lead to speech recognition deviations, resulting in poor performance when controlling smart homes via voice commands.

[0070] In an optional embodiment of this application, the processing module 44 in this application embodiment may further include: a statistics unit, used to count the words existing in the database in the word segmentation results; a processing unit, used to divide the number of words existing in the database in the word segmentation results by the total number of words in the word segmentation results to obtain the effectiveness rate of the voice command; a first determination unit, used to determine the voice command as a valid command if the effectiveness rate exceeds a second preset threshold; a second determination unit, used to determine the voice command as an invalid command if the effectiveness rate does not exceed the second preset threshold; wherein, the first word in the valid command that does not exist in the database is compared with the sound wave diagram.

[0071] In a specific example, taking "Query water purifier travel lifespan" as an example, the analysis results are "Query", "water purifier", "travel", and "lifespan". "Query", "water purifier", and "lifespan" have corresponding words in the database, while "travel" does not. Therefore, the number of words after segmentation for "Query water purifier travel lifespan" is 4, and the number of words existing in the database is 3. Thus, the effectiveness rate of "Query water purifier travel lifespan" is 3 / 4 = 0.75, corresponding to an effectiveness rate of 75%. Furthermore, in this embodiment, the second preset threshold can be set according to actual needs, such as 60%, 65%, or 70%. The setting of this second preset threshold is to ensure that fewer words are not correctly recognized in the voice command, so that the correct semantics of the voice command can be obtained in subsequent corrections. If the second preset threshold is set too low, and the current voice command recognition error rate is high, it cannot be corrected, i.e., it is an invalid command, requiring the user to reissue the voice command.

[0072] In an optional embodiment of this application, the comparison module 46 in this application embodiment may further include: a first acquisition unit, configured to acquire waveform parameters of the acoustic waveform of the first word and waveform parameters of the acoustic waveform of the second word, wherein the waveform parameters include at least one of the following: amplitude and frequency; a comparison unit, configured to compare the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word; and a second acquisition unit, configured to acquire the ratio between the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word, wherein the ratio is used to characterize the similarity.

[0073] In this embodiment, the waveform is obtained based on the intonation of the words; that is, each word has a corresponding waveform, and each waveform has a corresponding amplitude and frequency. Since this application primarily addresses speech recognition errors caused by accents or pronunciation issues, and these errors often involve words with similar pronunciations (e.g., "travel" and "filter"), this application can determine the similarity between two words by comparing the amplitude and frequency in the waveform. If the similarity is high, the second word is considered the correct word.

[0074] It should be noted that if there are multiple words in the current database whose similarity meets the first preset threshold, then semantic recognition can be performed on these multiple words, and the word whose semantics best matches the current voice command can be determined as the word that needs to replace the first word.

[0075] In an optional embodiment of this application, the execution module 48 in this application embodiment may further include: a correction unit, used to correct the first word in the word segmentation result to a second word; and an execution unit, used to execute the voice command corresponding to the corrected word segmentation result based on the semantics of the voice command corresponding to the corrected word segmentation result.

[0076] Taking the query "Query water purifier travel lifespan" as an example, the analysis results would be "query," "water purifier," "travel," and "lifespan." While "query," "water purifier," and "lifespan" have corresponding words in the database, "travel" does not. Comparing the waveform of "travel" with the waveforms of words in the database, the database identifies "filter cartridge" based on waveform similarity. Therefore, the voice command is corrected based on "filter cartridge," and the correct voice command "Query water purifier filter cartridge lifespan" is executed. Alternatively, multiple words matching the waveform similarity might be identified, such as "filter cartridge," "aluminum cartridge," and "green core." In this case, it's necessary to further determine if the semantics of the identified results match the current voice command. In this example, "filter cartridge" matches the current voice command, while "aluminum cartridge" and "green core" do not. Therefore, the "filter cartridge" voice command needs to be corrected, and the correct voice command "Query water purifier filter cartridge lifespan" is executed.

[0077] In optional embodiments of this application, such as Figure 5 As shown, the apparatus in this embodiment further includes a backup module 52, used to back up the first word to the database after executing the voice command in conjunction with the semantics of the second word.

[0078] As can be seen, in this embodiment, the first word can be added to the database. In a specific example, taking "check water purifier travel life" as an example, "travel" can be added to the database. The next time the command is "check water purifier travel life," the voice command "check water purifier travel life" can be executed directly without adjustment, as its correct meaning, "check water purifier filter life," is already known. In other words, even if the pronunciation is not standard, the correct meaning can be directly recognized in this application, improving the user experience of controlling smart devices with voice commands.

[0079] like Figure 6 As shown in the figure, this application provides an electronic device, including a processor 111, a communication interface 112, a memory 113, and a communication bus 114, wherein the processor 111, the communication interface 112, and the memory 113 communicate with each other through the communication bus 114.

[0080] Memory 113 is used to store computer programs;

[0081] In one embodiment of this application, when the processor 111 executes the program stored in the memory 113, it implements the method for executing voice commands of a smart device provided in any of the aforementioned method embodiments, and its function is similar, so it will not be described again here.

[0082] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the smart device voice command execution method provided in any of the foregoing method embodiments.

[0083] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0084] The above description is merely a specific embodiment of the present invention, enabling those skilled in the art to understand or implement the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. A method for executing voice commands in a smart device, characterized in that, include: Acquire user voice commands to control smart devices; The sentence corresponding to the voice command is segmented into words, and the segmentation results are compared with words in a preset database. The sound wave map of the first word whose word segmentation result is not in the database is compared with the sound wave map of the second word in the database; wherein, the sound wave map is obtained based on the pronunciation and intonation of the word, that is, each word has a corresponding sound wave map, and each sound wave map has a corresponding amplitude and frequency; If the similarity of the sound wave map exceeds a first preset threshold, the voice command is executed in combination with the semantics of the second word; wherein, if there are multiple words in the current database whose similarity meets the first preset threshold, semantic recognition is performed on these multiple words, and the word whose semantics best fits the current voice command is determined as the word to replace the first word; The step of comparing the acoustic waveform of the first word whose word segmentation result is not in the database with the acoustic waveform of the second word in the database includes: obtaining waveform parameters of the acoustic waveform of the first word and waveform parameters of the acoustic waveform of the second word, wherein the waveform parameters include at least one of the following: amplitude and frequency; comparing the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word; obtaining the ratio between the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word, wherein the ratio is used to characterize the similarity.

2. The method according to claim 1, characterized in that, The step of comparing the word segmentation results with words in a preset database includes: Analyze the words present in the database from the segmentation results; The effectiveness of the voice command is obtained by dividing the number of words present in the database in the word segmentation result by the total number of words in the word segmentation result. If the effectiveness rate exceeds a second preset threshold, the voice command is determined to be a valid command. If the effectiveness rate does not exceed the second preset threshold, the voice command is determined to be an invalid command; wherein, the first word in the valid command that does not exist in the database is compared with the sound wave diagram.

3. The method according to claim 1, characterized in that, After executing the voice instruction in conjunction with the semantics of the second word, the method further includes: The first word is backed up to the database.

4. The method according to claim 1, characterized in that, The execution of the voice instruction in conjunction with the semantics of the second word includes: The first word in the word segmentation result is corrected to the second word; Based on the semantics of the speech command corresponding to the corrected word segmentation result, execute the speech command corresponding to the corrected word segmentation result.

5. A voice command execution device for a smart device, characterized in that, include: The acquisition module is used to acquire the user's voice commands to control the smart device; The processing module is used to segment the sentences corresponding to the voice commands into words and compare the segmentation results with words in a preset database. The comparison module is used to compare the acoustic waveform of the first word whose word segmentation result is not in the database with the acoustic waveform of the second word in the database; wherein the acoustic waveform is obtained based on the pronunciation and intonation of the word, that is, each word has a corresponding acoustic waveform, and each acoustic waveform has a corresponding amplitude and frequency; The execution module is used to execute the voice command in combination with the semantics of the second word when the similarity of the sound wave map exceeds a first preset threshold; wherein, if there are multiple words in the current database whose similarity meets the first preset threshold, then semantic recognition is performed on these multiple words, and the word whose semantics best fits the current voice command is determined as the word to replace the first word; The comparison module includes: a first acquisition unit, configured to acquire waveform parameters of the acoustic waveform of the first word and the acoustic waveform of the second word, wherein the waveform parameters include at least one of the following: amplitude and frequency; a comparison unit, configured to compare the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word; and a second acquisition unit, configured to acquire the ratio between the waveform parameters of the acoustic waveform of the first word and the waveform parameters of the acoustic waveform of the second word, wherein the ratio is used to characterize the similarity.

6. The apparatus according to claim 5, characterized in that, The processing module includes: The statistical unit is used to count the words present in the database in the word segmentation results; The processing unit is used to compare the number of words existing in the database in the word segmentation result with the total number of words in the word segmentation result to obtain the effectiveness of the voice command; The first determining unit is configured to determine that the voice command is a valid command when the efficiency exceeds a second preset threshold. The second determining unit is used to determine that the voice command is an invalid command if the effectiveness does not exceed the second preset threshold; wherein, the operation of comparing the first word in the valid command that does not exist in the database is performed.

7. An electronic device, characterized in that, It includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; Memory, used to store computer programs; A processor, when executing a program stored in memory, implements the method described in any one of claims 1-4.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-4.