Voice recognition method based on mobile terminal and mobile terminal

A mobile terminal and voice recognition technology, applied in the field of communication, can solve problems such as potential safety hazards, potential safety hazards, and damage to user social relationships, and achieve the effects of ensuring normal work, improving user experience, and improving sensing capabilities

Active Publication Date: 2010-09-22
YULONG COMPUTER TELECOMM SCI (SHENZHEN) CO LTD
5 Cites 74 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0003] 1. There are potential safety hazards
Because the use of earphones will weaken the user's perception of the sound of the external environment, when the user uses the earphones when traveling, it may cause traffic accidents due to the inability to hear the horn of the vehicle or the...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

[0032] In the specific implementation, the acquisition module 40 is a sound pickup device, which is used to collect the sound signal in the external environment, which can be a microphone, a sound sensor or other devices with a sound pickup function. Specifically, the ambient sound signal can refer to all sound signals that can be picked up in the external environment, including car horns, pedestrians talking or shouting, bicycle bells, music played outside, birds singing, and so on. ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The embodiment of the invention provides a voice recognition method based on a mobile terminal. The method comprises the following steps: collecting environmental sound signals; indentifying the environmental sound signals according to preset identification information; when the environmental sound signals contain special sound signals matching the preset identification information, judging whether the loudness of the special sound signals reach the preset threshold; and if so, controlling to output the special sound signals. Correspondingly, the embodiment of the invention also provides a mobile terminal. The mobile terminal of the invention utilizes the voice recognition technology, overcomes the application defects of the headset of the existing mobile terminal, eliminates the hidden danger caused by the reduced induction to the external environment sound when the user uses the headset of a mobile terminal and promotes the user experience.

Application Domain

Technology Topic

LoudnessEnvironmental sounds +3

Image

  • Voice recognition method based on mobile terminal and mobile terminal
  • Voice recognition method based on mobile terminal and mobile terminal
  • Voice recognition method based on mobile terminal and mobile terminal

Examples

  • Experimental program(1)

Example Embodiment

[0029] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
[0030] See figure 1 , is a schematic structural diagram of an embodiment of the mobile terminal of the present invention; as figure 1 As shown, the mobile terminal includes: a collection module 40 , an identification module 50 , a judgment module 60 and a control module 70 .
[0031] The collection module 40 is used to collect ambient sound signals;
[0032] In a specific implementation, the collection module 40 is a sound pickup device used to collect sound signals in the external environment, which may be a microphone, a sound sensor or other devices with sound pickup function. Specifically, the ambient sound signal may refer to all sound signals that can be picked up in the external environment, including car horns, pedestrians talking or shouting, bicycle bells, music played outside, bird calls, and the like. Generally, the acquisition module 40 can use at least one common omnidirectional microphone to pick up sound. In practical applications, since the omnidirectional microphone can only collect sound signals, but cannot realize the localization of ambient sound, preferably, the collection module 40 can use at least two unidirectional microphones, and the unidirectional microphones have selective Collecting the environmental sound signal in a certain direction can well realize the localization of the environmental sound signal.
[0033] The identification module 50 is configured to identify the environmental sound signal collected by the collection module 40 according to preset identification information;
[0034] In a specific implementation, the identification module 50 is used to identify the external ambient sound signal, determine whether the ambient sound signal contains a specific sound signal matching (same or similar) with the preset identification information, and identify it The sound signal is recognized. For example: the collection module 40 collects the mixed environmental sound signal, which includes environmental non-voice sounds such as car horns. Assuming that the car horns are preset as non-speech recognition information, the recognition module 50 according to the preset The identification information identifies a specific sound signal (car horn sound) that is the same or similar to the preset identification information contained in the mixed ambient sound signal; another example: the collection module 40 collects the mixed ambient sound signal, It contains the voice of the owner's name and other sounds. Assuming that the owner's name is preset in text form as the identification information, the identification module 50 converts the calling sound of the owner's name contained in the mixed ambient sound signal into text information. , compared with the preset text information of the owner's name, if the two are the same or similar, the identification module 50 identifies the specific sound signal (the calling sound of the owner's name); another example: the acquisition module 40 The mixed ambient sound signal is collected, which includes the contact voice for the voice call with the host. Assuming that the contact's voice is preset as the identification information, the identification module 50 will mix the voice according to the preset identification information. A specific sound signal (contact voice) that is the same or similar to the preset identification information contained in the ambient sound signal is identified;
[0035] The judging module 60 is configured to judge whether the loudness of the specific sound signal reaches the preset level when the recognition module 50 recognizes that the ambient sound signal contains a specific sound signal that matches the preset identification information. the threshold value;
[0036] In specific implementation, the threshold is a threshold value of sound loudness, which can be set according to actual needs. For example: for the sound information of danger signals such as car horns, bicycle bells, alarms, etc., the highest loudness threshold can be used to set the user to avoid danger in time when hearing the sound signal; another example: for friends For the voice signal calling the owner's name, the lowest loudness threshold can be adopted to pick up the voice sensitively and avoid the phenomenon that the greeting cannot be heard. Here, the threshold value can be flexibly set according to the actual needs, which can not only avoid too sensitive acquisition of environmental signals and interfere with the normal operation of the headset, but also ensure the normal operation of the headset, and can pick up the sound sensitively, which is very good for the outside world. The ambient sound signal is notified to the user.
[0037] Specifically, when the identification module 50 identifies a specific sound signal that matches the preset identification information from the ambient sound signal, such as a car horn, the judging module 60 judges the identified car horn sound Whether the loudness reaches the preset threshold.
[0038] The control module 70 is configured to control the output of the specific sound signal when the judgment module 60 judges that the loudness of the specific sound signal reaches the preset threshold.
[0039]In a specific implementation, when the specific sound signal reaches the preset threshold, the control module 70 controls to output the specific sound signal to notify the user. Specifically, the control module 70 can control the output of a specific sound signal in various ways, including but not limited to:
[0040] One: control the earphone of the mobile terminal to output the specific sound signal; specifically, the content output by the original earphone can be paused, and only the ambient sound signal can be played; or the content output by the original earphone can be mixed with the specific sound signal , and then output the mixed sound signal;
[0041] Two: control the vibration of the mobile terminal to output the specific sound signal; this output method belongs to an auxiliary prompt, and the mobile terminal vibrates to prompt the user to pay attention to the external environment;
[0042] The third step: controlling the flashing of the lights of the mobile terminal to output the specific sound signal; this output method also belongs to an auxiliary prompt, and the lights of the mobile terminal flash, such as: LED (Light Emitting Display, light-emitting diode) flashing, Or the keyboard light of the mobile terminal flashes, etc., to remind the user to pay attention to the external environment;
[0043] Fourth: control the mobile terminal to display the specific sound signal; the premise of this output method is that the mobile terminal stores the relevant information of the specific sound signal, such as: a friend calling the owner's name, the premise is that the mobile terminal The friend's information is stored, including voice information, name, contact information, etc., and the control module 70 can control the mobile terminal to display the above stored corresponding information.
[0044] see again figure 1 , the mobile terminal further includes: an earphone 10 , a trigger module 20 and a preset module 30 .
[0045] The earphone 10 may be a wired earphone or a wireless earphone (eg, a Bluetooth earphone).
[0046] Due to the instability of the mobile terminal (for example, the user can hold the mobile terminal and rotate it), the acquisition module 40 of the mobile terminal using at least two unidirectional microphones cannot well locate the ambient sound signal. Therefore, preferably, the collection module 40 can be encapsulated in the earphone 10, preferably, the collection module 40 can include two unidirectional microphones, one for each of the left and right earphones, which can well realize the environment The positioning of the sound signal; for example: when the microphone of the left earphone picks up the ambient sound signal, the control module 70 of the mobile terminal can control a specific sound signal to be output from the left earphone, so as to prompt the user to pay attention to the direction pointed by the microphone of the left ear Another example: when the microphone of the right earphone picks up the ambient sound signal, the control module 70 of the mobile terminal can control a specific sound signal to be output from the right earphone, to prompt the user to pay attention to the orientation pointed by the microphone of the right ear Preferably, in order to pick up sound signals in a better and more all-round way, the microphones in the left and right earphones can be oriented differently, and the specific processing is similar to the above case, which is not repeated here.
[0047] The triggering module 20 is configured to detect the insertion signal of the earphone 10, and trigger the acquisition module 40 to collect the ambient sound signal.
[0048] In specific implementation, when the earphone 10 is a wired earphone, the trigger module 20 can use hardware, such as a chip or a CPU, etc., to detect the plugging state of the wired earphone by hardware; when the earphone 10 is a wireless earphone ( Bluetooth headset), the trigger module 20 can use software to detect the plug-in state of the wireless headset, and when it is detected that the headset 10 is plugged in, trigger the acquisition module 40 to collect the ambient sound signal.
[0049] The preset module 30 is configured to preset multiple identification information and loudness thresholds of each identification information, and the identification information includes non-speech identification information and/or speech identification information.
[0050] In a specific implementation, the preset module 30 may be an identification library that stores various preset identification information used to identify sounds and loudness thresholds corresponding to the various identification information, wherein the identification The non-speech recognition information in the information includes but is not limited to: car horns, bicycle bells and alarms; the text information in the identification information includes but is not limited to: the speech recognition information includes but is not limited to: the voice of the mobile terminal The voice information of the contacts saved during the call, the voice information of the interlocutors saved during the daily conversation, and any one or more of the owner's name, nickname or alias of the mobile terminal saved in the form of text or voice. kind. Specifically, the preset module 30 may provide an identification library in the form of a table, such as the following table:
[0051] Table 1: Recognition library
[0052]
[0053] It can be understood that the above table 1 is only an example, and users can preset other identification information and corresponding thresholds according to actual needs, for example, a contact voice recognition library or other identification library can be preset, and various situations can be analyzed similarly. , which will not be repeated here.
[0054] As described above, when the recognition module 50 recognizes that the ambient sound signal contains a specific sound signal (car horn sound) that matches the preset recognition information, the determination module 60 will determine according to the preset module 30 The threshold value of the car horn, to determine whether the loudness of the specific sound signal reaches the threshold value.
[0055] In the present invention, the identification information and threshold are preset in the mobile terminal, and the ambient sound signal is collected in real time. When a specific sound signal in the ambient sound signal matches the preset identification information and reaches the corresponding loudness threshold, the specific sound signal is controlled. The output of the headset is notified to the owner, which improves the user's ability to sense the sound of the external environment when using the headset; in addition, by presetting various identification information, such as car horns and other dangerous information, the user's ability to use the headset is eliminated. Hidden security risks, another example: preset the owner's name, nickname and other address information, when friends greet the owner, the owner can be notified, which improves the user experience; The loudness threshold of the identification information can not only ensure the normal operation of the earphone, but also inform the user of the effective ambient sound signal from the outside world, which improves the user experience.
[0056] In order to explain the present invention more clearly, the identification module 50 of the mobile terminal of the present invention will be described in detail below.
[0057] See figure 2 , is a schematic structural diagram of an embodiment of the identification module of the mobile terminal of the present invention; such as figure 2 As shown, the recognition module 50 includes: a non-speech recognition unit 501 and a speech recognition unit 502 .
[0058] The non-voice recognition unit 501 is configured to recognize the environmental sound signal collected by the collection module 40 according to preset non-voice recognition information;
[0059] In a specific implementation, the non-speech recognition unit 501 is used to identify the external environmental sound signal, and determine whether the environmental sound signal contains a specific sound signal that matches (same or similar to) preset non-speech recognition information, and identify its specific sound signal. For example: the collection module 40 collects a mixed environmental sound signal, which includes car horns and other sounds. Assuming that the car horns are preset as the identification information of the sound (environmental non-voice), the non-voice recognition unit 501 according to The preset non-speech recognition information identifies specific sound signals (car horns) that are the same or similar to the preset non-speech recognition information contained in the mixed ambient sound signal.
[0060] The speech recognition unit 502 is configured to recognize the environmental sound signal collected by the collection module 40 according to preset speech recognition information;
[0061] In a specific implementation, the speech recognition unit 502 is used to identify the external environmental sound signal, determine whether the environmental sound signal contains a specific sound signal that matches (same or similar to) the preset speech recognition information, and identifies Its specific sound signal is recognized. For example: the collection module 40 collects the mixed environmental sound signal, which includes the voice of the owner's name and other sounds. Assuming that the owner's name is preset in text form as the identification information, the speech recognition unit 502 will mix the mixed environmental sound signal. The calling sound of the owner's name contained in the ambient sound signal is converted into text information, and is compared with the preset owner's name text information. The calling sound of the main name) is identified; another example: the collection module 40 collects the mixed ambient sound signal, which includes the contact voice for the voice call with the owner, assuming that the contact's voice is preset as the identification information (the voice information of the contact is saved during the mobile terminal voice call), the voice recognition unit 502 according to the preset identification information, the mixed ambient sound signal contains the same or similar to the preset identification information A specific sound signal (contact voice) is recognized;
[0062] The present invention utilizes the speech recognition technology to eliminate the potential safety hazard caused by the weakening of the sense of the external environment sound when the user uses the earphone of the mobile terminal, thereby improving the user experience.
[0063] In order to explain the present invention more clearly, the voice recognition method based on the mobile terminal of the present invention will be described in detail below.
[0064] See image 3 , is a flow chart of the first embodiment of the mobile terminal-based voice recognition method of the present invention; such as image 3 As shown, the method includes:
[0065] S101, collecting ambient sound signals;
[0066] In a specific implementation, the S101 adopts a sound pickup device to collect sound signals in the external environment, which may use a microphone, a sound sensor or other devices with a sound pickup function. Specifically, the ambient sound signal may refer to all sound signals that can be picked up in the external environment, including car horns, pedestrians talking or shouting, bicycle bells, music played outside, bird calls, and the like. Generally, in the S101, at least one common omnidirectional microphone can be used for sound pickup. In practical applications, since the omnidirectional microphone can only collect sound signals, but cannot realize the localization of ambient sound, preferably, at least two unidirectional microphones can be used in the S101, and the unidirectional microphones can selectively collect certain The ambient sound signal of the azimuth can well realize the localization of the ambient sound signal.
[0067] S102, according to preset identification information, identify the ambient sound signal;
[0068] In a specific implementation, the step S102 is used to identify the external environmental sound signal, determine whether the environmental sound signal contains a specific sound signal that matches (same or similar to) the preset identification information, and convert the specific sound signal to the specific sound signal. identified. For example: the S101 collects a mixed environmental sound signal, which includes sounds such as car horns, assuming that the car horn is preset as the identification information, the S102 is based on the preset identification information. The specific sound signal (car horn) contained in the identification information that is the same or similar to the preset identification information is identified; another example: the S101 collects the mixed environmental sound signal, which includes the voice of the owner's name and other sounds, Assuming that the name of the owner is preset as the identification information in text form, the S102 converts the calling sound of the owner's name contained in the mixed ambient sound signal into text information, and compares it with the preset text information of the owner's name, if The two are the same or similar, then the S102 identifies the specific sound signal (the calling sound of the owner's name); another example: the S101 collects the mixed ambient sound signal, which includes the voice call with the owner. The voice of the contact, assuming that the voice of the contact is preset as the identification information, in S102, according to the preset identification information, the mixed ambient sound signal contains a specific sound signal that is the same as or similar to the preset identification information (contact voice) recognized;
[0069] S103, when the ambient sound signal includes a specific sound signal matching the preset identification information, determine whether the loudness of the specific sound signal reaches a preset threshold;
[0070] In specific implementation, the threshold is a threshold value of sound loudness, which can be set according to actual needs. For example: for the sound information of danger signals such as car horns, bicycle bells, alarms, etc., the highest loudness threshold can be used to set the user to avoid danger in time when hearing the sound signal; another example: for friends For the voice signal calling the owner's name, the lowest loudness threshold can be adopted to pick up the voice sensitively and avoid the phenomenon that the greeting cannot be heard. Here, the threshold value can be set flexibly according to the actual needs, which can not only avoid too sensitive acquisition of environmental signals and interfere with the normal operation of the earphone, but also ensure the normal operation of the earphone, and can pick up the sound sensitively, which is very good for the outside world. The ambient sound signal is notified to the user.
[0071] Specifically, when the S102 identifies a specific sound signal that matches the preset identification information from the ambient sound signal, such as a car horn, the S103 determines whether the identified loudness of the car horn reaches preset threshold.
[0072] S104, when the loudness of the specific sound signal reaches the preset threshold, control to output the specific sound signal.
[0073] In a specific implementation, when the specific sound signal reaches the preset threshold, the S104 controls to output the specific sound signal to notify the user. Specifically, the S104 can control the output of the specific sound signal in a variety of ways, including but not limited to:
[0074] One: control the earphone of the mobile terminal to output the specific sound signal; specifically, the content output by the original earphone can be paused, and only the ambient sound signal can be played; or the content output by the original earphone can be mixed with the specific sound signal , and then output the mixed sound signal;
[0075] Two: control the vibration of the mobile terminal to output the specific sound signal; this output method belongs to an auxiliary prompt, and the mobile terminal vibrates to prompt the user to pay attention to the external environment;
[0076] The third step: controlling the flashing of the lights of the mobile terminal to output the specific sound signal; this output method also belongs to an auxiliary prompt, and the lights of the mobile terminal flash, such as: LED (Light Emitting Display, light-emitting diode) flashing, Or the keyboard light of the mobile terminal flashes, etc., to remind the user to pay attention to the external environment;
[0077] Fourth: control the mobile terminal to display the specific sound signal; the premise of this output method is that the mobile terminal stores the relevant information of the specific sound signal, such as: a friend calling the owner's name, the premise is that the mobile terminal The information of the friend is stored, including voice information, name, contact information, etc., and the S104 may control the mobile terminal to display the above stored corresponding information.
[0078] In the present invention, the identification information and threshold are preset in the mobile terminal, and the ambient sound signal is collected in real time. When a specific sound signal in the ambient sound signal matches the preset identification information and reaches the corresponding loudness threshold, the specific sound signal is controlled. The output of the headset is notified to the owner, which improves the user's ability to sense the sound of the external environment when using the headset; in addition, by presetting various identification information, such as car horns and other dangerous information, the user's ability to use the headset is eliminated. Hidden security risks, another example: preset the owner's name, nickname and other address information, when friends greet the owner, the owner can be notified, which improves the user experience; The loudness threshold of the identification information can not only ensure the normal operation of the earphone, but also inform the user of the effective ambient sound signal from the outside world, which improves the user experience.
[0079] See Figure 4 , which is a flowchart of the second embodiment of the mobile terminal-based voice recognition method of the present invention; such as Figure 4 As shown, the method includes:
[0080] S201, preset multiple identification information and loudness thresholds of each identification information, and the identification information includes non-speech identification information and/or speech identification information;
[0081] In a specific implementation, the S201 may preset an identification library, and the identification library stores various preset identification information used to identify sounds and loudness thresholds corresponding to the various identification information, wherein, in the identification information The non-voice recognition information includes but is not limited to: car horns, bicycle bells and alarms; the text information in the recognition information includes but is not limited to: the voice recognition information includes but is not limited to: when the mobile terminal is in a voice call Any one or more of the saved voice information of the contact person, the saved voice information of the interlocutor during daily conversations, and the owner name, nickname or alias of the mobile terminal saved in the form of text or voice. Specifically, the S201 may provide an identification library in the form of a table, as shown in Table 1 above, which will not be repeated here.
[0082] S202, detect whether the earphone is inserted, if the detection result is yes, execute S203; otherwise, end;
[0083] In a specific implementation, when the earphone is a wired earphone, the S202 may use hardware, such as a chip or a CPU, etc., to detect the plugging and unplugging state of the wired earphone by hardware; when the earphone is a wireless earphone (Bluetooth earphone), The S202 may use software to detect the plugging and unplugging state of the wireless headset, and when it is detected that the headset is plugged in, the execution of S203 is triggered.
[0084] S203, collecting ambient sound signals;
[0085] S204, determine whether the ambient sound signal includes a specific sound signal matching the preset non-speech recognition information, if the determination result is yes, execute S206; otherwise, execute S205;
[0086] S205, determine whether the ambient sound information includes a specific sound signal that matches the preset speech recognition information, if the judgment result is yes, execute S206, otherwise, end;
[0087] S206, judging whether the loudness of the specific sound signal reaches a preset threshold, if the judgment result is yes, execute S207; otherwise, end;
[0088] S207, control to output the specific sound signal.
[0089] In a specific implementation, the above S205 can also be located before S204, or can occur simultaneously with S204; S201 can also be located after S202, but it needs to be ensured that it occurs before S204 (S205) performs ambient sound signal recognition.
[0090] The present invention utilizes the speech recognition technology to eliminate the potential safety hazard caused by the weakening of the sense of the external environment sound when the user uses the earphone of the mobile terminal, thereby improving the user experience.
[0091] Through the descriptions of the above embodiments, the present invention presets identification information and thresholds in the mobile terminal, and collects the ambient sound signal in real time. When a specific sound signal in the ambient sound signal matches the preset identification information and reaches the corresponding loudness When the threshold is set, the output of a specific sound signal is controlled to notify the owner, which improves the user's ability to sense the sound of the external environment when using the headset; in addition, various identification information is preset, such as car horns and other dangerous information. , which eliminates the potential safety hazards when users use headphones. Another example is: preset the owner's name, nickname and other address information, when friends greet the owner, the owner can be notified, which improves the user experience; at the same time, The loudness threshold of various identification information can be preset according to the actual situation, which can not only ensure the normal operation of the earphone, but also notify the user of the effective external environmental sound signal, which improves the user experience.
[0092] The above disclosures are only the preferred embodiments of the present invention, and of course, the scope of the rights of the present invention cannot be limited by this. Those of ordinary skill in the art can understand all or part of the procedures for realizing the above-mentioned embodiments, and make the claims according to the present invention. The equivalent changes of the invention still belong to the scope covered by the invention.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Motion sensing game system control method

InactiveCN104368147AStrong CompatibilityEnhanced sensing abilityVideo gamesCommunication interfaceEye Fatigue
Owner:SUZHOU DELUSEN AUTOMATION SYST

Magnetic sensing device and preparation process thereof

ActiveCN104880678AEnhanced sensing abilityImprove sensing ability and sensitivityMagnitude/direction of magnetic fieldsEngineeringElectrical and Electronics engineering
Owner:QST CORP

Classification and recommendation of technical efficacy words

  • Eliminate potential safety hazards
  • Enhanced sensing ability

Motion sensing game system control method

InactiveCN104368147AStrong CompatibilityEnhanced sensing abilityVideo gamesCommunication interfaceEye Fatigue
Owner:SUZHOU DELUSEN AUTOMATION SYST

Magnetic sensing device and preparation process thereof

ActiveCN104880678AEnhanced sensing abilityImprove sensing ability and sensitivityMagnitude/direction of magnetic fieldsEngineeringElectrical and Electronics engineering
Owner:QST CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products