Voice recognition method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A speech recognition and speech fragment technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of speech fragment rejection, and achieve the effect of reducing the frequency of misoperation, ensuring the running speed, and improving the experience.

Active Publication Date: 2013-10-02

BEIJING UNISOUND INFORMATION TECH

View PDF5 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In order to solve the technical problem that the above-mentioned local speech recognition technology of the mobile terminal cannot reject the input speech segment, the present invention provides a speech recognition method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0015] see figure 1 , which is a method flowchart of a speech recognition method of the present invention, the method includes the following steps:

[0016] S101: Receive an input voice segment;

[0017] It should be noted that the technical solution of the present invention is mainly applied to the mobile terminal, that is to say, the mobile terminal receives an externally input sound clip, which may be spoken by the user or played by the machine.

[0018] S102: Calculate multiple acoustic scores of each frame of the speech segment according to the subspace distribution clustering SDC;

[0019] This SDC algorithm belongs to the commonly used means of calculating acoustic scores in the technical field. In the mainstream speech recognition system, the state-tied triphone is usually used as the pronunciation unit, and its timing and statistical characteristics are analyzed by HMM. Modeling, and the output probability of each state of the HMM is represented by a Gaussian mixtur...

Embodiment 2

[0056] Corresponding to the above-mentioned speech recognition method, the embodiment of the present invention also provides a speech recognition device. see image 3 , which is a device structure diagram of a speech recognition device of the present invention, the device includes a speech receiving unit 301, a clustering calculation unit 302, a comparison and accumulation unit 303, a background acoustic total score calculation unit 304, a comparison judgment unit 305, a recognition unit 306 and rejection unit 307:

[0057] The voice receiving unit 301 is configured to receive an input voice segment;

[0058] The cluster calculation unit 302 is configured to calculate a plurality of acoustic scores of each frame of the speech segment according to the subspace distribution cluster SDC;

[0059] Preferably, the cluster calculation unit 302 is also used for:

[0060] The acoustic score is calculated using an approximate algorithm, and the specific calculation formula is:

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a voice recognition method and device. The voice recognition method comprises the following steps: receiving an input voice segment; calculating multiple acoustical scores of each frame in the voice segment according to the spatial distribution clustering (SDC); comparing the obtained multiple acoustical scores with vocabularies in a vocabulary library in a mobile terminal, and accumulating the compared acoustical scores of each frame, wherein one has the highest aggregate score is named as the optimal acoustical aggregate score; taking the sum of the highest acoustical score in each frame of the voice segment as a background acoustical aggregate score; comparing whether the optimal acoustical aggregate score and the background acoustical aggregate score can meet a reserved threshold or not; if no, rejecting to recognize the voice segment. The method can show that larger difference values generated by the comparison of the sum of the highest acoustical score in each frame of the voice segment and the input voice segment as well as the sum of the highest acoustical score in each frame of the voice segment and the vocabularies in the vocabulary library in the mobile terminal, so that the frequency of the false operations of the mobile terminal according to the voice input is greatly reduced, and the user experience is improved.

Description

technical field [0001] The invention relates to the field of voice recognition, in particular to a voice recognition method and device. Background technique [0002] At present, voice input and control on smart phones or mobile terminals are becoming more and more familiar and accepted by users, and as the hardware update speed of mobile terminals is getting faster and faster, high-speed CPU and large-capacity memory have become most of the The basic configuration of mobile terminals makes it possible to apply embedded speech recognition technology with tens of thousands of vocabulary on mobile terminals. For example, speech recognition for fixed vocabulary such as fixed person names, place names or App application names belongs to this category. Generally speaking, for the speech recognition system with tens of thousands of vocabulary used in mobile terminals, the recognition standard of its local recognition is based on the principle of maximum likelihood, that is, the cor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/10G10L15/26

Inventor苏牧李鹏李轶杰梁家恩

OwnerBEIJING UNISOUND INFORMATION TECH

Voice recognition method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology