Apparatus and method for recognizing speech based on a deep-neural-network (DNN) sound model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
a speech recognition and deep-neural network technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of inability to change the output node of a dnn structure, inability to determine the state of large-scale training speech data, and inapplicability to speech recognition applications performed with respect to multiple native speakers having different acoustics

Active Publication Date: 2018-05-01

ELECTRONICS & TELECOMM RES INST

View PDF10 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Benefits of technology

The present invention provides a method for learning a sound model for speech recognition using multi-set training speech data with different characteristics. The method involves generating sound-model state sets corresponding to the training speech data, setting a multi-set state cluster to learn a deep-neural-network (DNN) structural parameter, and using the learned DNN structural parameter to recognize user's speech through a DNN-Hidden-Markov-model (HMM) structure. Additionally, the invention provides a speech recognition apparatus and method for performing speech recognition by setting a sound-model state set corresponding to characteristic information of a user's speech. The technical effects of the invention include improved accuracy and efficiency in speech recognition using multi-set training speech data with different characteristics.

Problems solved by technology

However, in a DNN learning technique according to the related art, state-level alignment information is determined beforehand and thus an output node of a DNN structure cannot be changed.

However, it is inefficient to determine a state of large-size training speech data having different acoustic-statistical characteristics (e.g., a sound model for recognizing English speech of multiple native speakers who can speak, for example, Chinese, Korean, and English) using one decision tree.

However, a DNN-HMM-based learning technique according to the related art is a method of learning a structure having characteristics and parameters that most appropriately discriminate predetermined states and thus is not applicable in the field of speech recognition application performed with respect to multiple native speakers having different acoustic-statistical characteristics.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0023]Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that those skilled in the art can easily accomplish them. This invention may, however, be embodied in many different forms and is not to be construed as being limited to the embodiments set forth herein. In the following description, well-known functions or constructions are not described in detail if it is determined that they would obscure the invention due to unnecessary detail.

[0024]It should be understood that the terms ‘comprise’ and / or ‘comprising,’ when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and / or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof unless stated otherwise.

[0025]FIG. 1 is a block diagram of a speech recognition apparatus 100 according to an exemplary e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A speech recognition apparatus based on a deep-neural-network (DNN) sound model includes a memory and a processor. As the processor executes a program stored in the memory, the processor generates sound-model state sets corresponding to a plurality of pieces of set training speech data included in multi-set training speech data, generates a multi-set state cluster from the sound-model state sets, and sets the multi-set training speech data as an input node and the multi-set state cluster as output nodes so as to learn a DNN structured parameter.

Description

CROSS-REFERENCE TO RELATED APPLICATION[0001]This application claims priority to and the benefit of Korean Patent Application No. 10-2006-0005755, filed on Jan. 18, 2016, the disclosure of which is incorporated herein by reference in its entirety.BACKGROUND[0002]1. Field of the Invention[0003]The present invention relates to an apparatus and method for recognizing speech, and more particularly, to a speech recognition apparatus and method based on a deep-neural-network (DNN) sound model.[0004]2. Discussion of Related Art[0005]A context-dependent deep-neural-network (DNN)-hidden-Markov-Model (HMM) technique using a combination of a DNN and an HMM has been actively applied to sound models for speech recognition by replacing an existing CD-Gaussian-mixture-model-HMM (CD-GMM-HMM) (hereinafter referred to as ‘GMM-HMM’) technique.[0006]A DNN-HMM technique according to the related art is performed as will be described below.[0007]First, a state of an HMM corresponding to an output node or a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(United States)

IPC IPC(8): G10L15/16G10L15/07G10L15/02G10L15/06

CPCG10L15/16G10L15/063G10L15/07G10L2015/0636G10L2015/022

Inventor KANG, BYUNG OKPARK, JEON GUESONG, HWA JEONLEE, YUN KEUNCHUNG, EUI SOK

Owner ELECTRONICS & TELECOMM RES INST

Apparatus and method for recognizing speech based on a deep-neural-network (DNN) sound model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Benefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology