N-gram grammar model constructing method for voice identification and voice identification system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
An n-gram grammar and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of limited data volume, time-consuming and labor-intensive, etc., and achieve the effect of effective control, alleviation of sparsity, and optimization of the language model part.

Inactive Publication Date: 2016-01-20

INST OF ACOUSTICS CHINESE ACAD OF SCI +1

View PDF5 Cites 28 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the acquisition of manual annotations is time-consuming and laborious, so the amount of data is very limited. How to make full use of artificial annotation corpus has become the research goal of people.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0028] The solution of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0029] The process flow of the n-gram grammar model construction method based on word vector expansion manual labeling process provided by the present invention is as follows figure 1 As shown, specifically include:

[0030] 1. Word vector training: The corresponding word vectors of the words in the dictionary are obtained through neural network language model training. The training adopts the classic NNLM form, and its structure diagram is as follows figure 2 shown.

[0031] The model consists of input layer, mapping layer, hidden layer and output layer. Each word in the dictionary is represented by a vector whose dimension is the size of the dictionary, with 1 in the position of the word and 0 in the remaining dimensions. For the n-gram model, the input layer input is a long vector composed of "n-1" word vectors connect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an n-gram grammar model constructing method for voice identification and a voice identification system. The method comprises: step (101), training is carried out by using a neural network language model to obtain word vectors, and classification and multi-layer screening is carried out on word vectors to obtain parts of speech; step (102), manual marking is expanded by using a direct word frequency statistic method; and when same-kind-word substitution is carried out, direct statistics of 1-to-n-gram grammar combination units changing relative to an original sentence is carried out, thereby obtaining an n-gram grammar model of the expanding part; step (103), manual marking is carried out to generate a preliminary n-gram grammar model, model interpolation is carried out on the preliminary n-gram grammar model and the n-gram grammar model of the expanding part, thereby obtaining a final n-gram grammar model. In addition, the step (101) includes: step (101-1), inputting a mark and a training text; step (101-2), carrying out training by using a neural network language model to obtain corresponding work vectors of words in a dictionary; step (101-3), carrying out word vector classification by using a k mean value method; and step (101-4), carrying out multi-layer screening on the classification result to obtain parts of speech finally.

Description

technical field [0001] The present invention relates to a method for expanding artificially marked training corpus by using word vector classification in speech recognition to improve a language model, and specifically provides a method for constructing an n-gram grammar model for speech recognition and a speech recognition system. Background technique [0002] The currently used language model modeling technology is mainly n-gram language model (n-gramLanguageModel). This model has been widely used in the field of speech recognition because of its advantages of simple training, low complexity, and convenient use. However, the core idea of the n-gram model is to model it through word frequency statistics. In areas where resources are scarce, such as the speech recognition system for telephone conversations (CTS), due to the limited size of the corpus, there are a large number of grammars that do not appear in the training corpus. Combination, we can only rely on the smoot...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/16G10L15/06

Inventor张晴晴陈梦喆潘接林颜永红

OwnerINST OF ACOUSTICS CHINESE ACAD OF SCI

N-gram grammar model constructing method for voice identification and voice identification system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology