Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech recognition model training method and speech recognition method and device

A speech recognition model and training method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of large differences between different languages, long training data time period, high cost, etc., to reduce the cost of labeling, reduce The effect of labeling costs and speeding up components

Active Publication Date: 2020-02-21
AISPEECH CO LTD
View PDF7 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In daily communication, when people use one language to express, they often unconsciously mix expressions or expressions in another language or several other languages, for example, some local dialects (such as Sichuan dialect) are mixed in when communicating in Mandarin. ), this phenomenon will bring certain difficulties and challenges to the speech recognition system
[0003] At present, the construction method of a single language recognition system is not suitable for the mixed recognition of multiple languages. Generally, dialects such as Sichuan dialect have much fewer data acquisition channels than Mandarin, and the unbalanced amount of training data will lead to low accuracy of the recognition system; Moreover, different languages ​​have great differences in acoustics, and it is difficult to use the modeling unit of a certain language to model multiple languages.
[0004] In order to solve the situation that the amount of training data for dialects such as Sichuan dialect is relatively small, the amount of training data is usually obtained by manually collecting data or crawling audio and video data from the Internet, and then manually marked, and the time period of training data obtained by this method is compared Long, the model structure is slow, and the cost is relatively high
[0005] For the acoustic differences of different languages, there are two methods adopted in the prior art: 1. In the acoustic model, the Mandarin sub-set and the Sichuan dialect sub-set are fused, and the training data is added to the Sichuan dialect on the basis of the Chinese data. voice and language text data, and add the pronunciation of the corresponding text in multiple languages ​​into the dictionary. 2. During recognition, the same audio is sent to the recognition engines of two languages ​​at the same time for recognition, and finally the final recognition result is selected according to the confidence strategy; Although this can achieve the purpose of multi-language simultaneous recognition, it needs to deploy two sets of recognition resources on the server at the same time, and the engineering cost of concurrency is relatively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition model training method and speech recognition method and device
  • Speech recognition model training method and speech recognition method and device
  • Speech recognition model training method and speech recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0053] figure 1 is a schematic diagram of the main flow of the speech recognition model training method according to an embodiment of the present invention, as figure 1 As shown, the method for speech recognition of multiple languages ​​mixed in the embodiment of the present invention mainly includes:

[0054]Step S101: Obtain the original speech data of each language in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech recognition model training method and a speech recognition method and device, and relates to the technical field of computers. One specific embodiment of the method comprises the following steps of obtaining original voice data of each language in a plurality of languages, and screening the original voice data of each language according to the degree of confidenceof the original voice data of each language in order to obtain a training data set; extracting acoustic features of the voice data included in the training data set; and obtaining a speech recognitionmodel based on the acoustic features of the training data set and an adversarial training algorithm, wherein the speech recognition model is used for speech recognition of mixed languages. Accordingto the method, the degree of confidence is used as a reference for screening the training data set, so that the balance of the training data volume of various languages is easier, the robustness of the model is increased, the labeling cost is reduced, and the speed of acoustic and language model components is accelerated.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a speech recognition model training method, a speech recognition method and a device. Background technique [0002] In daily communication, when people use one language to express, they often unconsciously mix expressions or expressions in another language or several other languages, for example, some local dialects (such as Sichuan dialect) are mixed in when communicating in Mandarin. ), this phenomenon will bring certain difficulties and challenges to the speech recognition system. [0003] At present, the construction method of a single language recognition system is not suitable for the mixed recognition of multiple languages. Generally, dialects such as Sichuan dialect have much fewer data acquisition channels than Mandarin, and the unbalanced amount of training data will lead to low accuracy of the recognition system; Moreover, different languages ​​have great d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/06G10L15/04G10L15/02G10L15/26G10L25/12G10L25/24G10L25/30G10L25/45
CPCG10L15/063G10L15/02G10L15/04G10L15/26G10L25/24G10L25/12G10L25/30G10L25/45G10L2015/0636
Inventor 朱森钱彦旻
Owner AISPEECH CO LTD