Speech recognition model training method and speech recognition method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech recognition model and training method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of large differences between different languages, long training data time period, high cost, etc., to reduce the cost of labeling, reduce The effect of labeling costs and speeding up components

Active Publication Date: 2020-02-21

AISPEECH CO LTD

View PDF7 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] In daily communication, when people use one language to express, they often unconsciously mix expressions or expressions in another language or several other languages, for example, some local dialects (such as Sichuan dialect) are mixed in when communicating in Mandarin. ), this phenomenon will bring certain difficulties and challenges to the speech recognition system

[0003] At present, the construction method of a single language recognition system is not suitable for the mixed recognition of multiple languages. Generally, dialects such as Sichuan dialect have much fewer data acquisition channels than Mandarin, and the unbalanced amount of training data will lead to low accuracy of the recognition system; Moreover, different languages have great differences in acoustics, and it is difficult to use the modeling unit of a certain language to model multiple languages.

[0004] In order to solve the situation that the amount of training data for dialects such as Sichuan dialect is relatively small, the amount of training data is usually obtained by manually collecting data or crawling audio and video data from the Internet, and then manually marked, and the time period of training data obtained by this method is compared Long, the model structure is slow, and the cost is relatively high

[0005] For the acoustic differences of different languages, there are two methods adopted in the prior art: 1. In the acoustic model, the Mandarin sub-set and the Sichuan dialect sub-set are fused, and the training data is added to the Sichuan dialect on the basis of the Chinese data. voice and language text data, and add the pronunciation of the corresponding text in multiple languages into the dictionary. 2. During recognition, the same audio is sent to the recognition engines of two languages at the same time for recognition, and finally the final recognition result is selected according to the confidence strategy; Although this can achieve the purpose of multi-language simultaneous recognition, it needs to deploy two sets of recognition resources on the server at the same time, and the engineering cost of concurrency is relatively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0052] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0053] figure 1 is a schematic diagram of the main flow of the speech recognition model training method according to an embodiment of the present invention, as figure 1 As shown, the method for speech recognition of multiple languages mixed in the embodiment of the present invention mainly includes:

[0054]Step S101: Obtain the original speech data of each language in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech recognition model training method and a speech recognition method and device, and relates to the technical field of computers. One specific embodiment of the method comprises the following steps of obtaining original voice data of each language in a plurality of languages, and screening the original voice data of each language according to the degree of confidenceof the original voice data of each language in order to obtain a training data set; extracting acoustic features of the voice data included in the training data set; and obtaining a speech recognitionmodel based on the acoustic features of the training data set and an adversarial training algorithm, wherein the speech recognition model is used for speech recognition of mixed languages. Accordingto the method, the degree of confidence is used as a reference for screening the training data set, so that the balance of the training data volume of various languages is easier, the robustness of the model is increased, the labeling cost is reduced, and the speed of acoustic and language model components is accelerated.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a speech recognition model training method, a speech recognition method and a device. Background technique [0002] In daily communication, when people use one language to express, they often unconsciously mix expressions or expressions in another language or several other languages, for example, some local dialects (such as Sichuan dialect) are mixed in when communicating in Mandarin. ), this phenomenon will bring certain difficulties and challenges to the speech recognition system. [0003] At present, the construction method of a single language recognition system is not suitable for the mixed recognition of multiple languages. Generally, dialects such as Sichuan dialect have much fewer data acquisition channels than Mandarin, and the unbalanced amount of training data will lead to low accuracy of the recognition system; Moreover, different languages have great d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/06G10L15/04G10L15/02G10L15/26G10L25/12G10L25/24G10L25/30G10L25/45

CPCG10L15/063G10L15/02G10L15/04G10L15/26G10L25/24G10L25/12G10L25/30G10L25/45G10L2015/0636

Inventor 朱森钱彦旻

Owner AISPEECH CO LTD

Speech recognition model training method and speech recognition method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology