Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for obtaining language training data

A training data, language technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problem of low quality of training data

Active Publication Date: 2020-12-29
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The main purpose of the present invention is to provide a method and device for obtaining language training data to solve the problem of low quality training data used to train language recognition models in the related art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for obtaining language training data
  • Method and device for obtaining language training data
  • Method and device for obtaining language training data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0021] The invention proposes a method for obtaining language training data, which can be used for filtering and purifying the language training data. The language training data of the invention is used to train a language identification module, and the language identification model can identify the language corresponding to the audio data. This method first uses the language recognition model obtained through training to identify the language of the training data, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for acquiring language training data. According to the method and device, the problem that the quality of language training data is relatively low in related techniques is solved. The method comprises the following steps: training language recognition models for recognizing various languages; recognizing second audio data in a data set by virtue of various language recognition models, so as to acquire the score corresponding to each language recognition model; determining the recognition language corresponding to the second audio data; calculating ascore information entropy of each second audio data in the training set; and taking the set of second audio data in which the score information entropies meet a first preset condition and the practical language is accordant with the recognition language in the data set as a training data set, returning to carry out the step of training each language recognition model by virtue of the training data until the number of the audio data in the training data set meets a second preset condition, wherein the second audio data in the training data set is used for training the language recognition models. According to the method, the quality of the language training data is improved.

Description

technical field [0001] The invention relates to the technical field of speech signal processing, in particular to a method and device for obtaining language training data. Background technique [0002] The quality of a language recognition model depends on the quality of the training data used to train the model. However, in general, there will always be some labeling errors in the training data. The existence of these data will make the descriptive ability of the trained language recognition model Poor, and ultimately affect the recognition performance of the language recognition model. Therefore, how to filter out these data becomes very important. At present, the selection of language training data mainly relies on manual inspection, and the marked training data is randomly checked. If the labeling error rate is found to be large, the training data will be re-labeled. This method is time-consuming and laborious, and when the amount of data is large, some errors will ine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/00G10L15/06G10L15/32
Inventor 袁庆升汪立东包秀国张鸿时磊张卫强邵云飞
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT