Data screening method and device

A screening method and voice data technology, applied in the field of data processing, can solve problems such as the inability to guarantee the effect of acoustic models and language models

Active Publication Date: 2019-04-09
IFLYTEK CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In practice, it is necessary to use a large amount of sample data to train the acoustic model and the language model. However, in the data labeling stage of the existing acoustic model and language model, a number of sample data are randomly selected for labeling, so as to complete the subsequent model training, and these It is not known whether the randomly selected sample data is the sample that the model really wants to learn, so the effect of the acoustic model and language model cannot be guaranteed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data screening method and device
  • Data screening method and device
  • Data screening method and device

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0065] This embodiment will introduce a voice data screening method. Prior to this, a training data set composed of a large number of voices can be pre-built. Through this method, the voice data that the acoustic model really needs to learn can be screened out from the training data set. , used to train the acoustic model. In this way, using the limited data resources (ie low resources) selected, the acoustic model can learn the acoustic features as comprehensively as possible, which not only improves the training speed of the acoustic model, but also improves the Predictive performance of acoustic models.

[0066] see figure 1 , which is a schematic flow chart of a voice data screening method provided in this embodiment, the method includes the following steps:

[0067] S101: Using the voice data of the first duration, train an acoustic model.

[0068] In this embodiment, in order to improve the data quality of the training data of the acoustic model, before this step S101,...

no. 2 example

[0108] This embodiment will introduce a text data screening method. Prior to this, a training data set composed of a large amount of text can be pre-built. Through this method, the specific text domain classification model that the text domain classification model really needs to learn can be screened out from the training data set. The text data in the domain (such as the medical field) is used to train the text domain classification model, so that the text domain classification model can perform the specific domain as comprehensively as possible by using the limited data resources (that is, low resources) that are screened out. The learning of text features not only improves the training speed of the text domain classification model, but also improves the classification effect of the text domain classification model for this specific field. Furthermore, the text domain classification model can be used to more accurately select the text data of the specific domain from the tra...

no. 3 example

[0146] It should be noted that this embodiment will introduce a data screening method and a model building method, which can be specifically implemented by using the screening methods introduced in the first and second embodiments above.

[0147] see Figure 4 , which is a schematic flowchart of a data screening method provided in this embodiment, the method includes:

[0148] Step S401: Based on the learning requirements for data features, use a preset screening strategy to perform data screening in the data set to be screened to obtain screened data, wherein the screened data carries unlearned data features.

[0149] In practical applications, especially in the field of deep learning, in order to achieve different functional goals, it is necessary to collect a large amount of sample data related to the functional goals to form a data set, and it is expected that the data features in the data set can be analyzed. Comprehensive learning, however, in order to achieve comprehen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data screening method and device. The method comprises the step of screening a data set to be screened through a preset screening strategy based on the data property learningdemand so as to obtain screened data, wherein the screened data carry unlearned data properties. According to the data screening method, the data screening strategy is determined in advance based onthe data property learning demand so as to screen the data from the data set to be screened, and the screened data carry the currently unlearned data properties, and the property learning can be realized based on the screened limited data resource, and as a result, the property learning under small resource condition can be achieved.

Description

technical field [0001] The present application relates to the technical field of data processing, in particular to a data screening method and device. Background technique [0002] With the continuous development of speech recognition technology, speech recognition has practical applications in many occasions, such as speech input method, conference transcription, film and television subtitle generation and other fields. Excellent speech recognition technology plays a decisive role in improving the effects of these fields. Therefore, it has also received more and more research and attention from scholars. In recent years, with the rapid development of deep learning, the acoustic model and language model in the current speech recognition system are basically models based on various neural networks, and neural network models often require a large number of training samples as support, and How to improve the performance of speech recognition system under low resource condition...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/14G10L15/183
CPCG10L15/063G10L15/14G10L15/144G10L15/183
Inventor 方昕刘海波方磊
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products