Audio data labeling method, device and system

An audio data and audio technology, applied in the field of audio data labeling methods, devices and systems, can solve the problems of unstable labeling audio data quality, low accuracy of acoustic models, time-consuming and labor-intensive, etc., to save manpower and material resources, and maximize The effect of optimizing the identification path is accurate and improving the accuracy

Pending Publication Date: 2020-06-26
SUNING CLOUD COMPUTING CO LTD
View PDF13 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the training of the acoustic model, a large number of samples are required to mark the audio data. The acquisition of the data is mainly done manually, which is time-consuming, labor-intensive and expensive.
However, the labeled audio data after speech recognition is directly used as the training sample of the acoustic model. Due to the unstable quality of the labeled audio data obtained by speech recognition, the accuracy of the acoustic model trained using it as a sample is not high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio data labeling method, device and system
  • Audio data labeling method, device and system
  • Audio data labeling method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as figure 1 As shown, a method for labeling audio data includes the following steps:

[0063] S1. Collect audio material and obtain the duration of the audio material.

[0064] The audio material in step S1 is the collected original audio, which may contain invalid data such as noise and long blank space, but the duration of the audio material described in this step is the audio duration including noise and blank space.

[0065] S2. Organize the audio material, compare the duration of the audio material with the preset duration condition, and delete the audio material that does not meet the duration condition.

[0066] The main purpose of step S2 is to delete the shorter audio material and split the longer audio material, because the text content of the shorter audio expression may be incomplete, and the longer audio material contains more text content, which increases the difficulty of training .

[0067] S3. Perform voice endpoint detection on the audio data, ...

Embodiment 2

[0088] In order to implement an audio tagging method disclosed in Embodiment 1, this embodiment provides an audio tagging device on the basis of Embodiment 1, such as figure 2 As shown, an audio labeling device includes:

[0089] The audio duration acquisition module is used to acquire the duration of the collected audio material.

[0090] The audio sorting module is used to compare the duration of the audio material with the preset duration condition, and delete the audio material that does not meet the duration condition.

[0091] It should be noted that: the duration condition in the audio organizing module can include: any one or both of the large boundary value and the small boundary value, when the duration of the audio material is less than or equal to the small boundary value, the audio material Delete, when the duration of the audio material is greater than or equal to the maximum boundary value, the audio material is divided.

[0092] The endpoint detection module...

Embodiment 3

[0106] The embodiment of the present application provides a computer system based on the audio data labeling method of Embodiment 1, including:

[0107] one or more processors; and

[0108] A memory associated with the one or more processors, the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, execute the above audio data tagging method.

[0109] in, image 3 The architecture of the computer system is exemplarily shown, which may specifically include a processor 310 , a video display adapter 311 , a disk drive 312 , an input / output interface 313 , a network interface 314 , and a memory 320 . The processor 310 , video display adapter 311 , disk drive 312 , input / output interface 313 , network interface 314 , and the memory 320 can be connected by communication bus 330 .

[0110] Wherein, the processor 310 may be implemented by a general-purpose CPU (Central Processing Unit, central processing...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an audio data labeling method, device and system. The method comprises the following steps: performing voice recognition on to-be-labeled audio data by using a voice recognition engine to obtain a reference labeled text; searching an optimal identification path with the shortest editing distance from the reference labeled text in a word graph network obtained by decoding the to-be-labeled audio data; calculating the confidence coefficient of each word on the optimal recognition path, comparing the confidence coefficient of each word with a preset first confidence coefficient condition, and outputting a target word meeting the first confidence coefficient condition on the optimal recognition path; and aligning the target word according to the time parameter of each word in the word graph network to form an labeled text of the to-be-labeled audio data. The words in the word graph network of the to-be-labeled audio data are distinguished according to the confidencecoefficients, the words with high confidence coefficients are extracted to form the annotation text of the to-be-labeled audio data, the words with low confidence coefficients are annotated, audio data annotation is automatically completed, annotation efficiency is improved, and annotation accuracy is improved.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to an audio data labeling method, device and system. Background technique [0002] Speech recognition is a technology that uses speech as the research object to allow machines to automatically recognize and understand human spoken language through speech signal processing and pattern recognition. The technical problem to be solved is to allow the computer to convert speech into text and obtain The corresponding word or character sequence is essentially a problem of channel decoding and pattern recognition. [0003] Generally speaking, a speech recognition system is mainly composed of four modules: front-end processing, acoustic model, language model and decoder. The front-end processing mainly includes three aspects: endpoint detection, noise reduction, and feature extraction. The acoustic model, language model and decoder belong to the back-end processing. The acoustic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/05G10L15/18G10L15/22G10L15/26
CPCG10L15/063G10L15/18G10L15/05G10L15/22G10L15/26G10L2015/0633
Inventor 孙泽明齐欣王宁张旭华朱林林
Owner SUNING CLOUD COMPUTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products