Unlock instant, AI-driven research and patent intelligence for your innovation.

Audio data processing method and device

A technology of audio data and processing methods, applied in speech analysis, character and pattern recognition, instruments, etc., can solve problems such as error-prone, high cost, and difficulty in collecting samples, and achieve low cost and improved accuracy

Pending Publication Date: 2021-11-12
KE COM (BEIJING) TECHNOLOGY CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of realizing the present disclosure, the inventors found that the cost of manual mapping is relatively high, and errors are prone to occur; it is difficult to collect samples through sample training models, resulting in low accuracy of model mapping

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio data processing method and device
  • Audio data processing method and device
  • Audio data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] see figure 1 , figure 1 It is a schematic diagram of the audio data processing flow in Embodiment 1 of the present disclosure. The specific steps are:

[0032] Step 101, acquire audio data to be processed.

[0033] Step 102, extracting filter bank features of the audio data.

[0034] Filter bank (Fbank) is one of the methods for extracting speech feature parameters. Because of its unique cepstrum-based extraction method, it is more in line with the human hearing principle and is the most common and effective speech feature extraction algorithm.

[0035] The Fbank feature of the audio signal can be extracted based on the Filter Bank algorithm; the Fbank feature extraction method is equivalent to the Mel-Frequency Cepstral Coefficients (MFCC) without the discrete cosine transform (lossy transform) of the last step, which is similar to the MFCC feature Than, Fbank features retain more original speech data.

[0036] The embodiment of the present disclosure does not lim...

Embodiment 2

[0082] see Figure 5 , Figure 5 It is a schematic diagram of the audio data processing flow in Embodiment 2 of the present disclosure. The specific steps are:

[0083] Step 501, acquire audio data to be processed.

[0084] Step 502, extracting filter bank features of the audio data.

[0085] Fbank is one of the extraction methods that require speech feature parameters. Because of its unique cepstrum-based extraction method, it is more in line with the human hearing principle and is the most common and effective speech feature extraction algorithm.

[0086] The Fbank feature of the audio signal can be extracted based on the Filter Bank algorithm; the Fbank feature extraction method is equivalent to the discrete cosine transform (lossy transform) that removes the last step of MFCC. Compared with the MFCC feature, the Fbank feature retains more original speech data.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide an audio data processing method and apparatus. The method comprises the steps of obtaining to-be-processed audio data; extracting filter bank characteristics of the audio data; performing alignment operation on the extracted filter bank features, and querying a preset pronunciation dictionary to obtain a phoneme sequence corresponding to an alignment operation result; determining main language phonemes in the phoneme set; and mapping phonemes except the main language phonemes in the phoneme set to the main language phonemes. According to the method, the accuracy of phoneme mapping can be improved on the premise of low cost.

Description

technical field [0001] Embodiments of the present disclosure relate to an audio data processing method and device. Background technique [0002] At present, in the field of phonetics, different languages ​​have a complete pronunciation system, corresponding to a set of phonemes. However, in practical applications, there are often situations where different languages ​​are mixed, such as Chinese mixed with English and Japanese; even in the same language, common languages ​​​​will also be mixed with dialects. [0003] In practical applications, it is necessary to map different languages ​​to the same language, such as through manual mapping, or by collecting a large amount of speech data as training samples to realize the training of the mapping model. [0004] In the process of realizing the present disclosure, the inventors found that the cost of manual mapping is relatively high, and errors are prone to occur; it is difficult to collect samples through sample training mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/48G10L25/30G06K9/62
CPCG10L25/48G10L25/30G06F18/23
Inventor 解传栋李先刚邹伟王健常超沈明
Owner KE COM (BEIJING) TECHNOLOGY CO LTD