Audio data processing method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of audio data and processing methods, applied in speech analysis, character and pattern recognition, instruments, etc., can solve problems such as error-prone, high cost, and difficulty in collecting samples, and achieve low cost and improved accuracy

Pending Publication Date: 2021-11-12

KE COM (BEIJING) TECHNOLOGY CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In the process of realizing the present disclosure, the inventors found that the cost of manual mapping is relatively high, and errors are prone to occur; it is difficult to collect samples through sample training models, resulting in low accuracy of model mapping

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0031] see figure 1 , figure 1 It is a schematic diagram of the audio data processing flow in Embodiment 1 of the present disclosure. The specific steps are:

[0032] Step 101, acquire audio data to be processed.

[0033] Step 102, extracting filter bank features of the audio data.

[0034] Filter bank (Fbank) is one of the methods for extracting speech feature parameters. Because of its unique cepstrum-based extraction method, it is more in line with the human hearing principle and is the most common and effective speech feature extraction algorithm.

[0035] The Fbank feature of the audio signal can be extracted based on the Filter Bank algorithm; the Fbank feature extraction method is equivalent to the Mel-Frequency Cepstral Coefficients (MFCC) without the discrete cosine transform (lossy transform) of the last step, which is similar to the MFCC feature Than, Fbank features retain more original speech data.

[0036] The embodiment of the present disclosure does not lim...

Embodiment 2

[0082] see Figure 5 , Figure 5 It is a schematic diagram of the audio data processing flow in Embodiment 2 of the present disclosure. The specific steps are:

[0083] Step 501, acquire audio data to be processed.

[0084] Step 502, extracting filter bank features of the audio data.

[0085] Fbank is one of the extraction methods that require speech feature parameters. Because of its unique cepstrum-based extraction method, it is more in line with the human hearing principle and is the most common and effective speech feature extraction algorithm.

[0086] The Fbank feature of the audio signal can be extracted based on the Filter Bank algorithm; the Fbank feature extraction method is equivalent to the discrete cosine transform (lossy transform) that removes the last step of MFCC. Compared with the MFCC feature, the Fbank feature retains more original speech data.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Embodiments of the invention provide an audio data processing method and apparatus. The method comprises the steps of obtaining to-be-processed audio data; extracting filter bank characteristics of the audio data; performing alignment operation on the extracted filter bank features, and querying a preset pronunciation dictionary to obtain a phoneme sequence corresponding to an alignment operation result; determining main language phonemes in the phoneme set; and mapping phonemes except the main language phonemes in the phoneme set to the main language phonemes. According to the method, the accuracy of phoneme mapping can be improved on the premise of low cost.

Description

technical field [0001] Embodiments of the present disclosure relate to an audio data processing method and device. Background technique [0002] At present, in the field of phonetics, different languages have a complete pronunciation system, corresponding to a set of phonemes. However, in practical applications, there are often situations where different languages are mixed, such as Chinese mixed with English and Japanese; even in the same language, common languages will also be mixed with dialects. [0003] In practical applications, it is necessary to map different languages to the same language, such as through manual mapping, or by collecting a large amount of speech data as training samples to realize the training of the mapping model. [0004] In the process of realizing the present disclosure, the inventors found that the cost of manual mapping is relatively high, and errors are prone to occur; it is difficult to collect samples through sample training mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L25/48G10L25/30G06K9/62

CPCG10L25/48G10L25/30G06F18/23

Inventor 解传栋李先刚邹伟王健常超沈明

Owner KE COM (BEIJING) TECHNOLOGY CO LTD

Audio data processing method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology