Audio duplicate checking method and device
A technology of audio and audio data, applied in the field of audio plagiarism check, can solve problems such as difficult to achieve efficient and common audio content plagiarism check, and achieve the effect of increasing speed
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0054] refer to figure 1 , showing a schematic flow chart of an audio plagiarism checking method provided in an embodiment of the present application, the audio plagiarism checking method includes:
[0055] S101: Obtain target audio data.
[0056] S102: Divide the target audio data into multiple audio segments by using the utterance detection module.
[0057] Wherein, the voice detection module can perform voice marking on the target audio data, and segment the audio data according to the marking.
[0058] S103: Extract bottleneck features from the audio clips by using a bottleneck feature extractor.
[0059] Specifically, S103 may be to perform frame-level bottleneck feature extraction on the audio clip through the deep neural network, and use the bottleneck layer of the deep neural network or the combination of the output layer and the first two layers as the bottleneck feature of the audio clip.
[0060] It can be understood that the bottleneck features of different speech...
Embodiment 2
[0075] refer to figure 2 , showing a schematic flow chart of another audio plagiarism checking method provided by the embodiment of the present application, the audio plagiarism checking method includes:
[0076] S201: Select a plurality of languages with large pronunciation phoneme differences to build a pronunciation dictionary.
[0077] Optionally, the multiple languages can be Mandarin Chinese, English, etc.
[0078] S202: According to the pronunciation dictionary, use the labeled audio data to train the forced phoneme alignment model, so as to obtain the audio data marked with the pronunciation state.
[0079] Wherein, the pronunciation state may include but not limited to monophone, diphone and triphone.
[0080] S203: Train a bottleneck feature extractor using the audio data marked with the pronunciation state.
[0081] Specifically, the trained model may include but not limited to deep learning structures such as DNN, TDNN, LSTM, and CNN.
[0082] S204: Obtain...
Embodiment 3
[0107] refer to image 3 , which shows a schematic structural diagram of an audio plagiarism checking device provided in an embodiment of the present application. The audio plagiarism checking device 30 includes:
[0108] The embodiment of the present application provides an audio plagiarism checking device 30, which is characterized in that it includes:
[0109] An acquisition module 301, configured to acquire target audio data;
[0110] Segmentation module 302, for target audio data is segmented into a plurality of audio segments by speech detection module;
[0111] The extraction module 303 is used to extract the bottleneck feature to the audio clip by the bottleneck feature extractor;
[0112] Dimensionality reduction module 304, is used for carrying out dimensionality reduction processing to bottleneck feature and obtains the feature sequence of each audio segment;
[0113] Calculation module 305, is used for calculating the similarity of target audio data and the audi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


