Unlock instant, AI-driven research and patent intelligence for your innovation.

Single-track human voice and background music separation method

A technology of background music and separation method, applied in speech analysis, instruments, etc., can solve the problems of low recognition rate and accuracy

Inactive Publication Date: 2021-01-22
成都悦鉴科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem of low recognition rate and accuracy when separating monophonic vocals and background music in the prior art, and provides a method for separating monophonic vocals and background music, which significantly improves the efficiency of the separation process. Recognition rate and accuracy rate, effective restoration of human voice in monophonic audio files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single-track human voice and background music separation method
  • Single-track human voice and background music separation method
  • Single-track human voice and background music separation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0043] Such as figure 1 and figure 2 Shown, monophonic human voice and background music separation method, comprise the following steps:

[0044] Step 1, converting the time-domain analog voice signal to be separated into a time-domain digital voice signal;

[0045] Step 2, carrying out short-time Fourier transform to the time-domain digital voice signal in step 1, and getting its amplitude information to obtain the spectrogram;

[0046] Step 3, establishing a recurrent neural network framework;

[0047] Step 4. Input the spectrogram obtained in step 2 into the recurrent neural network framework in step 3 to obtain the human voice time-frequency mask M of the same size as the spectrogram vocal ;

[0048] Step 5. According to the human voice time-frequency mask M obtained in step 4 vocal The time-frequency mask of the background music is calculated by the method of difference;

[0049] Step 6. Dot product the two time-frequency masks obtained in step 5 with the spectrogr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a single-track human voice and background music separation method. The method comprises the following steps: 1, converting a to-be-separated time domain analog voice signal into a time domain digital voice signal; 2, performing short-time Fourier transform on the time-domain digital voice signal in the step 1, and taking amplitude information of the time-domain digital voice signal to obtain a spectrogram; 3, establishing a recurrent neural network framework; 4, inputting the spectrogram obtained in the step 2 into the recurrent neural network framework in the step 3 toobtain a human voice time-frequency mask with the same size as the spectrogram; 5, calculating and obtaining a time-frequency mask of the background music through a difference method; 6, performing point multiplication on the two time-frequency masks obtained in the step 5 and the spectrogram obtained in the step 2 to obtain a separated human voice spectrogram and a background music spectrogram;and 7, performing short-time inverse Fourier transform to respectively obtain a time domain digital voice signal of the human voice and a time domain digital voice signal of the background music. By introducing the recurrent neural network and the time-frequency mask, the recognition rate and accuracy of the separation process are significantly improved.

Description

technical field [0001] The invention relates to the technical field of audio processing, in particular to a method for separating monophonic human voice and background music. Background technique [0002] In real life, sound signals are usually mixed by sounds from multiple sound sources. For example, song information is a mixed signal of vocals and musical background music. Human ears can effectively capture the information they are interested in from complex speech information, even if these speech signals are "harmoniously" matched in frequency and time. For example, the separation of singing voices is very natural to humans. The auditory system, however, is very difficult to realize the ability of the human ear on a computer. [0003] Separating monophonic vocals from background music faces many challenges, the biggest challenges being the non-stationary nature of the voice and background music signals, and a signal that only provides one channel. If the signal and in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0272G10L21/0308G10L25/18G10L25/30
CPCG10L21/0272G10L21/0308G10L25/18G10L25/30
Inventor 旷昊恒
Owner 成都悦鉴科技有限公司