Real-time detection method of initial position of human sound in song

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of starting position and real-time detection, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as large number of samples required, long training time, blurred vocal features, etc., to achieve fast processing, good judgment and fault tolerance , the effect of simple algorithm

Active Publication Date: 2019-03-29

UNIV OF ELECTRONIC SCI & TECH OF CHINA

View PDF11 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

According to the previous analysis, due to the influence of musical instruments in the song on the human voice, many common vocal features become blurred or even invalid, and the multi-feature combination has little effect, which is not enough to make up for the calculation cost brought by the introduction of multiple features. ; In terms of classifiers, the difference in the effect of each classifier is not very obvious; in addition, the ANN method with relatively good effect still has disadvantages such as long training time and large number of samples required

In short, in the absence of effective feature expression for the instrument-human voice mixture, the accuracy of human voice detection is currently lower than 90%, which makes it difficult to estimate the accuracy of the starting point of the human voice to meet practical requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0050] The present invention will be further described in detail below in conjunction with the drawings and embodiments.

[0051] In this embodiment, a method for detecting the starting position of a human voice in a song is provided, and the flow chart is as follows: figure 1 As shown; including two stages of training and recognition; in this embodiment, the simulation experiment uses a total of 120 songs, of which the first 100 are training audio, and the last 20 songs are detection audio; each training audio is performed as follows Preprocessing: 1) Cut the audio, and keep only the front part. The reserved interval is 10 seconds after the start of the audio to the beginning of the human voice; 2) Mark the time of the beginning of the human voice.

[0052] The specific steps of the method for detecting the starting position of the human voice in the song in this embodiment are as follows:

[0053] ·Training phase:

[0054] S1. Read the training audio frame: Set the initial value of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of digital audio processing, and relates to human sound detection problems, in particular to an estimating method of the initial position of the human sound in a song. The method comprises the steps that an audio is framed by using a high overlapping long window, the dynamic characteristics of two dimensions including inter-frequency (a frequency domain) and inter-frame (a time domain) are extracted, and the audio characteristics in the initial sound production stage are effectively captured. Through learning of a starting-point fragment of the song, the song is divided into musical instrument sound and human sound or musical instrument and human mixed sound, the initial position of the human sound can be accurately estimated, and the method ishigh in human sound / musical instrument sound determining fault tolerance. The method is simple in algorithm, high in processing speed, capable of being widely used in transmission of programs in broadcasting station, automatic digital media management and the like.

Description

Technical field [0001] The invention belongs to the technical field of digital audio processing, and relates to the human voice detection problem, and specifically is a method for estimating the starting position of the human voice in a song. The method can be applied to real-time human voice position marking of broadcast audio. Background technique [0002] A song is usually composed of pure accompaniment and singing. The pure accompaniment part is purely produced by the accompaniment instrument and does not contain the vocal part, while the singing part is the superposition of the vocal and accompaniment music. In the current management of digital media data, it is often necessary to mark the starting position (starting point) of the human voice in a song. The starting point of the vocal information has many uses. For example, in the live program of a radio station, the starting position of the vocal can help the host control the speaking time, set the cross-fade of adjacent so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/05G10L15/06G10L25/24G10L25/45G10L25/69G10L25/81G10L25/87

CPCG10L15/05G10L15/063G10L25/24G10L25/45G10L25/69G10L25/81G10L25/87

Inventor甘涛甘云强何艳敏

OwnerUNIV OF ELECTRONIC SCI & TECH OF CHINA

Real-time detection method of initial position of human sound in song

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology