Method and terminal for positioning voice region in song video

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology in area positioning and video, which is applied in the direction of instruments, character and pattern recognition, electrical components, etc. It can solve the problems of accompaniment interference and the inability to accurately locate the song area, etc., and achieve high accuracy and good effect

Active Publication Date: 2018-06-29

FUJIAN STAR NET EVIDEO INFORMATION SYST CO LTD

View PDF6 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In order to identify the area sung in the song video, the prior art uses the audio vocal recognition method, that is, to judge whether it belongs to the singing area by identifying the human voice, but the audio vocal recognition method is easily affected by the accompaniment in the song. Interference, unable to accurately locate the area where the vocals are sung in the song video

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0127] Please refer to figure 1 and figure 2 , a method for vocal area location in a song video, comprising steps:

[0128] S1. Obtain a video frame image corresponding to the song video, and determine the subtitle area of the video frame image;

[0129] Wherein, use the Robert operator to extract the image edge of the video frame, and refine and binarize the extracted image edge;

[0130] Count the total number of pixels in each row and the total number of pixels in each column of the edge of the image after thinning and binarization;

[0131] judging whether there is a first pixel block, in the first pixel block, the total number of pixels in each row is greater than a first preset value, and the height of the first pixel block is greater than the first preset height;

[0132] judging whether there is a second pixel block, in the second pixel block, the total number of pixels in each column is greater than a second preset value, and the width of the second pixel block ...

Embodiment 2

[0154] The difference between this embodiment and embodiment one is:

[0155] The step S2 is:

[0156] S2. Perform the following steps S21 and S22 in parallel or successively:

[0157] S21. Identify the position where the subtitle advances in the subtitle area;

[0158] S22. Segment the boundaries of all characters in the subtitle area, and record the positions of the left and right boundaries of each character, and the positions of the left and right boundaries constitute the character area of each character;

[0159] Use OCR technology to identify the word corresponding to the word area of each word;

[0160] The step S3 is:

[0161] Determine the start time and end time of each word according to the position where the subtitle advances and the word area of each word;

[0162] The step S4 is:

[0163] Locate the vocal area of each word in the song video according to the start time and end time of each word;

[0164] This embodiment realizes the detection of the...

Embodiment 3

[0166] Please refer to image 3 , a terminal 1 for positioning vocal regions in a song video, comprising a memory 2, a processor 3, and a computer program stored in the memory 2 and operable on the processor 3, the processor 3 executing the The steps in the first embodiment are realized when the computer program is described.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method and terminal for positioning a voice region in a song video. The method comprises the following steps: acquiring a video frame image corresponding to a song video, anddetermining a caption region of the video frame image; identifying a caption forward location on the caption region; determining the starting time and ending time of lyrics line in the song video according to the distance between the caption forward location and the border of the caption region; and positioning the voice region in the song video according to the starting time and ending time of lyrics line in the song video; and positioning the voice region in the song video through the caption forward location in the song video without being interfered by the accompany. The accuracy degree ishigh, and the automatic identification can be realized, so that the automatic teaching and singing in the karaoke system becomes possible, and the effect is good.

Description

technical field [0001] The present invention relates to the technical field of audio-visual control, in particular to a method and terminal for locating vocal regions in song videos. Background technique [0002] In order to guide users who are not good at singing to learn to sing, the karaoke system needs a set of automatic singing teaching methods. When performing automatic singing teaching, the first problem is how to automatically identify the vocal singing area in the song video, and then play the original singing or Accompaniment so that users can sing along. In order to identify the area sung in the song video, the prior art uses the audio vocal recognition method, that is, to judge whether it belongs to the singing area by identifying the human voice, but the audio vocal recognition method is easily affected by the accompaniment in the song. Interference, it is impossible to accurately locate the area where the vocals are sung in the song video. Contents of the in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): H04N21/44H04N21/488H04N21/431G06K9/32G06K9/34

CPCH04N21/431H04N21/44008H04N21/4884G06V20/635G06V30/153

Inventor 王子亮蔡智力陈彪邹应双徐继芸林哲明

Owner FUJIAN STAR NET EVIDEO INFORMATION SYST CO LTD

Method and terminal for positioning voice region in song video

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology