Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for recognizing human voices in audio

An audio and human voice technology, applied in the field of multimedia information and audio signal analysis, can solve the problems of low precision and low accuracy

Active Publication Date: 2014-01-01
深圳太乐文化科技有限公司
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method and device for identifying human voice in audio, which is formed by extracting effective audio features, namely short-term zero-crossing rate, P-order LPC prediction coefficient, and skewness and kurtosis of LPC prediction residual amplitude spectrum. Feature vectors, and use machine learning to identify human voices in audio, solve the problem of low precision and low accuracy in human voice recognition research, and realize high-precision and high-confidence recognition of human voices in audio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for recognizing human voices in audio
  • Method and device for recognizing human voices in audio
  • Method and device for recognizing human voices in audio

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

[0023] exist figure 1 A first embodiment of the invention is shown in .

[0024] figure 1 It is the method for recognizing human voice in audio in the first embodiment of the present invention, and the implementation process 100 is described in detail as follows:

[0025] In step 101, audio data is divided into frames.

[0026] Step 101 (such as figure 2 shown) specifically include:

[0027] Step 1011, detect whether the audio is two-channel or multi-channel.

[0028] In this embodiment, the input audio can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for recognizing human voices in audio. The method comprises the steps that framing processing is carried out on audio data; each frame of the audio data after the framing processing is analyzed by using LPC with an order of P, and audio characteristics are extracted, and comprise a short-time zero-crossing rate, a P-order LPC predication coefficient and the skewness and the kurtosis of an LPC predicted residual magnitude spectrum; a P+3-order characteristic vector is formed according to the audio characteristics; an SVM algorithm is used for training the characteristic vector to obtain a corresponding support vector machine; whether the human voices are included in each frame of the audio data or not is recognized according to the support vector machine. According to the method and device for recognizing the human voices in the audio, the high-precision high-confidence reorganization on the human voices in the audio can be achieved, fundamental services are provided for song content analysis, and therefore the functions of lyric synchronization, song classification, song recommendation and the like are further achieved.

Description

technical field [0001] The invention relates to the field of multimedia information, in particular to the field of audio signal analysis, in particular to a method and device for recognizing human voice in audio. Background technique [0002] With the continuous development of multimedia technology, audio and video information plays an increasingly important role in people's work, style and entertainment. For example, major music websites on the Internet classify songs or recommend songs, so that each user can search for songs as soon as possible or recommend good songs to users. [0003] At present, the song classification and recommendation of major music websites are mostly based on text analysis and user behavior collaborative filtering, and there is no application of in-depth audio content analysis technology. The audio content analysis technology classifies the audio according to the extracted audio features, so that the user can retrieve the desired audio more accura...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L17/00G10L17/02G10L17/04
Inventor 田彪
Owner 深圳太乐文化科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products