News video retrieval method based on speech classifying identification

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of classification, recognition and speech, applied in speech recognition, speech analysis, television, etc., can solve the problems of query methods that are not suitable for people's habitual methods, unable to find speakers, and how users can get them.

Active Publication Date: 2009-07-01

NEW FOUNDER HLDG DEV LLC +2

View PDF1 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this method brings the following two problems: (1) When people retrieve videos, they retrieve them based on human high-level semantic features such as football matches, Iraq wars, bird flu, etc., which are different from the underlying features of videos described by computers, such as Features such as color and texture have great contradictions, and the two cannot be consistent; (2) The existing video retrieval methods cannot realize the retrieval from text to video well, and the query method is not suitable for people's usual methods, and the application is very inappropriate. convenient

The existing video retrieval method is: generally, the user submits a query shot or query segment to the system, and then the system returns a result similar to the query example. However, at the same time, the problem is: how does the user get the query example? In addition, the query method that most users are accustomed to is to enter query text, and then the system returns video materials related to the query text. For example, the user enters the query text "Iraq War" and hopes that the system can return video materials related to "Iraq War". Similar to the current search engines such as Google and Baidu, but different from these search engines, the input is text, and the retrieval result is video data

This method is difficult to apply to speech clips that include many people, because it is difficult to find everyone to train the speech recognition system. Even for a few people's speech clips, it is often impossible to find the speaker for speech training, such as for news videos. For speech recognition, it is impossible to find every speaker for speech training; in addition, even after speech training, it is still difficult to recognize non-standard speech, and the recognition rate is very low

However, if the speech recognition system is directly used for speech recognition of the news video without speech training, the recognition effect will be worse and the recognition rate will be lower, because the news programs of the video usually include the following various sounds: (1) with music Background news program preview; (2) advertisement; (3) weather forecast; (4) non-standard voice, such as the dialect of the interviewee; (5) standard voice

Among the above-mentioned voices, the recognition rate of non-standard voices is very low, and the recognition rate of (1)-(3) is even lower, and basically cannot be recognized

Therefore, if the speech recognition system is directly used to perform speech recognition on the entire news video indiscriminately, the result is: the speech recognition system recognizes all kinds of sounds contained in the news video, and finally leads to the result of speech recognition. Including correct recognition results (mainly the recognition of the standard speech in the above 5) and wrong recognition results (mainly the recognition of other speeches in the above 1 to 4), and the computer cannot know which are the correct results and which are is a wrong result, therefore, when searching for videos based on this, if you search for videos corresponding to the text "Iraq War", many wrong results will appear

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0020] Such as figure 1 As shown, a news video retrieval method based on speech classification and recognition includes the following steps:

[0021] (1) Utilize sound classifier, segment out the speech segment of standard speech in the news video, the standard speech in the present embodiment is illustrated with standard mandarin as example;

[0022] Audio classification uses a classification model based on support vector machines, which is divided into two parts: classifier model training and classification prediction. The audio feature uses a 13-dimensional feature vector composed of log energy (log energy) and Mel cepstral coefficient (MFCC).

[0023] In this embodiment, the process of classifier model training is: first select training samples, then extract the audio features formed by the logarithmic energy and Mel cepstral co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

This invention relates to a news video search method based on phone sort identification, which divides all phone fragments of standard phones automatically in news video then identifies the standard phones by a phone identification system, since the standard phone can express the main content of the video, ití»s easy to realize the news searches from the context to the video.

Description

technical field [0001] The invention belongs to the technical field of computer voice recognition and video retrieval, and in particular relates to a news video retrieval method based on voice classification recognition. Background technique [0002] At present, speech recognition technology has a wide range of applications, not only in the field of audio, but also in the field of video, because the video also contains audio information. If the speech content in the video can be recognized through speech recognition technology, it can provide powerful support for video retrieval and realize the retrieval from speech text to video content. Existing video retrieval technologies generally extract low-level features such as color and texture from videos, and then perform video retrieval based on these features. However, this method brings the following two problems: (1) When people retrieve videos, they retrieve them based on human high-level semantic features such as football ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): H04N5/93G10L15/00G10L15/08G10L15/06G11B27/10G10L21/06

Inventor彭宇新房翠华陈晓鸥吴於茜

OwnerNEW FOUNDER HLDG DEV LLC

News video retrieval method based on speech classifying identification

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology