News video retrieval method based on speech classifying indentification

A classification recognition and speech technology, applied in speech recognition, speech analysis, television, etc., can solve the problems that the query method is not suitable for people's usual way, the speaker cannot be found, and how the user can get it

Active Publication Date: 2006-08-30
NEW FOUNDER HLDG DEV LLC +2
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method brings the following two problems: (1) When people retrieve videos, they retrieve them based on human high-level semantic features such as football matches, Iraq wars, bird flu, etc., which are different from the underlying features of videos described by computers, such as Features such as color and texture have great contradictions, and the two cannot be consistent; (2) The existing video retrieval methods cannot realize the retrieval from text to video well, and the query method is not suitable for people's usual methods, and the application is very inappropriate. convenient
The existing video retrieval method is: generally, the user submits a query shot or query segment to the system, and then the system returns a result similar to the query example. However, at the same time, the problem is: how does the user get the query example? In addition, the query method that most users are accustomed to is to enter query text, and then the system returns video materials related to the query text. For example, the user enters the query text "Iraq War" and hopes that the system can return video materials related to "Iraq War". Similar to the current search engines such as Google and Baidu, but different from these search engines, the input is text, and the retrieval result is video data
This method is difficult to apply to speech clips that include many people, because it is difficult to find everyone to train the speech recognition system. Even for a few people's speech clips, it is often impossible to find the speaker for speech training, such as for news videos. For speech recognition, it is impossible to find every speaker for speech training; in addition, even after speech training, it is still difficult to recognize non-standard speech, and the recognition rate is very low
However, if the speech recognition system is directly used for speech recognition of the news video without speech training, the recognition effect will be worse and the recognition rate will be lower, because the news programs of the video usually include the following various sounds: (1) with music Background news program preview; (2) advertisement; (3) weather forecast; (4) non-standard voice, such as the dialect of the interviewee; (5) standard voice
Among the above-mentioned voices, the recognition rate of non-standard voices is very low, and the recognition rate of (1)-(3) is even lower, and basically cannot be recognized
Therefore, if the speech recognition system is directly used to perform speech recognition on the entire news video indiscriminately, the result is: the speech recognition system recognizes all kinds of sounds contained in the news video, and finally leads to the result of speech recognition. Including correct recognition results (mainly the recognition of the standard speech in the above 5) and wrong recognition results (mainly the recognition of other speeches in the above 1 to 4), and the computer cannot know which are the correct results and which are is a wrong result, therefore, when searching for videos based on this, if you search for videos corresponding to the text "Iraq War", many wrong results will appear

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • News video retrieval method based on speech classifying indentification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0020] Such as figure 1 As shown, a news video retrieval method based on speech classification and recognition includes the following steps:

[0021] (1) Utilize sound classifier, segment out the speech segment of standard speech in the news video, the standard speech in the present embodiment is illustrated with standard mandarin as example;

[0022] Audio classification uses a classification model based on support vector machines, which is divided into two parts: classifier model training and classification prediction. The audio feature uses a 13-dimensional feature vector composed of log energy (log energy) and Mel cepstral coefficient (MFCC).

[0023] In this embodiment, the process of classifier model training is: first select training samples, then extract the audio features formed by the logarithmic energy and Mel cepstral co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This invention relates to a news video search method based on phone sort identification, which divides all phone fragments of standard phones automatically in news video then identifies the standard phones by a phone identification system, since the standard phone can express the main content of the video, ití»s easy to realize the news searches from the context to the video.

Description

technical field [0001] The invention belongs to the technical field of computer voice recognition and video retrieval, and in particular relates to a news video retrieval method based on voice classification recognition. Background technique [0002] At present, speech recognition technology has a wide range of applications, not only in the field of audio, but also in the field of video, because the video also contains audio information. If the speech content in the video can be recognized through speech recognition technology, it can provide powerful support for video retrieval and realize the retrieval from speech text to video content. Existing video retrieval technologies generally extract low-level features such as color and texture from videos, and then perform video retrieval based on these features. However, this method brings the following two problems: (1) When people retrieve videos, they retrieve them based on human high-level semantic features such as football ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04N5/93G10L15/00G10L15/08G10L15/06G11B27/10G10L21/06
Inventor 彭宇新房翠华陈晓鸥吴於茜
Owner NEW FOUNDER HLDG DEV LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products