Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for recognizing speaker in video in real time

A technology in speaker and video, applied in the field of real-time recognition of speakers in video, can solve problems such as inability to apply, and achieve the effect of real-time processing

Active Publication Date: 2022-07-29
ZHEJIANG LAB
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the embodiment of the present application is to provide a method and device for real-time identification of speakers in a video, so as to solve the technical problems existing in related technologies that cannot be applied in scenarios with high real-time requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for recognizing speaker in video in real time
  • Method and device for recognizing speaker in video in real time
  • Method and device for recognizing speaker in video in real time

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057]Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.

[0058] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and / or" as ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for recognizing a speaker in a video in real time. The method comprises the following steps: acquiring an image sequence and an audio sequence which start at the same moment and are continuous; detecting and tracking a face according to the latest frame of image in the image sequence, and updating an existing face sequence information base; inputting the face sequence information in the face sequence information base and the audio sequence into a trained speaker detection network, detecting a speaking state, and updating a speaking state database; and according to the speaking state database, obtaining the current state of all people so as to identify possible speakers in the video.

Description

technical field [0001] The invention belongs to the field of computer vision speaker detection, and in particular relates to a method and device for real-time recognition of a speaker in a video. Background technique [0002] Speaker classification refers to automatically distinguishing different speakers appearing in a piece of audio, and dividing the audio into corresponding audios according to different speakers. However, for some multi-speaker scenarios, it is difficult to automatically classify speakers accurately. Therefore, a recognition method based on the mixed information of image sequences and audio sequences is introduced. The hybrid information recognition method can greatly improve the recognition accuracy. [0003] In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art: [0004] At the same time, due to the introduction of mixed information, the processing time is significantly in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/04G06V20/40G06V40/16G06V10/774G06V10/82
CPCG06N3/04G06F18/214
Inventor 黄敏林哲远朱世强宋伟王文金天磊
Owner ZHEJIANG LAB