Method and device for recognizing speaker in video in real time

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology in speaker and video, applied in the field of real-time recognition of speakers in video, can solve problems such as inability to apply, and achieve the effect of real-time processing

Active Publication Date: 2022-07-29

ZHEJIANG LAB

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the embodiment of the present application is to provide a method and device for real-time identification of speakers in a video, so as to solve the technical problems existing in related technologies that cannot be applied in scenarios with high real-time requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0057]Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.

[0058] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and / or" as ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and a device for recognizing a speaker in a video in real time. The method comprises the following steps: acquiring an image sequence and an audio sequence which start at the same moment and are continuous; detecting and tracking a face according to the latest frame of image in the image sequence, and updating an existing face sequence information base; inputting the face sequence information in the face sequence information base and the audio sequence into a trained speaker detection network, detecting a speaking state, and updating a speaking state database; and according to the speaking state database, obtaining the current state of all people so as to identify possible speakers in the video.

Description

technical field [0001] The invention belongs to the field of computer vision speaker detection, and in particular relates to a method and device for real-time recognition of a speaker in a video. Background technique [0002] Speaker classification refers to automatically distinguishing different speakers appearing in a piece of audio, and dividing the audio into corresponding audios according to different speakers. However, for some multi-speaker scenarios, it is difficult to automatically classify speakers accurately. Therefore, a recognition method based on the mixed information of image sequences and audio sequences is introduced. The hybrid information recognition method can greatly improve the recognition accuracy. [0003] In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art: [0004] At the same time, due to the introduction of mixed information, the processing time is significantly in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06V20/40G06V40/16G06V10/774G06V10/82

CPCG06N3/04G06F18/214

Inventor 黄敏林哲远朱世强宋伟王文金天磊

Owner ZHEJIANG LAB

Method and device for recognizing speaker in video in real time

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology