A person recognition method and system based on a multi-frame audio-video fusion network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for person recognition and fusion network, which is applied in the field of person recognition method and system based on multi-frame audio and video fusion network, can solve the problems affecting the effect of fusion features, the decline of visual feature discrimination, and the decline of feature discrimination ability, so as to avoid Influence, the effect of excellent recognition effect

Active Publication Date: 2021-09-24

INST OF COMPUTING TECH CHINESE ACAD OF SCI

View PDF8 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At present, the person recognition algorithm based on audio and video fusion can make full use of the information of face features and voiceprint features to determine the identity of the person, but the fusion algorithm fails to solve the problem of the decline in the discrimination of visual features in low-quality situations

[0004] When the inventor was conducting research on person recognition in the field of network audio and video monitoring, he found the following defects in the existing technology: one is that the algorithm based on single modality is difficult to solve the practical problems of complex network audio and video monitoring, and the face recognition algorithm works at low The degradation of high-quality images is serious, and the recognition accuracy of voiceprint recognition algorithms is also limited; second, in the field of network audio and video surveillance, there are often a large number of difficult-to-recognize images

Directly extracting face features from these difficult-to-recognition pictures will lead to a decrease in the ability to distinguish features, which will affect the effect of subsequent fusion features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] In recent years, video accounts for the vast majority of network traffic, and the proportion continues to increase. Massive videos are inevitably mixed with illegal videos, and these videos spread quickly, have a wide range of influence, and are extremely harmful. Therefore, intelligent analysis of video content and prevention of illegal video flooding on the Internet have become urgent problems to be solved. Illegal video is a complex concept. To accurately identify it requires not only analyzing the underlying visual features, but also understanding the high-level semantic associations, which is a very challenging task. As people are the main body of video content, the accurate identification of specific people can effectively assist the intelligent analysis of illegal videos. Such as figure 1As shown, the multi-frame audio and video fusion algorithm is mainly divided into three stages: the fusion of multi-frame visual features, the fusion of multi-frame voiceprint ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention proposes a method and system for character recognition based on a multi-frame audio-video fusion network, which is characterized in that it includes: a visual feature fusion step, decoding a video to be recognized, obtaining continuous K frames of the video, and extracting the continuous K frames For the face features of each frame in , all the face features are weighted and fused to obtain multi-frame visual features, K is a positive integer; the voiceprint feature fusion step extracts the voiceprint features of each frame in the continuous K frames, using The time recurrent neural network fuses all the voiceprint features to obtain multi-frame voiceprint features; the audio and video feature fusion step uses the fully connected layer to fuse the multi-frame visual features and the multi-frame voiceprint features, and uses the classification loss to constrain the fusion process , the multi-frame audio-video fusion feature is obtained, and the person recognition is performed according to the multi-frame audio-video fusion feature.

Description

technical field [0001] The present invention relates to the field of character recognition, and in particular to a character recognition method and system based on a multi-frame audio-video fusion network. Background technique [0002] Person recognition in video mainly uses the intrinsic or extrinsic attributes of the person to determine its identity. At present, the commonly used method is to use the biological characteristics of the human body, such as human face, voiceprint, etc., to identify the identity of the person. The corresponding algorithms include face recognition algorithm, voiceprint recognition algorithm and so on. Mainstream face recognition algorithms use convolutional neural networks to learn a mapping from raw face images to identity-invariant features from large-scale face datasets. Researchers often carefully design different loss functions, such as bigram loss, triplet loss, and center loss, to constrain the mapping process from images to features. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06K9/62G06K9/00G10L17/02

CPCG10L17/02G06V40/168G06F18/253

Inventor 高科王永杰

Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI

A person recognition method and system based on a multi-frame audio-video fusion network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology