Personnel identity recognition method based on audio and video information fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of audio information and video information, which is applied in the field of personal identification based on the fusion of audio and video information, can solve problems such as face recognition failure, and achieve the effect of ensuring accuracy and stability

Inactive Publication Date: 2021-06-18

FUDAN UNIV

View PDF8 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the above two solutions do not really use all the audio and video information, and because face recognition is very sensitive to interference factors such as posture, blur, and occlusion, and people do not always appear in the video, this makes the pure face Recognition often fails

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0015] In order to make the technical means, creative features, goals and effects of the present invention easy to understand, the following embodiments will specifically explain the personal identification method based on audio and video information fusion involved in the present invention in conjunction with the accompanying drawings.

[0016]

[0017] In this embodiment, iQIYI-VID2019 and YouTube video data sets are used as data sets, and the data sets are divided into training set and test set.

[0018] iQIYI-VID2019 is a celebrity identity dataset containing 600,000 video clips of 5,000 celebrities. These video clips are extracted from iQiyi's large number of online videos, and the characters in all videos have been manually annotated.

[0019] The YouTube video data set is a video data type containing millions of personal categories. In this embodiment, 1 million video clips of 5,000 celebrities are selected, and the video annotation adopts the person annotation inform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a personnel identity recognition method based on audio and video information fusion, and the method is characterized in that the method comprises the following steps: S1, reading video information and audio information of audio and video data, carrying out the preprocessing of the video information and the audio information, and obtaining the preprocessed video information and the preprocessed audio information; s2, processing the preprocessed audio information, and extracting audio features; s3, processing the preprocessed video information, and extracting face features, head features and body features of the personnel in the preprocessed video information; S4, establishing a plurality of MLP neural network models, and training the plurality of MLP neural network models and setting weights to obtain an MLP neural network joint model; and S5, inputting the audio features, the face features, the head features and the body features into an MLP neural network joint model to obtain a judgment category result.

Description

technical field [0001] The invention relates to the technical fields of computer vision, hearing and artificial intelligence, in particular to a method for identifying personnel based on fusion of audio and video information. Background technique [0002] Human recognition in audiovisual material is a challenging topic in the field of computer vision and machine learning. At present, there are two solutions in this field, one is called face recognition, and the other is called voiceprint recognition. The so-called face recognition refers to judging whether the face image to be tested and the known face images in the database belong to the same person; voiceprint recognition refers to judging whether the audio to be tested and the known audio in the database belong to the same person. [0003] Unlike still images, audiovisual material contains both visual and audio information. However, the above two solutions do not really use all the audio and video information, and becau...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G10L17/00G10L17/18G06N3/04G06N3/08

CPCG10L17/18G10L17/00G06N3/084G06V40/168G06V40/10G06V20/40G06V40/70G06N3/045

Inventor 潘志灏程颖冯瑞

Owner FUDAN UNIV

Personnel identity recognition method based on audio and video information fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology