Video character recognition method based on multi-modal feature fusion deep network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A feature fusion and deep network technology, applied in the field of video person recognition, can solve the problems of low accuracy of video person recognition and insufficient coverage of person recognition

Pending Publication Date: 2020-08-07

NANJING UNIV +1

View PDF3 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the correlation and transferability between the various modal information in the video data make the traditional method of pattern recognition relying on a single information low in the accuracy of video person recognition, because the characteristics of each individual modal are not enough. Covers all elements of person identification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0023] In order to demonstrate the purpose, features and advantages of the present invention in detail, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific implementation examples.

[0024] There are the following difficulties in video person recognition:

[0025] 1) The amount of video data is huge: the number of original videos is large, and the duration and resolution are uneven. The public video data set contains 10034 celebrities in complex scenes, 200 hours, 200,000 film and television dramas and short videos. The amount of data is very large. These challenges require the computing power of the environment where the model is run, and the complexity of the model will be limited.

[0026] 2) How to represent the characters in the video clips: the target of pattern recognition of single information is relatively easy to represent, while the same video in multi-modal data may contain multiple characters....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video character recognition method based on a multi-modal feature fusion deep network. The video character recognition method is a deep learning target recognition multi-modal fusion algorithm provided specially for the target recognition problem of multi-modal character video feature data. According to the video character recognition method, a network structure of the algorithm is composed of a plurality of single-modal multi-layer sensor identification modules and a multi-modal feature fusion module. The video character recognition method comprises the steps of: preprocessing multi-modal data generated by a video, training a plurality of deep networks by using the preprocessed different modal data, on the basis, subjecting features generated by a plurality of sub-networks to weighted fusion, and combining a feature weighted fusion module with models of different modals to achieve a better identification result. By adopting the video character recognition method, a multi-modal feature set weight fusion strategy is used for constructing a video figure target recognizer on a public video figure data set (iQIYI-VID-2019) for multi-modal features generated bythe preprocessed video, multi-model integration is not needed, and the average precision mean value of a single model reaches 89.52%.

Description

technical field [0001] The invention relates to a video character recognition method based on a multimodal feature fusion deep network, which belongs to the field of computer applications. Background technique [0002] With the rapid development of the Internet today, major video websites have massive video data and hundreds of millions of online video users. Compared with traditional images, video content information is richer. Video person recognition technology has a wide range of application scenarios, such as intelligent recommendation scenarios for advertising and user customization; intelligent creation scenarios for background music, emoticon package generation, and short video synthesis; violence, blood, pornography and other violations Smart moderation scenarios for videos. [0003] Person recognition has become a popular computer application direction. With the development of deep learning technology, face recognition, speech recognition, gesture recognition, ga...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/084G06V40/168G06V40/10G06V20/40G06V20/46G06N3/045G06F18/241G06F18/253Y02D10/00

Inventor 陈建蓉史颖欢高阳

Owner NANJING UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video character recognition method based on multi-modal feature fusion deep network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology