Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video character recognition method based on multi-modal feature fusion deep network

A feature fusion and deep network technology, applied in the field of video person recognition, can solve the problems of low accuracy of video person recognition and insufficient coverage of person recognition

Pending Publication Date: 2020-08-07
NANJING UNIV +1
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the correlation and transferability between the various modal information in the video data make the traditional method of pattern recognition relying on a single information low in the accuracy of video person recognition, because the characteristics of each individual modal are not enough. Covers all elements of person identification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video character recognition method based on multi-modal feature fusion deep network
  • Video character recognition method based on multi-modal feature fusion deep network
  • Video character recognition method based on multi-modal feature fusion deep network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to demonstrate the purpose, features and advantages of the present invention in detail, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific implementation examples.

[0024] There are the following difficulties in video person recognition:

[0025] 1) The amount of video data is huge: the number of original videos is large, and the duration and resolution are uneven. The public video data set contains 10034 celebrities in complex scenes, 200 hours, 200,000 film and television dramas and short videos. The amount of data is very large. These challenges require the computing power of the environment where the model is run, and the complexity of the model will be limited.

[0026] 2) How to represent the characters in the video clips: the target of pattern recognition of single information is relatively easy to represent, while the same video in multi-modal data may contain multiple characters....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video character recognition method based on a multi-modal feature fusion deep network. The video character recognition method is a deep learning target recognition multi-modal fusion algorithm provided specially for the target recognition problem of multi-modal character video feature data. According to the video character recognition method, a network structure of the algorithm is composed of a plurality of single-modal multi-layer sensor identification modules and a multi-modal feature fusion module. The video character recognition method comprises the steps of: preprocessing multi-modal data generated by a video, training a plurality of deep networks by using the preprocessed different modal data, on the basis, subjecting features generated by a plurality of sub-networks to weighted fusion, and combining a feature weighted fusion module with models of different modals to achieve a better identification result. By adopting the video character recognition method, a multi-modal feature set weight fusion strategy is used for constructing a video figure target recognizer on a public video figure data set (iQIYI-VID-2019) for multi-modal features generated bythe preprocessed video, multi-model integration is not needed, and the average precision mean value of a single model reaches 89.52%.

Description

technical field [0001] The invention relates to a video character recognition method based on a multimodal feature fusion deep network, which belongs to the field of computer applications. Background technique [0002] With the rapid development of the Internet today, major video websites have massive video data and hundreds of millions of online video users. Compared with traditional images, video content information is richer. Video person recognition technology has a wide range of application scenarios, such as intelligent recommendation scenarios for advertising and user customization; intelligent creation scenarios for background music, emoticon package generation, and short video synthesis; violence, blood, pornography and other violations Smart moderation scenarios for videos. [0003] Person recognition has become a popular computer application direction. With the development of deep learning technology, face recognition, speech recognition, gesture recognition, ga...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06V40/168G06V40/10G06V20/40G06V20/46G06N3/045G06F18/241G06F18/253Y02D10/00
Inventor 陈建蓉史颖欢高阳
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products