Supercharge Your Innovation With Domain-Expert AI Agents!

Video Description Method Based on Deep Transfer Learning

A technology of transfer learning and video description, which is applied in the field of video description based on deep transfer learning, can solve problems such as inaccurate description semantics, and achieve the effect of improving generalization ability and accuracy rate

Active Publication Date: 2022-03-18
SHANXI UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The description semantics of existing video description methods are not accurate enough. In order to improve the accuracy of description, a deep transfer learning video description model is designed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video Description Method Based on Deep Transfer Learning
  • Video Description Method Based on Deep Transfer Learning
  • Video Description Method Based on Deep Transfer Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] Specific embodiments of the present invention will be described in detail below.

[0050] A video description method based on deep transfer learning, comprising the following steps,

[0051] 1) Through the convolutional neural network video representation model, the video is represented as a vector form; the specific model structure is as follows figure 1 shown.

[0052] In step 1), the convolutional neural network model is used to complete the task of video representation. For a set of sampled frames in the video, each frame is input into the convolutional neural network model, and the output of the second fully connected layer is extracted. Mean pooling is then performed on all sampled frames, representing a video as an n-dimensional vector.

[0053] 2) Build an image semantic feature detection model using multi-instance learning to extract image domain semantic features. Image semantic feature detection models such as figure 2 shown.

[0054] Specific steps are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of video processing, in particular to a video description method based on deep transfer learning. Including the following steps, 1) Represent the video as a vector form through the convolutional neural network video representation model; 2) Use multi-instance learning to construct an image semantic feature detection model to extract image domain semantic features; 3) Convert the The image semantic feature detection model is migrated to the frame stream domain, and the frame stream semantic feature detection model is obtained to extract the frame stream semantic features, and realize the deep fusion of the image domain and the frame stream domain semantic features; 4) Construct a deep migration learning video description framework, generate Video natural language description. The invention deeply fuses the semantic features in different domains of the input end to improve the accuracy of generating video descriptions.

Description

technical field [0001] The invention belongs to the technical field of video processing, in particular to a video description method based on deep transfer learning. Background technique [0002] Video description is to use natural language to describe video, which is the focus and difficulty in the field of computer vision and natural language processing, and has broad application prospects in the field of artificial intelligence. [0003] Video description is quite different from image description. Video description not only needs to understand the object in each frame, but also understand the motion of the object between multiple frames. Existing video description methods mainly fall into the following four categories: 1) assign words detected in the visual content to each sentence segment, and then use predefined language templates to generate video descriptions. This type of method relies heavily on sentence templates, and the syntactic structure of the generated sente...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06T7/00
CPCG06T7/0002G06T2207/20164
Inventor 张丽红曹刘彬
Owner SHANXI UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More