Video content description method based on semantic information guidance

A technology for semantic information and video content, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of cumbersome research methods and chaotic timing, and achieve the effect of improving accuracy and ensuring temporal and spatial correlation.
CN107038221AActive Publication Date: 2017-08-11HANGZHOU DIANZI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU DIANZI UNIV
Publication Date
2017-08-11

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a video content description method based on semantic information guidance. The method comprises the steps that (1) a video format is preprocessed; (2) semantic information for guidance is established; (3) the weight of each semantic feature vector [A,XMS<(i)>] is calculated; (4) the semantic feature vectors [A,XMS<(i)>] are decoded; and (5) a video description model is tested. According to the method, by use of a faster-rcnn model, key semantic information on each frame of an image can be quickly detected, and the key semantic information is added into original features extracted through a CNN, so that the feature vector input into an LSTM network at each time node has semantic information; thus, in the decoding process, video content space-time relevancy is guaranteed, and language description accuracy is improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of computer vision and natural language processing, and relates to a video content description method guided by semantic information. Background technique

[0002] 1. Video content description

[0003] Previous research work on video content description is mainly divided into two directions:

[0004] 1. A method based on feature recognition and language template filling. Specifically, the method is divided into two steps. First, the video is converted into an image collection with continuous frames according to a certain time interval; second, a series of feature classifiers pre-trained in a large-scale image training set are used to convert The static features and dynamic features in the video are classified and marked. Specifically, these features can be subdivided into entities, entity attributes, interactive relationships between entities, and scenes, etc.; finally, according to the characteristics of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More