Video abstract generation method fusing local target features and global features

A technology of target features and global features, applied in neural learning methods, computer components, biological neural network models, etc., can solve problems such as lack of visual expressiveness of representational features, neglect of local target features, and neglect of interactive relationships between targets, etc., to achieve Detail-rich, performance-boosting, expressive-rich effects
CN113139468AActive Publication Date: 2021-07-20XI AN JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN ยท China
Current Assignee / Owner
XI AN JIAOTONG UNIV
Publication Date
2021-07-20

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

A video abstract generation method fusing local target features and global features comprises the steps: extracting the local target features of a video, wherein the local target features comprise the visual features of a target, the motion track features of the target and the category label features of the target; constructing a local target feature fusion network by using an attention mechanism, and inputting the local target features to obtain fused local target features; and extracting global features of the video from the video by using an encoder in the encoding-decoding framework, introducing the fused local features into the encoding-decoding framework, fusing global feature information and local target feature information of the video, and obtaining representation vectors with richer expressive force; and decoding a corresponding abstract statement according to the representation vector. According to the method, video local target features are introduced into a video abstract generation model of a coding-decoding framework, the visual expressive force representing the features is enriched, then final text generation is optimized, and related semantic text description is generated based on an input video.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical fields of artificial intelligence, computer vision and natural language processing, relates to video comprehension and video summary generation, and in particular to a video summary generation method that integrates local target features and global features. Background technique

[0002] With the continuous development and maturity of artificial intelligence technology in the field of computer vision and natural language processing, the intersecting task in the above fields - video summarization task has gradually become one of the research hotspots in the field of artificial intelligence. The task of video summarization generation refers to, given a video, using a computer to generate a text to describe the content in the video (currently mainly in English), so as to achieve the purpose of understanding the content of the video. The task of video summarization is an important branch of video understanding tasks. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More