Short video classification method based on multi-modal feature complete representation
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A classification method and short video technology, applied in video data clustering/classification, video data retrieval, video data indexing, etc., to achieve the effect of improving accuracy
Inactive Publication Date: 2021-07-23
TIANJIN UNIV
View PDF0 Cites 2 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0006] The present invention provides a short video classification method based on the complete representation of multimodal features,
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0027] The embodiment of the present invention provides a short video classification method based on the complete representation of multimodal features, which makes full use of the content information and label information of the short video, see figure 1 , the method includes the following steps:
[0028] 101: For content information, according to experience, the semantic feature representation of visual modality is very important in short video multi-label classification tasks. Therefore, a representation learning based on visual modality features is proposed, mainly based on visual modality features, from The modality-missing perspective constructs four subspaces, learns information complementarity between modalities, and obtains latent representations of two types of visual modality features. Considering the consistency of visual modality feature information, in order to obtain a more compact representation of visual modality features, the latent representations of two typ...
Embodiment 2
[0039] The scheme in embodiment 1 is further introduced below in conjunction with calculation formula and examples, see the following description for details:
[0040] 201: The model inputs a complete short video, and extracts three modal features of vision, audio and track respectively;
[0041] For the visual modality, extract key frames, and use the classic image feature extraction network ResNet (residual network) for all video key frames, and then do the average (AvePooling) operation to obtain the visual modality feature X v The overall characteristics of z v :
[0042]
[0043] Among them, ResNet(·): residual network, AvePooling(·): average operation, X v : the original visual features of the short video, βv: the network parameters to be learned, visual modality z v The dimension is d v .
[0044] For the audio mode, draw the sound spectrogram, and use "CNN+LSTM (convolutional neural network + long short-term memory network)" to extract the sound features of t...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more
PUM
Login to view more
Abstract
The invention discloses a short video classification method based on multi-modal feature complete representation, and the method comprises the steps: for the content information of a short video, providing constructing four subspaces from the perspective of modal missing mainly based on a visual modal feature, and obtaining potential feature representations respectively, further fusing the potential feature representations of the four subspaces by using an automatic coding and decoding network to ensure that more robust and effective public potential representations are learned; for label information, using inverse covariance estimation and a graph attention network to explore correlation between labels and update label representation to obtain label vector representation corresponding to the short video; providing a multi-head cross-modal fusion scheme based on multi-head attention for the public potential representation and the label vector representation, wherein the multi-head cross-modal fusion scheme is used for obtaining a label prediction score of the short video, wherein the overall loss function of the model is composed of traditional multi-label classification loss and reconstruction loss of an automatic coding and decoding network and is used for measuring the difference between a network output value and an actual value and guiding the network to find an optimal solution of the model.
Description
technical field [0001] The invention relates to the field of short video classification, in particular to a short video classification method based on complete representation of multimodal features. Background technique [0002] In recent years, with the popularity of smart terminals and the popularity of social networks, more and more information is presented in multimedia content. High-definition cameras, large-capacity storage and high-speed network connections have created extremely convenient shooting and sharing conditions for users, thus creating massive amounts of multimedia data. [0003] As a new type of user-generated content, short videos have been greatly welcomed in social networks due to their unique advantages such as low barriers to creation, fragmented content, and strong social attributes. Especially since 2011, with the popularization of mobile Internet terminals, the speed-up of the network and the reduction of traffic charges, short videos have quickly...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.