Course field multi-modal document classification method based on cross-modal attention convolutional neural network

A convolutional neural network, multimodal technology, applied in the field of multimodal document classification in the course field based on cross-modal attention convolutional neural network, can solve difficult image and text semantic feature vectors, reduce multimodal document Accuracy of feature expression, hindering the performance of multimodal document classification tasks, etc.
CN111985369AActive Publication Date: 2020-11-24NORTHWESTERN POLYTECHNICAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTHWESTERN POLYTECHNICAL UNIV
Publication Date
2020-11-24

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to a course field multi-modal document classification method based on a cross-modal attention convolutional neural network. Multi-modal document data in a course field is preprocessed; an attention mechanism and a dense convolutional network are combined, a convolutional neural network based on cross-modal attention is provided, and image features with sparsity can be constructed more effectively; a text feature-oriented attention mechanism-based bidirectional long-short-term memory network is provided, so that text features locally associated with image semantics can beefficiently constructed; and cross-modal grouping fusion based on an attention mechanism is designed, so that the local association relationship between the images and the texts in the document can belearned more accurately, and the accuracy of cross-modal feature fusion is improved. Under the data set of the same course field, compared with an existing multi-modal document classification model,the method has better performance, and the accuracy of multi-modal document data classification is improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the field of computer applications, multimodal data classification, educational data classification, image processing, and text processing, and in particular relates to a multimodal document classification method in the course field based on a cross-modal attention convolutional neural network. Background technique

[0002] With the development of science and technology, the data to be processed by computers in various fields has changed from a single image to multi-modal data such as images, text, and audio with richer forms and contents. Classification of multimodal documents has applications in video classification, visual question answering, entity matching for social networks, etc. The accuracy of multimodal document classification depends on whether the computer can accurately understand the semantics and content of the images and text contained in the document. However, the images in multi-modal documents mixed with tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More