Course field multi-modal document classification method based on cross-modal attention convolutional neural network

A convolutional neural network, multimodal technology, applied in the field of multimodal document classification in the course field based on cross-modal attention convolutional neural network, can solve difficult image and text semantic feature vectors, reduce multimodal document Accuracy of feature expression, hindering the performance of multimodal document classification tasks, etc.

Active Publication Date: 2020-11-24
NORTHWESTERN POLYTECHNICAL UNIV
View PDF9 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the images in multi-modal documents mixed with text and graphics in the course field are generally composed of lines and characters, and show high sparseness in visual features such as color and texture; the semantic relationship between text and images in multi-modal documents This makes it difficult for the existing multimodal document classification models to accurately construct the semantic feature vectors of images and text in documents, thus reducing the accuracy of multimodal document feature expression and hindering them. Performance on Multimodal Document Classification Tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Course field multi-modal document classification method based on cross-modal attention convolutional neural network
  • Course field multi-modal document classification method based on cross-modal attention convolutional neural network
  • Course field multi-modal document classification method based on cross-modal attention convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] Now in conjunction with embodiment, accompanying drawing, the present invention will be further described:

[0075] The multimodal document classification method in the curriculum field based on the cross-modal attention convolutional neural network proposed by the present invention is mainly divided into five modules: 1) preprocessing of multimodal document data. 2) Image feature construction of dense convolutional neural network based on attention mechanism. 3) Two-way long-short-term memory network text feature construction based on attention mechanism. 4) Group cross-modal fusion based on attention mechanism. 5) Multi-label classification of multi-modal documents. The model diagram of the whole method is shown in figure 1 shown, specifically as follows:

[0076] 1. Preprocessing of multimodal document data

[0077] Represent the mixed-text multimodal document as Represents the dth multimodal document data. in I d is the multimodal document data list that ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a course field multi-modal document classification method based on a cross-modal attention convolutional neural network. Multi-modal document data in a course field is preprocessed; an attention mechanism and a dense convolutional network are combined, a convolutional neural network based on cross-modal attention is provided, and image features with sparsity can be constructed more effectively; a text feature-oriented attention mechanism-based bidirectional long-short-term memory network is provided, so that text features locally associated with image semantics can beefficiently constructed; and cross-modal grouping fusion based on an attention mechanism is designed, so that the local association relationship between the images and the texts in the document can belearned more accurately, and the accuracy of cross-modal feature fusion is improved. Under the data set of the same course field, compared with an existing multi-modal document classification model,the method has better performance, and the accuracy of multi-modal document data classification is improved.

Description

technical field [0001] The invention belongs to the field of computer applications, multimodal data classification, educational data classification, image processing, and text processing, and in particular relates to a multimodal document classification method in the course field based on a cross-modal attention convolutional neural network. Background technique [0002] With the development of science and technology, the data to be processed by computers in various fields has changed from a single image to multi-modal data such as images, text, and audio with richer forms and contents. Classification of multimodal documents has applications in video classification, visual question answering, entity matching for social networks, etc. The accuracy of multimodal document classification depends on whether the computer can accurately understand the semantics and content of the images and text contained in the document. However, the images in multi-modal documents mixed with tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06V30/413G06N3/044G06N3/045G06F18/2415G06F18/253
Inventor 宋凌云俞梦真尚学群李建鳌彭杨柳李伟李战怀
Owner NORTHWESTERN POLYTECHNICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products