Course field multi-modal document classification method based on cross-modal attention convolutional neural network
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHWESTERN POLYTECHNICAL UNIV
- Publication Date
- 2020-11-24
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention belongs to the field of computer applications, multimodal data classification, educational data classification, image processing, and text processing, and in particular relates to a multimodal document classification method in the course field based on a cross-modal attention convolutional neural network. Background technique
[0002] With the development of science and technology, the data to be processed by computers in various fields has changed from a single image to multi-modal data such as images, text, and audio with richer forms and contents. Classification of multimodal documents has applications in video classification, visual question answering, entity matching for social networks, etc. The accuracy of multimodal document classification depends on whether the computer can accurately understand the semantics and content of the images and text contained in the document. However, the images in multi-modal documents mixed with tex...