Course field multi-modal document classification method based on cross-modal attention convolutional neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional neural network, multimodal technology, applied in the field of multimodal document classification in the course field based on cross-modal attention convolutional neural network, can solve difficult image and text semantic feature vectors, reduce multimodal document Accuracy of feature expression, hindering the performance of multimodal document classification tasks, etc.

Active Publication Date: 2020-11-24

NORTHWESTERN POLYTECHNICAL UNIV

View PDF9 Cites 36 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the images in multi-modal documents mixed with text and graphics in the course field are generally composed of lines and characters, and show high sparseness in visual features such as color and texture; the semantic relationship between text and images in multi-modal documents This makes it difficult for the existing multimodal document classification models to accurately construct the semantic feature vectors of images and text in documents, thus reducing the accuracy of multimodal document feature expression and hindering them. Performance on Multimodal Document Classification Tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0074] Now in conjunction with embodiment, accompanying drawing, the present invention will be further described:

[0075] The multimodal document classification method in the curriculum field based on the cross-modal attention convolutional neural network proposed by the present invention is mainly divided into five modules: 1) preprocessing of multimodal document data. 2) Image feature construction of dense convolutional neural network based on attention mechanism. 3) Two-way long-short-term memory network text feature construction based on attention mechanism. 4) Group cross-modal fusion based on attention mechanism. 5) Multi-label classification of multi-modal documents. The model diagram of the whole method is shown in figure 1 shown, specifically as follows:

[0076] 1. Preprocessing of multimodal document data

[0077] Represent the mixed-text multimodal document as Represents the dth multimodal document data. in I d is the multimodal document data list that ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a course field multi-modal document classification method based on a cross-modal attention convolutional neural network. Multi-modal document data in a course field is preprocessed; an attention mechanism and a dense convolutional network are combined, a convolutional neural network based on cross-modal attention is provided, and image features with sparsity can be constructed more effectively; a text feature-oriented attention mechanism-based bidirectional long-short-term memory network is provided, so that text features locally associated with image semantics can beefficiently constructed; and cross-modal grouping fusion based on an attention mechanism is designed, so that the local association relationship between the images and the texts in the document can belearned more accurately, and the accuracy of cross-modal feature fusion is improved. Under the data set of the same course field, compared with an existing multi-modal document classification model,the method has better performance, and the accuracy of multi-modal document data classification is improved.

Description

technical field [0001] The invention belongs to the field of computer applications, multimodal data classification, educational data classification, image processing, and text processing, and in particular relates to a multimodal document classification method in the course field based on a cross-modal attention convolutional neural network. Background technique [0002] With the development of science and technology, the data to be processed by computers in various fields has changed from a single image to multi-modal data such as images, text, and audio with richer forms and contents. Classification of multimodal documents has applications in video classification, visual question answering, entity matching for social networks, etc. The accuracy of multimodal document classification depends on whether the computer can accurately understand the semantics and content of the images and text contained in the document. However, the images in multi-modal documents mixed with tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/049G06N3/08G06V30/413G06N3/044G06N3/045G06F18/2415G06F18/253

Inventor 宋凌云俞梦真尚学群李建鳌彭杨柳李伟李战怀

Owner NORTHWESTERN POLYTECHNICAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Course field multi-modal document classification method based on cross-modal attention convolutional neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology