Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Visual question-answering model training method and device

A model training and visual technology, applied in the computer field, can solve problems such as limited data sets, no consideration of spatial semantic context information between image regions, image features and problem feature extraction and single processing, etc.

Active Publication Date: 2019-10-18
BEIJING KINGSOFT DIGITAL ENTERTAINMENT CO LTD +1
View PDF18 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the existing visual question answering model training methods are too simple to extract and process image features and question features, without considering the spatial semantic context information between image regions, and the current visual question answering VQA question data set is limited, and the model is generally overfitting. combined state, which affects the semantic context similarity between the obtained answer and the real answer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual question-answering model training method and device
  • Visual question-answering model training method and device
  • Visual question-answering model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0092] In the following description, numerous specific details are set forth in order to provide a thorough understanding of the application. However, the present application can be implemented in many other ways different from those described here, and those skilled in the art can make similar promotions without violating the connotation of the present application. Therefore, the present application is not limited by the specific implementation disclosed below.

[0093] Terms used in one or more embodiments of this specification are for the purpose of describing specific embodiments only, and are not intended to limit one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and / or" used in one or more embodiments of the present sp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a visual question-answering model training method and device, and relates to the technical field of computers. The visual question-answering model training method comprises thesteps of obtaining a training sample and a sample label; extracting sample image feature information and sample problem feature information; performing feature cross processing on the sample image feature information and the sample problem feature information to obtain a sample image feature vector carrying the sample problem information and a sample problem feature vector carrying the sample image information; inputting the sample image feature vector carrying the sample question information and the sample question feature vector carrying the sample image information into the visual question-answering model to obtain a prediction answer through the visual question-answering model; determining a loss value of a loss function based on the real answer and the predicted answer; and updating the visual question and answer model through the loss value of the loss function.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a visual question answering model training method and device, a computing device and a computer-readable storage medium. Background technique [0002] Visual Question Answering (VQA) is a comprehensive task involving computer vision and natural language processing. A VQA system takes a picture and a free-form, open-ended natural language question about the picture as input. Generate a natural language answer as output. [0003] At present, the existing visual question answering model training methods generally first extract the image features to be answered through a pre-trained deep convolutional neural network model (CNN), convert the questions into several word vectors, and then convert the image features into The question words of the word vector are input into the long short-term memory network (LSTM) together, and the LSTM network is used to generate the answer...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/243G06F18/253
Inventor 李长亮詹华年丁洪利唐剑波
Owner BEIJING KINGSOFT DIGITAL ENTERTAINMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products