Incremental data enhancement method for visual question and answer model training and application

A model training and incremental technology, applied in the field of model training, can solve problems such as increasing the difficulty of reasoning, difficulty in achieving recognition effect, increasing the amount of data in semantic expression, and answering conflicts, etc., to achieve data diversity, Improve the classification accuracy and improve the effect of the effect

Active Publication Date: 2020-11-20
TONGJI UNIV
View PDF9 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Among them, the diversity of semantic expressions increases the possibility of conflicts in the amount of data and answer conflicts, thus increasing the difficulty of reasoning. Therefore, the diversity of semantic expres

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incremental data enhancement method for visual question and answer model training and application

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] This embodiment provides an incremental data enhancement method for visual question answering model training, the method includes a data statistics step, a threshold value determination step and a data expansion step, specifically: obtaining the original training data set, the training samples in the data set The form of is , the text is formed by natural language sequence; obtain the sentence length distribution of the natural language sequence in the original training data set and the word frequency distribution of each word, and determine based on the sentence length distribution Minimum sentence length threshold and maximum sentence length threshold; according to the minimum sentence length threshold, maximum sentence length threshold and word frequency distribution, the natural language sequence in the training sample is expanded to realize data enhancement.

[0038] Data statistics include the length statistical distribution and word frequency distribution of text ...

Embodiment 2

[0044] This embodiment provides a training method for a visual question answering model, which adopts end-to-end training. The process of the training method is as follows figure 1 shown, including:

[0045] (1) Model initialization.

[0046] (2) Expand the original training data set with the incremental data enhancement method described in Embodiment 1, obtain the expanded training data set, and realize text data enhancement.

[0047] (3) Feature extraction is performed on the training samples in the expanded training data set to obtain text features and image features.

[0048] During the training process of the model, the maximum length of the text language sequence is cut so that the maximum length is less than the maximum length limit of the sequential neural network model, and then the sequence is sent to the query table module, and then the output result is sent to the sequential neural network to extract the text In the test stage, the original text language sequence...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an incremental data enhancement method for visual question and answer model training, and the method comprises the steps: obtaining an original training data set, wherein theform of a training sample in the data set is (images, texts and answers), and the texts are formed by a natural language sequence; obtaining sentence length distribution of a natural language sequencein the original training data set and word frequency distribution of each word, and determining a minimum sentence length threshold value and a maximum sentence length threshold value based on the sentence length distribution; and expanding a natural language sequence in the training sample according to the minimum sentence length threshold, the maximum sentence length threshold and the word frequency distribution to achieve data enhancement. Compared with the prior art, the method has the advantages of data diversity, good efficiency, simplicity and the like.

Description

technical field [0001] The invention relates to a model training method, in particular to an incremental data enhancement method and application for visual question answering model training. Background technique [0002] In recent years, with the popularization of mobile devices and the increasing demand of the people, all kinds of visual data presented to everyone have shown explosive growth, and people's demand for visual question answering systems that can answer doubts has continued to rise. The visual question answering system aims to help complete the interpretation of visual information according to people's demand descriptions, involving question understanding, object retrieval, positioning and reasoning. Compared with other cross-modal tasks such as visual description, the development of visual question answering tasks is still limited by the contradiction between infinite search space and incomplete training data, the contradiction between statistical reasoning and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F40/216G06F16/332G06N3/04
CPCG06F40/216G06F16/3329G06N3/045G06F18/214G06F18/25G06F18/253
Inventor 王瀚漓龙宇
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products