Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-modal pre-training model training method, application method and device thereof

A pre-training, multi-modal technology, applied in the direction of neural learning methods, biological neural network models, special data processing applications, etc., can solve the problem of low accuracy, achieve the effect of improving accuracy, facilitating deployment, and improving accuracy

Pending Publication Date: 2021-06-18
北京智源人工智能研究院 +1
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in practical applications, there may not be a strong semantic correlation between the image and the text in the picture-text pairing. The above-mentioned related technologies judge whether the text and the image correspond through the semantic correspondence between words and images, and the accuracy is very low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal pre-training model training method, application method and device thereof
  • Multi-modal pre-training model training method, application method and device thereof
  • Multi-modal pre-training model training method, application method and device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for thorough understanding of the application and to fully convey the scope of the application to those skilled in the art.

[0062] It should be noted that, unless otherwise specified, technical terms or scientific terms used in this application shall have the usual meanings understood by those skilled in the art to which this application belongs.

[0063] A training method, application method, and device for a multimodal pre-training model proposed according to an embodiment of the present application will be described below with reference to the accompanying drawings. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-modal pre-training model training method, and an application method and a device thereof. The multi-modal pre-training model training method comprises the following steps: constructing a multi-modal pre-training model of a double-tower structure; obtaining a positive sample data set comprising positive sample image-text pairs and a negative sample data set comprising negative sample image-text pairs; and training a multi-modal pre-training model according to the positive sample data set and the negative sample data set, wherein the multi-modal pre-training model comprises a cross-modal comparison learning module used for carrying out image-text similarity comparison learning on the positive sample image-text pair and the negative sample image-text pair. The multi-modal pre-training model adopts a double-tower structure and a cross-modal contrast learning algorithm, a large number of negative samples are constructed for image and text modals, the model expression ability is high, and the processing precision of image-text pairs is improved. According to the model, the overall similarity between images and texts is calculated, whether the images and texts correspond or not is judged according to the similarity, and on the basis of the image-text weak correlation hypothesis, the actual situation of semantic weak correlation between the images and texts in image-text pairs in actual application is better fit.

Description

technical field [0001] The application belongs to the field of computer application technology, and in particular relates to a training method, application method and device of a multimodal pre-training model. Background technique [0002] In recent years, pre-training models have become a hot topic in the field of Natural Language Processing (NLP, Natural Language Processing) research. Multimodal pre-training models involving multiple modal information interactions are suitable for more application scenarios. For example, multi-modal pre-training models for image-text pairs have gradually attracted widespread attention. [0003] At present, related technologies provide some multimodal pre-training models for processing image-text pairs. These multi-modal pre-training models assume that there is a strong semantic correlation between text and images in input image-text pairs. By judging whether the text includes Whether there is a semantic correspondence between the words in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F16/583G06F16/33
CPCG06N3/08G06F16/583G06F16/334G06V2201/07G06N3/045G06F18/22G06F18/214Y02D10/00
Inventor 霍宇琦张曼黎刘光镇卢志武窦志成金琴赵鑫宋睿华文继荣
Owner 北京智源人工智能研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products