Multi-modal pre-training model training method, application method and device thereof

A pre-training, multi-modal technology, applied in the direction of neural learning methods, biological neural network models, special data processing applications, etc., can solve the problem of low accuracy, achieve the effect of improving accuracy, facilitating deployment, and improving accuracy

Pending Publication Date: 2021-06-18
北京智源人工智能研究院 +1
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in practical applications, there may not be a strong semantic correlation between the image and the text in the picture-text pairing. The above-menti

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal pre-training model training method, application method and device thereof
  • Multi-modal pre-training model training method, application method and device thereof
  • Multi-modal pre-training model training method, application method and device thereof

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0061] An exemplary embodiment of the present application will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present application are shown, it is understood that the present application can be implemented in various forms and should not be set forth herein. Instead, these embodiments are provided to be more thoroughly understood, and the scope of the present application can be communicated to those skilled in the art.

[0062] It should be noted that the technical term or scientific terminology used in this application should be understood by those skilled in the art.

[0063] The training method, application method, and apparatus of a multimode pre-test model proposed in accordance with the embodiment of the present application will be described below with reference to the accompanying drawings.

[0064] The present application example provides a multimode pre-training model training method, which calculat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-modal pre-training model training method, and an application method and a device thereof. The multi-modal pre-training model training method comprises the following steps: constructing a multi-modal pre-training model of a double-tower structure; obtaining a positive sample data set comprising positive sample image-text pairs and a negative sample data set comprising negative sample image-text pairs; and training a multi-modal pre-training model according to the positive sample data set and the negative sample data set, wherein the multi-modal pre-training model comprises a cross-modal comparison learning module used for carrying out image-text similarity comparison learning on the positive sample image-text pair and the negative sample image-text pair. The multi-modal pre-training model adopts a double-tower structure and a cross-modal contrast learning algorithm, a large number of negative samples are constructed for image and text modals, the model expression ability is high, and the processing precision of image-text pairs is improved. According to the model, the overall similarity between images and texts is calculated, whether the images and texts correspond or not is judged according to the similarity, and on the basis of the image-text weak correlation hypothesis, the actual situation of semantic weak correlation between the images and texts in image-text pairs in actual application is better fit.

Description

technical field [0001] The application belongs to the field of computer application technology, and in particular relates to a training method, application method and device of a multimodal pre-training model. Background technique [0002] In recent years, pre-training models have become a hot topic in the field of Natural Language Processing (NLP, Natural Language Processing) research. Multimodal pre-training models involving multiple modal information interactions are suitable for more application scenarios. For example, multi-modal pre-training models for image-text pairs have gradually attracted widespread attention. [0003] At present, related technologies provide some multimodal pre-training models for processing image-text pairs. These multi-modal pre-training models assume that there is a strong semantic correlation between text and images in input image-text pairs. By judging whether the text includes Whether there is a semantic correspondence between the words in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F16/583G06F16/33
CPCG06N3/08G06F16/583G06F16/334G06V2201/07G06N3/045G06F18/22G06F18/214
Inventor 霍宇琦张曼黎刘光镇卢志武窦志成金琴赵鑫宋睿华文继荣
Owner 北京智源人工智能研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products