Multi-modal pre-training model training method, application method and device thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A pre-training, multi-modal technology, applied in the direction of neural learning methods, biological neural network models, special data processing applications, etc., can solve the problem of low accuracy, achieve the effect of improving accuracy, facilitating deployment, and improving accuracy

Pending Publication Date: 2021-06-18

北京智源人工智能研究院 +1

View PDF0 Cites 20 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] However, in practical applications, there may not be a strong semantic correlation between the image and the text in the picture-text pairing. The above-mentioned related technologies judge whether the text and the image correspond through the semantic correspondence between words and images, and the accuracy is very low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0061] Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for thorough understanding of the application and to fully convey the scope of the application to those skilled in the art.

[0062] It should be noted that, unless otherwise specified, technical terms or scientific terms used in this application shall have the usual meanings understood by those skilled in the art to which this application belongs.

[0063] A training method, application method, and device for a multimodal pre-training model proposed according to an embodiment of the present application will be described below with reference to the accompanying drawings. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a multi-modal pre-training model training method, and an application method and a device thereof. The multi-modal pre-training model training method comprises the following steps: constructing a multi-modal pre-training model of a double-tower structure; obtaining a positive sample data set comprising positive sample image-text pairs and a negative sample data set comprising negative sample image-text pairs; and training a multi-modal pre-training model according to the positive sample data set and the negative sample data set, wherein the multi-modal pre-training model comprises a cross-modal comparison learning module used for carrying out image-text similarity comparison learning on the positive sample image-text pair and the negative sample image-text pair. The multi-modal pre-training model adopts a double-tower structure and a cross-modal contrast learning algorithm, a large number of negative samples are constructed for image and text modals, the model expression ability is high, and the processing precision of image-text pairs is improved. According to the model, the overall similarity between images and texts is calculated, whether the images and texts correspond or not is judged according to the similarity, and on the basis of the image-text weak correlation hypothesis, the actual situation of semantic weak correlation between the images and texts in image-text pairs in actual application is better fit.

Description

technical field [0001] The application belongs to the field of computer application technology, and in particular relates to a training method, application method and device of a multimodal pre-training model. Background technique [0002] In recent years, pre-training models have become a hot topic in the field of Natural Language Processing (NLP, Natural Language Processing) research. Multimodal pre-training models involving multiple modal information interactions are suitable for more application scenarios. For example, multi-modal pre-training models for image-text pairs have gradually attracted widespread attention. [0003] At present, related technologies provide some multimodal pre-training models for processing image-text pairs. These multi-modal pre-training models assume that there is a strong semantic correlation between text and images in input image-text pairs. By judging whether the text includes Whether there is a semantic correspondence between the words in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F16/583G06F16/33

CPCG06N3/08G06F16/583G06F16/334G06V2201/07G06N3/045G06F18/22G06F18/214Y02D10/00

Inventor 霍宇琦张曼黎刘光镇卢志武窦志成金琴赵鑫宋睿华文继荣

Owner 北京智源人工智能研究院

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-modal pre-training model training method, application method and device thereof

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology