Multi-modal event representation learning method based on event ontology

A technology of event representation and learning method, applied in the field of multi-modal data representation learning, can solve problems such as long semantic distance, lack of semantic information for events, and insufficient expressive ability of single-modal events, so as to enhance semantics and solve limitations Effect

Pending Publication Date: 2022-03-18
SHANGHAI UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the deficiencies of the prior art, the present invention provides a multi-modal event representation learning method based on event ontology, which provides an embedded representation method for lower-layer applications such as event reasoning and decision-making, and can solve the problem of single-modal event representation. Insufficient capacity and lack of semantic information between events, while being able to ensure that the closer the semantic distance between similar or related events is, the farther the semantic distance between unrelated or dissimilar events is

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal event representation learning method based on event ontology
  • Multi-modal event representation learning method based on event ontology
  • Multi-modal event representation learning method based on event ontology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0104] like figure 1 As shown, the multimodal event representation learning based on event ontology in the present invention may include the following steps:

[0105] Step S101: the construction of the relevant field event ontology and the formation of an event ontology model;

[0106] Step S102: Preprocessing of the initial multimodal event information data, according to the language expression defined in the event ontology, different events are grouped into different event classes.

[0107] Step S103: adopt multimodal pre-training models such as ontology model and VilBERT to embed the multimodal event data;

[0108] Step S104: using graph convolutional neural network models such as GCN and GAT to learn the graph structure composed of events and events and events and relationships in the ontology.

[0109] In step S101, first collect relevant domain texts and build a corpus; by learning the corpus, extract the expressed events and event elements from the domain text content...

Embodiment 2

[0118] On the basis provided by Embodiment 1, the preprocessing of the initial multimodal event information data in step S102, according to the language expression defined in the event ontology, will group different events into different event categories, specifically including the following steps :

[0119] 1. Use Faster-RCNN and other target detection algorithms to extract the object area in the picture information and the feature representation of the area as the image input part of VilBERT;

[0120] 2. Extract the core keywords of an event in the text information and perform cosine similarity calculation and classification according to the language performance defined by the ontology, specifically satisfying the following formula:

[0121]

[0122] Among them, e c Represents the n-dimensional embedded representation of a certain event class, α represents the embedded representation obtained by averaging the event text keywords, Sim(e c ,α) indicates the similarity bet...

Embodiment 3

[0125] On the basis provided in Embodiment 2, the embedding representation of multimodal event data using multimodal pre-training models such as ontology model and VilBERT in step S103 may include the following steps:

[0126] 1. Obtain the trained VilBERT model, and use the regional image feature representation and text information extracted in Example 2 as the input of the model.

[0127] 2. The embedded representation generated by the VilBERT model according to the input data and the embedded representation corresponding to the event class to which it belongs are spliced ​​and fused to obtain a new embedded representation.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of representation learning of multi-modal data, in particular to a multi-modal event representation learning method based on event ontology. According to the specific technical scheme, the multi-modal event representation learning method based on the event ontology comprises the steps that event ontology of related fields is constructed, and an event ontology model is formed; preprocessing the initial multi-modal event information data, and merging different events into different event classes according to language expressions defined in event ontologies; performing embedding representation on the multi-modal event data by adopting an ontology model and a multi-modal pre-training model; and a graph convolutional neural network model is adopted to learn a graph structure composed of events and events and events and relationships in the ontology. According to the method, the problems that the single-mode event expression ability is insufficient and semantic information is lacked between events can be solved, and meanwhile it can be guaranteed that the closer the semantic distance between similar or related events is, the farther the semantic distance between irrelevant or dissimilar events is.

Description

technical field [0001] The invention relates to the technical field of multimodal data representation learning, in particular to a multimodal event representation learning method based on event ontology. Background technique [0002] With the rapid development and popularization of the Internet and artificial intelligence, the information on the Internet presents a trend of diversification, and it is no longer just a single modality. How to make full use of multimodal information to achieve more intelligent analysis is an urgent need for current artificial intelligence. solved problem. In the traditional field of natural language processing, a series of methods such as representation learning are used to convert text data into machine-recognizable and semantically embedded representations. At present, most of this technology stays at the level of words or sentences, and only plain text is available. modal, there are limitations. [0003] The existing event representation l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/279G06F40/216G06F40/30G06N3/04G06N3/08
CPCG06F40/279G06F40/30G06F40/216G06N3/08G06N3/045
Inventor 刘炜吴锜李卫民谢少荣彭艳
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products