The invention discloses a multi-
modal knowledge graph construction method, and relates to the 
knowledge engineering technology in the field of 
big data. The method is realized through the following technical scheme: firstly, extracting multi-
modal data semantic features based on a multi-
modal data feature representation model, constructing a pre-training model-based data 
feature extraction model for texts, images, audios, videos and the like, and respectively finishing single-
modal data semantic feature extraction; secondly, projecting different types of data into the same vector space for representation on the basis of unsupervised graph, attribute graph, heterogeneous 
graph embedding and other 
modes, so as to realize cross-modal multi-modal knowledge representation; on the basis of the above work, two maps needing to be fused and aligned are converted into vector representation forms respectively, then based on the obtained multi-modal knowledge representation, the mapping relation of entity pairs between knowledge maps is learned according to priori alignment data, multi-modal knowledge fusion disambiguation is completed, decoding and mapping to corresponding nodes in the knowledge maps are completed, and a fused new atlas, entities and attributes thereof are generated.