The invention discloses a method for synthesizing a video MV by music based on self-
supervised learning, and the method comprises the following steps: 1, separating an audio
stream and a video streamfrom an existing material
library; 2, extracting people, actions, expressions and scene information from the video by using a
deep learning technology based on video understanding; 3, automatically classifying the voiceprint information according to the
rhythm of the music; 4, separating voices, musical instruments, accompaniments and
lyrics from the music; 5,
synchronizing audio and video relatedfeature information by a
timestamp in the video file; 6, learning corresponding video information according to the music features to form a mapping relationship between music and videos; 7, inputtingany piece of music, and synthesizing a corresponding video MV; according to the invention, a proper video clip can be automatically matched and selected from massive existing video data, music is mapped to generate a corresponding short video MV, and more intuitive visual
impact and more vivid auditory experience are provided for a user.