The embodiment of the invention provides a Tibetan language-based multi-modal emotion calculation method and system, and a server. The method comprises: firstly, obtaining Tibetan language data to beclassified; collecting video signals, voice signals and text information from the Tibetan language data; then, extracting high-level video features, high-level voice features and text features in a classification emotion corpus, respectively extracting the high-level video features, the high-level voice features and the text features, performing learning based on a deep learning model to obtain high-level fusion features, and finally, classifying the high-level fusion features in the classification emotion corpus based on SVM and storing the high-level fusion features in the classification emotion corpus. Therefore, the blank state of the Tibetan language in sentiment analysis can be filled. A basic corpus is provided for Tibetan multi-modal sentiment analysis. The Tibetan language data sentiment recognition method based on the three modes is beneficial to development of Tibetan language multi-mode sentiment analysis, the natural language processing capacity and the intelligent sentiment recognition capacity of the Tibetan language can be promoted, the artificial intelligence information processing capacity of the Tibetan language is improved, and in addition, the sentiment recognition rate of the Tibetan language data can be effectively increased under the condition of mutual fusion of the three modes.