The embodiments of the invention provide a multi-modal fusion method and device for psychological stress detection. The multi-modal fusion method is based on an attention-enhanced feature matrix of physiological data->text, physiological data->pictures, text->physiological data, text->pictures, pictures->physiological data and pictures->text, based on a feedforward fully-connected neural network,a fusion feature matrix of the text, the pictures and the physiological data is obtained; then based on importance weights of the text, the pictures, and the physiological data and the fusion featurematrix of the text, the pictures and the physiological data, a fusion representation matrix of three modes is obtained; and finally, based on the fusion representation matrix of the three modes and the feedforward fully-connected network, a pressure classification vector reflecting the psychological stress problem is obtained. By merging text picture data and physiological related data, not only the deficiencies caused by the subjectivity of text and picture data are compensated, but also certain inherent problems of physiological related data are solved.