The invention belongs to the technical field of voice evidence obtaining. The invention discloses mobile device source identification method and system based on multimode fusion depth features. The method comprises the following steps: firstly, extracting MFCCs and GSV features of test data, correspondingly segmenting the features into multiple paths, then respectively training CNNs and performingfusion to obtain fused depth features, then determining the fused depth features by using a trained depth residual error network, and finally carrying out joint decision on the determination resultsof short samples of each path by adopting a voting method. According to the method, when the GMM-UBM model is trained, the data is screened according to the characteristics of phonemes and tones of the voice data, and a small amount of representative data is selected, so that the representation generalization of the model is ensured, the data calculation amount is reduced, and the modeling efficiency is improved; according to the method, the deep neural network is used for supervised training to extract the deep features, redundant and interference information in the feature data is eliminated, the feature data is simplified, the characterization of the data is improved, the dimensionality of the data is reduced, and the calculation amount is simplified.