The invention discloses a multi-dimension labelling and model optimization method for audio and video. The method specifically comprises the following steps: first, carrying out sample management andsorting, carrying out de-duplication aiming at sample data of an input system, carrying out numbering, and establishing a sample labelling task library; at the preprocessing stage of audio data, carrying out audio extraction on video data of the task library, and completing the preprocessing operation for the audio data; at the audio content analysis and feature extraction stage, after the audio preprocessing is completed, carrying out deep analysis according to a labelling standardized system configured at the background, and outputting label data; S304, at the video content analysis and feature extraction stage, carrying out image analysis on the video content, and carrying out deep analysis according to the labelling standardized system configured at the background, and outputting the label data; S305, carrying out feature fusion and label generation, namely, fusing the recognition features and label information, and outputting a label result of the sample; carrying out manual rechecking and model optimization, wherein the label result data generated by the system can be subjected to artificial re-check conformation.