In an overlapped sound event detection task, sometimes, extracted global features cannot accurately detect and locate sound events of overlapped parts. In view of the fact that short-term and long-term sequence features of a sound event related to a context are obtained by utilizing a Gated Recurrent Unit (GRU) based on a multi-scale spatial channel squeeze excitation convolutional network and the GRU, the invention provides a sound event detection and positioning model based on multi-scale spatial channel squeeze excitation (MscSE). The model, a baseline model and a residual network model are subjected to a contrast experiment in a public data set DCASE2020Task3. According to the optimal results, the detection ER is 0.59, the F1 score is 50.7%, the positioning error DE score and the DE_F1 score are 15.8% and 70.3% respectively, the F1 score is 2%-5% higher than that of other models, and the ER is also lower than that of other models. Therefore, compared with a single-scale model, the squeeze excitation model based on multiple scales is improved in sound event detection and positioning performance.