The invention provides a scene understanding method based on multi-task learning. The method includes: multi-task learning having unstable homoscedasticity, multi-task likelihood function, and scene understanding model. The method includes the following steps: first executing weighted linear sum on the loss of each individual task, learning the optimal task weight, inducting a multi-task loss function, defining a probability model, defining the probability as the gaussian function of the average values that are output by the model, eventually modeling pixel-level learning regression and classified output, comprising semantic division, incidence division and depth regression. According to the invention, the scene understanding model can learn multi-task weight, is more advantageous than models that independently train each task in that the method herein reduces computing amount, increases learning efficiency and prediction precision and can be real-timely operated.