The invention discloses a method for recognizing an object in an image based on depth learning, Here are the steps: an image is inputted, the candidate regions are extracted by convolution neural network, filter and optimize the output candidate region, At the same time, each candidate region is normalized, the candidate region is input to convolution neural network for feature extraction, and thetrained classification and regression network is used to classify, locate and detect the target image. Finally, the selected target region is subjected to border regression operation to correct the position of the target region. This method uses convolution neural network to extract the regions which may contain objects in the image, and reduces the number of candidate regions. At the same time,it optimizes the filtering operation on the output candidate regions of convolution neural network, which improves the computational speed of the algorithm. In addition, a variety of length-width ratio and region size are used for the candidate region of the target detection, which is closer to the real scene and improves the robustness of the algorithm.