The invention discloses a remote sensing image building extraction method and system based on a U-Net network, and electronic equipment. A multi-scale module is added to a decoding layer of a U-Net network, and the hole convolution network is introduced, the receptive field can be expanded under the condition that the resolution is not lost through hole convolution, so that the semantic information mining capacity of the network can be improved while detail information is reserved, and meanwhile, the multi-scale feature obtaining capacity of the network is enhanced through the multi-scale module; according to the invention, the convolution mode of the convolution layer is set as filling; that is, after convolution, the size of the feature map is completely unchanged; the original feature map is actually shrunk by 2; in this way, each time the feature map passes through a convolution layer , the size of the feature map is reduced by two times; by the adoption of the convolution model, the size of the feature map output through the four coding layers and the last coding layer is shrunk to be one sixteenth of the size of the input picture after the feature map passes through 4 encoding layers, the image resolution is recovered through deconvolution operation, the size of the feature map begins to be enlarged at the moment, and the training time is effectively shortened.