The invention relates to the field of artificial intelligence security, in particular to a black-box attack defense system based on the regularization of the middle layer of the neural network, including a first source model, a second source model and a third source model; The box attack defense method includes S1, inputting pictures into the first source model for white-box attack, outputting the first adversarial sample sequence, S2, inputting the first adversarial sample sequence into the second source model, and outputting the second adversarial sample sequence , S3, inputting the second adversarial sample sequence into the third source model for black-box attack, outputting the third recognition sample sequence, S4, inputting the third recognition sample sequence into the third source model for adversarial training, updating the third source model ; The adversarial samples generated by this algorithm have the characteristics of high transferability to the target model, and can also effectively defend the target model from being attacked through confrontation training.