Traffic sign recognition system and method for coding a disturbance-resistant neural network
By encoding and decoding an anti-perturbation neural network system, semantically enhanced images are generated to train a traffic sign recognition system, which solves the problem of decreased recognition accuracy of traffic signs in harsh environments and achieves higher anti-interference and recognition accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTH CENTRAL UNIVERSITY FOR NATIONALITIES
- Filing Date
- 2023-05-05
- Publication Date
- 2026-06-19
AI Technical Summary
In autonomous driving, the recognition accuracy of traffic signs decreases in adverse environments such as dirt and fog, leading to safety issues. Existing technologies are unable to effectively improve their anti-interference capabilities.
An encoder-decoder anti-disturbance neural network system is adopted. By constructing an encoder-decoder anti-disturbance traffic sign recognition neural network, semantically enhanced images are generated for training using a misjudged image filtering and feature operation module, an image synthesis and recognition module, and a decoder, thereby improving the network's anti-interference ability.
It improves the recognition capability of traffic sign recognition systems under conditions of wear, dirt, and fog, enhances anti-interference capabilities, and improves recognition accuracy.
Smart Images

Figure CN116682089B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of information security, and specifically relates to a traffic sign recognition system and method for encoding and decoding an anti-disturbance neural network. Background Technology
[0002] In recent years, the application of traffic signs in autonomous driving has become increasingly mature and is being used by more and more companies, greatly facilitating people's lives. However, autonomous driving involves the problem of traffic sign recognition. Traffic signs, which are often placed on the roadside, may become dirty, or their clarity may decrease due to fog or rain. These issues can easily lead to errors in the recognition of traffic signs by convolutional neural networks, significantly reducing the accuracy of traffic sign recognition in harsh environments. This poses a significant safety hazard for autonomous driving. To solve or mitigate these problems caused by dirt, fog, rain, etc., a traffic sign recognition neural network system and method based on encoder-decoder anti-perturbation was invented. We identify images in the dataset that are prone to misjudgment and embed easily trained semantic images into these images, generating a large number of semantically enhanced images. The neural network is then trained on this dataset, greatly improving its anti-interference ability. The advantage of this invention is that by training with these semantically enhanced images, it can recognize traffic signs that are worn, dirty, or obscured by fog, thus improving the anti-interference ability of traffic sign recognition. Summary of the Invention
[0003] To address the aforementioned technical problems, this invention proposes a traffic sign recognition system and method based on an encoding and decoding anti-disturbance neural network.
[0004] The technical solution of this invention is a traffic sign recognition system with an encoding and decoding anti-perturbation neural network, comprising:
[0005] User traffic sign collection system, host computer;
[0006] The user traffic sign collection system is connected to the host computer.
[0007] The user traffic sign collection system is used to collect traffic sign images and mark the correct traffic sign labels based on the traffic sign images.
[0008] The technical solution of this invention is a traffic sign recognition method using an encoding and decoding anti-perturbation neural network, characterized by the following steps:
[0009] Step 1: The user traffic sign collection system acquires each traffic sign image taken by different users, along with the real label for each traffic sign image;
[0010] Step 2: Construct a traffic sign recognition neural network based on encoder-decoder anti-perturbation. Calculate the predicted label and the actual label of each traffic sign image using the encoder-decoder anti-perturbation neural network. Output the loss between the predicted label and the actual label of each traffic sign image. Construct a loss function model for the encoder-decoder anti-perturbation neural network. Optimize and train the network using Adam to obtain the optimized encoder-decoder anti-perturbation neural network.
[0011] Step 3: The user traffic sign collection system collects traffic sign images in real time and transmits them to the host computer. The host computer then uses the optimized encoder-decoder anti-disturbance traffic sign recognition neural network to predict the real-time traffic sign images and obtain the predicted labels.
[0012] Preferably, each traffic sign image in step 1 is defined as:
[0013] TS i ={ts i (x,y)|x∈[1,U],y∈[1,V]}
[0014] i∈[1,N]
[0015] Among them, TS i Let ts represent the i-th traffic sign image. i (x,y) represents the pixel in the x-th row and y-th column of the i-th traffic sign image, U represents the row number of the i-th traffic sign image, V represents the column number of the i-th traffic sign image, and N represents the number of traffic sign images.
[0016] The true label for each traffic sign image is defined as follows:
[0017] {TSL i},TSL i ∈[0,Z]
[0018] Among them, TSL i Let Z be the true label of the i-th traffic sign image, and Z represent the total category of the traffic sign images.
[0019] Preferably, the traffic sign recognition neural network based on encoder-decoder anti-disturbance described in step 2 includes:
[0020] Traffic sign recognition fully connected Softmax neural network P c The system consists of a cascaded module for misjudged image filtering and feature calculation, an encoder, an image synthesis and recognition module, a decoder, and a traffic sign recognition convolutional neural network.
[0021] The i-th traffic sign image TSi And the true label TSL of the i-th traffic sign image i Input to a fully connected Softmax neural network for traffic sign recognition P c Traffic sign recognition fully connected Softmax neural network P c Output the predicted label probability vector (PTSL) for the i-th traffic sign image. i The m-dimensional fully connected feature vector (FTS) of the traffic sign image output by the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition. i , i∈[1,N],
[0022] FTS i The definition is as follows:
[0023]
[0024] Where N is the number of traffic sign images;
[0025] in, The m-dimensional fully connected feature vector FTS representing the i-th traffic sign image i The j-th value,
[0026] PTSL i The definition is as follows:
[0027]
[0028] in, Let represent the probability that the i-th traffic sign image belongs to the j-th class, where j∈[0,Z];
[0029] For each traffic sign image, there is a probability vector PTSL containing the ground truth label of each traffic sign image and the predicted label of the i-th traffic sign image. i Traffic sign recognition uses a fully connected Softmax neural network. The last fully connected layer of the neural network outputs the m-dimensional fully connected feature vector (FTS) of the i-th traffic sign image. i Input to the misjudged image filtering and feature calculation module;
[0030] The misjudged image filtering and feature calculation module outputs the i-th misjudged traffic sign image WTS. i The true label WTSL for the i-th misjudged traffic sign image i The probability vector WPTSL for the predicted label of a misclassified traffic sign image, the probability vector RPTSL for the predicted label of a correct traffic sign image, and the feature vector AF for a correct traffic sign image. pos and the feature vector AF of the incorrect traffic sign image neg ;
[0031] Each misjudged traffic sign image is WTS i The definition is as follows:
[0032] WTS i ={wts i (x,y)|x∈[1,U],y∈[1,V]}
[0033] i∈[1,N]
[0034] Among them, WTS i Let wts represent the image of the i-th misjudged traffic sign. i (x,y) represents the pixel in the x-th row and y-th column of the i-th misjudged traffic sign image, U represents the row number of the i-th misjudged traffic sign image, V represents the column number of the i-th misjudged traffic sign image, and N represents the number of misjudged traffic sign images.
[0035] The input to the encoder is the i-th misjudged traffic sign image WTS. i The output is the semantic image (MTS) of the i-th misjudged traffic sign. i :
[0036] MTS i ={mts i (x,y)|x∈[1,U],y∈[1,V]}
[0037] i∈[1,N]
[0038] Among them, MTS i Let mts represent the semantic image of the i-th misjudged traffic sign. i (x,y) represents the pixel in the x-th row and y-th column of the i-th misjudged traffic sign semantic image, U represents the row number of the i-th misjudged traffic sign semantic image, V represents the column number of the i-th misjudged traffic sign semantic image, and N represents the number of misjudged traffic sign semantic images.
[0039] The input to the image synthesis and recognition module is the i-th misjudged traffic sign semantic image MTS. i Image WTS of the i-th misjudged traffic sign i Misinterpreting the correct label of traffic sign images (WTSL) i The output is the semantic synthesized image (TSR) of the i-th misjudged traffic sign. i ;
[0040] Each misjudged traffic sign semantic composite image TSR i The definition is as follows:
[0041] TSR i ={tsr i (x,y)|x∈[1,U],y∈[1,V]}
[0042] Among them, TSR i Let tsr represent the semantically synthesized image of the i-th misjudged traffic sign. i (x,y) represents the pixel in the x-th row and y-th column of the i-th misjudged traffic sign semantic composite image, U represents the row number of the i-th misjudged traffic sign semantic composite image, and V represents the column number of the i-th misjudged traffic sign semantic composite image;
[0043] The decoder takes as input a semantically synthesized image of each misjudged traffic sign and outputs a semantically enhanced image (TSRR) of each misjudged traffic sign image. i The correct label for the semantically augmented image of each misjudged traffic sign image is TSRRL. i ;
[0044] Each misjudged traffic sign semantic composite image TSRR i The definition is as follows:
[0045] TSRR i ={tsrr i (x,y)|x∈[1,U],y∈[1,V]}
[0046] Among them, TSRR i Let tsrr be the semantically enhanced image of the i-th misjudged traffic sign image. i (x,y) represents the pixel in the x-th row and y-th column of the semantic enhancement image of the i-th misjudged traffic sign image, U represents the row number of the semantic enhancement image of the i-th misjudged traffic sign image, and V represents the column number of the semantic enhancement image of the i-th misjudged traffic sign image.
[0047] Preferably, the input to the traffic sign recognition convolutional neural network is the semantically enhanced image (TSRR) of each misjudged traffic sign image. i The correct label for each misjudged traffic sign image is obtained via TSRRL. i Each traffic sign image TS i The real label TSL for each traffic sign image i The output is the predicted label CNPTSL for each traffic sign image. i
[0048] CNPTSL i The predicted label for the i-th traffic sign image is defined as follows:
[0049] {CNPTSL i},CNPTSL i ∈[0,Z]
[0050] Among them, CNPTSL iLet Z be the predicted label for the i-th traffic sign image, and Z represent the total category of the traffic sign image.
[0051] Preferably, the input to the misjudged image filtering and feature calculation module is the i-th traffic sign image TS. i The true label TSL of the i-th traffic sign image i The probability vector PTSL of the predicted label for the i-th traffic sign image i The m-dimensional fully connected feature vector (FTS) of the traffic sign image output by the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition. i ;
[0052] The output of the misjudged image filtering and feature calculation module is the i-th misjudged traffic sign image WTS. i The true label WTSL for the i-th misjudged traffic sign image i The i-th correct traffic sign image (TTS) i The probability vector WPTSL for the predicted label of the i-th misjudged traffic sign image. i The probability vector TPTSL of the predicted label for the i-th correct traffic sign image. i WFTS of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image i The m-dimensional fully connected eigenvector TFTS of the i-th correct traffic sign image i The probability vectors for predicted labels of misclassified traffic sign images (WPTSL), the probability vectors for predicted labels of correct traffic sign images (TPTSL), and the feature vectors for correct traffic sign images (AF). pos and the AF feature vector of misjudged traffic sign images neg ;
[0053] Image WTS of the i-th misjudged traffic sign i The definition is as follows:
[0054] If the probability vector of the predicted label is max(PTSL) i )1TSL i The image WTS is then determined to be a false positive. i ;
[0055] Where max(PTSL) i ) for PTSL i The label corresponding to the maximum value
[0056] The i-th correct traffic sign image (TTS) i The definition is as follows:
[0057] If the probability vector of the predicted label is max(PTSL) i ) = TSLi Then it is determined that the i-th traffic sign image is correct (TTS). i ;
[0058] Image WTS of the i-th misjudged traffic sign i The probability vector of the predicted label is defined as the probability vector of the predicted label of the i-th traffic sign image (WPTSL). i ;
[0059] WPTSL i The definition is as follows:
[0060]
[0061] in, Let w represent the probability that the i-th misclassified traffic sign image belongs to the j-th class, i∈[1,w], where w represents the number of misclassified traffic sign images, j∈[0,Z].
[0062] The i-th correct traffic sign image (TTS) i The probability vector of the predicted label is defined as the probability vector of the predicted label of the i-th correct traffic sign image (TPTSL). i ;
[0063]
[0064] in, Let represent the probability that the i-th correct traffic sign image belongs to the j-th class, i∈[1,t], where t represents the number of correct traffic sign images, j∈[0,Z], and Z represents the total number of traffic sign images;
[0065] The probability vector WPTSL for predicting labels of misclassified traffic sign images is defined as follows:
[0066]
[0067] Where w represents the total number of misjudged traffic sign images, and ||||2 represents the operation of the second normal form.
[0068] The probability vector TPTSL for predicting the correct traffic sign image is defined as follows:
[0069]
[0070] Where t is the total number of correct traffic sign images;
[0071] WFTS of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image i The definition is as follows:
[0072]
[0073] in, Let j represent the j-th value of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image, where j∈[1,m].
[0074] The m-dimensional fully connected feature vector TFTS of the i-th correct traffic sign image i
[0075]
[0076] in, The m-dimensional fully connected feature vector TFTS representing the i-th correct traffic sign image i The j-th value, j∈[1,m];
[0077] Correct traffic sign image feature vector AF pos and the AF feature vector of misjudged traffic sign images neg The calculation method is as follows:
[0078]
[0079]
[0080] Where w represents the number of feature vector images of misjudged traffic signs, and t represents the number of correct traffic sign images.
[0081] Preferably, the image synthesis calculation module consists of an image synthesis module and a traffic sign convolutional neural network to be trained, with the input being the i-th misjudged traffic sign semantic image (MTS). i The i-th misjudged traffic sign image WTS i The correct label WTSL for the i-th traffic sign image was incorrectly identified. i The output is the semantic synthesized image (TSR) of the i-th traffic sign that was misjudged. i
[0082] Among them, the i-th traffic sign semantic composite image TSR was misjudged. i The calculation method is as follows:
[0083] TSR i =MTS i +WTS i , i∈[1,w].
[0084] As a preferred embodiment, the adversarial network loss function model described in step 2 is specifically defined as follows:
[0085] L all =L fullconn +L code +αd posi +βdneg +ηL wr +γL corret +L cnn +L tsrr
[0086] Among them, L all Let be the total loss function, α be the feature loss function for correct traffic sign images, used to control the similarity of features between each traffic sign image and the correct traffic sign image; β be the feature loss function for each traffic sign image, used to control the similarity of features between each traffic sign image and the misjudged traffic sign image; η be the feature space similarity function for each traffic sign image and the misjudged traffic sign image; and γ be the feature space similarity function for each traffic sign image and the correct traffic sign image.
[0087]
[0088] Where N is the total number of traffic sign images input to the fully connected Softmax neural network for traffic sign recognition, and L... fullconn The loss function for a fully connected Softmax neural network for traffic sign recognition is PTSL. The input to this network is each traffic sign image and its ground truth label. i A fully connected Softmax neural network P for traffic sign recognition c Output the predicted label probability vector for the i-th traffic sign image, max(PTSL) i ) represents the maximum value in the predicted label probability vector of the i-th traffic sign image output by the fully connected Softmax neural network for traffic sign recognition, where N represents the number of traffic sign images, i∈[1,N];
[0089]
[0090] Where MSE is the mean squared error loss function, and TSR is... i To avoid misinterpreting the semantic synthesis image of the i-th traffic sign, WTS i For the i-th misjudged traffic sign image, L code The encoder-decoder loss function aims to make the semantically enhanced image of each misjudged traffic sign image as close as possible to the original image, making the semantic image of the misjudged traffic sign more natural, i∈[1,w], where w represents the number of misjudged traffic sign images;
[0091]
[0092] Among them, FTS iFor the traffic sign recognition fully connected Softmax neural network, AF is the m-dimensional fully connected feature vector of the i-th traffic sign image output by the last fully connected layer. pos For the correct traffic sign image feature vector, d posi To ensure that each traffic sign image has the correct image features as much as possible, i∈[1,N], where N represents the number of traffic sign images;
[0093]
[0094] Among them, FTS i For the traffic sign recognition fully connected Softmax neural network, AF is the m-dimensional fully connected feature vector of the i-th traffic sign image output by the last fully connected layer. neg To avoid misjudging the feature vector of a traffic sign image, d neg To control each traffic sign image to have as many erroneous image features as possible, i∈[1,N], where N represents the number of traffic sign images;
[0095]
[0096] Among them, PTSL i A fully connected Softmax neural network P for traffic sign recognition c Output the predicted label probability vector for the i-th traffic sign image, where WPTSL is the predicted label probability vector for misclassified traffic sign images, and L... wr To ensure that each traffic sign image has as much feature space similarity as possible to the misjudged traffic sign images, i∈[1,N], where N represents the number of traffic sign images;
[0097]
[0098] Where TPTSL is the probability vector of the predicted label for the correct traffic sign image, and PTSL is the probability vector of the predicted label. i A fully connected Softmax neural network P for traffic sign recognition c Output the predicted label probability vector for the i-th traffic sign image, L corret To ensure that each traffic sign image has the correct feature space similarity as much as possible, i∈[1,N], where N represents the number of traffic sign images.
[0099]
[0100] CNPTSL i L is the predicted label for the i-th traffic sign image. cnn The loss function is used to ensure that the traffic sign recognition convolutional neural network fits the correct label to each traffic sign image.
[0101] The advantage of this invention is that by training these semantically enhanced images, it can identify traffic signs that are worn, dirty, or obscured by fog, thus improving the anti-interference capability of traffic sign recognition. Attached Figure Description
[0102] Figure 1 : Flowchart of a specific embodiment of the present invention.
[0103] Figure 2 : Basic framework diagram of the codec network in a specific embodiment of the present invention. Detailed Implementation
[0104] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0105] In specific implementation, the method proposed in the technical solution of this invention can be automatically executed by those skilled in the art using computer software technology. System devices for implementing the method, such as computer-readable storage media storing the corresponding computer program of the technical solution of this invention and computer equipment including the computer program running the corresponding computer program, should also be within the protection scope of this invention.
[0106] The technical solution of the system in this embodiment of the invention is: a traffic sign recognition neural network system and method based on encoder-decoder anti-disturbance, comprising:
[0107] User traffic sign collection system, host computer;
[0108] The user traffic sign collection system is connected to the host computer.
[0109] The user traffic sign collection system is used to collect traffic sign images and mark the correct traffic sign labels based on the traffic sign images.
[0110] The selected user traffic sign collection system is YOLOv5;
[0111] The host computer is selected as a server;
[0112] The following is combined Figures 1 to 2 The technical solution of the method in the embodiments of the present invention is a traffic sign recognition neural network system and method based on encoder-decoder anti-disturbance, as detailed below:
[0113] Step 1: The user traffic sign collection system acquires each traffic sign image taken by different users, along with the real label for each traffic sign image;
[0114] Each traffic sign image mentioned in step 1 is defined as follows:
[0115] TS i ={ts i (x,y)|x∈[1,U],y∈[1,V]}
[0116] i∈[1,N]
[0117] Among them, TS i Let ts represent the i-th traffic sign image. i (x,y) represents the pixel in the x-th row and y-th column of the i-th traffic sign image, U=64 represents the number of rows in the i-th traffic sign image, V=64 represents the number of columns in the i-th traffic sign image; N=39209 represents the number of traffic sign images.
[0118] The true label for each traffic sign image is defined as follows:
[0119] {TSL i},TSL i ∈[0,Z]
[0120] Among them, TSL i Let Z = 43 be the true label of the i-th traffic sign image, and let Z = 43 represent the total number of categories of traffic sign images.
[0121] Step 2: Construct a traffic sign recognition neural network based on encoder-decoder anti-perturbation. For each traffic sign image and its actual label, calculate the predicted label and the loss between the actual label and the actual label of each traffic sign image using the encoder-decoder anti-perturbation neural network. Construct a loss function model for the encoder-decoder anti-perturbation neural network. Optimize the model using Adam training to obtain the optimized encoder-decoder anti-perturbation neural network. The optimized encoder-decoder anti-perturbation traffic sign recognition network outputs the predicted label for each traffic sign image.
[0122] Step 2, the encoder-decoder-based disturbance-resistant traffic sign recognition neural network, includes:
[0123] Traffic sign recognition fully connected Softmax neural network P c The system consists of a cascaded module for misjudged image filtering and feature calculation, an encoder, an image synthesis and recognition module, a decoder, and a traffic sign recognition convolutional neural network.
[0124] The i-th traffic sign image TS i And the true label TSL of the i-th traffic sign image i Input to a fully connected Softmax neural network for traffic sign recognition P c Traffic sign recognition fully connected Softmax neural network P c Output the predicted label probability vector (PTSL) for the i-th traffic sign image. i The Fully Connected Feature Vector (FTS) of the i-th traffic sign image output by the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition is an m=1024-dimensional fully connected feature vector. i , i∈[1,N],
[0125] FTS i The definition is as follows:
[0126]
[0127] Where N is the number of traffic sign images;
[0128] in, The m-dimensional fully connected feature vector FTS representing the i-th traffic sign image i The j-th value,
[0129] PTSL i The definition is as follows:
[0130]
[0131] in, Let represent the probability that the i-th traffic sign image belongs to the j-th class, where j∈[0,Z];
[0132] For each traffic sign image, there is a probability vector PTSL containing the ground truth label of each traffic sign image and the predicted label of the i-th traffic sign image. i Traffic sign recognition uses a fully connected Softmax neural network. The last fully connected layer of the neural network outputs the m-dimensional fully connected feature vector (FTS) of the i-th traffic sign image. i Input to the misjudged image filtering and feature calculation module;
[0133] The misjudged image filtering and feature calculation module outputs the i-th misjudged traffic sign image WTS. i The true label WTSL for the i-th misjudged traffic sign image i The probability vector WPTSL for the predicted label of a misclassified traffic sign image, the probability vector RPTSL for the predicted label of a correct traffic sign image, and the feature vector AF for a correct traffic sign image. pos and the feature vector AF of the incorrect traffic sign image neg;
[0134] Each misjudged traffic sign image is WTS i The definition is as follows:
[0135] WTS i ={wts i (x,y)|x∈[1,U],y∈[1,V]}
[0136] i∈[1,N]
[0137] Among them, WTS i Let wts represent the image of the i-th misjudged traffic sign. i (x,y) represents the pixel in the x-th row and y-th column of the i-th misjudged traffic sign image, U represents the row number of the i-th misjudged traffic sign image, V represents the column number of the i-th misjudged traffic sign image, and N represents the number of misjudged traffic sign images.
[0138] The input to the encoder is the i-th misjudged traffic sign image WTS. i The output is the semantic image (MTS) of the i-th misjudged traffic sign. i :
[0139] MTS i ={mts i (x,y)|x∈[1,U],y∈[1,V]}
[0140] i∈[1,N]
[0141] Among them, MTS i Let mts represent the semantic image of the i-th misjudged traffic sign. i (x,y) represents the pixel in the x-th row and y-th column of the i-th misjudged traffic sign semantic image, U represents the row number of the i-th misjudged traffic sign semantic image, V represents the column number of the i-th misjudged traffic sign semantic image, and N represents the number of misjudged traffic sign semantic images.
[0142] The input to the image synthesis and recognition module is the i-th misjudged traffic sign semantic image MTS. i Image WTS of the i-th misjudged traffic sign i Misinterpreting the correct label of traffic sign images (WTSL) i The output is the semantic synthesized image (TSR) of the i-th misjudged traffic sign. i ;
[0143] Each misjudged traffic sign semantic composite image TSR i The definition is as follows:
[0144] TSR i ={tsr i(x,y)|x∈[1,U],y∈[1,V]}
[0145] Among them, TSR i Let tsr represent the semantically synthesized image of the i-th misjudged traffic sign. i (x,y) represents the pixel in the x-th row and y-th column of the i-th misjudged traffic sign semantic composite image, U represents the row number of the i-th misjudged traffic sign semantic composite image, and V represents the column number of the i-th misjudged traffic sign semantic composite image;
[0146] The decoder takes as input a semantically synthesized image of each misjudged traffic sign and outputs a semantically enhanced image (TSRR) of each misjudged traffic sign image. i The correct label for the semantically augmented image of each misjudged traffic sign image is TSRRL. i ;
[0147] Each misjudged traffic sign semantic composite image TSRR i The definition is as follows:
[0148] TSRR i ={tsrr i (x,y)|x∈[1,U],y∈[1,V]}
[0149] Among them, TSRR i Let tsrr be the semantically enhanced image of the i-th misjudged traffic sign image. i (x,y) represents the pixel in the x-th row and y-th column of the semantic enhancement image of the i-th misjudged traffic sign image, U represents the row number of the semantic enhancement image of the i-th misjudged traffic sign image, and V represents the column number of the semantic enhancement image of the i-th misjudged traffic sign image.
[0150] The input to the traffic sign recognition convolutional neural network is the semantic augmentation image (TSRR) for each misjudged traffic sign image. i The correct label for each misjudged traffic sign image is obtained via TSRRL. i Each traffic sign image TS i The real label TSL for each traffic sign image i The output is the predicted label CNPTSL for each traffic sign image. i
[0151] CNPTSL i The predicted label for the i-th traffic sign image is defined as follows:
[0152] {CNPTSL i},CNPTSL i ∈[0,Z]
[0153] Among them, CNPTSL i Let Z be the predicted label for the i-th traffic sign image, and Z represent the total category of the traffic sign image.
[0154] The input to the misjudged image filtering and feature calculation module is the i-th traffic sign image TS. i The true label TSL of the i-th traffic sign image i The probability vector PTSL of the predicted label for the i-th traffic sign image i The m-dimensional fully connected feature vector (FTS) of the traffic sign image output by the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition. i ;
[0155] The output of the misjudged image filtering and feature calculation module is the i-th misjudged traffic sign image WTS. i The true label WTSL for the i-th misjudged traffic sign image i The i-th correct traffic sign image (TTS) i The probability vector WPTSL for the predicted label of the i-th misjudged traffic sign image. i The probability vector TPTSL of the predicted label for the i-th correct traffic sign image. i WFTS of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image i The m-dimensional fully connected eigenvector TFTS of the i-th correct traffic sign image i The probability vectors for predicted labels of misclassified traffic sign images (WPTSL), the probability vectors for predicted labels of correct traffic sign images (TPTSL), and the feature vectors for correct traffic sign images (AF). pos and the AF feature vector of misjudged traffic sign images neg ;
[0156] Image WTS of the i-th misjudged traffic sign i The definition is as follows:
[0157] If the probability vector of the predicted label is max(PTSL) i )1TSL i The image WTS is then determined to be a false positive. i ;
[0158] Where max(PTSL) i ) for PTSL i The label corresponding to the maximum value
[0159] The i-th correct traffic sign image (TTS) i The definition is as follows:
[0160] If the probability vector of the predicted label is max(PTSL)i ) = TSL i Then it is determined that the i-th traffic sign image is correct (TTS). i ;
[0161] Image WTS of the i-th misjudged traffic sign i The probability vector of the predicted label is defined as the probability vector of the predicted label of the i-th traffic sign image (WPTSL). i ;
[0162] WPTSL i The definition is as follows:
[0163]
[0164] in, Let w represent the probability that the i-th misclassified traffic sign image belongs to the j-th class, i∈[1,w], where w represents the number of misclassified traffic sign images, j∈[0,Z].
[0165] The i-th correct traffic sign image (TTS) i The probability vector of the predicted label is defined as the probability vector of the predicted label of the i-th correct traffic sign image (TPTSL). i ;
[0166]
[0167] in, Let represent the probability that the i-th correct traffic sign image belongs to the j-th class, i∈[1,t], where t represents the number of correct traffic sign images, j∈[0,Z], and Z represents the total number of traffic sign images;
[0168] The probability vector WPTSL for predicting labels of misclassified traffic sign images is defined as follows:
[0169]
[0170] Where w represents the total number of misjudged traffic sign images, and ||||2 represents the operation of the second normal form.
[0171] The probability vector TPTSL for predicting the correct traffic sign image is defined as follows:
[0172]
[0173] Where t is the total number of correct traffic sign images;
[0174] WFTS of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image i The definition is as follows:
[0175]
[0176] in, Let represent the j-th value of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image, where j∈[1,m]; and let TFTS represent the m-dimensional fully connected feature vector of the i-th correctly judged traffic sign image. i
[0177]
[0178] in, The m-dimensional fully connected feature vector TFTS representing the i-th correct traffic sign image i The j-th value, j∈[1,m];
[0179] Correct traffic sign image feature vector AF pos and the AF feature vector of misjudged traffic sign images neg The calculation method is as follows:
[0180]
[0181]
[0182] Where w represents the number of feature vector images of misjudged traffic signs, and t represents the number of correct traffic sign images.
[0183] The image synthesis and computation module consists of an image synthesis module and a traffic sign convolutional neural network to be trained. The input is the i-th misjudged traffic sign semantic image (MTS). i The i-th misjudged traffic sign image WTS i The correct label WTSL for the i-th traffic sign image was incorrectly identified. i The output is the semantic synthesized image (TSR) of the i-th traffic sign that was misjudged. i
[0184] Among them, the i-th traffic sign semantic composite image TSR was misjudged. i The calculation method is as follows:
[0185] TSR i =MTS i +WTS i , i∈[1,w].
[0186] The adversarial network loss function model described in step 2 is specifically defined as follows:
[0187] L all =L fullconn +L code +αd posi +βd neg +ηL wr+γL corret +L cnn +L tsrr
[0188] Among them, L all Let be the total loss function, α be the feature loss function for correct traffic sign images, used to control the similarity of features between each traffic sign image and the correct traffic sign image; β be the feature loss function for each traffic sign image, used to control the similarity of features between each traffic sign image and the misjudged traffic sign image; η be the feature space similarity function for each traffic sign image and the misjudged traffic sign image; and γ be the feature space similarity function for each traffic sign image and the correct traffic sign image.
[0189]
[0190] Where N is the total number of traffic sign images input to the fully connected Softmax neural network for traffic sign recognition, and L... fullconn The loss function for a fully connected Softmax neural network for traffic sign recognition is PTSL. The input to this network is each traffic sign image and its ground truth label. i A fully connected Softmax neural network P for traffic sign recognition c Output the predicted label probability vector for the i-th traffic sign image, max(PTSL) i ) represents the maximum value in the predicted label probability vector of the i-th traffic sign image output by the fully connected Softmax neural network for traffic sign recognition, where N represents the number of traffic sign images, i∈[1,N];
[0191]
[0192] Where MSE is the mean squared error loss function, and TSR is... i To avoid misinterpreting the semantic synthesis image of the i-th traffic sign, WTS i For the i-th misjudged traffic sign image, L code The encoder-decoder loss function aims to make the semantically enhanced image of each misjudged traffic sign image as close as possible to the original image, making the semantic image of the misjudged traffic sign more natural, i∈[1,w], where w represents the number of misjudged traffic sign images;
[0193]
[0194] Among them, FTS i For the traffic sign recognition fully connected Softmax neural network, AF is the m-dimensional fully connected feature vector of the i-th traffic sign image output by the last fully connected layer. posFor the correct traffic sign image feature vector, d posi To ensure that each traffic sign image has the correct image features as much as possible, i∈[1,N], where N represents the number of traffic sign images;
[0195]
[0196] Among them, FTS i For the traffic sign recognition fully connected Softmax neural network, AF is the m-dimensional fully connected feature vector of the i-th traffic sign image output by the last fully connected layer. neg To avoid misjudging the feature vector of a traffic sign image, d neg To control each traffic sign image to have as many erroneous image features as possible, i∈[1,N], where N represents the number of traffic sign images;
[0197]
[0198] Among them, PTSL i A fully connected Softmax neural network P for traffic sign recognition c Output the predicted label probability vector for the i-th traffic sign image, where WPTSL is the predicted label probability vector for misclassified traffic sign images, and L... wr To ensure that each traffic sign image has as much feature space similarity as possible to the misjudged traffic sign images, i∈[1,N], where N represents the number of traffic sign images;
[0199]
[0200] Where TPTSL is the probability vector of the predicted label for the correct traffic sign image, and PTSL is the probability vector of the predicted label. i A fully connected Softmax neural network P for traffic sign recognition c Output the predicted label probability vector for the i-th traffic sign image, L corret To ensure that each traffic sign image has the correct feature space similarity as much as possible, i∈[1,N], where N represents the number of traffic sign images.
[0201]
[0202] CNPTSL i L is the predicted label for the i-th traffic sign image. cnn The loss function is used to ensure that the traffic sign recognition convolutional neural network fits the correct label to each traffic sign image.
[0203] Step 3: Obtain each traffic sign image taken by different users through the user traffic sign collection system, upload it to the host computer, and obtain the predicted label of the real-time traffic sign image through the optimized encoder-decoder anti-disturbance traffic sign recognition neural network.
[0204] It should be understood that any parts not described in detail in this specification belong to the prior art.
[0205] Although this document uses terms such as "user traffic sign collection system" and "host computer" extensively, the possibility of using other terms is not excluded. These terms are used merely for the convenience of describing the essence of the invention, and interpreting them as any additional limitation would contradict the spirit of the invention.
[0206] It should be understood that the above description of the preferred embodiments is quite detailed, but it should not be considered as a limitation on the scope of protection of this invention. Those skilled in the art, under the guidance of this invention, can make substitutions or modifications without departing from the scope of protection of the claims of this invention, and all such substitutions or modifications fall within the scope of protection of this invention. The scope of protection of this invention should be determined by the appended claims.
Claims
1. A traffic sign recognition method of coding and decoding a disturbance-resistant neural network, characterized by, A traffic sign recognition system that encodes and decodes perturbation-resistant neural networks includes: User traffic sign collection system, host computer; The user traffic sign collection system is connected to the host computer. The user traffic sign collection system acquires each traffic sign image taken by different users, along with the real label for each traffic sign image; An encoder-decoder-based perturbation-resistant traffic sign recognition neural network is constructed. Each traffic sign image is trained using Adam optimization to obtain an optimized encoder-decoder-based perturbation-resistant traffic sign recognition neural network. The user traffic sign collection system collects traffic sign images in real time and transmits them to the host computer. The host computer then uses the optimized encoder-decoder-based anti-disturbance traffic sign recognition neural network to predict the real-time traffic sign images and obtain the predicted labels. The method includes the following steps: Step 1: The user traffic sign collection system acquires each traffic sign image taken by different users, along with the real label for each traffic sign image; Step 2: Construct a traffic sign recognition neural network based on encoder-decoder anti-perturbation. Calculate the predicted label and the actual label of each traffic sign image using the encoder-decoder anti-perturbation neural network. Output the loss between the predicted label and the actual label of each traffic sign image. Construct a loss function model for the encoder-decoder anti-perturbation neural network. Optimize and train the network using Adam to obtain the optimized encoder-decoder anti-perturbation neural network. Step 3: The user traffic sign collection system collects traffic sign images in real time and transmits them to the host computer. The host computer uses the optimized encoder-decoder anti-disturbance traffic sign recognition neural network to predict the real-time traffic sign images and obtain the predicted labels. Each traffic sign image mentioned in step 1 is defined as follows: wherein, denotes the i-th traffic sign image, denotes the pixel of the i-th traffic sign image at the x-th row and the y-th column, U denotes the number of rows of the i-th traffic sign image, and V denotes the number of columns of the i-th traffic sign image; denotes the number of traffic sign images; The true label for each traffic sign image is defined as follows: wherein, Yi is the true label of the i-th traffic sign image, and Z represents the total number of classes of traffic sign images. Step 2, the traffic sign recognition neural network based on encoder-decoder anti-disturbance, includes: Traffic sign recognition full connection softmax neural network , misjudgment image screening and feature operation module, encoder, image synthesis recognition module, decoder, traffic sign recognition convolutional neural network are sequentially cascaded. the i-th traffic sign image , and the ground truth label of the i-th traffic sign image input to the traffic sign recognition fully connected Softmax neural network , the traffic sign recognition fully connected Softmax neural network output a predicted label probability vector for the i-th traffic sign image , the m-dimensional fully connected feature vector output by the last fully connected layer of the traffic sign recognition fully connected Softmax neural network for the i-th traffic sign image , , The following definitions apply: Where N is the number of traffic sign images; in, The m-dimensional fully connected feature vector representing the i-th traffic sign image The j-th value, The definition is as follows: in, Representing the The traffic sign image belongs to the first The probability of a class ; For each traffic sign image, the actual label for each traffic sign image, the first... The probability vector of the predicted label for a traffic sign image Traffic sign recognition fully connected Softmax neural network, the last fully connected layer of the neural network outputs the first... m-dimensional fully connected feature vectors of traffic sign images Input to the misjudged image filtering and feature calculation module; The misjudged image filtering and feature calculation module outputs the first... Image of a traffic sign that was misjudged , No. The true label of a traffic sign image misjudged The probability vector of the predicted label for a misjudged traffic sign image The probability vector of the predicted label for the correct traffic sign image Correct traffic sign image feature vector and the feature vector of the incorrect traffic sign image ; Each misjudged traffic sign image The definition is as follows: in, This represents the i-th misjudged traffic sign image. Let U represent the pixel in the x-th row and y-th column of the i-th misjudged traffic sign image, U represent the row number of the i-th misjudged traffic sign image, and V represent the column number of the i-th misjudged traffic sign image. This indicates the number of misjudged traffic sign images; The input of the encoder is the first Image of a traffic sign that was misjudged The output is the first... Misinterpreted traffic sign semantic images : in, This represents the semantic image of the i-th misjudged traffic sign. Let represent the pixel in the x-th row and y-th column of the i-th misjudged traffic sign semantic image, U represent the row number of the i-th misjudged traffic sign semantic image, and V represent the column number of the i-th misjudged traffic sign semantic image. This indicates the number of traffic sign semantic images that were misjudged. The input to the image synthesis and recognition module is the first... Misinterpreted traffic sign semantic images , No. Image of a traffic sign that was misjudged Misinterpreting the correct label of traffic sign images The output is the first... Misjudged traffic sign semantic composite image ; Each misjudged traffic sign semantic composite image The definition is as follows: in, This represents the semantically synthesized image of the i-th misjudged traffic sign. Let U represent the pixel in the x-th row and y-th column of the i-th misjudged traffic sign semantic composite image, U represent the row number of the i-th misjudged traffic sign semantic composite image, and V represent the column number of the i-th misjudged traffic sign semantic composite image. The decoder takes as input a semantically synthesized image of each misjudged traffic sign and outputs a semantically enhanced image of each misjudged traffic sign. Correct label for each misjudged traffic sign image: semantically enhanced image ; Each misjudged traffic sign semantic composite image The definition is as follows: in, This represents the semantically enhanced image of the i-th misjudged traffic sign image. Let U represent the pixel in the x-th row and y-th column of the semantic enhancement image of the i-th misjudged traffic sign image, U represent the row number of the semantic enhancement image of the i-th misjudged traffic sign image, and V represent the column number of the semantic enhancement image of the i-th misjudged traffic sign image. The input to the traffic sign recognition convolutional neural network is a semantically enhanced image for each misjudged traffic sign image. Correct labeling of each traffic sign misjudged image with semantically enhanced image. Each traffic sign image The real label for each traffic sign image The output is the predicted label for each traffic sign image. ; in The predicted label for the i-th traffic sign image is defined as follows: in, Let Z be the predicted label for the i-th traffic sign image, and Z represent the total category of the traffic sign images.
2. The traffic sign recognition method using an encoding and decoding anti-perturbation neural network according to claim 1, characterized in that: The input to the misjudged image filtering and feature calculation module is the first... Image of a traffic sign , No. Real labels for traffic sign images , No. The probability vector of the predicted label for a traffic sign image The output of the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition. m-dimensional fully connected feature vectors of traffic sign images ; The output of the misjudged image filtering and feature calculation module is the first... Image of a traffic sign that was misjudged , No. The true label of a traffic sign image misjudged , No. Image of a correct traffic sign , No. The probability vector of the predicted label for a traffic sign image that has been misjudged. , No. The probability vector of the predicted label for a correct traffic sign image , No. m-dimensional fully connected feature vectors of misjudged traffic sign images , No. m-dimensional fully connected feature vectors of correct traffic sign images The probability vector of the predicted label for a misjudged traffic sign image The probability vector of the predicted label for the correct traffic sign image Correct traffic sign image feature vector and the feature vector of misjudged traffic sign images ; No. Image of a traffic sign that was misjudged The definition is as follows: If the probability vector of the predicted label If so, it is determined to be a false positive for the signal image. ; in, for The label corresponding to the maximum value No. Image of a correct traffic sign The definition is as follows: If the probability vector of the predicted label Then it is determined to be the first Image of a correct traffic sign ; No. Image of a traffic sign that was misjudged The probability vector of the predicted label is defined as the misclassification of the first label. The probability vector of the predicted label for a traffic sign image ; The definition is as follows: in, Representing the The image of the misjudged traffic sign belongs to the first category. The probability of a class Where w represents the number of misjudged traffic sign images. No. Image of a correct traffic sign The probability vector of the predicted label is defined as the first... The probability vector of the predicted label for a correct traffic sign image ; in, Representing the The correct traffic sign image belongs to the first category. The probability of a class Where t represents the number of correct traffic sign images. Z represents the total category of traffic sign images; The probability vector of predicted labels for misclassified traffic sign images The operation is defined as follows: in, To avoid misjudging the total number of traffic sign images, Operations representing the second normal form The probability vector of the predicted label for the correct traffic sign image The operation is defined as follows: in, The total number of correct traffic sign images; No. m-dimensional fully connected feature vectors of misjudged traffic sign images The definition is as follows: in, Let j represent the j-th value of the m-dimensional fully connected feature vector of the i-th misjudged traffic sign image, where j∈[1,m]; No. m-dimensional fully connected feature vectors of correct traffic sign images in, The m-dimensional fully connected feature vector representing the i-th correct traffic sign image The j-th value, ; Correct traffic sign image feature vector and the feature vector of misjudged traffic sign images The calculation method is as follows: , , Where w represents the number of feature vector images of misjudged traffic signs, and t represents the number of correct traffic sign images.
3. The traffic sign recognition method using an encoding / decoding anti-perturbation neural network according to claim 2, characterized in that: The image synthesis calculation module consists of an image synthesis module and a traffic sign convolutional neural network to be trained. The input is the first... Misinterpreted traffic sign semantic images , No. Image of a traffic sign that was misjudged Misjudgment of the first Correct labeling of a traffic sign image The output is the incorrect judgment number. Traffic sign semantic composite image Among them, the misjudgment was number one. Traffic sign semantic composite image The calculation method is as follows: , 。 4. The traffic sign recognition method using an encoding and decoding anti-perturbation neural network according to claim 3, characterized in that: The loss function model for the traffic sign recognition neural network described in step 2 is specifically defined as follows: in, For the total loss function, The loss function for correct traffic sign images is used to control the similarity of features between each traffic sign image and the correct traffic sign image; A feature loss function for each traffic sign image is used to control the similarity of features between the traffic sign image and the misjudged traffic sign image; To control the feature space similarity between each traffic sign image and the misjudged traffic sign image; To control the feature space similarity between each traffic sign image and the correct traffic sign image; Where N represents the total number of traffic sign images input to the fully connected Softmax neural network for traffic sign recognition. The loss function for a fully connected Softmax neural network for traffic sign recognition is defined here. The input to this network is each traffic sign image and its ground truth label. Fully connected Softmax neural network for traffic sign recognition Output the predicted label probability vector for the i-th traffic sign image. The output of the fully connected Softmax neural network for traffic sign recognition The maximum value in the predicted label probability vector of each traffic sign image, where N represents the number of traffic sign images. ; in, Let the mean squared error loss function be . To misjudge the first A semantically synthesized image of traffic signs. For the first Image of a traffic sign that was misinterpreted. The encoder-decoder loss function aims to make the semantically enhanced image of each misjudged traffic sign image as close as possible to the original image, thus making the semantic image of the misjudged traffic sign more natural. , where w represents the number of misjudged traffic sign images; in, This is the m-dimensional fully connected feature vector of the i-th traffic sign image output by the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition. For the correct traffic sign image feature vector, To ensure that each traffic sign image has the most accurate image features possible, Where N represents the number of traffic sign images; in, This is the m-dimensional fully connected feature vector of the i-th traffic sign image output by the last fully connected layer of the fully connected Softmax neural network for traffic sign recognition. To avoid misjudging the feature vector of a traffic sign image, To control the number of erroneous image features in each traffic sign image, N represents the number of traffic sign images; in, Fully connected Softmax neural network for traffic sign recognition Output the predicted label probability vector for the i-th traffic sign image. This is the probability vector for predicting labels of misclassified traffic sign images. To ensure that each traffic sign image has as much feature space similarity as possible to the images of traffic signs that may be misjudged, N represents the number of traffic sign images; in, This is the probability vector for the predicted label of the correct traffic sign image. Fully connected Softmax neural network for traffic sign recognition Output the predicted label probability vector for the i-th traffic sign image. To ensure that each traffic sign image has the correct feature space similarity as much as possible, Where N represents the number of traffic sign images; in Let i be the predicted label for the i-th traffic sign image. The loss function is used to ensure that the traffic sign recognition convolutional neural network fits the correct label to each traffic sign image.