[0019] The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
[0020] For the convenience of illustrating the principle and motive of the scheme provided by the present application, now give an example, Figure 1(a)-1(c) Figure 1(a) is the license plate of country A, Figure 1(b) is the license plate of country B, Figure 1(c) is the license plate of country C, and the license plate of country A The appearance similarity with the license plate of country B is extremely high, and it cannot be accurately classified by ordinary classification methods, but there is a certain difference between the two in the arrangement order of English letters and numbers. The license plate of country C and the license plate of country A are more different in appearance than the license plate of country B, but the arrangement of letters and numbers is the same as the license plate of country A. To accurately distinguish the license plates of these three countries, it is necessary to combine the license plate appearance and the internal character format of the license plate at the same time; but for the country classification task of overseas license plates with dozens or even hundreds of categories, and the license plate formats of some countries are complicated, only It is difficult to count the laws by manpower; therefore, a qualified overseas license plate classification network must not only learn the overall appearance of the license plate, but also need to learn the content and arrangement of the license plate. In practical applications, due to the limited amount of data and the difficulty of the task, it is difficult for the general classification network to take into account the key features. The general convolutional neural network (CNN) classification network often only learns the overall appearance of the license plate better. It is almost impossible to learn the license plate content.
[0021] In view of the above problems, this application proposes a license plate country classification method based on the fusion of CNN and Transformer features. This method uses a two-branch network: Transformer branch and CNN network branch, and Transformer branch is used to extract the internal relationship between license plate characters. , the CNN network branch is used to extract the key features of the appearance of the license plate. By fusing the features of the above two aspects, a more robust feature is formed, and then sent to the classification network to obtain a more accurate classification result. The following is for this application The adopted scheme is described in detail.
[0022] see figure 2 , figure 2 It is a schematic flow chart of an embodiment of the license plate classification method provided by the application, and the method includes:
[0023] Step 11: Perform license plate recognition processing on the first license plate image to obtain a license plate recognition result.
[0024] The first license plate image can be obtained by an image acquisition device or by searching an image database. The image acquisition device can be a camera or a device with a camera. The image acquisition device can be set on the vehicle or independent of the vehicle.
[0025] After obtaining the first license plate image, the existing license plate recognition algorithm can be used to perform license plate recognition processing on the first license plate image to generate a license plate recognition result; for example, taking Figure 1(a) as an example, the license plate of country A is Recognition, the obtained license plate recognition result is {A, B, D, 5, 0, 7}.
[0026] Further, the acquired scene image (including the license plate) can be directly used as the first license plate image, or the image of the area where the license plate is located in the scene image can be used as the first license plate image, and only the image of the area where the license plate is located can be used as the first license plate image. The manner of the license plate image can make the subsequent processing of the first license plate image simpler and more effective.
[0027] Step 12: Encoding the license plate recognition result to obtain the first license plate feature.
[0028] After the license plate recognition result corresponding to the first license plate image is obtained, the license plate recognition result can be encoded by using an encoding method to obtain the corresponding encoded feature (ie, the first license plate feature).
[0029] Step 13: Perform feature extraction processing on the first license plate image to obtain second license plate features.
[0030] For the acquired first license plate image, a feature extraction method (such as: CNN) can be used to directly perform feature extraction processing on the first license plate image to generate corresponding features (ie, second license plate features).
[0031] Step 14: Use the classification network to process the first license plate feature and the second license plate feature to obtain the first license plate classification result.
[0032] After obtaining the first license plate feature and the second license plate feature, the first license plate feature and the second license plate feature can be input into the pre-trained classification network, and the classification network can fuse the first license plate feature and the second license plate feature or classification, etc., to obtain the first license plate classification result; for example, taking Fig. 1 (a) as an example, after the image shown in Fig. 1 (a) is processed in step 11-step 14, the country of the license plate can be obtained is the classification result of the first license plate of A.
[0033] The solution provided in this embodiment mainly relates to the field of deep learning technology, especially deep learning, license plate classification and natural language processing technologies. The obtained first license plate image is directly processed for license plate recognition to obtain the character content of the license plate, and The content of the character is encoded to obtain the corresponding text feature, and then the text feature and the appearance feature obtained by feature extraction of the first license plate image are fused and classified, so as to obtain the result of the license plate classification; The law is integrated into the classification network, rather than isolated first classification and then correction, which helps to correctly distinguish those license plates with complicated license plate style rules; The learning difficulty of the content has achieved better classification results and improved the accuracy of license plate classification.
[0034] see image 3 , image 3 It is a schematic flow chart of another embodiment of the license plate classification method provided by the application, and the method includes:
[0035] Step 21: Acquire the image to be processed, and crop the image to be processed to generate the first license plate image.
[0036] First obtain the image to be processed through the image acquisition device, and obtain the position of the license plate in the image to be processed through the license plate detection model; then intercept the image of the position of the license plate in the image to be processed to obtain the first license plate image, the first license plate image includes license plate.
[0037]Step 22: Use the license plate recognition network to recognize the characters in the first license plate image to obtain a license plate recognition result.
[0038] like Figure 4 As shown, the first license plate image is input to the license plate recognition network, and the license plate recognition network recognizes characters (including numbers, English letters or other special characters) in the first license plate image to generate a license plate recognition result.
[0039] Step 23: Use a preset encoding method to encode the license plate recognition result to obtain a character vector sequence of the license plate.
[0040] The characters in overseas license plates are usually composed of numbers, English letters, local characters or special symbols. In order to convert the characters in the license plate into effective input for the Transformer model, this embodiment uses a one-hot encoding module to perform one-hot ( one-hot) encoding, that is, the default encoding method is one-hot encoding to convert the character string in the license plate recognition result into a unique vector corresponding to it. The encoding rules are as follows:
[0041] Assuming that the character type of the overseas license plate is M, the maximum character length that may appear on the license plate is N, and the length of the characters in the license plate to be classified is recorded as S, then a vector of length M is assigned to each character in the license plate, because the The length of the vector is the same as the number of character types, and each position in the vector is associated with exactly one character. For a certain character, the value of the corresponding position in the vector is set to 1, and the value of the remaining positions is set to 0, then a unique vector can be generated to represent the character. For a license plate with S characters, S vectors with a length of M can be generated, and then supplemented to generate (N-S) vectors with all 0 values and a length of M, and finally N vectors with a length of M are obtained. That is, the license plate character vector sequence is an N×M vector, which is the encoding sequence corresponding to the license plate number. The encoded license plate character vector sequence is similar to the text sequence and can be directly sent to the Transformer model, such as Figure 4 shown.
[0042] For example, assuming that the maximum character length N of the license plate is 4, and the number of character types M is 3 (taking the letters "A", "B" and "C" as an example), given a license plate number "BAA", three lengths are generated is a vector of 3 to represent the characters "B", "A" and "A" in turn, the vector corresponding to the character "B" is [0, 1, 0], and the vector corresponding to the character "A" is [1, 0, 0], since the character length of this license plate is 1 less than the maximum character length, an empty vector [0, 0, 0] needs to be added, and the encoded license plate character vector sequence corresponding to the final license plate "BAA" is: {[0, 1,0];[1,0,0];[1,0,0];[0,0,0]}.
[0043] Step 24: Use the Transformer model to process the character vector sequence of the license plate to obtain the first license plate feature.
[0044] The Transformer model includes an encoding module and a decoding module, such as Figure 5 As shown, first use the encoding module to encode the license plate character vector sequence to obtain the encoded license plate character vector; then use the decoding module to decode the encoded license plate character vector to obtain the first license plate feature.
[0045] Further, the encoding module is composed of several encoders connected in series, and the decoding module is composed of a corresponding number of decoders, that is, the number of encoders is the same as the number of decoders, Figure 5 Take the number of encoders and decoders as 2 as an example. All encoders have the same structure. Each encoder includes the first self-attention layer and the first feed-forward network. The input vector first passes through the first Self-attention layer, the data output by the first self-attention layer is passed to the first feed-forward neural network, and then enters the next encoder, and the output of the last encoder is sent to each decoder, in the decoder After passing through the second self-attention layer, the second codec attention layer and the second feed-forward neural network in sequence, the final output vector is the feature vector containing the character rules in the license plate.
[0046] Step 25: Perform feature extraction processing on the first license plate image to obtain second license plate features.
[0047] like Figure 4 As shown, the CNN network can be used to process the first license plate image, extract the features in the first license plate image, and generate the second license plate feature; specifically, in the branch where the CNN network is located (ie, the CNN network branch), you can use Any advanced backbone network to extract the appearance features of the first license plate image. In the CNN network branch, the end of the network obtains the feature map of the license plate. In order to fuse with the feature vector output by the Transformer model, both need to go through the shaping network. Adjust its dimensions, as shown in Step 26-Step 27.
[0048] Step 26: Use the first shaping network to perform dimensionality reduction processing on the first license plate feature, and obtain the first license plate feature after dimensionality reduction.
[0049] like Figure 4 As shown, the classification network includes a first shaping network connected to the Transformer model, the first shaping network receives the first license plate feature output by the Transformer model, performs dimensionality reduction processing on the first license plate feature, and inputs the processed feature into the feature fusion layer.
[0050] Step 27: Using the second shaping network to perform dimensionality reduction processing on the second license plate features, to obtain the dimensionally reduced second license plate features.
[0051] like Figure 4 As shown, the classification network also includes a second shaping network connected to the CNN network, the second shaping network receives the second license plate feature output by the CNN network, and performs dimensionality reduction processing on the second license plate feature, so that the dimensionality reduction of the first The dimension of a license plate feature is equal to the dimension of the second license plate feature after dimension reduction, and the second license plate feature after dimension reduction is input into the feature fusion layer, that is, the dimensions of the two features input to the feature fusion layer are equal.
[0052] It can be understood that only one of the first shaping network and the second shaping network can be set, and when the dimension of the first license plate feature is greater than the dimension of the second license plate feature, only the first shaping network is set, and the first shaping network is used to Reduce the dimension of the first license plate feature to be equal to the dimension of the second license plate feature; when the dimension of the first license plate feature is less than the dimension of the second license plate feature, only set the second shaping network, utilize the second shaping network The dimensionality of the features is reduced to be equal to the dimensionality of the first license plate feature.
[0053] Step 28: Use the feature fusion layer to fuse the first license plate feature and the second license plate feature to obtain the fused license plate feature.
[0054] like Figure 4 As shown, the classification network also includes a feature fusion layer, and the feature fusion layer receives the features output by the first shaping network and the second shaping network, and fuses the two to generate new features (i.e. fused license plate features); for example, the first The dimension of the feature output by the shaping network is 1*512, the dimension of the feature output by the second shaping network is 1*512, and the dimension of the integrated license plate feature is 1*512.
[0055] Step 29: Use the classification layer to classify the fused license plate features to obtain the first license plate classification result.
[0056] like Figure 4 As shown, the classification layer receives the fused license plate features output by the feature fusion layer, classifies the fused license plate features, and generates corresponding classification results, thereby determining the country to which the current license plate to be classified belongs; specifically, the classification layer can be fully connected by layer composition.
[0057] In a specific embodiment, such as Image 6 As shown, the first license plate image is the image shown in Figure 1(a), the first shaping network is the first fully connected layer, the second shaping network is the average pooling layer, and the classification layer is the second fully connected layer.
[0058] For the feature map output by the CNN network, the average pooling method is used to reduce the feature map to a size of 1*512, and the feature vector output by the Transformer model is transformed to a fixed size of 1*512 through the first fully connected layer, and then used The feature fusion layer fuses and adds the two to get a new feature vector, and the new feature vector is sent to the second fully connected layer, and finally outputs the country of the license plate.
[0059] The classification network provided in this embodiment is a multi-input overseas license plate country classification network that combines CNN and Transformer features. Accepted input, use Transformer to extract the internal laws of license plate characters, convert the license plate characters into useful features for country classification, and improve the accuracy of license plate country classification.
[0060] Understandably, before using the classification network, the classification network needs to be trained to ensure the accuracy of the classification. Specifically, the classification network can be trained through the following steps:
[0061] a. Obtain classification training data
[0062] a1) Obtain classification training images.
[0063] A camera may be used to capture a second license plate image, the second license plate image being an image including the license plate.
[0064] a2) Perform license plate recognition processing on the classification training image to obtain a second license plate recognition result.
[0065] After the classification training image is acquired, the license plate recognition network can be used to recognize the classification training image to generate a corresponding license plate recognition result (denoted as the second license plate recognition result).
[0066] a3) Encoding the second license plate recognition result to obtain a third license plate feature.
[0067] Use the one-hot encoding method to encode the second license plate recognition result to obtain the license plate character vector sequence, and then use the Transformer model to process the license plate character vector sequence, thereby extracting the text features of the second license plate image, denoted as the third license plate features.
[0068] a4) Perform feature extraction processing on the classification training image to obtain the fourth license plate feature.
[0069] The trained CNN network is used to extract the features in the classification training image to generate the fourth license plate features. Understandably, if the classification training image also includes other non-license plate content, the license plate position is obtained through the license plate detection algorithm, and then the image corresponding to the license plate position is intercepted as an image input to the license plate recognition network and CNN network to reduce calculation. The complexity improves the accuracy of recognition and the efficiency of feature extraction.
[0070] Repeat the above steps a1-a4 to obtain enough classification training data for classifying countries. The classification training data includes multiple sets of training features, and each set of training features includes a third license plate feature and a corresponding fourth license plate feature.
[0071] b. Use the classification training data to train the classification network
[0072] b1) Select a set of training features from the classification training data as the current training features.
[0073] A set of training features can be selected from the classification training data according to the set order or randomly as the training features currently input to the classification network to train the entire classification network. The training features include the third license plate feature and the corresponding fourth license plate feature.
[0074] b2) Using the classification network to process the third license plate feature and the fourth license plate feature in the current training features to obtain the second license plate classification result.
[0075] After obtaining the third license plate feature and the fourth license plate feature, the classification network can be used to process the two input features to obtain the corresponding classification results; specifically, as Figure 4As shown, the classification network includes a first shaping network, a second shaping network, a feature fusion layer, and a classification layer. The specific functions and functions thereof have been described in the above embodiments, and will not be repeated here.
[0076] b3) Judging that the classification accuracy of the classification network exceeds a preset threshold.
[0077] In order to determine when to terminate the training, the classification accuracy of the current classification network can be counted, and the relationship between the classification accuracy and the preset threshold can be judged.
[0078] b4) If the classification accuracy of the classification network does not exceed the preset threshold, adjust the parameters of the classification network based on the second license plate classification result.
[0079] If it is judged that the classification accuracy of the current classification network does not exceed the preset threshold, it indicates that the current classification accuracy does not meet the requirements. At this time, the parameters of the classification network can be adjusted based on the second license plate classification result, and then return to select from the classification training data. A set of training features is used as the step of the current training feature, that is, step b1 is executed until the classification accuracy of the classification network exceeds the preset threshold.
[0080] b5) If the classification accuracy of the classification network exceeds the preset threshold, stop the training and output the classification network.
[0081] If it is detected that the classification accuracy of the current classification network has exceeded the preset threshold, it indicates that the classification accuracy of the current classification network is already high and meets the preset requirements. At this time, the training can be stopped and the trained classification network model can be output.
[0082] This embodiment provides a new deep learning-based license plate country classification method, which considers the license plate style characteristics of each country, performs one-hot encoding on the license plate character content, and encodes each character into a unique feature vector, and then input the encoded vector to the Transformer branch to directly extract text features, send the original license plate image to the CNN network to extract appearance features, and map the outputs of the two branches to The same dimension is fused, and then classified by a classification layer composed of a fully connected layer to obtain the country classification result; the Transformer branch can directly extract the text features of the license plate characters, and better find the relationship between the license plate characters and the country to which they belong. This method can provide effective prior knowledge about the license plate content for the classification network, reduce the learning difficulty of the CNN network for the intrinsic format characteristics of the license plate, and thus improve the classification accuracy.
[0083] see Figure 7 , Figure 7 It is a structural schematic diagram of an embodiment of the license plate classification device provided by the present application. The license plate classification device 70 includes a memory 71 and a processor 72 connected to each other. The memory 71 is used to store a computer program. When the computer program is executed by the processor 72, it is used to Realize the license plate classification method in the above-mentioned embodiment.
[0084] This embodiment provides an overseas license plate country classification scheme based on the fusion of CNN and Transformer features. It combines the features of CNN and Transfermer, and uses one-hot encoding for license plate characters. Each character has a unique vector to represent. Then use the Transformer model to directly extract text features. With the help of the powerful features of the Transformer model, it can directly, effectively and meticulously provide the network with the internal relationship of the license plate characters, thereby assisting the country classification, that is, based on the appearance features of the license plate extracted by the original CNN. Integrating the text features of the license plate format, it is helpful to assist the classification network to learn the internal format of the license plate, and can be effectively applied to the national classification task of overseas license plates.
[0085] see Figure 8 , Figure 8 It is a schematic structural diagram of an embodiment of a computer-readable storage medium provided by the present application. The computer-readable storage medium 80 is used to store a computer program 81. When the computer program 81 is executed by a processor, it is used to implement the license plate classification in the above-mentioned embodiment method.
[0086] The computer-readable storage medium 80 can be a server, a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, etc. medium for program code.
[0087] In the several implementation manners provided in this application, it should be understood that the disclosed methods and devices may be implemented in other ways. For example, the device implementations described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
[0088] A unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
[0089] In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
[0090] The above is only an embodiment of the application, and does not limit the patent scope of the application. Any equivalent structure or equivalent process conversion made by using the specification and drawings of this application, or directly or indirectly used in other related technologies fields, are all included in the scope of patent protection of this application in the same way.