Vehicle model and license plate integrated recognition method and system based on multi-dimensional image analysis
By employing a shared backbone network and multi-task detection head construction, knowledge distillation and quantization processing in the vehicle recognition model, combined with end-side hard case judgment and cloud-based synthetic training image generation, the accuracy and adaptability issues of the integrated vehicle recognition model are solved, achieving efficient vehicle type and license plate recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 广州市埃特斯通讯设备有限公司
- Filing Date
- 2026-05-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing integrated vehicle recognition models have low recognition accuracy during compression, and the models of edge devices lack generalization ability, making it difficult to adapt to updates in vehicle type and license plate style. The small number of hard examples in incremental update schemes leads to overfitting during training.
An initial model is constructed using a shared backbone network and a multi-task detection head. A lightweight model is obtained through knowledge distillation and quantization. Difficult cases are identified in real time on the edge and uploaded to the cloud to generate synthetic training images for incremental learning. The backbone network parameters are frozen and only the detection head module is fine-tuned.
It achieves efficient recognition of vehicle type and license plate on edge devices, while improving the recognition accuracy and adaptability of the model through cloud-based collaborative incremental learning, avoiding excessive consumption of computing resources and overfitting during training.
Smart Images

Figure CN122244812A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of vehicle recognition technology, specifically to a method and system for integrated vehicle model and license plate recognition based on multi-dimensional image analysis. Background Technology
[0002] With the development of intelligent transportation systems, vehicle recognition has been widely used in scenarios such as traffic monitoring, parking lot management, and highway toll collection. Vehicle model recognition and license plate recognition are two core links in vehicle recognition tasks, which can provide key information support for vehicle identity verification, traffic violation investigation and handling, and automated parking lot management.
[0003] Early vehicle recognition solutions mostly adopted a step-by-step processing approach. This involved first using two independent neural networks to perform vehicle detection and license plate detection, and then inputting them into the vehicle model recognition network and license plate recognition network respectively to complete the corresponding tasks. This required calling different network models multiple times for inference, which not only consumed a lot of computing resources and had low inference efficiency, making it difficult to meet the low-latency deployment requirements of edge devices, but also easily led to error accumulation during the step-by-step processing. Detection deviations in the earlier stages would directly cause errors in subsequent recognition results, making it difficult to guarantee the overall recognition accuracy.
[0004] To address the aforementioned issues, recent research has proposed an integrated recognition scheme. This scheme extracts common features from vehicle images through a shared backbone network and then connects them to vehicle model recognition and license plate recognition detection heads respectively. This allows for the simultaneous output of vehicle model recognition and license plate recognition results with a single forward inference, significantly reducing the number of model parameters and inference time.
[0005] However, most existing integrated models directly perform lightweight compression on the entire network. During the compression process, the compression intensity is not differentiated according to the feature extraction needs of different modules, which can easily lead to a decrease in key recognition accuracy. On the other hand, in real-world application scenarios, vehicle types and license plate styles are constantly updated, and models deployed on the edge are prone to insufficient generalization. Most existing incremental update solutions directly collect difficult examples and send them back to the cloud for training. The number of difficult examples is small, and training is prone to overfitting, resulting in a decrease in the overall recognition performance of the model after incremental updates, which cannot meet the needs of real-world applications. Summary of the Invention
[0006] This application provides a vehicle model and license plate integrated recognition method and system based on multi-dimensional image analysis, which solves the technical problem that the compression method of the existing vehicle integrated recognition model is unreasonable, resulting in low recognition accuracy.
[0007] The technical solution to the above-mentioned technical problems in this application is as follows:
[0008] In a first aspect, this application provides a method for integrated vehicle model and license plate recognition based on multi-dimensional image analysis, the method comprising:
[0009] The vehicle image to be identified is input into a pre-trained vehicle model and license plate recognition model deployed on the edge, and the model recognition result and license plate recognition result are output. The confidence scores of the model recognition result and the license plate recognition result are output simultaneously.
[0010] When the confidence score of the vehicle model recognition result is lower than the preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold, the vehicle image to be recognized is determined as an incremental learning difficult example sample.
[0011] The incremental learning difficult example samples are uploaded to the cloud, and a pre-trained directional synthetic data generation model is used to generate a synthetic training image that is consistent with the vehicle appearance features and license plate style features of the incremental learning difficult example samples.
[0012] The synthesized training images are filtered to form an incremental training dataset. The vehicle model and license plate recognition model is incrementally trained using the incremental training dataset to obtain an updated vehicle model and license plate recognition model. The updated vehicle model and license plate recognition model is then sent to the terminal for subsequent vehicle model and license plate recognition.
[0013] Secondly, this application provides a vehicle model and license plate integrated recognition system based on multi-dimensional image analysis, including:
[0014] The vehicle information recognition module is used to input the image of the vehicle to be recognized into a pre-trained vehicle model and license plate recognition model deployed on the edge, and output the vehicle model recognition result and the license plate recognition result, and simultaneously output the confidence score of the vehicle model recognition result and the confidence score of the license plate recognition result.
[0015] The confidence score comparison module is used to determine the vehicle image to be identified as an incremental learning difficult example sample when the confidence score of the vehicle recognition result is lower than the preset vehicle confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold.
[0016] The difficult example sample uploading module is used to upload the incremental learning difficult example samples to the cloud, and generate a synthetic training image that is consistent with the vehicle appearance features and license plate style features of the incremental learning difficult example samples through a pre-trained directional synthetic data generation model.
[0017] The recognition model update module is used to filter the synthetic training images to form an incremental training dataset, use the incremental training dataset to incrementally train the vehicle model and license plate integrated recognition model to obtain an updated vehicle model and license plate integrated recognition model, and send the updated vehicle model and license plate integrated recognition model to the terminal for subsequent vehicle model and license plate integrated recognition.
[0018] This application provides one or more technical solutions, which have at least the following technical effects or advantages:
[0019] This application provides a method and system for integrated vehicle model and license plate recognition based on multi-dimensional image analysis. First, it achieves integrated recognition by adopting a shared backbone network structure for vehicle model and license plate recognition, reducing the computational overhead caused by repetitive feature extraction. Second, it uses real-time on-device identification of difficult examples and transmits them back to the cloud, utilizing a targeted synthetic data generation model to generate synthetic training samples with matching features. This solves the problem of overfitting when training directly with a small number of new difficult examples. Finally, during incremental training, the original shared backbone network parameters are frozen, with only subsequent module parameters fine-tuned and feature distillation loss constraints introduced. This allows the model to learn the feature knowledge of new samples while avoiding the destruction of the original model's capabilities by incremental training, ensuring the overall recognition accuracy of the updated model and better adapting to the actual deployment needs of on-device devices.
[0020] Through the above technical solutions, this application addresses the problems of decreased accuracy caused by unreasonable lightweight compression of existing integrated models and insufficient generalization of models with newly added vehicle types and license plates. It achieves end-side integrated vehicle type and license plate recognition that balances inference efficiency and recognition accuracy. At the same time, through a cloud-based collaborative directed synthetic incremental learning scheme, it completes continuous model updates at low cost. This avoids overfitting caused by a small number of difficult examples, avoids the large amount of computing resources consumed by full retraining, and eliminates the need for end-side devices to bear high-load training computations. It effectively improves the stability and adaptability of vehicle type and license plate recognition in real-world scenarios and can meet the requirements of intelligent transportation systems for vehicle recognition tasks in different application scenarios. Attached Figure Description
[0021] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0022] Figure 1 This is a flowchart illustrating the vehicle model and license plate integrated recognition method based on multi-dimensional image analysis provided in this application embodiment;
[0023] Figure 2This is a schematic diagram of the structure of the vehicle model and license plate integrated recognition system based on multi-dimensional image analysis provided in the embodiments of this application.
[0024] The components represented by each number in the attached diagram are explained below:
[0025] Vehicle information recognition module 11, confidence score comparison module 12, difficult sample upload module 13, and recognition model update module 14. Detailed Implementation
[0026] This application provides a method and system for integrated vehicle model and license plate recognition based on multi-dimensional image analysis, which addresses the technical problem of low recognition accuracy caused by unreasonable compression methods in existing integrated vehicle recognition models.
[0027] Example 1, as Figure 1 As shown in the embodiments of this application, a method for integrated vehicle model and license plate recognition based on multi-dimensional image analysis is provided, including:
[0028] S10: Input the image of the vehicle to be identified into the pre-trained vehicle model and license plate recognition model deployed on the edge, output the vehicle model recognition result and the license plate recognition result, and simultaneously output the confidence score of the vehicle model recognition result and the confidence score of the license plate recognition result;
[0029] In this embodiment, the image of the vehicle to be identified is input into a pre-trained vehicle model and license plate recognition model. The pre-trained vehicle model and license plate recognition model is constructed based on a shared backbone network and a multi-task detection head, and undergoes knowledge distillation and quantization processing to finally obtain a lightweight pre-trained model that can be deployed on the edge. It outputs vehicle model recognition results and license plate recognition results, as well as confidence scores for vehicle model recognition results and license plate recognition results, balancing inference speed and overall recognition accuracy.
[0030] Specifically, step S10 in the method includes:
[0031] An initial vehicle model and license plate recognition model is constructed using a shared backbone network and a multi-task detection head. The multi-task detection head includes a vehicle model recognition detection head and a license plate recognition detection head. The output layers of the vehicle model recognition detection head and the license plate recognition detection head use the Softmax function, and the maximum probability value output by the Softmax function is used as the confidence score of the corresponding recognition result.
[0032] Construct a joint training sample set for vehicle models and license plates, wherein each training sample in the joint training sample set for vehicle models and license plates contains a vehicle image and a corresponding vehicle model classification label and license plate information label;
[0033] The initial vehicle model and license plate integrated recognition model is trained in a supervised manner using the vehicle model and license plate joint training sample set until the preset convergence condition is met, thus obtaining the trained initial vehicle model and license plate integrated recognition model.
[0034] The initial vehicle model and license plate recognition model that has been trained is subjected to knowledge distillation and quantization to obtain a pre-trained vehicle model and license plate recognition model.
[0035] The pre-trained vehicle model and license plate recognition model is deployed to the edge. The vehicle image to be recognized is input into the vehicle model and license plate recognition model, and the vehicle model recognition result and license plate recognition result are output. The confidence scores of the vehicle model recognition result and the license plate recognition result are output simultaneously.
[0036] In this embodiment, firstly, an initial vehicle model and license plate recognition model is constructed using a shared backbone network and a multi-task detection head. The shared backbone network can extract common basic features for both vehicle model recognition and license plate recognition simultaneously. Feature extraction for both types of tasks can be completed with only one forward propagation. The multi-task detection head includes a vehicle model recognition detection head and a license plate recognition detection head. The output layer uses the Softmax function, and the maximum probability value output by the Softmax function is used as the confidence score of the corresponding recognition result.
[0037] The Softmax function maps the output probability to the 0-1 range, making it easy to judge the model's confidence in the recognition result directly based on the probability value. The higher the confidence, the more certain the model is about the output result; conversely, the lower the confidence, the less certain the model is about the features of the current sample, and the higher the probability of the recognition result being wrong.
[0038] Specifically, the vehicle model recognition detection head consists of two convolutional layers, a batch normalization layer, a ReLU activation layer, and a fully connected output layer stacked sequentially. The input is a general feature map output from a shared backbone network. After convolutional downsampling to compress the feature dimension, the fully connected layer outputs the predicted probability corresponding to different vehicle models. The kernel size of the two convolutional layers is 3×3, with a stride of 2 and padding of 1. This gradually reduces the feature map size and compresses the number of model parameters without losing key semantic information, thus adapting to the computing power requirements of edge devices.
[0039] The specific structure of the license plate recognition detection head consists of three stacked residual convolutional blocks, a spatial pyramid pooling layer, a feature fusion convolutional layer, and two parallel output branches. The first output branch outputs the predicted probability of the license plate character sequence, and the second output branch outputs the predicted coordinates of the license plate key points. The input is a general feature map output by a shared backbone network. Fine-grained features of the license plate region are extracted through the residual convolutional blocks, and then the feature information of different receptive fields is fused through spatial pyramid pooling to improve the feature adaptation capability of license plates of different sizes. Finally, the two output branches complete character recognition and position refinement respectively to ensure the overall accuracy of license plate recognition. In addition, all convolutional layers use depthwise separable convolution to replace standard convolution, which further compresses the number of parameters and computation while maintaining the feature extraction capability, adapting to the low inference latency requirements of edge devices.
[0040] Secondly, a joint training sample set of vehicle type and license plate information, including vehicle type classification labels and license plate information labels, is constructed to conduct joint supervised training on the initial model. For example, a multi-task joint loss function can be used to calculate the total loss, which is a weighted sum of the cross-entropy loss of vehicle type classification and the CTC loss of license plate recognition. The model parameters are gradually updated through backpropagation until the recognition accuracy of the model on the validation set no longer improves.
[0041] Furthermore, the model undergoes knowledge distillation and quantization to further compress the number of model parameters and computational load without significantly sacrificing recognition accuracy. This yields a pre-trained vehicle model and license plate integrated recognition model adapted for edge deployment. The compression of model parameters and computational load is specifically achieved through structured differentiated compression: low-intensity compression is applied to the shallow modules responsible for extracting general features at the bottom layer in the shared backbone network, while high-intensity compression is applied to the deep modules responsible for extracting high-level semantic features and the two detection head modules. This maximizes the compression of the overall model size while ensuring the integrity of the extraction of general features at the bottom layer. This satisfies the need for low computational resource consumption of edge devices and avoids damage to key feature extraction capabilities caused by uniform intensity compression across the entire network.
[0042] Knowledge distillation refers to transferring the knowledge learned by the high-precision source domain model to the lightweight target domain model, allowing the small model to learn the output distribution characteristics of the large model. This reduces the model size while preserving the recognition accuracy of the large model as much as possible, avoiding a significant drop in accuracy caused by simply compressing the model. Quantization, on the other hand, converts the high-precision floating-point parameters in the model into low-precision integer parameters, further reducing the model's storage footprint and computational overhead, and adapting to the computing power limitations of edge devices.
[0043] Specifically, the initial vehicle model and license plate recognition model, after training, undergoes knowledge distillation and quantization to obtain a pre-trained vehicle model and license plate recognition model, including:
[0044] The initial vehicle model and license plate recognition model that has been trained is used as the high-precision source domain model, and the pre-built lightweight network is used as the lightweight target domain model.
[0045] Set a first distillation weight corresponding to the vehicle model recognition detection head and a second distillation weight corresponding to the license plate recognition detection head, wherein the first distillation weight is greater than the second distillation weight;
[0046] The vehicle images in the joint training sample set of vehicle models and license plates are input into the high-precision source domain model to obtain the source domain vehicle model soft label output by the vehicle model recognition detection head and the source domain license plate soft label output by the license plate recognition detection head.
[0047] Inputting the same vehicle image into the lightweight target domain model outputs a soft label for the vehicle model and a soft label for the license plate in the target domain.
[0048] Calculate the first KL divergence between the source domain vehicle model soft label and the target domain vehicle model soft label, and multiply it by the first distillation weight to obtain the vehicle model distillation loss;
[0049] Calculate the second KL divergence between the source domain license plate soft label and the target domain license plate soft label, and multiply it by the second distillation weight to obtain the license plate distillation loss;
[0050] The vehicle model distillation loss, the license plate distillation loss, and the task loss of the lightweight target domain model are weighted and summed. The parameters of the lightweight target domain model are updated based on the weighted summation result until the preset distillation convergence condition is met, and the distilled lightweight target domain model is obtained.
[0051] The lightweight target domain model after distillation is quantized to obtain a pre-trained vehicle model and license plate recognition model.
[0052] In this embodiment, firstly, the trained initial vehicle model and license plate integrated recognition model is used as the high-precision source domain model, and a pre-built lightweight network is used as the lightweight target domain model. For example, lightweight networks that conform to industry standards, such as MobileNetV3 or YOLOv8n, are used as the backbone. Differentiated distillation weights are set for the two recognition tasks. The first distillation weight corresponding to vehicle model recognition is greater than the second distillation weight corresponding to license plate recognition. This prioritizes the knowledge transfer effect of the vehicle model recognition task and aligns with the priority of model accuracy requirements for vehicle model classification in practical applications. For example, in most smart checkpoint scenarios, the accuracy of vehicle model recognition directly affects the accuracy of vehicle classification statistics, and its priority is usually higher than that of license plate recognition. The differentiated weight setting allows the distillation process to prioritize the retention of high-precision knowledge of vehicle model recognition.
[0053] Next, the same vehicle image is input into two models to obtain soft labels for the two tasks. The KL divergence between the soft labels is then calculated to obtain the corresponding distillation loss. The target domain model parameters are updated by weighting the distillation loss and the task loss together. After knowledge distillation, the model is quantized. KL divergence measures the difference between two probability distributions. In this embodiment, it is used to quantify the difference in probability distribution between the outputs of the lightweight target domain model and the high-precision source domain model. The smaller the KL divergence, the closer the output distribution of the target domain model is to the source domain model, and the better the knowledge transfer effect.
[0054] For example, KL divergence ,in, Output the corresponding category for the source domain model The probability, Output the corresponding category for the target domain model The probability is calculated, and finally the KL divergence between the two tasks is obtained. The smaller the calculated KL divergence, the closer the output probability distributions of the two models are, and the better the knowledge transfer effect.
[0055] Specifically, the vehicle model distillation loss, license plate distillation loss, and task loss of the lightweight target domain model are weighted and summed. The weight coefficients are set according to the task priority to ensure that the distillation process is tilted towards the more important vehicle model recognition task. Finally, a pre-trained lightweight model that balances model size and recognition accuracy is obtained, which is suitable for edge deployment requirements.
[0056] Furthermore, the distilled lightweight target domain model is quantized to obtain a pre-trained vehicle model and license plate recognition integrated model, including:
[0057] The weight values of the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head in the lightweight target domain model after distillation are extracted, and the distribution range of the weight values in the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head are statistically analyzed respectively.
[0058] The first optimal quantization bit width is determined for the shared backbone network based on the distribution range of the weight values in the shared backbone network; the second optimal quantization bit width is determined for the vehicle model recognition detection head based on the distribution range of the weight values in the vehicle model recognition detection head; and the third optimal quantization bit width is determined for the license plate recognition detection head based on the distribution range of the weight values in the license plate recognition detection head.
[0059] All weight values in the shared backbone network are quantized according to the first optimal quantization bit width, all weight values in the vehicle model recognition detection head are quantized according to the second optimal quantization bit width, and all weight values in the license plate recognition detection head are quantized according to the third optimal quantization bit width.
[0060] The quantized shared backbone network, the quantized vehicle model recognition head, and the quantized license plate recognition head are combined into a pre-trained integrated vehicle model and license plate recognition model.
[0061] In this embodiment, firstly, since the distribution characteristics of the shared backbone network, vehicle model recognition detection head, and license plate recognition detection head differ, the shared backbone network is responsible for extracting general visual features, and its weight distribution is usually more concentrated, while the detection head module is responsible for outputting specific task results, and its weight distribution is often more discrete. Therefore, the weight distribution range of the three modules is statistically analyzed separately. Thus, by extracting the weight values of the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head in the distilled lightweight target domain model, specifically by statistically analyzing the distribution range of the weights of different modules, the span and extreme values of the weight distribution of each module are determined.
[0062] For example, the weight values of the shared backbone network are distributed in [-0.8, 0.9], with a distribution range of 1.7; the weight values of the vehicle model recognition detection head are distributed in [-0.3, 0.4], with a distribution range of 0.7; and the weight values of the license plate recognition detection head are distributed in [-0.5, 0.6], with a distribution range of 1.1.
[0063] Furthermore, based on the statistically obtained weight distribution range of the shared backbone network, vehicle model recognition detector, and license plate recognition detector—that is, the numerical span between the minimum and maximum values of all weight values in the shared backbone network, vehicle model recognition detector, and license plate recognition detector—the optimal quantization bit width for each module is determined according to the weight distribution characteristics of different modules. A higher quantization bit width is used for the shared backbone network, which has a wider weight distribution range and higher sensitivity to feature extraction, to retain richer feature accuracy. A lower quantization bit width is used for the detector module, which has a more concentrated weight distribution and lower sensitivity, to obtain a greater compression ratio.
[0064] For example, since the shared backbone network has the widest distribution range, it is allocated a higher optimal quantization bit width to preserve accuracy; the vehicle model recognition detection head has the narrowest distribution range, so it is allocated a lower optimal quantization bit width to further compress the model. While ensuring that the overall model compression rate meets the requirements of edge deployment, the accuracy loss caused by quantization processing is minimized. This reduces the model's storage footprint and inference computation while ensuring the model's core recognition capabilities.
[0065] Furthermore, based on the actual range of the weight distribution of each module, the optimal quantization bit width is determined for each module: for modules with a narrower weight distribution range and lower numerical precision redundancy, a lower quantization bit width can be selected to compress the model size; for modules with a wider weight distribution range and higher sensitivity to numerical precision, a higher quantization bit width is retained to maintain recognition accuracy. According to the determined optimal quantization bit width for each module, quantization operations are performed on all weights of the corresponding module to convert floating-point weights into low-bit integer weights.
[0066] For example, when performing quantization, a uniform quantization mapping method is used to convert floating-point numbers to integers. For a certain module, assume that the minimum value of the weight distribution range of that module is... The maximum value is If the current quantization bit width is set to b, then the range of the quantized integer values is: Quantization step size For arbitrary floating-point weights, it can be achieved through... Once the corresponding quantized integer weight wq is obtained, the quantization conversion of that weight is completed. After all the weights of all modules have been converted according to their respective bit widths, the quantization process of the entire model is completed.
[0067] Finally, the quantized shared backbone network, the quantized vehicle model recognition detection head, and the quantized license plate recognition detection head are combined to obtain the final pre-trained integrated vehicle model and license plate recognition model that can be deployed on the edge. This not only meets the requirements of the edge device for model size and inference speed, but also retains the recognition accuracy of the original model to the greatest extent.
[0068] S20: When the confidence score of the vehicle model recognition result is lower than the preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold, the vehicle image to be recognized is determined as an incremental learning difficult example sample.
[0069] In this embodiment, the vehicle model recognition result is verified by a preset confidence threshold. When the confidence of any recognition result is lower than the corresponding preset threshold, it indicates that the pre-trained vehicle model and license plate integrated recognition model does not have a sufficient grasp of the features of the current sample to be recognized, and the recognition result has a high risk of error. Therefore, the image of the vehicle to be recognized is marked as an incremental learning difficult sample that needs to be updated in the future.
[0070] Specifically, step S20 in the method includes:
[0071] Obtain the set of vehicle model recognition confidence scores and the set of license plate recognition confidence scores output by the pre-trained vehicle model and license plate recognition integrated model when inferring from multiple historical vehicle images during historical runtime.
[0072] Arrange the confidence scores in the vehicle identification confidence score set in descending order of numerical value, and take the confidence score corresponding to the preset quantile as the preset vehicle confidence threshold;
[0073] Arrange the confidence scores in the license plate recognition confidence score set in descending order of numerical value, and take the confidence score corresponding to the preset quantile as the preset license plate confidence threshold;
[0074] Obtain the confidence score of the vehicle model recognition result and the confidence score of the license plate recognition result synchronously output by the pre-trained vehicle model and license plate recognition model during real-time inference;
[0075] When the confidence score of the vehicle model recognition result is lower than the preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold, the vehicle image to be recognized is determined as an incremental learning difficult example sample.
[0076] In this embodiment, the confidence threshold is first automatically determined based on historical operational data statistics, rather than being set manually. This allows the threshold to adapt to the output confidence distribution of the current model in the actual deployment scenario, avoiding the problem of mismatch between manually set thresholds and the actual model output distribution. For example, if the overall confidence of the current model's output is generally high in the actual scenario, a manually set low threshold will result in incomplete screening of difficult examples. If the overall confidence is generally low, a manually set high threshold will screen out too many unnecessary difficult examples. However, automatically determining the threshold based on historical data quantiles ensures that the proportion of screened difficult examples remains stable within a preset range, which perfectly matches the workload of manual review.
[0077] Furthermore, after sorting the historical confidence scores in descending order, the value corresponding to the preset quantile is taken as the threshold. For example, if the preset quantile is set to 90%, the final threshold will automatically select the samples in the bottom 10% of the confidence ranking as difficult cases, which just controls the proportion of difficult cases to 10% of the total identified samples. This matches the workload of manual review in most scenarios. The quantile can also be flexibly adjusted according to the available manual review resources. The more resources there are, the higher the quantile can be set to select more difficult cases and improve the model update effect. If resources are limited, a lower quantile can be set to select only the few most uncertain samples.
[0078] S30: Upload the incremental learning difficult example samples to the cloud, and generate a synthetic training image that is consistent with the vehicle appearance features and license plate style features of the incremental learning difficult example samples through a pre-trained directional synthetic data generation model;
[0079] In this embodiment, after selecting the incremental learning difficult examples, they are uploaded to the cloud. Data synthesis and model updates are not performed on the device side, avoiding the consumption of computing resources on the device and affecting the normal operation of the real-time recognition task. Upon receiving the incremental learning difficult examples, the cloud calls a pre-trained directional synthetic data generation model. Based on the difficult examples, it extracts the corresponding vehicle appearance features and license plate style features, generating a synthetic training image with consistent features. This solves the problem of insufficient real difficult examples to support incremental model updates, and also eliminates the need for manual collection and labeling of more similar samples, reducing the labeling cost of model iteration updates. This directional synthetic data generation model includes a generator and a discriminator, capable of matching features such as vehicle type, appearance color, occlusion, license plate font size, and license plate background color of the difficult examples. The generated synthetic image features have high realism and can be directly used for subsequent incremental model update training.
[0080] Specifically, step S30 in the method includes:
[0081] The incremental learning difficult example samples are uploaded to the cloud, and the vehicle appearance features and license plate style features in the incremental learning difficult example samples are extracted.
[0082] The vehicle appearance features are used as the first generation condition, and the license plate style features are used as the second generation condition. Multiple synthetic training images that simultaneously satisfy the first generation condition and the second generation condition are generated by a pre-trained directional synthetic data generation model.
[0083] The synthetic training images are input into the discriminator of the pre-trained directed synthetic data generation model, and the discriminator outputs a realism score for each synthetic training image.
[0084] In this embodiment of the application, firstly, the incremental learning difficult sample samples collected above are uploaded to the cloud. The cloud extracts the feature information corresponding to the difficult sample samples, namely vehicle appearance features and license plate style features. The vehicle appearance features and license plate style features are used as conditional inputs to the generator to guide the generator to generate training images that match the target features, thereby avoiding the generation of invalid samples with irrelevant features.
[0085] Secondly, when performing targeted synthesis, it is required that the generated image matches the vehicle appearance features of the difficult example sample, including vehicle brand, model, color, shooting angle, degree of occlusion and lighting conditions, etc. At the same time, it is required to match the license plate style features, including license plate type, font size, background color, tilt angle and degree of dirt, etc. This ensures that the generated synthetic training image and the original difficult example sample belong to the same type of difficult-to-recognize scene, which can effectively help the model learn the features of this type of scene and solve the problem of insufficient recognition accuracy of the original scene.
[0086] Furthermore, after generating multiple synthetic training images, a discriminator scores the realism of each generated image, selecting synthetic images with realism scores higher than a preset realism threshold and removing low-quality generated images to ensure the quality of synthetic data used for subsequent incremental training and avoid poor data affecting the effect of incremental model updates. The qualified synthetic training images obtained after screening will be combined with the original incremental learning hard examples to form the incremental training dataset for subsequent model updates.
[0087] The training process for the targeted synthetic data generation model includes:
[0088] Collect multiple real vehicle images, label each real vehicle image with vehicle appearance labels and license plate style labels, and construct a vehicle appearance and license plate style matching dataset;
[0089] An initial directional synthetic data generation model is constructed, which includes a generator and a discriminator. The generator takes vehicle appearance labels and license plate style labels as input conditions and outputs synthetic training images. The discriminator is used to distinguish the synthetic training images from the real vehicle images.
[0090] The vehicle appearance and license plate style matching dataset is used to perform adversarial training on the initial directed synthetic data generation model. During the adversarial training process, an adversarial loss function is constructed. The adversarial loss function is the sum of the expected value of the log probability output by the discriminator for the real vehicle image and the expected value of the log probability output by the discriminator for the synthetic training image.
[0091] The parameters of the generator and the discriminator are updated based on the value of the adversarial loss function until a preset convergence condition is met, thus obtaining a trained directional synthetic data generation model.
[0092] In this embodiment, firstly, real vehicle images of different scenes, vehicle models, and license plate types are collected. Each image is manually labeled with vehicle appearance tags and license plate style tags to construct a paired training dataset, ensuring that the model can learn the mapping relationship between various features and image generation effects.
[0093] Secondly, labeled real data is input into the initial model, allowing the generator to learn to generate synthetic images with corresponding features based on the specified labels, and the discriminator to learn to distinguish between real and generated images. The two continuously optimize their parameters through adversarial training, with the generator gradually improving the realism of the generated images and the discriminator improving its ability to distinguish the generated images. When the adversarial loss function converges to a preset range, training stops, resulting in a directional synthetic data generation model that can stably generate synthetic training images that meet the requirements.
[0094] For example, the targeted synthetic data generation model is trained based on the initial targeted synthetic data generation model, and the specific steps are as follows: The generator is set to include convolutional layers, upsampling layers, and conditional embedding layers, with an input dimension of 256-dimensional feature vectors and an output dimension of 3×H×W three-channel RGB image, where H is the image height and W is the image width; the discriminator is set to include downsampling convolutional layers and feature fusion layers, with an input dimension of 3×H×W image and an output dimension of 1-dimensional truth probability value; during training, the initial learning rate is set to 1×10^-4, the batch size is set to 32, the parameters are updated using the Adam optimizer, and the discriminator is trained once after each training of the generator. When the decrease of the adversarial loss function is less than 1×10^-5 in 10 consecutive training rounds, the convergence condition is determined to be met, and training is stopped to obtain the final targeted synthetic data generation model.
[0095] S40: The synthesized training images are filtered to form an incremental training dataset. The vehicle model and license plate recognition model is incrementally trained using the incremental training dataset to obtain an updated vehicle model and license plate recognition model. The updated vehicle model and license plate recognition model is then sent to the terminal for subsequent vehicle model and license plate recognition.
[0096] In this embodiment, synthetic training images that meet the preset screening conditions are selected to form an incremental training dataset. The original pre-trained vehicle model and license plate recognition model is incrementally updated and trained. The parameters of the original model need to be fine-tuned and optimized, but the entire model does not need to be retrained, which can shorten the model update cycle.
[0097] After incremental training is completed, the updated vehicle model and license plate recognition model will be distributed to the edge devices that are deployed for recognition tasks, replacing the original model and completing the online iterative update of the edge model.
[0098] The synthetic training images are filtered to form an incremental training dataset, which includes:
[0099] Each synthetic training image is sequentially input into the pre-trained vehicle model and license plate recognition model. The confidence scores of the vehicle model recognition and license plate recognition of the pre-trained vehicle model and license plate recognition model for the synthetic training image are obtained. The lower of the vehicle model recognition confidence score and the license plate recognition confidence score is taken, and 1 is subtracted from the lower value to obtain the uncertainty score of the synthetic training image.
[0100] Obtain the realism score output by the discriminator of the pre-trained directional synthetic data generation model of the synthetic training image;
[0101] The uncertainty score and the authenticity score are summed to obtain the screening contribution score of the synthetic training image;
[0102] Synthetic training images whose contribution scores meet the preset screening criteria are retained to form an incremental training dataset.
[0103] In this embodiment, each synthetic training image is first input into the pre-trained vehicle model and license plate recognition model in sequence to obtain the vehicle model recognition confidence and license plate recognition confidence of the synthetic image. The smaller value of the two is selected, and the smaller value is subtracted from 1 to obtain the uncertainty score. The higher the uncertainty score, the lower the original model's confidence in recognizing the synthetic image, and the higher the value of the corresponding sample for incremental updates.
[0104] Secondly, the realism score output by the discriminator is combined. The higher the realism score, the closer the generated image is to the real sample, and the higher the training value. Furthermore, the uncertainty score and the realism score are added together to obtain the screening contribution score, which filters out reliable samples with high uncertainty. The higher the screening contribution score, the greater the help the synthetic sample provides to the incremental update of the model. Images that meet the preset screening contribution score and the original incremental learning difficult sample together form the incremental training dataset. This can not only ensure the overall quality of the incremental training dataset, but also prioritize the retention of difficult sample classes that are most helpful to the model update, thereby improving the efficiency and effectiveness of incremental training.
[0105] The preset conditions are automatically determined based on quantiles. For example, all synthetic training images are sorted in descending order of their screening contribution scores from high to low. The screening contribution score corresponding to the preset quantile is taken as the minimum threshold, and only synthetic training images with scores not lower than the threshold are retained. For example, if the 85th percentile is taken, the top 15% of synthetic training images with scores are retained. This maximizes the contribution of each sample to the model update while controlling the size of the incremental training dataset.
[0106] Further, the vehicle model and license plate recognition model is incrementally trained using the incremental training dataset to obtain an updated vehicle model and license plate recognition model, and the updated vehicle model and license plate recognition model is then distributed to the terminal, including:
[0107] The parameters of the pre-trained vehicle model and license plate integrated recognition model are frozen, using the shared backbone network in the model as a reference shared backbone network.
[0108] The pre-trained vehicle model and license plate recognition model is fine-tuned on the incremental training dataset. During the fine-tuning process, an incremental training loss function is constructed, wherein the incremental training loss function is obtained by adding a classification loss term, a regression loss term, and a feature distillation loss term.
[0109] Based on the value of the incremental training loss function, update the parameters in the pre-trained vehicle model and license plate integrated recognition model other than the reference shared backbone network until the preset incremental convergence condition is met, and obtain the updated vehicle model and license plate integrated recognition model.
[0110] The updated vehicle model and license plate recognition model will be distributed to the device side to replace the pre-trained vehicle model and license plate recognition model deployed on the device side.
[0111] The process of constructing the incremental training loss function includes:
[0112] Obtain the vehicle type classification labels and license plate labels of the incremental learning difficult example samples used to generate the synthetic training images;
[0113] The synthetic training images in the incremental training dataset are input into the pre-trained vehicle model and license plate integrated recognition model. The vehicle model classification prediction result and license plate regression prediction result output by the pre-trained vehicle model and license plate integrated recognition model are obtained. The cross-entropy loss between the vehicle model classification prediction result and the vehicle model classification label corresponding to the synthetic training image is used as the value of the classification loss term. The regression loss between the license plate regression prediction result and the license plate label corresponding to the synthetic training image is used as the value of the regression loss term.
[0114] The synthetic training images in the incremental training dataset are input into the reference shared backbone network to obtain the reference feature map output by the reference shared backbone network. The same synthetic training image is input into the shared backbone network that is being fine-tuned in the pre-trained vehicle model and license plate integrated recognition model to obtain the current feature map output by the shared backbone network.
[0115] Calculate the mean square error between the reference feature map and the current feature map, and use the mean square error as the value of the feature distillation loss term;
[0116] The values of the classification loss term, the regression loss term, and the feature distillation loss term are added together to obtain the value of the incremental training loss function.
[0117] In this embodiment, firstly, the shared backbone network in the pre-trained vehicle model and license plate integrated recognition model is used as a reference shared backbone network. All parameters of the network are frozen and no longer updated or adjusted. This avoids destroying the general features already learned by the backbone network during incremental learning, and avoids catastrophic forgetting, which would lead to a decrease in the recognition accuracy of the original correctly identifiable samples.
[0118] Secondly, the parameters of the vehicle model recognition branch and license plate recognition branch after the backbone network in the model are fine-tuned. During the fine-tuning process, an incremental training loss function containing three types of loss terms is constructed: the first type is the classification loss term, which calculates the cross-entropy loss between the vehicle model classification prediction result and the true label, constraining the accuracy of vehicle model classification; the second type is the regression loss term, which calculates the regression loss between the predicted results of license plate position and character recognition and the labeled label, constraining the accuracy of license plate detection and recognition; the third type is the feature distillation loss term, which uses the frozen reference feature map output by the original shared backbone network as the distillation target, calculates the mean square error between the current backbone output feature map and the reference feature map, and constrains the feature output distribution of the backbone network after fine-tuning to remain consistent with the original model, further avoiding the decline in recognition performance caused by feature distribution shift.
[0119] Specifically, the process of constructing the incremental training loss function is as follows:
[0120] First, the ground truth annotation information corresponding to all samples in the current incremental training dataset is read, including the vehicle type classification label, license plate location coordinates, and license plate character sequence annotation for each synthetic training image, corresponding to the supervision information for the vehicle type classification task and the license plate recognition and detection task, respectively. The incremental training data is then input into the model batch by batch, sequentially obtaining the predicted probability distribution for vehicle type classification, the predicted bounding box for license plate location, and the character recognition results. The cross-entropy classification loss term for vehicle type classification and the smooth L1 regression loss term for license plate prediction are then calculated, respectively.
[0121] For example, after each batch of training is completed, the average value of the classification loss term and the average value of the regression loss term for all samples in the current batch are calculated. These two values are then used as the classification loss term and regression loss term for the current batch and substituted into the calculation of the incremental training loss function. Then, the samples are input into the frozen reference shared backbone network and the currently updated shared backbone network, respectively, to obtain two feature maps with the same dimensions. The mean square error of the pixel values at corresponding positions in the two feature maps is calculated to obtain the feature distillation loss term for the current batch. Finally, the three losses are added together according to their weights. In this embodiment, the weights of the three losses are all set to 1 to obtain the total incremental training loss for the current batch, i.e., the incremental training loss function = classification loss term + regression loss term + feature distillation loss term. Based on this loss, the learnable parameters of the model are updated through backpropagation, gradually optimizing the model's recognition performance until the incremental training meets the preset convergence condition, thus completing the model update.
[0122] Furthermore, during the training process, parameters are updated by backpropagation based on the incremental training loss function. Only the parameters of the recognition branches outside the backbone network are adjusted until the preset incremental convergence condition is met, that is, the incremental training loss function decreases below the preset threshold for multiple consecutive rounds. Training is then stopped to obtain the updated vehicle model and license plate recognition model. Finally, the updated model is distributed to each end device to replace the original model, and it can then be used for subsequent vehicle model and license plate recognition tasks.
[0123] In summary, compared with existing technologies, this application constructs a targeted synthetic data generation model to generate synthetic training data with corresponding features for difficult examples in incremental learning. This can supplement the insufficient number of original difficult examples and solve the problem of data distribution bias. At the same time, through a screening strategy that combines uncertainty and authenticity, it retains the high-quality samples that are most helpful for incremental updates, controls the dataset size, and improves the efficiency of incremental training. During incremental training, by freezing the original shared backbone network parameters and introducing feature distillation loss to constrain feature distribution, it can effectively avoid the catastrophic forgetting problem in incremental learning. While improving the recognition accuracy of new categories and new scenes, it maintains the accuracy of the original recognition task without decline, and finally realizes the online and efficient iterative update of the end-side vehicle and license plate integrated recognition model.
[0124] In summary, the embodiments of this application have at least the following technical effects:
[0125] This application provides a method for integrated vehicle model and license plate recognition based on multi-dimensional image analysis. First, it achieves integrated recognition by adopting a shared backbone network structure for vehicle model and license plate recognition, reducing the computational overhead caused by repetitive feature extraction. Second, it identifies difficult examples in real-time on the device side and transmits them back to the cloud. A targeted synthetic data generation model is used to generate synthetic training samples with matching features, solving the problem of overfitting when training directly with a small number of new difficult examples. Finally, during incremental training, the original shared backbone network parameters are frozen, and only the parameters of subsequent modules are fine-tuned, with a feature distillation loss constraint introduced. This allows the model to learn the feature knowledge of new samples while avoiding the destruction of the original model's capabilities by incremental training, ensuring the overall recognition accuracy of the updated model and better adapting to the actual deployment needs of edge devices.
[0126] Through the above technical solutions, this application addresses the problems of decreased accuracy caused by unreasonable lightweight compression of existing integrated models and insufficient generalization of models with newly added vehicle types and license plates. It achieves end-side integrated vehicle type and license plate recognition that balances inference efficiency and recognition accuracy. At the same time, through a cloud-based collaborative directed synthetic incremental learning scheme, it completes continuous model updates at low cost. This avoids overfitting caused by a small number of difficult examples, avoids the large amount of computing resources consumed by full retraining, and eliminates the need for end-side devices to bear high-load training computations. It effectively improves the stability and adaptability of vehicle type and license plate recognition in real-world scenarios and can meet the requirements of intelligent transportation systems for vehicle recognition tasks in different application scenarios.
[0127] Example 2, as Figure 2 As shown, based on the same inventive concept as the vehicle model and license plate integrated recognition method based on multi-dimensional image analysis provided in Embodiment 1, this application also provides a vehicle model and license plate integrated recognition system based on multi-dimensional image analysis, including:
[0128] The vehicle information recognition module 11 is used to input the image of the vehicle to be recognized into a pre-trained vehicle model and license plate recognition model deployed on the edge, output the vehicle model recognition result and the license plate recognition result, and simultaneously output the confidence score of the vehicle model recognition result and the confidence score of the license plate recognition result.
[0129] The confidence score comparison module 12 is used to determine the vehicle image to be identified as an incremental learning difficult example sample when the confidence score of the vehicle recognition result is lower than the preset vehicle confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold.
[0130] The difficult example sample uploading module 13 is used to upload the incremental learning difficult example samples to the cloud and generate a synthetic training image that is consistent with the vehicle appearance features and license plate style features of the incremental learning difficult example samples through a pre-trained directional synthetic data generation model.
[0131] The recognition model update module 14 is used to filter the synthetic training images to form an incremental training dataset, use the incremental training dataset to incrementally train the vehicle model and license plate integrated recognition model to obtain an updated vehicle model and license plate integrated recognition model, and send the updated vehicle model and license plate integrated recognition model to the terminal for subsequent vehicle model and license plate integrated recognition.
[0132] In one embodiment, the vehicle information recognition module 11 is specifically used for:
[0133] An initial vehicle model and license plate recognition model is constructed using a shared backbone network and a multi-task detection head. The multi-task detection head includes a vehicle model recognition detection head and a license plate recognition detection head. The output layers of the vehicle model recognition detection head and the license plate recognition detection head use the Softmax function, and the maximum probability value output by the Softmax function is used as the confidence score of the corresponding recognition result.
[0134] Construct a joint training sample set for vehicle models and license plates, wherein each training sample in the joint training sample set for vehicle models and license plates contains a vehicle image and a corresponding vehicle model classification label and license plate information label;
[0135] The initial vehicle model and license plate integrated recognition model is trained in a supervised manner using the vehicle model and license plate joint training sample set until the preset convergence condition is met, thus obtaining the trained initial vehicle model and license plate integrated recognition model.
[0136] The initial vehicle model and license plate recognition model that has been trained is subjected to knowledge distillation and quantization to obtain a pre-trained vehicle model and license plate recognition model.
[0137] The pre-trained vehicle model and license plate recognition model is deployed to the edge. The vehicle image to be recognized is input into the vehicle model and license plate recognition model, and the vehicle model recognition result and license plate recognition result are output. The confidence scores of the vehicle model recognition result and the license plate recognition result are output simultaneously.
[0138] Furthermore, the initial vehicle model and license plate recognition model, after training, undergoes knowledge distillation and quantization to obtain a pre-trained vehicle model and license plate recognition model, including:
[0139] The initial vehicle model and license plate recognition model that has been trained is used as the high-precision source domain model, and the pre-built lightweight network is used as the lightweight target domain model.
[0140] Set a first distillation weight corresponding to the vehicle model recognition detection head and a second distillation weight corresponding to the license plate recognition detection head, wherein the first distillation weight is greater than the second distillation weight;
[0141] The vehicle images in the joint training sample set of vehicle models and license plates are input into the high-precision source domain model to obtain the source domain vehicle model soft label output by the vehicle model recognition detection head and the source domain license plate soft label output by the license plate recognition detection head.
[0142] Inputting the same vehicle image into the lightweight target domain model outputs a soft label for the vehicle model and a soft label for the license plate in the target domain.
[0143] Calculate the first KL divergence between the source domain vehicle model soft label and the target domain vehicle model soft label, and multiply it by the first distillation weight to obtain the vehicle model distillation loss;
[0144] Calculate the second KL divergence between the source domain license plate soft label and the target domain license plate soft label, and multiply it by the second distillation weight to obtain the license plate distillation loss;
[0145] The vehicle model distillation loss, the license plate distillation loss, and the task loss of the lightweight target domain model are weighted and summed. The parameters of the lightweight target domain model are updated based on the weighted summation result until the preset distillation convergence condition is met, and the distilled lightweight target domain model is obtained.
[0146] The lightweight target domain model after distillation is quantized to obtain a pre-trained vehicle model and license plate recognition model.
[0147] Furthermore, the distilled lightweight target domain model is quantized to obtain a pre-trained vehicle model and license plate recognition integrated model, including:
[0148] The weight values of the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head in the lightweight target domain model after distillation are extracted, and the distribution range of the weight values in the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head are statistically analyzed respectively.
[0149] The first optimal quantization bit width is determined for the shared backbone network based on the distribution range of the weight values in the shared backbone network; the second optimal quantization bit width is determined for the vehicle model recognition detection head based on the distribution range of the weight values in the vehicle model recognition detection head; and the third optimal quantization bit width is determined for the license plate recognition detection head based on the distribution range of the weight values in the license plate recognition detection head.
[0150] All weight values in the shared backbone network are quantized according to the first optimal quantization bit width, all weight values in the vehicle model recognition detection head are quantized according to the second optimal quantization bit width, and all weight values in the license plate recognition detection head are quantized according to the third optimal quantization bit width.
[0151] The quantized shared backbone network, the quantized vehicle model recognition head, and the quantized license plate recognition head are combined into a pre-trained integrated vehicle model and license plate recognition model.
[0152] In one embodiment, the confidence score comparison module 12 is specifically used for:
[0153] Obtain the set of vehicle model recognition confidence scores and the set of license plate recognition confidence scores output by the pre-trained vehicle model and license plate recognition integrated model when inferring from multiple historical vehicle images during historical runtime.
[0154] Arrange the confidence scores in the vehicle identification confidence score set in descending order of numerical value, and take the confidence score corresponding to the preset quantile as the preset vehicle confidence threshold;
[0155] Arrange the confidence scores in the license plate recognition confidence score set in descending order of numerical value, and take the confidence score corresponding to the preset quantile as the preset license plate confidence threshold;
[0156] Obtain the confidence score of the vehicle model recognition result and the confidence score of the license plate recognition result synchronously output by the pre-trained vehicle model and license plate recognition model during real-time inference;
[0157] When the confidence score of the vehicle model recognition result is lower than the preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold, the vehicle image to be recognized is determined as an incremental learning difficult example sample.
[0158] In one embodiment of the application, the difficult example sample upload module 13 is specifically used for:
[0159] The incremental learning difficult example samples are uploaded to the cloud, and the vehicle appearance features and license plate style features in the incremental learning difficult example samples are extracted.
[0160] The vehicle appearance features are used as the first generation condition, and the license plate style features are used as the second generation condition. Multiple synthetic training images that simultaneously satisfy the first generation condition and the second generation condition are generated by a pre-trained directional synthetic data generation model.
[0161] The synthetic training images are input into the discriminator of the pre-trained directed synthetic data generation model, and the discriminator outputs a realism score for each synthetic training image.
[0162] The training process for the targeted synthetic data generation model includes:
[0163] Collect multiple real vehicle images, label each real vehicle image with vehicle appearance labels and license plate style labels, and construct a vehicle appearance and license plate style matching dataset;
[0164] An initial directional synthetic data generation model is constructed, which includes a generator and a discriminator. The generator takes vehicle appearance labels and license plate style labels as input conditions and outputs synthetic training images. The discriminator is used to distinguish the synthetic training images from the real vehicle images.
[0165] The vehicle appearance and license plate style matching dataset is used to perform adversarial training on the initial directed synthetic data generation model. During the adversarial training process, an adversarial loss function is constructed. The adversarial loss function is the sum of the expected value of the log probability output by the discriminator for the real vehicle image and the expected value of the log probability output by the discriminator for the synthetic training image.
[0166] The parameters of the generator and the discriminator are updated based on the value of the adversarial loss function until a preset convergence condition is met, thus obtaining a trained directional synthetic data generation model.
[0167] Furthermore, in one embodiment, the synthesized training images are filtered to form an incremental training dataset, including:
[0168] Each synthetic training image is sequentially input into the pre-trained vehicle model and license plate recognition model. The confidence scores of the vehicle model recognition and license plate recognition of the pre-trained vehicle model and license plate recognition model for the synthetic training image are obtained. The lower of the vehicle model recognition confidence score and the license plate recognition confidence score is taken, and 1 is subtracted from the lower value to obtain the uncertainty score of the synthetic training image.
[0169] Obtain the realism score output by the discriminator of the pre-trained directional synthetic data generation model of the synthetic training image;
[0170] The uncertainty score and the authenticity score are summed to obtain the screening contribution score of the synthetic training image;
[0171] Synthetic training images whose contribution scores meet the preset screening criteria are retained to form an incremental training dataset.
[0172] Further, the vehicle model and license plate recognition model is incrementally trained using the incremental training dataset to obtain an updated vehicle model and license plate recognition model, and the updated vehicle model and license plate recognition model is then distributed to the terminal, including:
[0173] The parameters of the pre-trained vehicle model and license plate integrated recognition model are frozen, using the shared backbone network in the model as a reference shared backbone network.
[0174] The pre-trained vehicle model and license plate recognition model is fine-tuned on the incremental training dataset. During the fine-tuning process, an incremental training loss function is constructed, wherein the incremental training loss function is obtained by adding a classification loss term, a regression loss term, and a feature distillation loss term.
[0175] Based on the value of the incremental training loss function, update the parameters in the pre-trained vehicle model and license plate integrated recognition model other than the reference shared backbone network until the preset incremental convergence condition is met, and obtain the updated vehicle model and license plate integrated recognition model.
[0176] The updated vehicle model and license plate recognition model will be distributed to the device side to replace the pre-trained vehicle model and license plate recognition model deployed on the device side.
[0177] The process of constructing the incremental training loss function includes:
[0178] Obtain the vehicle type classification labels and license plate labels of the incremental learning difficult example samples used to generate the synthetic training images;
[0179] The synthetic training images in the incremental training dataset are input into the pre-trained vehicle model and license plate integrated recognition model. The vehicle model classification prediction result and license plate regression prediction result output by the pre-trained vehicle model and license plate integrated recognition model are obtained. The cross-entropy loss between the vehicle model classification prediction result and the vehicle model classification label corresponding to the synthetic training image is used as the value of the classification loss term. The regression loss between the license plate regression prediction result and the license plate label corresponding to the synthetic training image is used as the value of the regression loss term.
[0180] The synthetic training images in the incremental training dataset are input into the reference shared backbone network to obtain the reference feature map output by the reference shared backbone network. The same synthetic training image is input into the shared backbone network that is being fine-tuned in the pre-trained vehicle model and license plate integrated recognition model to obtain the current feature map output by the shared backbone network.
[0181] Calculate the mean square error between the reference feature map and the current feature map, and use the mean square error as the value of the feature distillation loss term;
[0182] The values of the classification loss term, the regression loss term, and the feature distillation loss term are added together to obtain the value of the incremental training loss function.
Claims
1. A method for integrated vehicle model and license plate recognition based on multi-dimensional image analysis, characterized in that, The method includes: The vehicle image to be identified is input into a pre-trained vehicle model and license plate recognition model deployed on the edge, and the model recognition result and license plate recognition result are output. The confidence scores of the model recognition result and the license plate recognition result are output simultaneously. When the confidence score of the vehicle model recognition result is lower than the preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold, the vehicle image to be recognized is determined as an incremental learning difficult example sample. The incremental learning difficult example samples are uploaded to the cloud, and a pre-trained directional synthetic data generation model is used to generate a synthetic training image that is consistent with the vehicle appearance features and license plate style features of the incremental learning difficult example samples. The synthesized training images are filtered to form an incremental training dataset. The vehicle model and license plate recognition model is incrementally trained using the incremental training dataset to obtain an updated vehicle model and license plate recognition model. The updated vehicle model and license plate recognition model is then sent to the terminal for subsequent vehicle model and license plate recognition.
2. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 1, characterized in that, The image of the vehicle to be identified is input into a pre-trained vehicle model and license plate recognition model deployed on the edge. The model outputs vehicle model recognition results and license plate recognition results, and simultaneously outputs the confidence scores for the vehicle model recognition results and the license plate recognition results, including: An initial vehicle model and license plate recognition model is constructed using a shared backbone network and a multi-task detection head. The multi-task detection head includes a vehicle model recognition detection head and a license plate recognition detection head. The output layers of the vehicle model recognition detection head and the license plate recognition detection head use the Softmax function, and the maximum probability value output by the Softmax function is used as the confidence score of the corresponding recognition result. Construct a joint training sample set for vehicle models and license plates, wherein each training sample in the joint training sample set for vehicle models and license plates contains a vehicle image and a corresponding vehicle model classification label and license plate information label; The initial vehicle model and license plate integrated recognition model is trained in a supervised manner using the vehicle model and license plate joint training sample set until the preset convergence condition is met, thus obtaining the trained initial vehicle model and license plate integrated recognition model. The initial vehicle model and license plate recognition model that has been trained is subjected to knowledge distillation and quantization to obtain a pre-trained vehicle model and license plate recognition model. The pre-trained vehicle model and license plate recognition model is deployed to the edge. The vehicle image to be recognized is input into the vehicle model and license plate recognition model, and the vehicle model recognition result and license plate recognition result are output. The confidence scores of the vehicle model recognition result and the license plate recognition result are output simultaneously.
3. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 2, characterized in that, The initial vehicle model and license plate recognition model, after training, undergoes knowledge distillation and quantization to obtain a pre-trained vehicle model and license plate recognition model, including: The initial vehicle model and license plate recognition model that has been trained is used as the high-precision source domain model, and the pre-built lightweight network is used as the lightweight target domain model. Set a first distillation weight corresponding to the vehicle model recognition detection head and a second distillation weight corresponding to the license plate recognition detection head, wherein the first distillation weight is greater than the second distillation weight; The vehicle images in the joint training sample set of vehicle models and license plates are input into the high-precision source domain model to obtain the source domain vehicle model soft label output by the vehicle model recognition detection head and the source domain license plate soft label output by the license plate recognition detection head. Inputting the same vehicle image into the lightweight target domain model outputs a soft label for the vehicle model and a soft label for the license plate in the target domain. Calculate the first KL divergence between the source domain vehicle model soft label and the target domain vehicle model soft label, and multiply it by the first distillation weight to obtain the vehicle model distillation loss; Calculate the second KL divergence between the source domain license plate soft label and the target domain license plate soft label, and multiply it by the second distillation weight to obtain the license plate distillation loss; The vehicle model distillation loss, the license plate distillation loss, and the task loss of the lightweight target domain model are weighted and summed. The parameters of the lightweight target domain model are updated based on the weighted summation result until the preset distillation convergence condition is met, and the distilled lightweight target domain model is obtained. The lightweight target domain model after distillation is quantized to obtain a pre-trained vehicle model and license plate recognition model.
4. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 3, characterized in that, The lightweight target domain model after distillation is quantized to obtain a pre-trained vehicle model and license plate recognition integrated model, including: The weight values of the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head in the lightweight target domain model after distillation are extracted, and the distribution range of the weight values in the shared backbone network, the vehicle model recognition detection head, and the license plate recognition detection head are statistically analyzed respectively. The first optimal quantization bit width is determined for the shared backbone network based on the distribution range of the weight values in the shared backbone network; the second optimal quantization bit width is determined for the vehicle model recognition detection head based on the distribution range of the weight values in the vehicle model recognition detection head; and the third optimal quantization bit width is determined for the license plate recognition detection head based on the distribution range of the weight values in the license plate recognition detection head. All weight values in the shared backbone network are quantized according to the first optimal quantization bit width, all weight values in the vehicle model recognition detection head are quantized according to the second optimal quantization bit width, and all weight values in the license plate recognition detection head are quantized according to the third optimal quantization bit width. The quantized shared backbone network, the quantized vehicle model recognition head, and the quantized license plate recognition head are combined into a pre-trained integrated vehicle model and license plate recognition model.
5. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 1, characterized in that, When the confidence score of the vehicle model recognition result is lower than a preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than a preset license plate confidence threshold, the vehicle image to be recognized is identified as a difficult example sample for incremental learning, including: Obtain the set of vehicle model recognition confidence scores and the set of license plate recognition confidence scores output by the pre-trained vehicle model and license plate recognition integrated model when inferring from multiple historical vehicle images during historical runtime. Arrange the confidence scores in the vehicle identification confidence score set in descending order of numerical value, and take the confidence score corresponding to the preset quantile as the preset vehicle confidence threshold; Arrange the confidence scores in the license plate recognition confidence score set in descending order of numerical value, and take the confidence score corresponding to the preset quantile as the preset license plate confidence threshold; Obtain the confidence scores of the vehicle recognition results and the license plate recognition results synchronously output by the pre-trained vehicle and license plate integrated recognition model during real-time inference; When the confidence score of the vehicle model recognition result is lower than the preset vehicle model confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold, the vehicle image to be recognized is determined as an incremental learning difficult example sample.
6. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 1, characterized in that, The incremental learning difficult example samples are uploaded to the cloud, and a pre-trained directed synthetic data generation model is used to generate synthetic training images that match the vehicle appearance features and license plate style features of the incremental learning difficult example samples, including: The incremental learning difficult example samples are uploaded to the cloud, and the vehicle appearance features and license plate style features in the incremental learning difficult example samples are extracted. The vehicle appearance features are used as the first generation condition, and the license plate style features are used as the second generation condition. Multiple synthetic training images that simultaneously satisfy the first generation condition and the second generation condition are generated by a pre-trained directional synthetic data generation model. The synthetic training images are input into the discriminator of the pre-trained directed synthetic data generation model, and the discriminator outputs a realism score for each synthetic training image.
7. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 6, characterized in that, The training process for a targeted synthetic data generation model includes: Collect multiple real vehicle images, label each real vehicle image with vehicle appearance labels and license plate style labels, and construct a vehicle appearance and license plate style matching dataset; An initial directional synthetic data generation model is constructed, which includes a generator and a discriminator. The generator takes vehicle appearance labels and license plate style labels as input conditions and outputs synthetic training images. The discriminator is used to distinguish the synthetic training images from the real vehicle images. The vehicle appearance and license plate style matching dataset is used to perform adversarial training on the initial directed synthetic data generation model. During the adversarial training process, an adversarial loss function is constructed. The adversarial loss function is the sum of the expected value of the log probability output by the discriminator for the real vehicle image and the expected value of the log probability output by the discriminator for the synthetic training image. The parameters of the generator and the discriminator are updated based on the value of the adversarial loss function until a preset convergence condition is met, thus obtaining a trained directional synthetic data generation model.
8. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 1, characterized in that, The synthesized training images are filtered to form an incremental training dataset, including: Each synthetic training image is sequentially input into the pre-trained vehicle model and license plate recognition model. The confidence scores of the vehicle model recognition and license plate recognition of the pre-trained vehicle model and license plate recognition model for the synthetic training image are obtained. The lower of the vehicle model recognition confidence score and the license plate recognition confidence score is taken, and 1 is subtracted from the lower value to obtain the uncertainty score of the synthetic training image. Obtain the realism score output by the discriminator of the pre-trained directional synthetic data generation model of the synthetic training image; The uncertainty score and the authenticity score are summed to obtain the screening contribution score of the synthetic training image; Synthetic training images whose contribution scores meet the preset screening criteria are retained to form an incremental training dataset.
9. The vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to claim 8, characterized in that, The vehicle model and license plate recognition model is incrementally trained using the incremental training dataset to obtain an updated vehicle model and license plate recognition model. The updated vehicle model and license plate recognition model is then distributed to the terminal side, including: The parameters of the pre-trained vehicle model and license plate integrated recognition model are frozen, using the shared backbone network in the model as a reference shared backbone network. The pre-trained vehicle model and license plate recognition model is fine-tuned on the incremental training dataset. During the fine-tuning process, an incremental training loss function is constructed, wherein the incremental training loss function is obtained by adding a classification loss term, a regression loss term, and a feature distillation loss term. Based on the value of the incremental training loss function, update the parameters in the pre-trained vehicle model and license plate integrated recognition model other than the reference shared backbone network until the preset incremental convergence condition is met, and obtain the updated vehicle model and license plate integrated recognition model. The updated vehicle model and license plate recognition model will be distributed to the device side to replace the pre-trained vehicle model and license plate recognition model deployed on the device side. The process of constructing the incremental training loss function includes: Obtain the vehicle type classification labels and license plate labels of the incremental learning difficult example samples used to generate the synthetic training images; The synthetic training images in the incremental training dataset are input into the pre-trained vehicle model and license plate integrated recognition model. The vehicle model classification prediction result and license plate regression prediction result output by the pre-trained vehicle model and license plate integrated recognition model are obtained. The cross-entropy loss between the vehicle model classification prediction result and the vehicle model classification label corresponding to the synthetic training image is used as the value of the classification loss term. The regression loss between the license plate regression prediction result and the license plate label corresponding to the synthetic training image is used as the value of the regression loss term. The synthetic training images in the incremental training dataset are input into the reference shared backbone network to obtain the reference feature map output by the reference shared backbone network. The same synthetic training image is input into the shared backbone network that is being fine-tuned in the pre-trained vehicle model and license plate integrated recognition model to obtain the current feature map output by the shared backbone network. Calculate the mean square error between the reference feature map and the current feature map, and use the mean square error as the value of the feature distillation loss term; The values of the classification loss term, the regression loss term, and the feature distillation loss term are added together to obtain the value of the incremental training loss function.
10. A vehicle model and license plate integrated recognition system based on multi-dimensional image analysis, characterized in that, The method for performing the vehicle model and license plate integrated recognition method based on multi-dimensional image analysis according to any one of claims 1-9 includes: The vehicle information recognition module is used to input the image of the vehicle to be recognized into a pre-trained vehicle model and license plate recognition model deployed on the edge, and output the vehicle model recognition result and the license plate recognition result, and simultaneously output the confidence score of the vehicle model recognition result and the confidence score of the license plate recognition result. The confidence score judgment module is used to determine the vehicle image to be identified as an incremental learning difficult example sample when the confidence score of the vehicle recognition result is lower than the preset vehicle confidence threshold, or the confidence score of the license plate recognition result is lower than the preset license plate confidence threshold. The difficult example sample uploading module is used to upload the incremental learning difficult example samples to the cloud, and generate a synthetic training image that is consistent with the vehicle appearance features and license plate style features of the incremental learning difficult example samples through a pre-trained directional synthetic data generation model. The recognition model update module is used to filter the synthetic training images to form an incremental training dataset, use the incremental training dataset to incrementally train the vehicle model and license plate integrated recognition model to obtain an updated vehicle model and license plate integrated recognition model, and send the updated vehicle model and license plate integrated recognition model to the terminal for subsequent vehicle model and license plate integrated recognition.