Driving assistance method and device, vehicle-mounted equipment and storage medium

A driving assistance and model technology, applied in the field of intelligent driving, can solve the problems of high labor cost, time cost, poor performance, affecting application and promotion value, etc., to eliminate differences and eliminate overfitting.

Pending Publication Date: 2022-04-29
INFIRAY TECH CO LTD
0 Cites 3 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0003] The currently known neural network models usually only perform well in visible light scenes, and perform poorly on infrared datasets. Retraining the neural network model ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

Carry out image to original visible light image and adopt preset image enhancement mode, such as selecting the image enhancement mode based on chromaticity space change, make after the enhanced visible light image obtained by enhancing is used for expanding training set, through training set to initial The image recognition model obtained after the image model is trained is transferred to achieve the same good recognition effect as the visible light image when the infrared image is used for target detection.
First, the enhanced visible light image based on image enhancement expands the image data set, one can eliminate the small sample overfitting problem, and simultaneously eliminate the difference between the visible light image data set and the infrared image data set to the greatest extent, Realize that after migrating the image recognition model with good performance on the visible light scene to the infrared image data set, it can also achieve the purpose of good performance;
In above-mentioned embodiment, based on the Teacher-Student model of building YOLP as initial image model, the Image Recognition Model after training obtains the image recognition model after training to the Teacher-Student model of YOLP by training set, can improve training efficiency, guarantees recognition accuracy loss If it is not obvious, the training efficiency and the recognition efficiency of the model after training can be greatly improved, and the performance effect when the neural network model that performs well in visible light scenes based on small sample data is transferred to the recognition of infrared scene images can be further improved , to enhance the robustness of the image recognition model.
In the foregoing embodiment, by dividing the training to initial image model into two rounds of iterative training, in the first round of iterative training, the model parameters are not frozen for training, and in the second round of iterative training, in the first round of iterative training On the basis of the intermediate image model obtained through training, the ablation training strategy of freezing the backbone network layer and training the model parameters of the two semantic segmentation detection heads for the detection of drivable areas and lane lines is conducive to helping the model training process to be more accurate. Quickly find the local optimum to improve training accuracy and training speed. It should be noted that the ablation training strategy here can be applied to any type of initial image model in different embodiments of the present application. Please refer to FIG. 5a); the pre-training neural network model (5b) in the scheme of training on the basis of the pre-training neural network model obtained after training based on the open source image data set; in the scheme of training the teacher model and the student model Teacher-student architecture model (5c).
In the foregoing embodiment, by obtaining the original visible light image of the driving scene, image enhancement is performed on the original visible light image to obtain an enhanced visible light image, and a sample set is formed according to the enhanced visible light image and the original visible light image, and the The target contained in the image in the sample set is marked to obtain a training set containing the target mark, the initial image model is trained based on the training set, the image recognition model after training is obtained, and the infrared image of the driving scene collected by the infrared shooting device in real time is obtained, and passed The trained image recognition model performs target detection on the infrared image of the driving scene, and outputs the target detection result of the infrared image of the driving scene. In this way, the original visible light image is image enhanced to amplify the sample image. On the one hand, it can Eliminate the problem of overfitting in neural network model training through small samples; on the other hand, it can eliminate the difference between visible light image data and infrared image data, and migrate the neural network model that performs well for visible light scenes to the recognition of infrared scene images , it can also perform well.
Optionally, the initial neural network model of the construction, using the initial neural network model as a teacher model, includes: constructing an initial neural network model, training the initial neural network model based on an open source image data set to obtain a preliminary Training the neural network model, using the pre-trained neural network model as a teacher model. Here, the pre-trained neural network model obtained after training based on the open source image dataset is used as the teacher model, which can use the certain feature extraction ability of the pre-trained neural network model before training through the training set, through a small initial The learning rate helps the pre-trained neural network model to find the local optimum more easily and converge faster, improving training efficiency.
S12, build initial image model; Described initial image model can be one of following: initial neural network model, the pre-training neural network model that trains initial neural network model based on open source image data set, based on initial neural network The teacher-student architecture model is constructed by using the model as the teacher model and the initial neural network model after compressing the convolution kernel as the student model. In this embodiment, the initial neural network model is a YOLOP model, and the network structure of the YOLOP model includes: a backbone network Backbone, a neck layer Neck, and a Detect head. The backbone network Backbone is used to extract the feature information of the target in the image, and input the feature map into the neck structure Neck; the neck structure Neck performs feature fusion of the feature maps of different backbone layers, and fuses the shallow network with the deep network. Strengthen the fusion capability of network features; the detection head Detect head is used to generate the bounding box and predict the category of the target. The lane line segmentation head and the target detection head that detects the type and position of target objects such as pedestrians and vehicles; the drivable area segmentation head and the lane line segmentation head respectively output a mask (Mask) matrix that is similar in size to the driving scene image; target detection The head o...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The embodiment of the invention provides a driving assistance method and device of an infrared image, vehicle-mounted equipment and a storage medium, and the driving assistance method comprises the steps: obtaining an original visible light image of a driving scene, carrying out the image enhancement of the original visible light image, and obtaining an enhanced visible light image, forming a sample set according to the enhanced visible light image and the original visible light image; labeling targets contained in each image in the sample set to obtain a training set containing target labels; training an initial image model based on the training set to obtain a trained image recognition model; and acquiring a driving scene infrared image acquired by an infrared shooting device in real time, performing target detection on the driving scene infrared image through the image recognition model, and outputting a target detection result of the driving scene infrared image.

Application Domain

Technology Topic

Image

  • Driving assistance method and device, vehicle-mounted equipment and storage medium
  • Driving assistance method and device, vehicle-mounted equipment and storage medium
  • Driving assistance method and device, vehicle-mounted equipment and storage medium

Examples

  • Experimental program(1)

Example Embodiment

[0031] The technical solutions of the present application will be further elaborated below with reference to the accompanying drawings and specific embodiments of the description.
[0032] In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail below with reference to the accompanying drawings. All other embodiments obtained under the premise of creative work fall within the scope of protection of the present application.
[0033] In the following description, references to the expression "some embodiments" describe a subset of all possible embodiments, it should be noted that "some embodiments" may be the same subset or different subsets of all possible embodiments sets and can be combined with each other without conflict.
[0034] In the following description, the related terms "first, second, third" are only used to distinguish similar objects, and do not represent a specific order of objects. Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.
[0035] see figure 1 , a schematic diagram of an optional application scenario of the infrared image-based driving assistance method provided by the embodiment of the application, wherein the vehicle-mounted device includes a main control system 13 and an infrared photographing device 11 that is communicatively connected to the main control system 13 , and the main control system 13 It mainly includes a memory, a processor, a display connected to the processor, an input device, and the like. The main control system 13 is loaded with a computer program for implementing the infrared image-based driving assistance method provided by the embodiments of the present application. Optionally, the in-vehicle device further includes a display alarm module 12 that is communicatively connected to the main control system 13 . The in-vehicle equipment is installed on the vehicle. When the vehicle is driving on the lane, the infrared camera 11 collects the infrared images around the vehicle in real time to form an infrared image of the driving scene and sends it to the main control system 13. The main control system 13 recognizes the image through the image. The model performs target detection on the infrared image of the driving scene, and forms the target detection result by identifying the drivable area, lane line, target object type and its position in the infrared image of the driving scene. The suspected collision target that may collide with it, if it is determined that the risk of collision between the suspected collision target and the vehicle exceeds a certain value, a control command can be sent to the display alarm module 12 to control the display alarm module 12 to issue collision warning information to provide the driver of the vehicle. Predict the risk in advance and take timely measures to eliminate the risk of collision or take timely measures to reduce the degree of collision.
[0036] see figure 2 , the infrared image-based driving assistance method provided by an embodiment of the present application is applied to figure 1 The vehicle-mounted device shown, the driving assistance method includes the following steps:
[0037] S101: Acquire an original visible light image of a driving scene, perform image enhancement on the original visible light image to obtain an enhanced visible light image, and form a sample set according to the enhanced visible light image and the original visible light image.
[0038]The original visible light image of the driving scene may be the visible light picture data captured in real time by the visible light camera and/or the collected visible light video data during the daytime when the vehicle is driving. The visible light image of the driving scene is collected by the vehicle in the daytime driving scene, so there are more sources and ways to obtain the original visible light image of the driving scene. The enhanced visible light image is obtained by performing image enhancement on the original visible light image, and a sample set is formed according to the enhanced visible light image and the original visible light image.
[0039] Wherein, performing image enhancement on the original visible light image to obtain an enhanced visible light image may be performed through a preset image enhancement mode. Image enhancement mode refers to the purpose of emphasizing the overall and local characteristics of the image by enhancing the useful information in the image, making the original unclear image clear or emphasizing some interesting features, and expanding the difference between the features of different objects in the image. It is an image processing method that suppresses uninteresting features, improves image quality, enriches information, strengthens image interpretation and recognition, and meets the needs of some special analysis. The enhanced visible light image is obtained by performing image enhancement on the original visible light image using a preset image enhancement mode, and a sample set is formed according to the enhanced visible light image and the original visible light image. The data set is expanded, so that after the initial image model is transferred and trained through the expanded sample image data set, it can be applied to the infrared image to maintain a good effect when the target is detected.
[0040] S103: Label the targets included in each image in the sample set to obtain a training set including target labels.
[0041] The objects contained in the image may include various objects that are expected to be detected and analyzed by the image recognition model, such as lanes, drivable areas, people, vehicles, objects, etc. within the driving scene. Optionally, the target included in the image may only refer to a target object that may cause a collision risk with the vehicle during the driving process, such as people, vehicles, roadblocks and other objects on the road within the range of the driving scene. Labeling the targets contained in the images in the sample set can be obtained by manual labeling or automatic labeling by a preset labeling tool. By labeling the targets contained in the images in the sample set, a training set containing target labels is obtained according to the images and their labels. In this way, the training set can be obtained from various driving scenes collected by the image acquisition device on the vehicle during the daytime driving. The original visible light image is obtained by enhancing the original visible light image, and the enhanced visible light image is obtained, and various objects in the original visible light image and the enhanced visible light image are marked.
[0042] In an optional specific example, an image acquisition device is installed at the front grille of the vehicle to collect visible light image data of highways, national highways, urban areas, and suburban scenes in different regions, seasons, and weather, and obtain tens of thousands of images. hours of video files. The video files are extracted according to the proportion to obtain pictures, and they are strictly cleaned, and the untargeted data and highly similar repeated scenes are eliminated, and the original visible light image of the driving scene is obtained. Use a preset image enhancement mode to perform image enhancement on the original visible light image to obtain a corresponding enhanced visible light image, and manually mark the targets in the original visible light image and the enhanced visible light image. In this embodiment, the target mark includes the driving scene image. The segmentation line labeling of the area, the segmentation line labeling of the lane line in the driving scene image, and the labeling of the type and location of the target object that may cause a collision risk with the vehicle. The target objects include pedestrians and vehicles. Pedestrians are divided into person, There are three types of cyclist and rider, corresponding to standard pedestrians, bicycle cyclists and electric motorcycle cyclists respectively; vehicles are divided into four types: car, bus, truck and vehicle, which correspond to cars, buses, trucks and other types of vehicles respectively. .
[0043] S105: Train the initial image model based on the training set to obtain a trained image recognition model.
[0044] The initial image model can use various known convolutional neural network models, deep convolutional neural network models, and the like. In an optional example, see image 3 , the initial image model adopts the YOLOP (You Only Look Once for Panoptic, panoramic driving perception) convolutional neural network model. The model architecture of YOLOP can include the backbone network Backbone, the neck layer Neck, and the detection head Detect head. The backbone network Backbone is used to extract the feature information of the target in the image, and input the feature map into the neck structure Neck; the neck structure Neck fuses the feature maps of different backbone layers, and fuses the shallow network with the deep network. In order to strengthen the fusion ability of network features; the detection head Detect head is used to generate bounding boxes and predict the category of the target. The detection head Detect head is used as the output layer of the image recognition model, and corresponding to the detection output results of different targets, it is set to include multiple ones. The target annotation includes the marking of the dividing line of the drivable area in the driving scene image, and the segmentation of the lane line in the driving scene image. For example, the line labeling and the labeling of the target object type and its position where the lane may have a collision risk with the vehicle, the detection head Detect head correspondingly includes a drivable area segmentation head to output a Mask matrix similar to the size of the input image; the lane line segmentation head, To output a Mask matrix similar to the size of the input image; target detection head to output a matrix of preset size.
[0045] S107: Acquire an infrared image of the driving scene collected in real time by an infrared photographing device, perform target detection on the infrared image of the driving scene by using the image recognition model, and output a target detection result of the infrared image of the driving scene.
[0046] The infrared image of the driving scene refers to the infrared image taken of the environment in which the vehicle is currently driving. For example, when the vehicle is driving in the lane, the infrared image of the driving scene refers to the infrared image obtained by shooting a certain range of areas around the vehicle in the lane. image. By acquiring the infrared image of the driving scene of the vehicle during the driving process of the vehicle, the road, buildings, other vehicles, people or objects that are close to the vehicle in front of the vehicle can be photographed in the infrared image of the driving scene. The targets in the infrared images of the driving scene collected in real time are detected and analyzed. According to the target detection results, the risk of a collision may be predicted during the driving of the vehicle and an early warning is given before the collision occurs, or according to the target detection results, the vehicle is driven on the road. The driving operation optimization is supplemented by prompts, or the vehicle intelligent driving control is realized according to the target detection results. The target detection results include the drivable area segmentation results, the lane line segmentation results, and the recognition results of whether the infrared image of the driving scene contains target object categories such as people, vehicles or other objects and their positions. The target object category refers to the category of different target objects included in the infrared image of the driving scene that may form a collision risk with the vehicle, wherein the number and category name of the target object category can be preset, such as the target object category is people, vehicles; Or the target object category is divided into pedestrians, cyclists, and people riding electric vehicles for people, and cars, buses, trucks, and other types of vehicles for vehicles.
[0047] The in-vehicle device performs target detection on the infrared image of the driving scene through an image recognition model, and outputting the target detection result of the infrared image of the driving scene may refer to outputting the segmentation result of the drivable area in the infrared image of the driving scene, the segmentation result of the lane line. Segmentation results and object detection results for the included target object categories and their locations. The vehicle-mounted device performs target detection on the infrared image of the driving scene, and determines the imaging area of ​​the drivable area of ​​the vehicle in the image, the imaging area of ​​the lane line, and the category of target objects such as people and vehicles in the image, and the target object is in the infrared image of the driving scene. , and output the target detection result that includes the imaging range of the drivable area, the position of the lane line, the category of the target object contained and its position in the infrared image corresponding to the driving scene.
[0048] Optionally, the target detection result further includes the size of the target object. The vehicle-mounted device performs target detection on the infrared image of the driving scene, and when it is determined that the infrared image of the driving scene contains target objects such as people and vehicles, the category of the target object is determined. , The position of the target object in the infrared image of the driving scene and the size of the target object in the infrared image of the driving scene, according to the imaging range of the drivable area, the position of the lane line, the category of the target object and its position and size in the infrared image of the corresponding driving scene The target detection result is formed and output.
[0049] In the above embodiment, the original visible light image of the driving scene is acquired, the original visible light image is image-enhanced to obtain an enhanced visible light image, a sample set is formed according to the enhanced visible light image and the original visible light image, and the image in the sample set is formed. The included targets are marked to obtain a training set containing target markings, the initial image model is trained based on the training set, a trained image recognition model is obtained, and the infrared image of the driving scene collected by the infrared camera in real time is obtained. The image recognition model performs target detection on the infrared image of the driving scene, and outputs the target detection result of the infrared image of the driving scene. In this way, by performing image enhancement on the original visible light image to amplify the sample image, on the one hand, it can eliminate the need for small On the other hand, the difference between visible light image data and infrared image data can be eliminated, and the neural network model that performs well in the visible light scene will be transferred to the recognition of infrared scene images. able to perform well.
[0050] Optionally, the initial image model is trained based on the training set to obtain a trained image recognition model, including:
[0051] constructing an initial neural network model, training the initial neural network model based on an open source image data set to obtain a pre-training neural network model, and using the pre-training neural network model as an initial image model;
[0052] The initial image model is trained based on the training set to obtain a trained image recognition model.
[0053] The initial neural network model is trained based on the open source image data set to obtain a pre-trained neural network model, the pre-trained neural network model is used as the initial image model, and the initial image model is trained based on the training set to obtain a trained image recognition model. In an optional example, a pre-trained neural network model obtained by training the initial neural network model based on an open source image dataset, specifically using the YOLOP convolutional neural network model based on the Image-Net open source image dataset for training The resulting pre-trained neural network model.
[0054] In the above embodiment, the pre-training neural network model is obtained by training the initial neural network model based on the open source image data set, and the pre-training neural network model is trained based on the training set, because the pre-training neural network model already has a certain feature extraction capability. , so that a small initial learning rate can be used for training, and a small initial learning rate can help the pre-trained neural network model to find the local optimum more easily and converge faster, improve the training efficiency, and enhance the robustness of the image recognition model.
[0055] Optionally, the initial image model is trained based on the training set to obtain a trained image recognition model, including:
[0056] constructing an initial neural network model, and using the initial neural network model as a teacher model;
[0057] Compressing the number of convolution kernels in the teacher model to obtain a student model;
[0058] An initial image model is constructed according to the teacher model and the student model, the initial image model is trained based on the training set, and the first model parameters of the teacher model are kept unchanged during the training process. and the gradient of the consistency loss function between the predicted results of the student model and the second model parameter of the student model by using a preset gradient descent algorithm to iterate;
[0059] Until the preset iterative conditions are met, the trained image recognition model is obtained.
[0060]Among them, the initial neural network model is used as the teacher model, and the initial neural network model after compressing the convolution kernel is used as the student model to construct the initial image model of the teacher-student architecture (Teacher-Student), please refer to Figure 4 , for the teacher model, the online data set can be used as the sample set, and the student model can use the training set (offline data set) as the training set. During the training process, the model parameters of the teacher model are kept unchanged. The model parameters are called the first model parameters. Through the gradient of the Consistency Lost function between the prediction results of the teacher model and the student model, the preset gradient descent algorithm is used to iterate the model parameters of the student model. To distinguish, the model parameters of the student model are referred to as the second model parameters until the pre-set iterative conditions are met to obtain the trained image recognition model. The preset gradient descent algorithm may be a known gradient descent algorithm such as SGD (Stochastic gradient descent, adaptive gradient descent) or Adam (Adaptive moment estimation, adaptive matrix estimation).
[0061] Optionally, the constructing an initial neural network model, using the initial neural network model as a teacher model, includes: constructing an initial neural network model, and training the initial neural network model based on an open source image data set to obtain a pre-trained neural network. model, using the pre-trained neural network model as a teacher model. Here, the pre-trained neural network model obtained after training based on the open source image data set is used as the teacher model, and the certain feature extraction ability of the pre-trained neural network model before training through the training set can be used, and through a small initial The learning rate helps the pre-trained neural network model to find the local optimum more easily and converge faster, improving the training efficiency.
[0062] In an optional specific example, the YOLOP convolutional neural network model is used as the initial neural network model, and the original YOLOP model is used as the teacher model. The number of convolution kernels of the leaky_relu activation function) module is compressed to 1/2 of the number of convolution kernels of the original model, and the compressed model is used as the student model to build the initial image of the teacher-student architecture (Teacher-Student). The model can refer to the π model.
[0063] In the above embodiment, based on constructing the Teacher-Student model of YOLOP as the initial image model, the trained image recognition model is obtained by training the Teacher-Student model of YOLOP through the training set, which can improve the training efficiency and ensure that the loss of recognition accuracy is not obvious. In this case, the training efficiency and the recognition efficiency of the model after training can be greatly improved, and the performance effect of migrating the neural network model that performs well for the visible light scene based on small sample data to the recognition of infrared scene images can be further improved, and the image can be enhanced. Robustness of the recognition model.
[0064] Optionally, until a preset iterative condition is met, a trained image recognition model is obtained, including:
[0065] A Gaussian ramp function is used to set the weighting coefficient of the consistency loss function, and the expression of the weighting coefficient is: The initial value of the weighting coefficient is set to 0, until the t approaches 1, the image recognition model after training is obtained; or,
[0066] Until the number of iterations reaches the preset number, the trained image recognition model is obtained; or,
[0067] Until the consistency loss function converges, a trained image recognition model is obtained.
[0068] The weighting coefficient of the consistency loss function Consistency Lost between the teacher model and the student model adopts a Gaussian ramp-up function (Gaussian ramp-up), and the expression of the weighting coefficient is: At the beginning of the training, setting the initial value of the weighting coefficient w(t) to 0 can eliminate the student model with poor feature extraction effect at the beginning of the training. t approaches 1. In an alternative example, iterative training of YOLOP's Teacher-Student model over the training set can be as follows: x i = training sample excitation; L = labeled training small sample input; y i = label of the ith training sample; w(t) = weight coefficient of unsupervised ramp function; T μ (x) = Teacher model with trainable first model parameter μ; S θ (x) = Student model with trainable second model parameter θ; g(x) = excitation function; K(x) = loss function of YOLOP; C = number of target object classes detected by the target detection head; z i = the prediction result of the Student model model; The prediction result of the model; i belongs to minibatch B, which means that minibatch training samples are taken to the Teacher-Student model at a time in each iteration cycle. One iteration of training can be expressed as follows:
[0069] for t in[1,num_epochs]do
[0070] for each mimbatch B do
[0071] z i∈B ←S θ (g(x i∈B ))
[0072]
[0073]
[0074] During the iterative training process, the first model parameter μ remains unchanged, and the second model parameter θ value is updated with a gradient descent strategy such as SGD or Adam.
[0075] As the training period increases, t can continue to approach 1. In this way, t can be set to approach 1 as the termination condition of the model training iteration. Optionally, it is also possible to set the number of iterations to a preset number as the termination condition for model training iterations to improve training efficiency; or set the convergence of the consistency loss function as the termination condition for model training iterations to ensure that training accuracy and training efficiency are achieved. better balance.
[0076] Optionally, the target included in each image in the sample set is marked to obtain a training set containing target annotations, including:
[0077] Labeling the categories and positions of the target objects included in each image in the sample set, segmenting and labeling the drivable area, and segmenting and labeling the lane lines to obtain a training set containing the target labels.
[0078] Among them, there are three types of target annotations on the images of the sample set: one is the segmentation and annotation of the drivable area; the second is the segmentation and annotation of the lane line; the third is the annotation of the category and location of the target object. The training set consists of sample images with three types of target annotations, and the initial image model is trained through the training set consisting of sample images with three types of target annotations. In this way, the trained image recognition model can be used for vehicle driving scenes. Detect and analyze the drivable area, lane line, target object category and position in the infrared image collected in real time in Test results.
[0079] Optionally, the initial image model is trained based on the training set to obtain a trained image recognition model, including:
[0080] Unfreezing the model parameters of the initial image model, and training the initial image model based on the training set until the first iteration termination condition is met, to obtain an intermediate image model;
[0081] Freezing the model parameters of the backbone network layer in the intermediate image model, the first segmentation detection head for detecting the drivable area, and the second segmentation detection head for detecting the lane line;
[0082] The intermediate image model is trained based on the training set until the second iteration termination condition is satisfied, and a trained image recognition model is obtained.
[0083] Secondly, the training of the initial image model is divided into two rounds of iterative training. During the first round of iterative training, the model parameters of the initial image model are unfrozen, that is, any model parameters of the initial image model are not frozen. The image model is trained until the termination condition of the first iteration is met, and an intermediate image model is obtained; in the second round of iterative training, the backbone network layer in the intermediate image model obtained from the first round of iterative training is used to detect the drivable area. The model parameters of the first segmentation detection head and the second segmentation detection head for detecting lane lines are frozen, and then the intermediate image model is trained through the training set until the second iteration termination condition is met, so as to finally obtain the trained image. Identify the model. The first iteration termination condition and the second iteration termination condition may refer to different iteration training times. In an optional example, the first iteration termination condition refers to training for 150 iteration cycles, and the second iteration termination condition refers to Train for 50 epochs.
[0084] In the above embodiment, by dividing the training of the initial image model into two rounds of iterative training, the model parameters are not frozen for training in the first round of iterative training, and the results obtained in the first round of iterative training in the second round of iterative training. On the basis of the intermediate image model, the ablation training strategy of freezing the backbone network layer and training the model parameters of the two semantic segmentation detection heads for the detection of drivable areas and lane lines is beneficial to help the model training process to find local parts more quickly. The best point to improve the training accuracy and training speed. It should be noted that the ablation training strategy here can be applied to any type of initial image model in different embodiments of the present application, please refer to Figure 5 , such as the initial neural network model (5a) in the training scheme in which the initial neural network model is directly trained; the pre-training in the scheme of training on the basis of the pre-trained neural network model obtained after training based on the open source image data set Neural network model (5b); teacher-student architecture model (5c) in the scheme of building teacher model and student model for training.
[0085] Optionally, performing image enhancement on the original visible light image to obtain an enhanced visible light image, comprising:
[0086] An enhanced visible light image is obtained by performing image enhancement on the original visible light image using an image enhancement mode based on chromaticity space change.
[0087] Use the preset image enhancement mode for the original visible light image, such as selecting the image enhancement mode based on chromaticity space change, so that after the enhanced visible light image obtained after enhancement is used to expand the training set, the initial image model is enhanced through the training set. The image recognition model obtained after training can maintain the same good recognition effect as visible light images when it is used for target detection on infrared images.
[0088] Optionally, performing image enhancement on the original visible light image to obtain an enhanced visible light image, comprising:
[0089] Image enhancement is performed on the original visible light image using at least one of the following image enhancement modes to obtain an enhanced visible light image:
[0090] randomly changing the color dithering enhancement mode of at least one of brightness, saturation, and contrast of the original visible light image;
[0091] an inversion enhancement mode that inverts each pixel value of the original visible light image;
[0092]A randomization enhancement mode in which pixels whose pixel values ​​in the original visible light image exceed a set threshold are randomly inverted;
[0093] A gradation separation enhancement mode in which a preset bit value is randomly reduced for each color channel in the original visible light image;
[0094] A Gaussian blur enhancement mode that performs Gaussian convolution on the original visible light image.
[0095] Among them, the image enhancement mode for enhancing the original visible light image may specifically be: color jitter enhancement mode (ColorJitter), inversion enhancement mode (Inver), randomization enhancement mode (RandomSolarize), color level separation enhancement mode (Posterize), Gaussian enhancement mode One or more of the GaussianBlur modes. The color dithering enhancement mode obtains the corresponding visible light enhancement image by randomly changing the brightness, saturation (contract) and contrast (hue) of the original visible light image. The inversion enhancement mode is to obtain the corresponding visible light enhancement image by inverting the pixel value of each pixel of the original visible light image. The randomization enhancement mode obtains the corresponding visible light enhancement image by randomly inverting the pixels whose pixel value exceeds the set threshold (threshold) in the original visible light image. The color level separation enhancement mode obtains the corresponding visible light enhancement image by randomly reducing preset bit values ​​(bits) for each color channel in the original visible light image. The Gaussian blur enhancement mode obtains the corresponding visible light enhancement image by performing Gaussian convolution on the original visible light image. By performing transfer learning on the initial image model with a mixed dataset formed by the image-enhanced visible-light-enhanced image and the original visible-light image, a smaller learning rate is used to train the model on the data-enhanced dataset, and the target of infrared images is realized. The detection can maintain the same good recognition effect as the visible light image.
[0096] In the above-mentioned embodiment, the color jitter enhancement mode, inversion enhancement mode, randomization enhancement mode, color level separation enhancement mode, and Gaussian blur enhancement mode are used as the image enhancement modes to perform offline enhancement on the original visible light image. The visible light image expands the original visible light data set used for training the model, and the expanded image data set is six times that of the original visible light data set. How to eliminate the problem of small sample over-fitting when the sets have large differences, and at the same time, eliminate the difference with the infrared data set, so that the image recognition model that performs well on the visible light scene can also achieve good performance after migrating to the infrared image data set. Performance.
[0097] Taking the initial image model as the YOLOP pre-training model obtained by training the initial YOLOP model based on the open source image data set as an example, the image recognition model is obtained after training the initial image model through the training set, such as Image 6 As shown, it is a comparison chart of the effect of target detection on the infrared images of the driving scene collected in real time using the YOLOP pre-training model and the image recognition model after training in this application, respectively. Target objects such as pedestrians and bicycles can be accurately recognized; on the other hand, the recognition of lane lines and drivable areas can be added, and the recognition effect is also improved. like Figure 7 As shown, it is another effect comparison chart of using the YOLOP pre-training model and using the image recognition model after training in this application to respectively perform target detection on the infrared images of the driving scene collected in real time. Through transfer learning, pedestrians, traffic lights, etc. can be used as The target objects are included in the scope of target detection, so that they can identify pedestrians and traffic lights that were previously unperceived, and improve the application value of the YOLOP model in the field of vehicle-assisted driving. like Figure 8 As shown, it is another effect comparison chart of using the YOLOP pre-training model and using the image recognition model trained in this application to perform target detection on the infrared images of the driving scene collected in real time. Through transfer learning, riders, traffic lights, etc. can be used as The target objects are included in the scope of target detection, so that they can identify the previously unperceived riders and traffic lights, and improve the application value of the YOLOP model in the field of vehicle-mounted assisted driving.
[0098] In order to have a more overall understanding of the infrared image-based driving assistance method provided by the embodiments of the present application, please refer to Figure 9 , the implementation process of the driving assistance method is described with an optional specific example. The vehicle-mounted device includes an infrared photographing device and a main control system, wherein the main control system includes a memory and a processor.
[0099] S11, obtain the original visible light image of the driving scene, perform offline augmentation on the original visible light image by using a preset image enhancement mode to obtain an enhanced visible light image, and expand the training sample set through the enhanced visible light image; the image enhancement mode includes ColorJitter , Invert, RandomSolarize, Posterize, GaussianBlur, etc.
[0100] S12, constructing an initial image model; the initial image model may be one of the following: an initial neural network model, a pre-trained neural network model obtained by training the initial neural network model based on an open-source image dataset, a teacher based on the initial neural network model The model and the initial neural network model after compressing the convolution kernels are used as the teacher-student architecture model constructed as the student model. In this embodiment, the initial neural network model is a YOLOP model, and the network structure of the YOLOP model includes: a backbone network Backbone, a neck layer Neck, and a detection head Detect head. The backbone network Backbone is used to extract the feature information of the target in the image, and input the feature map into the neck structure Neck; the neck structure Neck fuses the feature maps of different backbone layers, and fuses the shallow network with the deep network. Strengthen the fusion ability of network features; the detection head Detect head is used to generate bounding boxes and predict the category of the target. The detection head Detect head includes the drivable area segmentation head that detects the drivable area in the driving scene image, and the lane line detection. The lane line segmentation head and the target detection head for detecting the type and position of target objects such as pedestrians and vehicles; the drivable area segmentation head and the lane line segmentation head respectively output a mask matrix similar in size to the driving scene image; target detection The head outputs a matrix of size 1x Kx(5+nc), and the network structure of the target detection head adopts a path aggregation network (PAN) structure composed of multiple feature pyramids (FPN), which is used to extract the feature information of target objects of different scales. For example, two FPNs are used to form a PAN structure, and the feature map information corresponding to the target objects of large, medium and small scales are extracted respectively. The K dimension in the matrix output by the target detection head contains all the large, medium and small scales. The feature map information is also positively related to the size of the input image to be recognized. nc (number of class) refers to the number of classification task categories of the target detection head, which can indicate that the target detection head is used to detect the The number of categories of target objects.
[0101] S13, the initial image model is trained by the training sample set, the initial image model is trained by using the mixed data set expanded by the image enhancement, and the trained image recognition model is obtained by using the transfer learning of the initial image model; The transfer learning strategy of training the initial image model with the enhanced visible light image obtained after the pattern is enhanced realizes the transfer of the initial image pattern that performs well in the target detection of the visible light scene image to the application of the target detection and recognition of the infrared scene image.
[0102] S14, collecting the infrared image of the driving scene in real time through the infrared photographing device and sending it to the main control system;
[0103] S15, perform target detection on the infrared image of the driving scene by using the image recognition model, and output a target detection result of the infrared image of the driving scene; wherein the target detection result includes the drivable area, lane lines and Detection and recognition of the type and location of target objects such as pedestrians and vehicles.
[0104] In the above embodiment, the driving assistance method has at least the following characteristics:
[0105] First, the enhanced visible light image based on image enhancement expands the image data set. First, it can eliminate the problem of small sample over-fitting, and at the same time eliminate the difference between the visible light image data set and the infrared image data set to the greatest extent. After the image recognition model with good performance on the scene is transferred to the infrared image dataset, it can also achieve the purpose of good performance;
[0106] Second, through transfer learning training, the range of target detection can be expanded. Pedestrians, riders, traffic lights, etc. can be included in the scope of target detection as target objects. At the same time, drivable area segmentation and lane line recognition can be included in the scope of target detection. The latter image recognition model can identify pedestrians, riders, and traffic lights that were previously unperceived, and identify drivable areas and lane lines in the image, thereby improving the application value of the initial image model, such as the YOLOP model, in the field of vehicle-assisted driving;
[0107] In an example, taking the YOLOP model as the initial neural network model as an example, the image recognition model obtained after training is used to achieve the comparison of target detection on the infrared images of the driving scene, as shown in the following table:
[0108]
[0109] Among them, Precision refers to the precision, which represents the precision of the model; Recal refers to the recall rate, which is used to represent the recall of the model; F1 is the arithmetic average of Precision and Recal, that is, F1=(1/Precision+1/Recal )/2=2Precision*Recal/(Precision+Recal); mAP (mean Average Precesion) refers to the average value of AP of each category, which is the evaluation index of target detection, where AP (Average precision) refers to the average precision, usually expressed as is the area under the Precision-Recall curve; mIOU (mean Intersection over Union) refers to the average intersection ratio, that is, the average value of each category of IOU; Accuracy refers to the pixel accuracy, that is, the ratio of correctly marked pixels to the total pixels. As shown in the above table, the expanded hybrid dataset refers to the training set obtained by enhancing the original visible light image by using the preset image enhancement mode to obtain the enhanced visible light image, and expanding the original sample set by enhancing the visible light image. Sequence number 1 refers to an image recognition model obtained by training a neural network model with random initial model parameter values ​​using an expanded hybrid dataset (17178 training image sets); sequence number 2 refers to an open source dataset (BDD100k) of a specified size. : ADiverse Driving Video Database with Scalable Annotation Tooling, public driving data set) The image recognition model obtained after training the neural network model with random initial model parameter values; No. 3 refers to the use of the expanded hybrid data set (17178 training images) Image set) is an image recognition model obtained by training a pre-trained neural network model; No. 4 refers to an image recognition model obtained by training a pre-trained neural network model with an open source data set (BDD100k) of a specified size.
[0110]In an example, the target detection result of the target detection performed on the infrared image of the driving scene by the image recognition model may include the following situations: a possible collision risk is predicted according to the relative position of the target object and the vehicle, and a warning is displayed by displaying an alarm. The module provides early warning prompts for collision risks to assist driving operations and improve driving safety; according to the detection results of the vehicle's drivable area, lane lines and target objects, it generates intelligent driving control instructions to achieve intelligent driving assistance; The detection results of the area, lane line and target object can predict the emergency situation during the driving process of the vehicle, and generate control instructions to eliminate the emergency situation when necessary to assist the driving operation and improve the driving safety.
[0111] Third, for the initial image model, the initial neural network model is trained based on the open source image data set to obtain a pre-trained neural network model, or in the training scheme of constructing a teacher-student architecture model, in order to ensure the recognition accuracy of the image recognition model. Under the premise, it can effectively improve the training efficiency.
[0112] see Figure 10 On the other hand, an embodiment of the present application provides a driving assistance device, which includes a sample acquisition module 21 for acquiring an original visible light image of a driving scene, performing image enhancement on the original visible light image to obtain an enhanced visible light image, and according to the enhanced visible light image The image and the original visible light image form a sample set; the labeling module 22 is used to label the targets contained in each image in the sample set to obtain a training set containing target labels; the training module 23 is used to identify the target based on the training set. The initial image model is trained to obtain a trained image recognition model; the target recognition module 24 is used to obtain the infrared image of the driving scene collected in real time by the infrared photographing device, and perform target detection on the infrared image of the driving scene through the image recognition model, The target detection result of the infrared image of the driving scene is output.
[0113] Optionally, the training module 23 is specifically used for constructing an initial neural network model, training the initial neural network model based on an open source image data set to obtain a pre-training neural network model, and using the pre-training neural network model as an initial An image model; the initial image model is trained based on the training set to obtain a trained image recognition model.
[0114] Optionally, the training module 23 is also used to construct an initial neural network model, and use the initial neural network model as a teacher model; compress the number of convolution kernels in the teacher model to obtain a student model; The teacher model and the student model construct an initial image model, the initial image model is trained based on the training set, and the first model parameters of the teacher model are kept unchanged during the training process. The gradient of the consistency loss function between the prediction results of the student model uses a preset gradient descent algorithm to iterate on the second model parameters of the student model; until the preset iteration conditions are met, a trained image recognition model is obtained.
[0115] Optionally, the training module 23 is further configured to use a Gaussian ramp function to set the weighting coefficient of the consistency loss function, and the expression of the weighting coefficient is: The initial value of the weighting coefficient is set to 0, until the t approaches 1, the image recognition model after training is obtained; or, when the number of iterations reaches a preset number of times, the image recognition model after training is obtained; or, Until the consistency loss function converges, a trained image recognition model is obtained.
[0116] Optionally, the labeling module 22 is specifically configured to label the category and position of the target object contained in each image in the sample set, segment and label the drivable area, and segment and label the lane line, and obtain the target label containing the target object. training set.
[0117] Optionally, the training module 23 is further configured to unfreeze the model parameters of the initial image model, and train the initial image model based on the training set until the first iteration termination condition is satisfied, and an intermediate image model is obtained; The model parameters of the backbone network layer in the intermediate image model, the first segmentation detection head that detects the drivable area, and the second segmentation detection head that detects the lane line; based on the training set The intermediate image model is trained until the second iteration termination condition is satisfied, and the trained image recognition model is obtained.
[0118] Optionally, the sample acquisition module 21 is specifically configured to perform image enhancement on the original visible light image using an image enhancement mode based on chromaticity space change to obtain an enhanced visible light image.
[0119] Optionally, the sample acquisition module 21 is further configured to perform image enhancement on the original visible light image by adopting at least one of the following image enhancement modes to obtain an enhanced visible light image: randomly changing the brightness, saturation, A color dither enhancement mode in at least one of the contrasts; an inversion enhancement mode in which each pixel value of the original visible light image is inverted; a pixel whose pixel value exceeds a set threshold in the original visible light image is randomly inverted A randomization enhancement mode; a color scale separation enhancement mode in which preset bit values ​​are randomly reduced for each color channel in the original visible light image; a Gaussian blur enhancement mode in which Gaussian convolution is performed on the original visible light image.
[0120] It should be noted that: the driving assistance device provided by the above embodiment only takes the division of the above program modules as an example in the process of realizing the driving operation assistance during the driving process of the vehicle. The processing distribution is accomplished by different program modules, that is, the internal structure of the device can be divided into different program modules to complete all or part of the method steps described above. In addition, the driving assistance device provided in the above embodiments and the driving assistance method embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
[0121] see Figure 11 On the other hand, an embodiment of the present application further provides an in-vehicle device, comprising a processor 21, an infrared photographing device 11 connected to the processor 21, a memory 22, and a memory 22 that is stored in the memory 22 and can be processed by the processor 21. The computer program executed by the processor 21 implements the driving assistance method described in any embodiment of the present application when the computer program is executed by the processor 21 . The infrared photographing device 11 may be an infrared camera that collects image data of the driving scene in real time during the driving of the vehicle. The number of processors 21 can be one or more; the infrared camera is connected in communication with the processor 21, the vehicle 12V power supply system supplies power to the processor 21, the processor 21 supplies power to the infrared camera, and the infrared camera is installed at the front grille of the vehicle to detect The infrared radiation on the road in front of the vehicle forms an infrared image of the driving scene. The processor 21 is installed in the cab of the vehicle, receives infrared image data collected by the infrared camera, and executes the driving assistance method described in the embodiments of the present application.
[0122] Optionally, the in-vehicle device further includes a display alarm module connected to the processor 21 . The display warning device can present to the customer the simulated video with the target detection result, and give sound, light and/or picture warning of the possible collision danger.
[0123] Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above-mentioned embodiments of the driving assistance method can be implemented, and can achieve the same In order to avoid repetition, the technical effect will not be repeated here. The computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk, or an optical disk.
[0124] It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
[0125] From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present invention.
[0126] The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the technical scope disclosed in the present application can easily think of changes or substitutions. All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Classification and recommendation of technical efficacy words

  • Eliminate overfitting
  • Eliminate differences

Interblock processing method for brain white matter fiber bundle tracking based on riemannian manifold

InactiveCN102609946AEliminate differencesSolving Noise Immunity IssuesImage analysisRegion of interestParabolic partial differential equation
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products