Image processing method and device, electronic equipment, vehicle, storage medium and product
By processing the raw image data from the vehicle camera using an end-to-end full-color night vision model, the problem of poor imaging quality in low-light conditions at night for vehicles has been solved, achieving high-quality color imaging and improved cost-effectiveness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BYD CO LTD
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
AI Technical Summary
Existing vehicles have limited visual perception capabilities in low-light conditions at night, resulting in reduced driving safety. Current night vision technologies have poor imaging quality and high costs, making it difficult to achieve high-quality color imaging.
An end-to-end full-color night vision model is adopted, which processes the raw image data of the vehicle camera through an image encoding layer, a feature extraction layer, and a feature fusion decoding layer to output high-quality color images. Nighttime color imaging is achieved using ordinary vehicle cameras and low-computing-power vehicle computing platforms.
Generating high-quality color images in low-light conditions at night improves the imaging quality of vehicles driving at night, reduces the overall vehicle cost, and achieves highly adaptable full-color night vision.
Smart Images

Figure CN122243841A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and more particularly to an image processing method, apparatus, electronic device, vehicle, storage medium, and product. Background Technology
[0002] Currently, vehicles generally perceive their surroundings and issue correct driving commands through vision. However, in low-light conditions such as at night, a vehicle's visual perception is greatly limited, leading to loss of control and seriously threatening nighttime driving safety.
[0003] Infrared thermal imaging technology can be used to improve a vehicle's perception capabilities in low-light environments at night, and it is widely used in night vision scenarios.
[0004] However, thermal imaging technology has poor image quality, and for example, it cannot perform color imaging in low-light environments such as at night. Summary of the Invention
[0005] This application provides an image processing method, apparatus, electronic device, vehicle, storage medium, and program product to improve the imaging quality of vehicles in low-light environments, thereby at least partially solving the aforementioned technical problems.
[0006] To achieve the above objectives, according to a first aspect of this application, an image processing method is provided, comprising:
[0007] In response to the input of raw image data acquired in the low-light environment of the vehicle, an environmental color image is output based on the image features of the raw image data through an image processing model.
[0008] Optionally, the image processing model includes: an image encoding layer, a feature extraction layer, and a feature fusion decoding layer.
[0009] Optionally, the step of outputting an environmental color image based on the image features of the original image data using an image processing model includes:
[0010] The original image data is encoded through the image coding layer to obtain a multidimensional feature matrix;
[0011] Image features are obtained by extracting features from the multidimensional feature matrix through the feature extraction layer.
[0012] The image features are fused through the feature fusion decoding layer to obtain the environmental color image.
[0013] Optionally, encoding the original image data through the image coding layer to obtain a multidimensional feature matrix includes:
[0014] The image coding layer extracts pixel data from the original image data to obtain an initial multidimensional feature matrix; the pixel data is the data of the camera filter that collects the original image data in the vehicle.
[0015] Based on the bit depth of the analog-to-digital conversion of the camera, the initial multidimensional feature matrix is normalized to obtain the multidimensional feature matrix.
[0016] Optionally, the feature extraction layer includes a detail feature extraction layer, a color feature extraction layer, and a contour feature extraction layer;
[0017] The image features include at least one of the following: detail image features, color image features, and contour image features.
[0018] Optionally, the step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes:
[0019] High-dimensional features are obtained by extracting features from the multidimensional feature matrix through at least one convolutional module in the detail feature extraction layer.
[0020] The high-dimensional features are extracted using at least one residual module in the detail feature extraction layer to obtain detail image features.
[0021] Optionally, the step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes:
[0022] The multidimensional feature matrix is downsampled by the downsampling module in the color feature extraction layer to adjust the size of the multidimensional feature matrix;
[0023] The color image features are obtained by extracting features from the downsampled multidimensional feature matrix through the convolution and residual modules in the color feature extraction layer.
[0024] Optionally, the step of extracting features from the downsampled multidimensional feature matrix using the convolution module and residual module in the color feature extraction layer to obtain color image features includes:
[0025] The convolutional and residual modules in the color feature extraction layer are used to extract features from the downsampled multidimensional feature matrix to obtain the initial color image features.
[0026] The initial color image features are extracted using the channel attention module in the color feature extraction layer to obtain the color image features.
[0027] Optionally, the step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes:
[0028] The target multidimensional feature matrix is downsampled by the downsampling module in the contour feature extraction layer, wherein the target multidimensional feature matrix is obtained by downsampling the multidimensional feature matrix by the downsampling module in the color feature extraction layer.
[0029] By using the residual module, channel attention module, and spatial attention module in the contour feature extraction layer, low-frequency information in the downsampled target multidimensional feature matrix is extracted to obtain contour image features.
[0030] Optionally, fusing the image features through the feature fusion decoding layer to obtain an environmental color image includes:
[0031] The feature fusion decoding layer upsamples the contour image features output by the feature extraction layer, and then fuses the color image features output by the feature extraction layer with the upsampled contour image features to obtain the first fused feature.
[0032] The first fused feature is extracted by the residual module and channel attention module of the feature fusion decoding layer, and the detailed image features output by the feature extraction layer are fused with the first fused feature to obtain an environmental color image.
[0033] Optionally, fusing the detailed image features output by the feature extraction layer with the first fusion feature after feature extraction to obtain an environmental color image includes:
[0034] The detailed image features output by the feature extraction layer are fused with the first fusion feature after feature extraction to obtain the second fusion feature;
[0035] The second fused feature is processed by the feature fusion decoding layer to obtain an environmental color image.
[0036] Optionally, the step of performing feature processing on the second fused feature through the feature fusion decoding layer to obtain an environmental color image includes:
[0037] The second fused feature is extracted using the residual module in the feature fusion decoding layer;
[0038] The spatial transformation module in the feature fusion decoding layer performs spatial transformation on the second fused feature extracted from the feature;
[0039] The second fused feature after spatial transformation is binary-classified using the activation function of the feature fusion decoding layer to obtain an environmental color image.
[0040] Optionally, the method further includes:
[0041] Based on the loss function, optimize at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model.
[0042] Optionally, the loss function includes at least one of the root mean square error function, structural similarity error function, and context loss function.
[0043] Optionally, optimizing at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model according to the loss function includes:
[0044] The contour feature extraction layer is optimized based on the root mean square error function and the structural similarity error function.
[0045] The color feature extraction layer and the contour feature extraction layer are optimized based on the structural similarity error function and the context loss function; and / or,
[0046] The detail feature extraction layer, the color feature extraction layer, and the contour feature extraction layer are optimized based on the root mean square error function, the structural similarity error function, and the context loss function.
[0047] Optionally, after outputting an environmental color image based on the image features of the original image data using an image processing model, the process includes:
[0048] Determine whether the average brightness of the ambient color image is less than a preset brightness threshold;
[0049] If the average brightness of the ambient color image is less than the preset brightness threshold, then the target ambient color image is obtained based on the average brightness of the ambient color image and the target step size.
[0050] Optionally, after determining whether the average brightness of the ambient color image is less than a preset brightness threshold, the method further includes:
[0051] If the average brightness of the ambient color image is greater than or equal to the preset brightness threshold, then the ambient color image is set as the target ambient color image.
[0052] Optionally, obtaining the target environment color image based on the average brightness of the environment color image and the target step size includes:
[0053] Based on the target step size and the average brightness of the ambient color image, adjust the exposure time of the vehicle's camera and / or adjust the gain of the camera;
[0054] The target environment color image is obtained until the average brightness of the image reaches the preset brightness threshold.
[0055] Optionally, the target step size is determined based on the average brightness of the ambient color image and the target brightness.
[0056] Optionally, the step of outputting an environmental color image based on the image features of the original image data using an image processing model includes:
[0057] An environmental color image is output based on the image features of the original image data through the image processing model in the vehicle's onboard computing platform.
[0058] Optionally, the low-light environment includes a nighttime environment, where the ambient light intensity of the low-light environment is less than a preset light intensity threshold.
[0059] According to a second aspect of this application, an image processing apparatus is provided, comprising:
[0060] The processing module is used to respond to the input of raw image data collected in the low-light environment of the vehicle, and output an environmental color image based on the image features of the raw image data through an image processing model.
[0061] Thirdly, this embodiment also provides an electronic device, which includes a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the above-described method.
[0062] Fourthly, this embodiment also provides a vehicle that includes the aforementioned electronic equipment.
[0063] Fifthly, this embodiment also provides a computer-readable storage medium including a computer program, which, when run on an electronic device, causes the electronic device to perform the steps of the above-described method.
[0064] Sixthly, this embodiment also provides a computer program product, including a computer program stored in a computer-readable storage medium; when a processor of an electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program, causing the electronic device to perform the steps of the above method.
[0065] In summary, through the above technical solutions, this application embodiment can directly utilize the on-board image processing model to process the image features of the original image data collected by the vehicle at night, and output an ambient color image in a low-light environment. Compared with the prior art of using infrared for nighttime imaging, this application can use the image processing model to generate color images in a low-light environment. Even in a nighttime driving environment, it can normally generate high-quality ambient color images, thus improving the imaging quality of nighttime color images.
[0066] Other features and advantages of this application will be described in detail in the following detailed description section. Attached Figure Description
[0067] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0068] To gain a more complete understanding of this application and its beneficial effects, the following description will be provided in conjunction with the accompanying drawings, wherein the same reference numerals in the following description denote the same parts.
[0069] Figure 1-1 This is a first schematic diagram of the background technology provided in this application;
[0070] Figure 1-2 This is a second schematic diagram of the background technology provided in this application;
[0071] Figure 1-3 This is a third schematic diagram of the background technology provided in this application;
[0072] Figure 2 This is a schematic diagram of an image processing system provided in an exemplary embodiment of this application;
[0073] Figure 3 This is a first schematic diagram of the image processing flow provided in an exemplary embodiment of this application;
[0074] Figure 4 This is a schematic diagram of the image processing model structure provided in an exemplary embodiment of this application;
[0075] Figure 5 This is a schematic diagram of the detailed feature extraction layer structure provided in an exemplary embodiment of this application;
[0076] Figure 6-1 This is a first schematic diagram of the color feature extraction layer structure provided in an exemplary embodiment of this application;
[0077] Figure 6-2 This is a second schematic diagram of the color feature extraction layer structure provided in an exemplary embodiment of this application;
[0078] Figure 7-1 This is a first schematic diagram of the contour feature extraction layer structure provided in an exemplary embodiment of this application;
[0079] Figure 7-2 This is a second schematic diagram of the contour feature extraction layer structure provided in an exemplary embodiment of this application;
[0080] Figure 8 This is a schematic diagram of the feature fusion decoding layer structure provided in an exemplary embodiment of this application;
[0081] Figure 9 This is a first schematic diagram of image processing output result comparison provided in an exemplary embodiment of this application;
[0082] Figure 10 This is a second schematic diagram showing the comparison of image processing output results provided in an exemplary embodiment of this application;
[0083] Figure 11 This is a second schematic diagram of the image processing flow provided in an exemplary embodiment of this application;
[0084] Figure 12 This is a schematic diagram of an image processing apparatus provided in an exemplary embodiment of this application;
[0085] Figure 13 This is a schematic diagram of the architecture of an electronic device provided in an exemplary embodiment of this application. Detailed Implementation
[0086] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the protection scope of this application.
[0087] In light of the above background description, whether it is human driving or autonomous driving, it almost always relies on vision to perceive the external environment and make correct driving commands. In low-light environments such as at night, the visual perception ability of both people and vehicles is greatly limited, which can lead to loss of vehicle control and seriously threaten the safety of nighttime driving.
[0088] To address this, various night vision methods have been proposed to improve vehicle perception at night. Among them, infrared thermal imaging technology, which utilizes the different temperatures (infrared thermal radiation intensities) of different objects for imaging, is widely used in night vision scenarios to enhance vehicle perception in low-light environments. However, thermal imaging technology, which relies on temperature difference imaging, suffers from several drawbacks. First, it loses the target's color information, making color imaging impossible. Second, in high-temperature environments, the imaging contrast is low, hindering clear target imaging. Therefore, various low-light full-color night vision imaging methods have been proposed to achieve high-quality color imaging at night, improving the perception capabilities of vehicles and drivers in low-light environments.
[0089] For example, such as Figure 1-1 As shown, the lens's aperture design reaches 0.6, significantly increasing the amount of light entering the lens and improving the intensity of signal light collected by the camera in low-light environments at night, thus enhancing nighttime image quality. However, this specially designed lens requires multiple lens elements, resulting in a larger size, which is not conducive to actual installation and fixation on vehicles. Moreover, the lens's high cost will increase the vehicle's overall cost, hindering the promotion and application of the technology.
[0090] For example, such as Figure 1-2 As shown, different convolutional network modules are used to convolve the low-light image processed by traditional ISP, and different loss functions are used to iteratively optimize the different convolutional network modules. All convolutional modules work in series, and finally output a color image with normal brightness. However, using an image processed by ISP for enhancement cannot avoid introducing errors from the ISP process, and the series operation of modules also leads to the accumulation and amplification of errors between modules, ultimately resulting in poor low-light enhancement effect.
[0091] For example, such as Figure 1-3 As shown, using sensor raw data as input, a low-light raw image is transformed into a normal-brightness RGB image through a cascaded application of traditional and deep learning methods. Pixel fusion, black level correction, and image brightness adjustment are traditional image processing algorithms, while the denoising network and adaptive brightness adjustment network are deep learning models. However, the cascaded use of traditional and deep learning methods still suffers from error accumulation, resulting in a lower upper limit for the method. Furthermore, the combined use of multiple algorithms places high demands on the computing power of the platform, which is not conducive to real-time deployment and inference on the vehicle side.
[0092] To at least address the aforementioned issues, this application proposes an image processing method, system, electronic device, computer-readable storage medium, and computer program product that can enhance the imaging brightness and contrast of vehicle-mounted cameras in low-light environments such as at night, without the need for specially designed hardware, thereby achieving high-quality color imaging in vehicles at night.
[0093] The image processing system in this application, such as Figure 2 As shown, the system consists of a visible light color vehicle camera 10, an on-board computing platform 20, and an image processing model 30. The main function of the visible light color vehicle camera 10 is to convert the light signals in the environment into electrical signals, amplify the analog signals output by the photosensitive chip, perform high-bit-depth analog-to-digital conversion, and finally output the raw data without traditional ISP processing to the on-board computing platform 20. After being deserialized by the platform, the data is sent to the image processing model 30, which runs in modules such as NPU or GPU, for calculation. Finally, a color image with normal brightness is output for display on the monitor.
[0094] It is worth noting that the image processing model 30 in this application can be an end-to-end full-color night vision model. It can be understood that through the end-to-end model structure design, it is possible to train using low-light raw data (Raw data) and normal brightness color images (RGB images) to complete the end-to-end mapping from low-light Raw to normal brightness RGB images. This enables a single model to solve both ISP and low-light enhancement tasks. It can be deployed and run on high-computing-power vehicle computing platforms (such as Orin) or on low-computing-power vehicle platforms (such as Black Sesame A1000L), achieving highly adaptable full-color night vision for vehicles.
[0095] In one embodiment, the image processing method of this application, such as Figure 3 As shown, the following steps may be included:
[0096] S10, in response to the input of the original image data collected in the low light environment of the vehicle, the environmental color image is output based on the image features of the original image data through the image processing model.
[0097] In this embodiment, the vehicle can control the onboard camera to collect raw image data of its surrounding environment.
[0098] Specifically, the raw image data can be the corresponding RAW data. RAW data is the raw light signal data directly captured by the camera's image sensor (such as CMOS or CCD). This data is stored in the form of a file before any processing or compression. RAW data contains complete information captured from each pixel on the image sensor, including color, brightness, etc.
[0099] The vehicle can respond to the raw image data collected in low-light environments and output an environmental color image based on the image features of the raw image data through an image processing model.
[0100] In one specific embodiment, the low-light environment may include a nighttime environment. In this embodiment, the light intensity of the low-light environment is less than a preset light intensity threshold. The preset light intensity threshold can characterize the critical light intensity for executing the image processing method in this application. That is, if the ambient light intensity of the current environment where the vehicle is located is greater than the threshold, it means that the current ambient light will not affect the imaging effect, and the image processing method in this embodiment can be omitted.
[0101] It should be noted that in this embodiment, in low-light environments such as nighttime, ordinary vehicle-mounted cameras can output the raw image data they collect to the vehicle-mounted computing platform. The image processing model deployed in the vehicle-mounted computing platform can process the raw image data collected by the camera and output a high-quality color image. That is, this embodiment does not require high-performance sensors such as infrared cameras or high-sensitivity cameras, but only requires common vehicle-mounted cameras to complete true-color imaging in nighttime environments.
[0102] Therefore, in this embodiment, the image features of the original image data collected by the vehicle at night can be processed directly using the in-vehicle image processing model to output the ambient color image of the nighttime environment. Compared with the use of infrared for nighttime imaging in the prior art, this application can use the image processing model to generate color images in low light environments. Even in the nighttime driving environment, high-quality ambient color images can be generated normally, thus improving the imaging quality of nighttime color images.
[0103] In one embodiment, the image processing model of this application includes: an image encoding layer, a feature extraction layer, and a feature fusion decoding layer.
[0104] In this embodiment, the image processing model can specifically be an end-to-end full-color night vision model, which includes an image encoding layer, a feature extraction layer, and a feature fusion decoding layer.
[0105] Specifically, for example, such as Figure 4 As shown, the feature extraction layer in this embodiment includes a detail feature extraction layer, a color feature extraction layer, and a contour feature extraction layer. That is, the image processing model in this embodiment can be a three-layer structure model, including an image encoding layer, a detail feature extraction layer, a color feature extraction layer, a contour feature extraction layer, and a feature fusion decoding layer, ultimately outputting a high-quality normal brightness color image.
[0106] As can be seen, this embodiment references the principles of human visual perception, employs a multi-layer feature extraction framework to extract features from image data at different levels and dimensions, and outputs high-quality color images through feature fusion and encoding methods. Through the channel separation encoding module, it can directly process the raw image data of the sensor chip (i.e., the vehicle-mounted camera) without the need for traditional ISP processing, outputting RGB color images. Furthermore, the end-to-end full-color night vision model structure in this embodiment is specially optimized, allowing it to be deployed and run on both high-computing-power vehicle computing platforms and low-computing-power vehicle platforms, achieving highly adaptable full-color night vision. This embodiment optimizes the full-color night vision effect of the deep learning model by combining traditional image processing methods with deep learning model methods.
[0107] In one embodiment, S10 above, "outputting an environmental color image based on the image features of the original image data using an image processing model," may include:
[0108] S101, the original image data is encoded through the image coding layer to obtain a multidimensional feature matrix;
[0109] S102, the feature extraction layer extracts features from the multidimensional feature matrix to obtain image features;
[0110] S103, the image features are fused through the feature fusion decoding layer to obtain an environmental color image.
[0111] In this embodiment, the image encoding layer in the image processing model can encode the original image data to obtain a multi-dimensional feature matrix. Then, the feature extraction layer in the image processing model can extract features from the multi-dimensional feature matrix to obtain image features. Finally, the feature fusion decoding layer in the image processing model fuses the image features to obtain an environmental color image.
[0112] In a specific embodiment, S101 above, "encoding the original image data through the image coding layer to obtain a multidimensional feature matrix," may include:
[0113] S1011, Pixel data is extracted from the original image data through the image coding layer to obtain an initial multidimensional feature matrix; the pixel data is the data of the camera filter that collects the original image data in the vehicle.
[0114] S1012, based on the bit depth of the analog-to-digital conversion of the camera, the initial multidimensional feature matrix is normalized to obtain a multidimensional feature matrix.
[0115] In this embodiment, the image coding layer can extract the pixel values under the same color filter in the input raw image (i.e., the raw image data in this embodiment) according to the arrangement rules of the camera color filters, normalize them, and re-encode them into a multi-channel data matrix.
[0116] Specifically, for example, for a camera with color filters arranged in RGGB order, the pixels under the red filter, the green filter pixels next to the red filter, the green filter pixels next to the blue filter, and the pixels under the blue filter in the original Raw image (H, W, 1) are extracted to form an initial multidimensional feature matrix (H / 2, W / 2, 4) with its size halved and 4 channels.
[0117] Furthermore, the multidimensional feature matrix can be normalized based on the bit depth of the analog-to-digital converter inside the camera. For example, for a bit depth of 12 bits, 4096 (2^12) is used to normalize the data. Finally, the image coding layer can encode the original raw image with an input size of (H, W, 1) into a multidimensional feature matrix with a size of (H / 2, W / 2, 4) and a value range of [0, 1], which is used for subsequent multi-layer feature extraction and fusion.
[0118] In one embodiment, the feature extraction layer includes an image detail feature extraction layer, a color feature extraction layer, and a contour feature extraction layer;
[0119] The image features include at least one of image detail features, image color features, and image contour features.
[0120] In this embodiment, as Figure 4 As shown, the feature extraction layer in the image processing model includes a detail feature extraction layer, a color feature extraction layer, and a contour feature extraction layer. That is, the image processing model in this embodiment can be a three-layer structure model, including an image encoding layer, a detail feature extraction layer, a color feature extraction layer, a contour feature extraction layer, and a feature fusion decoding layer, and finally outputs a high-quality normal brightness color image.
[0121] In one embodiment, S102 above, "extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features", may include:
[0122] S1021, High-dimensional features are obtained by extracting features from the multidimensional feature matrix through at least one convolutional module in the detail feature extraction layer;
[0123] S1022, through at least one residual module in the detail feature extraction layer, feature extraction is performed on the high-dimensional features to obtain image detail features.
[0124] In this embodiment, the detail feature extraction layer performs convolution on the encoded multidimensional data containing all information to extract high-frequency detail information from the original Raw data.
[0125] Specifically, for example, the overall structure of this layer is as follows: Figure 5 As shown, the multidimensional feature matrix output by the image coding layer is input into the detail feature extraction layer. After input, it is initially extracted into high-dimensional features by the first convolution module 3021 of different sizes and numbers in the detail feature extraction layer. Then, it passes through a group residual module 3022 to further extract the initially extracted high-dimensional features into higher-dimensional features (i.e., image detail features in this embodiment) for subsequent feature fusion.
[0126] In this embodiment, the grouped residual module can perform grouped convolution and residual connections on the input multidimensional features according to the number of channels, avoiding the loss of high-frequency detail signals caused by convolution operations, and extracting as much detail information as possible from the raw data. In addition, the convolution module in the color feature extraction layer can be three or more, and the grouped residual module can also be divided into other numbers, such as 2 groups or 4 groups. For example, there are 2 convolution modules and 3 feature groups. There is no specific limitation on this.
[0127] In one embodiment, S102 above, "extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features", may include:
[0128] S1023, the multidimensional feature matrix is downsampled by the downsampling module in the color feature extraction layer to adjust the size of the multidimensional feature matrix;
[0129] S1024, through the convolution module and residual module in the color feature extraction layer, feature extraction is performed on the downsampled multidimensional feature matrix to obtain image color features.
[0130] In this embodiment, the color feature extraction layer extracts color information with coarser detail by downsampling the multidimensional feature matrix output by the image coding layer and then extracting features.
[0131] Specifically, for example, the overall structure of the color feature extraction layer is as follows: Figure 6-1 As shown, the first downsampling module 3031 downsamples the input multidimensional feature matrix by setting the convolution stride to 2, and adjusts the size of the input features to (H / 4, W / 4, c). The downsampled multidimensional features are then passed through a second convolution module 3032 and a grouping residual module 3022 to fully extract the color information in the image.
[0132] In a specific embodiment, the above-mentioned "extracting features from the initial color image features through the channel attention module in the color feature extraction layer to obtain color image features" may include:
[0133] The convolutional and residual modules in the color feature extraction layer are used to extract features from the downsampled multidimensional feature matrix to obtain the initial color image features.
[0134] The initial color image features are extracted using the channel attention module in the color feature extraction layer to obtain the color image features.
[0135] It should be noted that, in this embodiment, since color information has a strong correlation with channels, this embodiment can use a channel attention module 3033 to better extract color features, the structure of which is as follows: Figure 6-2 As shown, in order to more accurately extract color-related signals from multi-dimensional features, image color features are obtained, and finally the features output from the channel attention module 3033 are used for fusion processing with the output features of the next level.
[0136] Specifically, for example, such as Figure 6-1 As shown, in this embodiment, the second convolution module 3032 and the group residual module 3022 in the color feature extraction layer extract features from the downsampled multidimensional feature matrix to obtain the initial color image features. Then, the channel attention module 3033 in the color feature extraction layer can perform deep feature extraction on the initial color image features to obtain the color image features.
[0137] In one embodiment, S102 above, "extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features", may include:
[0138] S1026, the target multidimensional feature matrix is downsampled by the downsampling module in the contour feature extraction layer, wherein the target multidimensional feature matrix is obtained by downsampling the multidimensional feature matrix by the downsampling module in the color feature extraction layer;
[0139] S1027, through the residual module, channel attention module and spatial attention module in the contour feature extraction layer, low-frequency information in the downsampled target multidimensional feature matrix is extracted to obtain image contour features.
[0140] In this embodiment, the contour feature extraction layer further downsamples the multidimensional features after downsampling by the first downsampling module 3031 in the color feature extraction layer, and then performs multidimensional feature extraction to extract lower frequency information in the original data, such as contour, overall contrast and brightness.
[0141] Specifically, for example, the structure of the contour feature extraction layer is as follows: Figure 7-1 As shown, the input target multidimensional feature matrix also first passes through a second downsampling module 3041, which reduces the feature size by half to (H / 8, W / 8, c). Then, in order to fully extract information in the channels and space, the target multidimensional feature matrix passes through the grouped residual module 3022, the channel attention module 3033, and the spatial attention module 3042 in sequence to fully extract various low-frequency information in the data.
[0142] The spatial attention module 3042 is introduced to extract useful information in space and reduce interference from other noise information. Its structure is as follows: Figure 7-2 As shown, in the process of connecting different modules, in order to avoid information loss during the convolution process, the input and output information of the module are fused before being input into the next module for processing.
[0143] In one embodiment, S103 above, "fusing the image features through the feature fusion decoding layer to obtain an environmental color image," may include:
[0144] S1031, the image contour features output by the feature extraction layer are upsampled by the upsampling module in the feature fusion decoding layer, and the image color features output by the feature extraction layer are fused with the upsampled image contour features to obtain the first fused feature;
[0145] S1032, through the residual module and channel attention module of the feature fusion decoding layer, the first fused feature is extracted, and the image detail features output by the feature extraction layer are fused with the first fused feature after feature extraction to obtain an environmental color image.
[0146] In this embodiment, as Figure 4 As shown, in the feature extraction layer of the image processing model, the image detail feature extraction layer is located in the first layer, the contour feature extraction layer is located in the bottom layer, and the color feature extraction layer is located in the middle. Based on this, since the image detail feature extraction layer in the first layer has the largest size and the contour feature extraction layer in the bottom layer has the smallest size, fusion can be performed from bottom to top. That is, the image contour features output by the contour feature extraction layer are first fused with the image color features output by the color feature extraction layer, and then the fused image features are fused with the image detail features output by the image detail feature extraction layer to obtain the environmental color image.
[0147] In a specific embodiment, S1032 above, "fusing the image detail features output by the feature extraction layer with the first fusion feature after feature extraction to obtain an environmental color image," may include:
[0148] Step a: Fuse the image detail features with the first fusion feature after feature extraction to obtain the second fusion feature;
[0149] Step b: The second fused feature is processed through the feature fusion decoding layer to obtain an environmental color image.
[0150] In this embodiment, the feature fusion decoding layer upsamples the output features of different layers and then performs additive fusion to fuse different types of feature information, finally outputting a three-channel color image.
[0151] The structure of the feature fusion decoding layer is as follows: Figure 8 As shown, firstly, the image contour features output by the contour feature extraction layer are upsampled by an upsampling module 3051 to adjust the size of the image contour features so that the image contour features can be fused with the image color features output by the color feature extraction layer. After upsampling the image contour features, they can be added and fused with the image color features output by the color feature extraction layer to obtain the fused features, which is the first fused feature in this embodiment.
[0152] Furthermore, the feature fusion decoding layer can fuse the detailed image features output by the feature extraction layer with the first fusion feature after feature extraction to obtain the second fusion feature. Then, the feature fusion decoding layer performs feature processing on the second fusion feature to obtain the environmental color image.
[0153] In a specific embodiment, step b above, "performing feature processing on the second fused feature through the feature fusion decoding layer to obtain an environmental color image," may include:
[0154] Step b01: Extract features from the second fused feature using the residual module in the feature fusion decoding layer;
[0155] Step b02: The spatial transformation module in the feature fusion decoding layer performs spatial transformation on the second fused feature extracted from the feature;
[0156] Step b03: The second fused feature after spatial transformation is binary classified using the activation function of the feature fusion decoding layer to obtain the environmental color image.
[0157] In this embodiment, the fused features (i.e., the first fused features in this embodiment) are sequentially passed through the grouped residual module 3022 and the channel attention module 3033 for further feature extraction. The first fused features after feature extraction are further upsampled by the upsampling module 3051 and added to the image detail features output by the detail feature extraction layer. The fused features (i.e., the second fused features in this embodiment) are then further extracted and spatially transformed using the residual module 3022 and the spatial transformation module 3052. After passing through an activation function 3053, a three-channel color RGB image (i.e., the ambient color image in this embodiment) is output.
[0158] It is understood that in the embodiments of this application, the image encoding layer, feature extraction layer and feature fusion decoding layer can reuse the same modules. For example, the contour feature extraction layer in the feature extraction layer can reuse the channel attention module 3033 in the color feature extraction layer and the grouping residual module 3022 in the detail feature extraction layer, which can effectively reduce the model size and improve the image processing efficiency.
[0159] As can be seen, compared with existing technologies that require infrared cameras or high-sensitivity cameras and high-light-gathering lenses for night vision imaging, the embodiments of this application only rely on ordinary vehicle-mounted visible light cameras and low-computing-power vehicle-mounted computing platforms to achieve vehicle-mounted full-color night vision. By using an end-to-end night vision model to process and enhance the raw night image data collected by ordinary vehicle-mounted cameras, and outputting color images with normal brightness, this application can achieve a night-time true-color imaging frame rate of ≥20fps on the vehicle side. This invention can replace the existing vehicle night vision method that uses near-infrared plus infrared illumination, reduce the overall vehicle cost, improve the night-time imaging effect, and achieve cost reduction and efficiency improvement for the entire vehicle.
[0160] Furthermore, compared to existing technologies that utilize modular concatenation of deep learning models with different functions to achieve night vision, this application employs an end-to-end full-color night vision model that responds directly to the input of raw nighttime data from the camera, outputting a color image with normal brightness. The end-to-end model structure design allows for parallel processing of various functions within the model, and the entire model gradient can be backpropagated. Multiple tasks, including denoising, depigmentation, color restoration, and brightness correction, can be optimized simultaneously, ultimately achieving global optimization and resulting in superior full-color night vision performance. Simultaneously, the end-to-end model structure design avoids numerous intermediate process evaluations, reducing the complexity of model development.
[0161] In one embodiment, the image processing method of this application may further include:
[0162] S20, optimize at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model according to the loss function.
[0163] The loss function combination includes the root mean square error function, the structural similarity error function, and the context loss function.
[0164] It should be noted that in this embodiment, the image processing model can use multiple image quality assessment loss functions to comprehensively guide the optimization of the model, including root mean square error (MSE), structural similarity error (SSIM), and contextual loss.
[0165] The root mean square error is the error between the model output and the ground truth image for each pixel. Its calculation formula is shown in Equation (1), where y_pred is the pixel data of the ambient color image output by the model, and y_gt is the pixel data of the ground truth image.
[0166] The structural similarity error takes into account the differences in brightness, contrast, and local structure between the environmental color image output by the model and the ground truth image, and can reflect the relationship between local pixels, as shown in Equation (2), where μ x and μ y σ represents the average values of x and y, respectively. x and σ y σxy represents the standard deviation of x and y respectively, and c1 and c2 represent the covariance of x and y. c1 and c2 are constants to avoid systematic errors caused by a denominator of 0. It can be understood that SSIM is a number between 0 and 1. The larger the value, the closer the model output image is to the ground truth image and the better the image quality.
[0167] Context loss calculates the distribution difference between the features of the model output image and the features of the ground truth image at the feature level, thereby evaluating the quality and effect of the model output, as shown in equation (3) below, where CX ij For x i Features and y j The similarity between features is usually measured by the cosine distance between two features. When the feature distributions of two images are similar, their quality is also relatively close. Since it measures the similarity of feature distributions, strict pixel alignment is not required, thus reducing the impact of alignment errors in the dataset on model performance.
[0168]
[0169] Since different layers in the model need to perform different functions, this embodiment can maximize the effect of each layer by using a combination of different loss functions.
[0170] In a specific embodiment, S20 above, "optimizing at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model according to the loss function," may include:
[0171] S201, the contour feature extraction layer is optimized based on the root mean square error function and the structural similarity error function;
[0172] S202, optimize the color feature extraction layer and the contour feature extraction layer according to the structural similarity error function and the context loss function; and / or,
[0173] S203, optimize the detail feature extraction layer, the color feature extraction layer, and the contour feature extraction layer based on the root mean square error function, the structural similarity error function, and the context loss function.
[0174] In this embodiment, since different layers in the model need to perform different functions, a combination of different loss functions can be used to maximize the effect of each layer. The loss function in this embodiment may include the root mean square error function (MSE), the structural similarity error function (SSIM), and the contextual loss function.
[0175] Specifically, for example, the weighted sum of MSE and SSIM can be used to jointly optimize the contour feature extraction layer; the weighted sum of SSIM and contextual loss can be used to optimize the color feature extraction layer and the contour feature extraction layer; and the weighted sum of MSE, SSIM and contextual loss can be used to optimize the detail feature extraction layer, the color feature extraction layer and the contour feature extraction layer.
[0176] The actual inference performance of the trained image processing model in this embodiment is as follows: Figure 9 As shown, the image quality evaluation indicators are as follows: Figure 10 As shown, where, Figure 9 Figure a represents the model's Raw image (since the Raw image cannot be displayed directly, it is processed using a traditional ISP for display), Figure a represents the environmental color image output by the image processing model in this embodiment, and Figure c represents the ground truth image. The results of subjective evaluation and quantitative evaluation show that the end-to-end full-color night vision model in this embodiment has good ISP and image enhancement effects.
[0177] As can be seen, in this embodiment, the image processing model, through an end-to-end structural design, can be trained using low-light Raw data and normal-brightness color images (RGB images), completing the end-to-end mapping from low-light Raw to normal-brightness RGB images. This allows a single model to solve both ISP and low-light enhancement tasks. Furthermore, this embodiment combines a multi-dimensional loss function design that incorporates both human and machine perception effects, enabling the model to be optimized towards a mutually optimal direction during iterative training, thus meeting the different image requirements of both humans and machines. This end-to-end model structure design allows various functions to be processed in parallel within the model, and the entire model gradient can be backpropagated. Multiple tasks, including denoising, depigmentation, color restoration, and brightness correction, can be optimized simultaneously, ultimately achieving global optimization and resulting in better full-color night vision performance. Simultaneously, the end-to-end model structure design avoids numerous intermediate process evaluations, reducing the complexity of model development.
[0178] In one embodiment, the image processing method of this application may further include:
[0179] S30, determine whether the average brightness of the ambient color image is less than a preset brightness threshold;
[0180] S40, if the average brightness of the ambient color image is less than a preset brightness threshold, then the target ambient color image is obtained based on the average brightness of the ambient color image and the target step size.
[0181] S50, if the average brightness of the ambient color image is greater than or equal to a preset brightness threshold, then the ambient color image is set as the target ambient color image.
[0182] It should be noted that, in this embodiment, as Figure 4 As shown, the main function of the visible light color vehicle camera 10 is to convert the light signal in the environment into an electrical signal, and after amplifying the analog signal output by the photosensitive chip, perform high-bit-depth analog-to-digital conversion, and finally output the raw raw data without traditional ISP processing for end-to-end full-color night vision model calculation. In this embodiment, in order to avoid interference from traditional ISP to raw data, the control algorithm of the vehicle camera 10 only retains two functions: automatic exposure 101 and automatic gain 102.
[0183] Based on this, such as Figure 11 As shown, in this embodiment, the vehicle can determine whether the average brightness g_mean of the ambient color image output by the image processing model is less than the preset brightness threshold T_light.
[0184] If the average brightness g_mean of the ambient color image is determined to be less than the preset brightness threshold T_light, it means that the brightness of the ambient color image is still relatively dark. Then, the target ambient color image can be obtained based on the average brightness of the ambient color image and the target step size.
[0185] If the average brightness g_mean of the ambient color image is determined to be greater than or equal to the preset brightness threshold T_light, the ambient color image can be directly set as the target ambient color image.
[0186] In a specific embodiment, S40 above, "obtaining the target environment color image based on the average brightness of the environment color image and the target step size" may include:
[0187] S401, adjust the exposure time of the vehicle's camera and / or adjust the gain of the camera based on the target step size and the average brightness of the ambient color image;
[0188] S402, until the average brightness of the image reaches the preset brightness threshold, to obtain the target environment color image.
[0189] In this embodiment, the vehicle can adjust the exposure time and / or gain of the vehicle's camera based on the average brightness of the ambient color image and the target step size, so that the average brightness of the image reaches a preset brightness threshold, thereby obtaining a normally exposed target ambient color image.
[0190] Specifically, for example, the camera's exposure time t is adjusted by statistically analyzing the average brightness g_mean or brightness distribution of the raw image (i.e., the ambient color image in this embodiment). If the average brightness g_mean in the raw image is less than the threshold T_light, the exposure time is increased by a target step size s until the average brightness of the image is greater than T_light or the exposure time reaches the maximum allowable exposure time t_max. The target step size s can be a fixed exposure time, such as 2ms. Then, following a hill-climbing method, the camera's exposure time is gradually adjusted until the brightness of the raw image reaches a preset brightness threshold or the maximum exposure time is reached.
[0191] In one embodiment, the target step size is determined based on the average brightness of the ambient color image and the target brightness.
[0192] In this embodiment, the target step size s can also be dynamically adjusted according to the difference between the average brightness of the Raw image and the target brightness, as shown in equation (4). Furthermore, the exposure time t can be designed with proportional, integral, and derivative factors according to the PID control algorithm to adjust the average brightness of the Raw image to T_light as soon as possible, or to reach the maximum exposure time.
[0193]
[0194] To avoid image quality degradation caused by motion blur and thermal noise due to excessively long exposure times, the minimum exposure time t_min and the maximum exposure time t_max in this embodiment can be 1ms and 10ms, respectively. If the application scenario is a low-speed scenario, such as a parking scenario, t_max can be appropriately extended, but it cannot exceed 30ms.
[0195] In addition, in this embodiment, the camera gain G can be adjusted by statistically analyzing the average brightness g_mean or brightness distribution of the raw image to bring the brightness of the raw image to a suitable value. If the average brightness g_mean of the raw image is less than the threshold T_light, the camera gain can be increased by a target step size s_gain until the average brightness of the raw image reaches T_light or the gain reaches the maximum allowable value G_max. The target step size s_gain can be a fixed value, such as 1dB. Then, the gain is gradually adjusted according to the hill-climbing method until the brightness of the raw image reaches a suitable condition or the maximum gain is reached.
[0196] Alternatively, the target step size s_gain can be dynamically adjusted based on the difference between the average brightness of the raw image and the target brightness, as shown in Equation 5 below. Here, the gain G can be designed using a PID control algorithm to determine the proportional, integral, and derivative factors, in order to adjust the average brightness of the raw image to T_light as quickly as possible, or to achieve the maximum gain G_max. To avoid excessive gain leading to severe noise contamination in the image and consequently degrading image quality, the maximum gain value g_max adjusted in this embodiment must ensure that the signal-to-noise ratio (SN) of the raw image is greater than 40dB.
[0197]
[0198] It is worth noting that when the above two camera control algorithms are applied in this embodiment, the automatic gain and automatic exposure adjustments can be performed alternately. For example, based on the brightness of the current frame, the gain is increased or decreased by one step; in the next frame, the brightness of the image is calculated, and according to the above calculation method, the exposure time is extended or decreased by one step until the brightness of the raw image reaches the required value, or both the gain and exposure time reach their maximum values. Furthermore, to avoid motion blur caused by excessively long exposure times, the frequency of automatic gain adjustment can be greater than the frequency of automatic exposure adjustment, such as adjusting the automatic gain twice and then adjusting the automatic exposure once.
[0199] In addition, to increase the speed of image brightness adjustment, automatic gain and automatic exposure can be adjusted within the same frame according to different weights. For example, based on the brightness of the current frame, gain and exposure time can be adjusted simultaneously, with the step size of each adjusted according to a fixed weight, thereby simultaneously adjusting exposure time and gain to quickly adjust the brightness in the raw image. If the focus is on gain adjustment, the weight ratio of gain is set to 0.6, and the step size weight of exposure time is set to 0.4. If the focus is on exposure time adjustment, the weight ratio of gain is set to 0.4, and the step size weight of exposure time is set to 0.6.
[0200] In this way, after the vehicle-mounted camera 10 collects the raw data, it is transmitted to the vehicle-mounted computing platform 20. After being deserialized by 20, the data is sent to the end-to-end full-color night vision model running in modules such as NPU or GPU for calculation. Finally, a color image with normal brightness is output for display on the monitor.
[0201] In one embodiment, S10 above, "outputting an environmental color image based on the image features of the original image data using an image processing model," may include:
[0202] An environmental color image is output based on the image features of the original image data through the image processing model in the vehicle's onboard computing platform.
[0203] In this embodiment, as Figure 4 As shown, the vehicle-mounted computing platform 20 may be equipped with an image processing model 30, which is used to respond to the input of image features of the original image data and output an environmental color image. Refer to the above embodiment for further details.
[0204] As can be seen, in this embodiment, by prioritizing gain control and conservatively controlling exposure time, the night vision function of the deep learning model can be maximized, effectively suppressing motion blur, insufficient dynamic range, and overexposure problems in night imaging in traditional ISP methods, and improving the quality of the system's night imaging. Compared with existing technologies that use images processed by ISP for end-to-end low-light enhancement, this embodiment does not require ISP processing, resulting in higher efficiency and speed. The end-to-end full-color night vision model integrates the tasks of ISP into the model for processing, retaining only the camera control methods that affect signal strength, avoiding the errors introduced by traditional ISP processing, reducing data processing time, and suppressing motion blur and insufficient dynamic range problems, thus achieving high-quality full-color night vision imaging.
[0205] Accordingly, embodiments of this application also provide an image processing apparatus, such as... Figure 12 As shown, the device may include:
[0206] The processing module 1001 is used to respond to the input of the original image data collected in the low light environment of the vehicle, and output an environmental color image based on the image features of the original image data through an image processing model.
[0207] Optionally, the image processing model includes: an image encoding layer, a feature extraction layer, and a feature fusion decoding layer.
[0208] Optionally, the processing module 1001 is also used for:
[0209] The original image data is encoded through the image coding layer to obtain a multidimensional feature matrix;
[0210] Image features are obtained by extracting features from the multidimensional feature matrix through the feature extraction layer.
[0211] The image features are fused through the feature fusion decoding layer to obtain the environmental color image.
[0212] Optionally, the processing module 1001 is also used for:
[0213] The image coding layer extracts pixel data from the original image data to obtain an initial multidimensional feature matrix; the pixel data is the data of the camera filter that collects the original image data in the vehicle.
[0214] Based on the bit depth of the analog-to-digital conversion of the camera, the initial multidimensional feature matrix is normalized to obtain the multidimensional feature matrix.
[0215] Optionally, the feature extraction layer includes a detail feature extraction layer, a color feature extraction layer, and a contour feature extraction layer;
[0216] The image features include at least one of the following: detail image features, color image features, and contour image features.
[0217] Optionally, the processing module 1001 is also used for:
[0218] The step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes:
[0219] High-dimensional features are obtained by extracting features from the multidimensional feature matrix through at least one convolutional module in the detail feature extraction layer.
[0220] The high-dimensional features are extracted using at least one residual module in the detail feature extraction layer to obtain detail image features.
[0221] Optionally, the processing module 1001 is also used for:
[0222] The step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes:
[0223] The multidimensional feature matrix is downsampled by the downsampling module in the color feature extraction layer to adjust the size of the multidimensional feature matrix;
[0224] The color image features are obtained by extracting features from the downsampled multidimensional feature matrix through the convolution and residual modules in the color feature extraction layer.
[0225] Optionally, the processing module 1001 is also used for:
[0226] The convolutional and residual modules in the color feature extraction layer are used to extract features from the downsampled multidimensional feature matrix to obtain the initial color image features.
[0227] The initial color image features are extracted using the channel attention module in the color feature extraction layer to obtain the color image features.
[0228] Optionally, the processing module 1001 is also used for:
[0229] The step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes:
[0230] The target multidimensional feature matrix is downsampled by the downsampling module in the contour feature extraction layer, wherein the target multidimensional feature matrix is obtained by downsampling the multidimensional feature matrix by the downsampling module in the color feature extraction layer.
[0231] By using the residual module, channel attention module, and spatial attention module in the contour feature extraction layer, low-frequency information in the downsampled target multidimensional feature matrix is extracted to obtain contour image features.
[0232] Optionally, the processing module 1001 is also used for:
[0233] The step of fusing the image features through the feature fusion decoding layer to obtain an environmental color image includes:
[0234] The feature fusion decoding layer upsamples the contour image features output by the feature extraction layer, and then fuses the color image features output by the feature extraction layer with the upsampled contour image features to obtain the first fused feature.
[0235] The first fused feature is extracted by the residual module and channel attention module of the feature fusion decoding layer, and the detailed image features output by the feature extraction layer are fused with the first fused feature to obtain an environmental color image.
[0236] Optionally, the processing module 1001 is also used for:
[0237] The step of fusing the detailed image features output by the feature extraction layer with the first fusion feature after feature extraction to obtain an environmental color image includes:
[0238] The detailed image features output by the feature extraction layer are fused with the first fusion feature after feature extraction to obtain the second fusion feature;
[0239] The second fused feature is processed by the feature fusion decoding layer to obtain an environmental color image.
[0240] Optionally, the step of performing feature processing on the second fused feature through the feature fusion decoding layer to obtain an environmental color image includes:
[0241] The second fused feature is extracted using the residual module in the feature fusion decoding layer;
[0242] The spatial transformation module in the feature fusion decoding layer performs spatial transformation on the second fused feature extracted from the feature;
[0243] The second fused feature after spatial transformation is binary-classified using the activation function of the feature fusion decoding layer to obtain an environmental color image.
[0244] Optionally, the image processing apparatus in this application further includes:
[0245] An optimization module is used to optimize at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model according to a loss function.
[0246] Optionally, the loss function includes at least one of the root mean square error function, structural similarity error function, and context loss function.
[0247] Optionally, the optimization module is also used for:
[0248] The contour feature extraction layer is optimized based on the root mean square error function and the structural similarity error function.
[0249] The color feature extraction layer and the contour feature extraction layer are optimized based on the structural similarity error function and the context loss function; and / or,
[0250] The detail feature extraction layer, the color feature extraction layer, and the contour feature extraction layer are optimized based on the root mean square error function, the structural similarity error function, and the context loss function.
[0251] Optionally, the image processing apparatus in this application further includes:
[0252] A brightness determination module is used to determine whether the average brightness of the ambient color image is less than a preset brightness threshold.
[0253] The first image acquisition module is used to acquire a target ambient color image based on the average brightness of the ambient color image and the target step size if the average brightness of the ambient color image is less than the preset brightness threshold.
[0254] Optionally, the image processing apparatus in this application further includes:
[0255] The second image acquisition module is used to set the ambient color image as the target ambient color image if the average brightness of the ambient color image is greater than or equal to the preset brightness threshold.
[0256] Optionally, the first image acquisition module is also used for:
[0257] Based on the target step size and the average brightness of the ambient color image, adjust the exposure time of the vehicle's camera and / or adjust the gain of the camera;
[0258] The target environment color image is obtained until the average brightness of the image reaches the preset brightness threshold.
[0259] Optionally, the target step size is determined based on the average brightness of the ambient color image and the target brightness.
[0260] Optionally, the processing module 1001 is also used for:
[0261] An environmental color image is output based on the image features of the original image data through the image processing model in the vehicle's onboard computing platform.
[0262] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0263] Accordingly, embodiments of this application also provide an electronic device, such as... Figure 13 As shown, Figure 13This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. The electronic device 1100 includes a processor 1101 with one or more processing cores, a memory 1102 with one or more computer-readable storage media, and a computer program stored on the memory 1102 and executable on the processor. The processor 1101 and the memory 1102 are electrically connected. Those skilled in the art will understand that the vehicle structure shown in the figure does not constitute a limitation on the vehicle and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0264] The processor 1101 is the control center of the electronic device 1100. It connects various parts of the electronic device 1100 via various interfaces and lines. By running or loading software programs and / or units stored in the memory 1102, and by calling data stored in the memory 1102, it executes various functions and processes data of the electronic device 1100, thereby providing overall monitoring of the electronic device 1100. The processor 1101 can be a processor (Central Processing Unit, CPU), graphics processing unit (GPU), network processor (NP), etc., and can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application.
[0265] In this embodiment, the processor 1101 in the electronic device 1100 loads the instructions corresponding to the processes of one or more applications into the memory 1102 according to the following steps, and the processor 1101 runs the applications stored in the memory 1102 to realize various functions, such as:
[0266] In response to the input of raw image data acquired in the low-light environment of the vehicle, an environmental color image is output based on the image features of the raw image data through an image processing model.
[0267] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0268] Optional, such as Figure 13 As shown, the electronic device 1100 also includes: a touch display screen 1103, a radio frequency circuit 1104, an audio circuit 1105, an input unit 1106, and a power supply 1107. The processor 1101 is electrically connected to the touch display screen 1103, the radio frequency circuit 1104, the audio circuit 1105, the input unit 1106, and the power supply 1107. Those skilled in the art will understand that... Figure 13The vehicle structure shown does not constitute a limitation on the vehicle and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0269] The touch display screen 1103 can be used to display a graphical user interface (GUI) and receive operation commands generated by the user interacting with the GUI. The touch display screen 1103 may include a display panel and a touch panel. The display panel can be used to display information input by the user or information provided to the user, as well as various graphical user interfaces of the vehicle. These graphical user interfaces can be composed of graphics, text, icons, video, and any combination thereof. Optionally, the display panel can be configured using a liquid crystal display (LCD), organic light-emitting diode (OLED), or other similar technologies. The touch panel can be used to collect touch operations performed by the user on or near it (such as operations performed by the user using a finger, stylus, or any suitable object or accessory on or near the touch panel), generate corresponding operation commands, and execute the corresponding program according to the operation commands. Optionally, the touch panel may include two parts: a touch display system and a touch controller. The touch display system detects the user's touch location and the signal generated by the touch operation, transmitting the signal to the touch controller. The touch controller receives touch information from the touch display system, converts it into touch point coordinates, and sends it to the processor 1101. It can also receive and execute commands from the processor 1101. The touch panel can cover the display panel. When the touch panel detects a touch operation on or near it, it transmits the information to the processor 1101 to determine the type of touch event. Subsequently, the processor 1101 provides corresponding visual output on the display panel based on the type of touch event. In this embodiment, the touch panel and the display panel can be integrated into the touch display screen 1103 to achieve input and output functions. However, in some embodiments, the touch panel and the touch display screen 1103 can be implemented as two independent components to achieve input and output functions. That is, the touch display screen 1103 can also be used as part of the input unit 1106 to achieve input functions.
[0270] The radio frequency circuit 1104 can be used to transmit and receive radio frequency signals to establish wireless communication with network devices or other vehicles, and to transmit and receive signals with network devices or other vehicles.
[0271] Audio circuit 1105 can be used to provide an audio interface between the user and the vehicle via a speaker and a microphone. Audio circuit 1105 can convert received audio data into electrical signals and transmit them to the speaker, where the speaker converts them into sound signals for output. Conversely, the microphone converts collected sound signals into electrical signals, which are then received by audio circuit 1105, converted back into audio data, and processed by processor 1101 before being transmitted via radio frequency circuit 1104 to, for example, another vehicle, or output to memory 1102 for further processing. Audio circuit 1105 may also include an earphone jack to provide communication between external headphones and the vehicle.
[0272] The input unit 1106 can be used to receive input numbers, characters, or user characteristic information (such as fingerprints, iris, facial information, etc.), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
[0273] Power supply 1107 is used to supply power to various components of electronic device 1100. Optionally, power supply 1107 can be logically connected to processor 1101 through a power management device, thereby enabling functions such as charging, discharging, and power consumption management through the power management device. Power supply 1107 may also include one or more DC or AC power supplies, recharging devices, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
[0274] although Figure 13 As not shown in the diagram, the electronic device 1100 may also include a camera, sensor, wireless fidelity module, Bluetooth module, etc., which will not be described in detail here.
[0275] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0276] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.
[0277] Therefore, embodiments of this application provide a computer-readable storage medium storing a plurality of computer programs. These computer programs can be loaded by a processor to execute any of the image processing methods provided in this application. The computer program can execute the steps of the following image processing method:
[0278] In response to the input of raw image data acquired in the low-light environment of the vehicle, an environmental color image is output based on the image features of the raw image data through an image processing model.
[0279] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0280] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0281] Since the computer program stored in the computer-readable storage medium can execute any of the image processing methods provided in the embodiments of this application, the beneficial effects that any of the image processing methods provided in the embodiments of this application can achieve can be realized, as detailed in the preceding embodiments, and will not be repeated here.
[0282] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0283] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable image processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image processing apparatus, create means for implementing the functions specified in one or more blocks of the flowchart illustrations and / or one or more blocks of the block diagrams.
[0284] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable image processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more block diagrams.
[0285] These computer program instructions may also be loaded onto a computer or other programmable image processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.
[0286] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0287] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0288] Computer-readable media include both permanent and non-permanent, removable and non-removable media, which can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient media, such as modulated communication signals and carrier waves.
[0289] In the description of this application, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.
[0290] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0291] The embodiments, implementation methods, and related technical features of this application can be combined and substituted for each other without conflict.
[0292] The above are merely preferred embodiments of this application and are not intended to limit this application in any way. Any simple modifications, equivalent changes, and alterations made to the above embodiments based on the technical essence of this application without departing from the scope of the technical solution of this application shall still fall within the scope of the technical solution of this application.
Claims
1. An image processing method, characterized in that, The method is applied to a vehicle, and the method includes: In response to the input of raw image data acquired in the low-light environment of the vehicle, an environmental color image is output based on the image features of the raw image data through an image processing model.
2. The image processing method according to claim 1, characterized in that, The image processing model includes: an image encoding layer, a feature extraction layer, and a feature fusion decoding layer.
3. The image processing method according to claim 2, characterized in that, The step of outputting an environmental color image based on the image features of the original image data through an image processing model includes: The original image data is encoded through the image coding layer to obtain a multidimensional feature matrix; Image features are obtained by extracting features from the multidimensional feature matrix through the feature extraction layer. The image features are fused through the feature fusion decoding layer to obtain the environmental color image.
4. The image processing method according to claim 3, characterized in that, The process of encoding the original image data through the image coding layer to obtain a multidimensional feature matrix includes: The image coding layer extracts pixel data from the original image data to obtain an initial multidimensional feature matrix; the pixel data is the data of the camera filter that collects the original image data in the vehicle. Based on the bit depth of the analog-to-digital conversion of the camera, the initial multidimensional feature matrix is normalized to obtain the multidimensional feature matrix.
5. The image processing method according to claim 3, characterized in that, The feature extraction layer includes a detail feature extraction layer, a color feature extraction layer, and a contour feature extraction layer; The image features include at least one of the following: detail image features, color image features, and contour image features.
6. The image processing method according to claim 5, characterized in that, The step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes: High-dimensional features are obtained by extracting features from the multidimensional feature matrix through at least one convolutional module in the detail feature extraction layer. The high-dimensional features are extracted using at least one residual module in the detail feature extraction layer to obtain detail image features.
7. The image processing method according to claim 5, characterized in that, The step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes: The multidimensional feature matrix is downsampled by the downsampling module in the color feature extraction layer to adjust the size of the multidimensional feature matrix; The color image features are obtained by extracting features from the downsampled multidimensional feature matrix through the convolution and residual modules in the color feature extraction layer.
8. The image processing method according to claim 7, characterized in that, The step of extracting color image features from the downsampled multidimensional feature matrix using the convolution and residual modules in the color feature extraction layer includes: The convolutional and residual modules in the color feature extraction layer are used to extract features from the downsampled multidimensional feature matrix to obtain the initial color image features. The initial color image features are extracted using the channel attention module in the color feature extraction layer to obtain the color image features.
9. The image processing method according to claim 5, characterized in that, The step of extracting features from the multidimensional feature matrix through the feature extraction layer to obtain image features includes: The target multidimensional feature matrix is downsampled by the downsampling module in the contour feature extraction layer, wherein the target multidimensional feature matrix is obtained by downsampling the multidimensional feature matrix by the downsampling module in the color feature extraction layer. By using the residual module, channel attention module, and spatial attention module in the contour feature extraction layer, low-frequency information in the downsampled target multidimensional feature matrix is extracted to obtain contour image features.
10. The image processing method according to any one of claims 3 to 9, characterized in that, The step of fusing the image features through the feature fusion decoding layer to obtain an environmental color image includes: The feature fusion decoding layer upsamples the contour image features output by the feature extraction layer, and then fuses the color image features output by the feature extraction layer with the upsampled contour image features to obtain the first fused feature. The first fused feature is extracted by the residual module and channel attention module of the feature fusion decoding layer, and the detailed image features output by the feature extraction layer are fused with the first fused feature to obtain an environmental color image.
11. The image processing method according to claim 10, characterized in that, The step of fusing the detailed image features output by the feature extraction layer with the first fusion feature after feature extraction to obtain an environmental color image includes: The detailed image features output by the feature extraction layer are fused with the first fusion feature after feature extraction to obtain the second fusion feature; The second fused feature is processed by the feature fusion decoding layer to obtain an environmental color image.
12. The image processing method according to claim 11, characterized in that, The step of performing feature processing on the second fused feature through the feature fusion decoding layer to obtain an environmental color image includes: The second fused feature is extracted using the residual module in the feature fusion decoding layer; The spatial transformation module in the feature fusion decoding layer performs spatial transformation on the second fused feature extracted from the feature; The second fused feature after spatial transformation is binary-classified using the activation function of the feature fusion decoding layer to obtain an environmental color image.
13. The image processing method according to claim 5, characterized in that, The method further includes: Based on the loss function, optimize at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model.
14. The image processing method according to claim 13, characterized in that, The loss function includes at least one of the root mean square error function, structural similarity error function, and context loss function.
15. The image processing method according to claim 14, characterized in that, The step of optimizing at least one of the detail feature extraction layer, color feature extraction layer, and contour feature extraction layer of the image processing model according to the loss function includes: The contour feature extraction layer is optimized based on the root mean square error function and the structural similarity error function. The color feature extraction layer and the contour feature extraction layer are optimized based on the structural similarity error function and the context loss function; and / or, The detail feature extraction layer, the color feature extraction layer, and the contour feature extraction layer are optimized based on the root mean square error function, the structural similarity error function, and the context loss function.
16. The image processing method according to claim 1, characterized in that, After outputting an environmental color image based on the image features of the original image data using an image processing model, the process includes: Determine whether the average brightness of the ambient color image is less than a preset brightness threshold; If the average brightness of the ambient color image is less than the preset brightness threshold, then the target ambient color image is obtained based on the average brightness of the ambient color image and the target step size.
17. The image processing method according to claim 14, characterized in that, After determining whether the average brightness of the ambient color image is less than a preset brightness threshold, the method further includes: If the average brightness of the ambient color image is greater than or equal to the preset brightness threshold, then the ambient color image is set as the target ambient color image.
18. The image processing method according to claim 14, characterized in that, The step of obtaining the target environment color image based on the average brightness of the environment color image and the target step size includes: Based on the target step size and the average brightness of the ambient color image, adjust the exposure time of the vehicle's camera and / or adjust the gain of the camera; The target environment color image is obtained until the average brightness of the image reaches the preset brightness threshold.
19. The image processing method according to any one of claims 16 to 18, characterized in that, The target step size is determined based on the average brightness of the ambient color image and the target brightness.
20. The image processing method according to claim 1, characterized in that, The step of outputting an environmental color image based on the image features of the original image data through an image processing model includes: An environmental color image is output based on the image features of the original image data through the image processing model in the vehicle's onboard computing platform.
21. The image processing method according to claim 1, characterized in that, The low-light environment includes a nighttime environment, where the ambient light intensity is less than a preset light intensity threshold.
22. An image processing apparatus, characterized in that, The image processing apparatus includes: The processing module is used to respond to the input of raw image data collected in the low-light environment of the vehicle, and output an environmental color image based on the image features of the raw image data through an image processing model.
23. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the method of any one of claims 1 to 21.
24. A vehicle, characterized in that, The electronic device is equipped with the electronic device of claim 23.
25. A computer-readable storage medium, characterized in that, It includes a computer program that, when run on an electronic device, causes the electronic device to perform any of the methods described in claims 1 to 21.
26. A computer program product, characterized in that, The device includes a computer program stored in a computer-readable storage medium; when a processor of the electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program, causing the electronic device to perform the method of any one of claims 1 to 21.