Image processing method and apparatus
By constructing a mapping relationship between RAW images and RGB images and using an AI neural network model to process RAW data, the problem of information loss in RAW data compression in existing technologies is solved, achieving efficient image quality restoration and enhancement.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- YINWANG INTELLIGENT TECHNOLOGIES CO LTD
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-25
AI Technical Summary
Existing image enhancement techniques cannot be directly applied to RAW data, resulting in severe information loss during the compression process and affecting subsequent processing.
By constructing a mapping relationship and using an AI neural network model to process the mapping between RAW images and RGB images, the quality of RAW images decoded from different encoded bitstreams is enhanced, and the loss of compressed information is reduced.
It improves the closed-loop efficiency and quality recovery effect of RAW data, meeting the needs of different encoding scenarios and terminal devices.
Smart Images

Figure CN2024140109_25062026_PF_FP_ABST
Abstract
Description
An image processing method and apparatus Technical Field
[0001] This application relates to the field of image processing technology, and in particular to an image processing method and apparatus. Background Technology
[0002] Currently, the demand for high-quality, high-resolution images / videos is gradually increasing. Depending on different needs or transmission capabilities, different compression ratios are typically used to compress high-resolution images / videos. The higher the compression ratio, the greater the quality loss of the decompressed image / video, leading to poorer performance in backend processing tasks. Therefore, it is necessary to enhance the decompressed images (including video frames) using image enhancement techniques to restore or improve the quality of the corresponding images.
[0003] Current image enhancement techniques targeting standard coding technologies such as H.266 / H.265 / H.264 / Joint Photographic Experts Group (JPEG) all process images in RGB or YUV format. While developing coding technologies for raw image formats (also known as RAW images) can reuse these standard coding technologies, they cannot directly utilize image enhancement techniques designed for these standards.
[0004] Therefore, there is an urgent need for an image enhancement technique for RAW data to reduce the loss of compressed information in RAW data. Summary of the Invention
[0005] This application provides an image processing method and apparatus for performing image enhancement processing on RAW data, thereby reducing the loss of compressed information in RAW data and improving the closed-loop efficiency and benefits of RAW data.
[0006] Firstly, this application provides an image processing method, which may include: decoding a first original RAW image from a first encoded bitstream, the first RAW image comprising any one of the following: a color filter array (CFA) format image, an RGGB format image, or a YCoCgDg format image, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; performing image signal processing (ISP) on the first RAW image to obtain a first RGB image; encoding the first RAW image to obtain a second encoded bitstream; and decoding a second RAW image from the second encoded bitstream; wherein the first RGB image and the second RAW image satisfy a first mapping relationship. The first encoded bitstream is lossless compression (including uncompressed or low-compression-ratio compression), and the first RAW image decoded from the first encoded bitstream is lossless RAW data. The second encoded bitstream is lossy compression, and the second RAW image decoded from the second encoded bitstream is lossy RAW data. This method can be executed by a cloud device, which may be a cloud server or a cloud platform. Alternatively, this method can also be executed by a terminal device. This application does not specifically limit this approach.
[0007] Using the above method, a lossless first RAW image can be collected, and high-quality first RGB image and lossy RAW data can be obtained based on the first RAW image. A mapping relationship can then be obtained (e.g., constructed) based on the first RGB image and the lossy RAW data. This mapping relationship can be provided to terminal devices or used by the device itself to provide a solution for enhancing the quality of decoded RAW data on the terminal device or cloud device side, thereby reducing the loss of compressed information in RAW data and improving the closed-loop efficiency and benefits of RAW data.
[0008] In conjunction with the first aspect, in one possible implementation, the first mapping relationship can be used to obtain a second RGB image corresponding to a third RAW image, wherein the third RAW image is decoded from a third encoded bitstream, and the third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream are greater than the first compression ratio of the first encoded bitstream. The third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream can be the same.
[0009] In conjunction with the first aspect, in one possible implementation, the method may further include: training a first artificial intelligence (AI) neural network model based on the first RGB image and the second RAW image, wherein the first AI neural network model is used to characterize a first mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image. It should be understood that the AI neural network model is merely an example of how the mapping relationship can be implemented and does not constitute any limitation.
[0010] In conjunction with the first aspect, in one possible implementation, the method further includes: encoding the first RAW image to obtain a fourth encoded bitstream; decoding the fourth encoded bitstream to obtain a fourth RAW image; wherein the first RGB image and the fourth RAW image satisfy a second mapping relationship. The fourth encoded bitstream is lossy compressed, and the fourth RAW image decoded from the fourth encoded bitstream is lossy RAW data.
[0011] Using the above method, lossy RAW data with varying degrees of loss can be obtained by performing different encoding and decoding processes on the first RAW image, thereby constructing more mapping relationships to meet the needs of different encoding scenarios or different terminal devices. It should be understood that one or more mapping relationships described in the embodiments of this application can be expressed in any way, and this application does not limit the form and content of specific mapping relationships.
[0012] In conjunction with the first aspect, in one possible implementation, the second mapping relationship is used to obtain the third RGB image corresponding to the fifth RAW image, wherein the fifth RAW image is decoded from the fifth encoded bitstream, and the fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
[0013] In conjunction with the first aspect, in one possible implementation, the method may further include: training a second AI neural network model based on the first RGB image and the fourth RAW image, wherein the second AI neural network model is used to characterize a second mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image. It should be understood that the AI neural network model is merely an example of how the mapping relationship can be implemented and does not constitute any limitation.
[0014] In conjunction with the first aspect, in one possible implementation, the method further includes: obtaining the first encoded bitstream from a local storage medium; or, receiving the first encoded bitstream from a terminal device.
[0015] In conjunction with the first aspect, in one possible implementation, the encoder of the terminal device is used to perform encoding operations, and the encoder supports an image with a bit width depth greater than or equal to 8 bits.
[0016] In conjunction with the first aspect, in one possible implementation, encoding the first RAW image to obtain a second encoded bitstream includes:
[0017] The first RAW image is encoded using any of the following encoding standards to obtain the second encoded bitstream: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, and Joint Picture Experts Group JPEG Low Latency Coding Standard.
[0018] Secondly, this application provides an image processing method applied to an image processing device, which can be deployed on a cloud device or a terminal device; the embodiments of this application do not specifically limit this. The method may include: acquiring a third encoded bitstream; decoding a third RAW image from the third encoded bitstream, the third RAW image including any one of the following: an image in CFA format, an RGGB format image, or a YCoCgDg format image, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; and acquiring a second RGB image corresponding to the third RAW image based on a first mapping relationship, wherein the first mapping relationship is a mapping relationship satisfied between the first RGB image corresponding to the first original RAW image and the second RAW image, the first RAW image being decoded from a first encoded bitstream, and the second RAW image being decoded from a second encoded bitstream, the second encoded bitstream being encoded from the first RAW image.
[0019] In conjunction with the second aspect, in one possible implementation, the third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
[0020] In conjunction with the second aspect, in one possible implementation, the method further includes: acquiring a fifth encoded bitstream; decoding a fifth RAW image from the fifth encoded bitstream, the fifth RAW image including any one of the following: an image in CFA format, an RGGB format image, or a YCoCgDg format image, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; acquiring a third RGB image corresponding to the fifth RAW image based on a second mapping relationship, wherein the second mapping relationship is a mapping relationship satisfied between the first RGB image corresponding to the first RAW image and the fourth RAW image, the fourth RAW image being decoded from a fourth encoded bitstream, the fourth encoded bitstream being encoded from the first RAW image.
[0021] In conjunction with the second aspect, in one possible implementation, the fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream are greater than the first compression ratio of the first encoded bitstream. The fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream can be the same.
[0022] In conjunction with the second aspect, in one possible implementation, the image processing device is located in a cloud device, which is further configured to perform backend processing based on at least one of the second RGB image and the third RGB image; or, the image processing device is located in a terminal device, which is further configured to output at least one of the second RGB image and the third RGB image.
[0023] In conjunction with the second aspect, in one possible implementation, when the image processing device is located in a cloud device, the acquisition of the third encoded bitstream includes: receiving the third encoded bitstream from a terminal device.
[0024] In conjunction with the second aspect, in one possible implementation, the encoder of the terminal device is used to perform encoding operations, and the encoder supports an image with a bit width depth greater than or equal to 8 bits.
[0025] In conjunction with the second aspect, in one possible implementation, decoding the third RAW image from the third encoded bitstream includes: decoding the third RAW image from the third encoded bitstream using any of the following decoding standards: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, or Joint Picture Experts Group (JPP) JPEG Low Latency Coding Standard.
[0026] Thirdly, this application provides an image processing apparatus, comprising: a first decoding unit, configured to decode a first original RAW image from a first encoded bitstream, the first RAW image comprising any one of the following: an image in Color Filter Array (CFA) format, an image in RGGB format, or an image in YCoCgDg format, wherein R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; an image signal processing unit, configured to perform image signal processing (ISP) on the first RAW image to obtain a first RGB image; an encoding unit, configured to encode the first RAW image to obtain a second encoded bitstream; and a second decoding unit, configured to decode a second RAW image from the second encoded bitstream, wherein the first RGB image and the second RAW image satisfy a first mapping relationship. For example, the image processing apparatus may further include a mapping unit, configured to construct a first mapping relationship based on the first RGB image and the second RAW image.
[0027] In conjunction with the third aspect, in one possible implementation, the first mapping relationship is used to obtain the second RGB image corresponding to the third RAW image, wherein the third RAW image is decoded from the third encoded bitstream, and the third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
[0028] In conjunction with the third aspect, in one possible implementation, the mapping unit can, for example, be used to train a first artificial intelligence (AI) neural network model based on the first RGB image and the second RAW image. This first AI neural network model is used to characterize a first mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image. It should be understood that the AI neural network model is merely an example of how the mapping relationship can be implemented and does not constitute any limitation.
[0029] In conjunction with the third aspect, in one possible implementation, the encoding unit is further configured to: encode the first RAW image to obtain a fourth encoded bitstream; the second decoding unit is further configured to: decode the fourth encoded bitstream to obtain a fourth RAW image; wherein, the first RGB image and the fourth RAW image satisfy a second mapping relationship. For example, the mapping unit constructs the second mapping relationship based on the first RGB image and the fourth RAW image. The fourth encoded bitstream is lossy compressed, and the fourth RAW image decoded from the fourth encoded bitstream is lossy RAW data.
[0030] In conjunction with the third aspect, in one possible implementation, the second mapping relationship is used to obtain the third RGB image corresponding to the fifth RAW image, wherein the fifth RAW image is decoded from the fifth encoded bitstream, and the fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
[0031] In conjunction with the third aspect, in one possible implementation, the mapping unit may be used, for example, to train a second AI neural network model based on the first RGB image and the fourth RAW image, the second AI neural network model being used to characterize a second mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image.
[0032] In conjunction with the third aspect, in one possible implementation, the apparatus further includes an acquisition unit for acquiring the first encoded bitstream from a local storage medium; or, receiving the first encoded bitstream from a terminal device.
[0033] In conjunction with the third aspect, in one possible implementation, the encoder of the terminal device is used to perform encoding operations, and the encoder supports an image with a bit width depth greater than or equal to 8 bits.
[0034] In conjunction with the third aspect, in one possible implementation, the encoding unit is used to: encode the first RAW image using any of the following encoding standards to obtain the second encoded bitstream: H264 encoding standard, H265 encoding standard, H266 encoding standard, JPEG low latency encoding standard.
[0035] Fourthly, this application provides an image processing apparatus, comprising: an acquisition unit for acquiring a third encoded bitstream; a decoding unit for decoding a third RAW image from the third encoded bitstream, the third RAW image comprising any one of the following: an image in CFA format, an RGGB format image, or a YCoCgDg format image, wherein R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; and a mapping unit for acquiring a second RGB image corresponding to the third RAW image based on a first mapping relationship, wherein the first mapping relationship is a mapping relationship satisfied between the first RGB image corresponding to the first original RAW image and the second RAW image, the first RAW image being decoded from a first encoded bitstream, and the second RAW image being decoded from a second encoded bitstream, the second encoded bitstream being encoded from the first RAW image.
[0036] In conjunction with the fourth aspect, in one possible implementation, the third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
[0037] In conjunction with the fourth aspect, in one possible implementation, the acquisition unit is further configured to: acquire a fifth encoded bitstream; the decoding unit is further configured to decode a fifth RAW image from the fifth encoded bitstream, the fifth RAW image including any one of the following: an image in CFA format, an RGGB format image, or a YCoCgDg format image; the mapping unit is further configured to acquire a third RGB image corresponding to the fifth RAW image based on a second mapping relationship, wherein the second mapping relationship is a mapping relationship satisfied between the first RGB image corresponding to the first RAW image and the fourth RAW image, the fourth RAW image being decoded from the fourth encoded bitstream, the fourth encoded bitstream being encoded from the first RAW image.
[0038] In conjunction with the fourth aspect, in one possible implementation, the fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
[0039] In conjunction with the fourth aspect, in one possible implementation, the image processing device is located in a cloud device, which is further configured to perform back-end processing based on at least one of the second RGB image and the third RGB image; or, the image processing device is located in a terminal device, which is further configured to output at least one of the second RGB image and the third RGB image.
[0040] In conjunction with the fourth aspect, in one possible implementation, when the image processing device is located in a cloud device, the acquisition unit is configured to: receive the third encoded bitstream from the terminal device.
[0041] In conjunction with the fourth aspect, in one possible implementation, the encoder of the terminal device is used to perform encoding operations, and the encoder supports an image with a bit width depth greater than or equal to 8 bits.
[0042] In conjunction with the fourth aspect, in one possible implementation, decoding the third RAW image from the third encoded bitstream includes: decoding the third RAW image from the third encoded bitstream using any of the following decoding standards: H264 encoding standard, H265 encoding standard, H266 encoding standard, or JPEG low-latency encoding standard.
[0043] Fifthly, this application provides an electronic device including at least one processor coupled to a memory; the at least one processor is configured to execute a computer program or instructions stored in the memory to cause the electronic device to perform the method as described in the first aspect and any possible implementation thereof, or to perform the method as described in the second aspect and any possible implementation thereof.
[0044] In a sixth aspect, this application provides a computer-readable storage medium storing program code that, when run on a computer, causes the computer to perform the method as described in the first aspect and any possible implementation thereof, or to perform the method as described in the second aspect and any possible implementation thereof.
[0045] In a seventh aspect, this application provides a computer program product that, when run on a computer, causes the computer to perform the method as described in the first aspect and any possible implementation thereof, or to perform the method as described in the second aspect and any possible implementation thereof.
[0046] Eighthly, embodiments of this application provide a communication system including a cloud device, the cloud device being used to implement the method as described in the first aspect and any possible implementation of the first aspect, or to implement the method as described in the second aspect and any possible implementation of the second aspect.
[0047] In conjunction with the eighth aspect, in one possible implementation, the communication system may further include a terminal device for implementing the method as described in the first aspect and any possible implementation of the first aspect, or for implementing the method as described in the second aspect and any possible implementation of the second aspect.
[0048] Based on the implementations provided in the above aspects, the embodiments of this application can be further combined to provide more implementations.
[0049] The technical effects that can be achieved by any possible implementation of any aspect from the second to the eighth aspect above can be described with reference to the technical effects that can be achieved by any possible implementation of any aspect from the first aspect above, and the repetitions will not be discussed. Attached Figure Description
[0050] Figure 1 is a schematic diagram of the structure of an edge-cloud system provided in an embodiment of this application;
[0051] Figure 2 is a schematic diagram of the hardware structure of the vehicle provided in the embodiment of this application;
[0052] Figure 3 is a schematic flowchart of an image processing method provided in an embodiment of this application;
[0053] Figure 4 is a schematic flowchart of an image processing method provided in an embodiment of this application;
[0054] Figure 5 is a schematic diagram of the architecture of the training system provided in an embodiment of this application;
[0055] Figures 6a, 6b-7 are schematic flowcharts illustrating different examples of constructing mapping relationships provided in the embodiments of this application;
[0056] Figure 8 is a schematic diagram of the modular structure of the terminal device provided in the embodiment of this application;
[0057] Figures 9 and 10 are schematic flowcharts of the image processing method provided in the embodiments of this application;
[0058] Figure 11 is a schematic diagram of the modular structure of the cloud device provided in an embodiment of this application;
[0059] Figure 12 is a schematic flowchart of the image processing method provided in an embodiment of this application;
[0060] Figure 13 is a schematic diagram of the modular structure of the cloud device provided in an embodiment of this application;
[0061] Figure 14 is a schematic flowchart of the image processing method provided in an embodiment of this application;
[0062] Figure 15 is a schematic diagram of the structure of an image processing device according to an embodiment of this application;
[0063] Figure 16 is a schematic diagram of another image processing device according to an embodiment of this application;
[0064] Figure 17 is a schematic diagram of another image processing device according to an embodiment of this application. Detailed Implementation
[0065] Before introducing the technical solutions provided in this application, some of the terms used in this application will be explained in order to facilitate understanding by those skilled in the art.
[0066] (1) Color filter array (CFA): It is a color filter array used on image sensors, whose main function is to capture color information.
[0067] In this embodiment, the CFA format image includes raw data from the image sensor, which converts captured light source signals into digital signals. This is the original image within the camera, such as a raw image in Bayer filter format or a raw image from other CFA configurations. The Bayer raw image can also be called a Bayer image or a raw image. "Raw" means "unprocessed," and can be understood as the raw data from the camera's charge-coupled device (CCD) image sensor and complementary metal-oxide-semiconductor (CMOS) image sensor, which converts captured light source signals into digital signals. Therefore, a Bayer raw image can also be conceptualized as "raw image encoded data" or more figuratively as "digital negative."
[0068] Furthermore, in this embodiment, the Bayer raw image includes three color components. Each pixel in the Bayer raw image has only one color component, and the value of this color component can be equivalent to the pixel value of that pixel. In one example, the three color components are red (R), blue (B), and green (G). In another example, the three color components are R, B, and yellow (Y'). The specific color components included in the Bayer raw image are related to the CFA configuration in the camera, and this embodiment does not specifically limit this.
[0069] (2) RGB color space and YUV color space:
[0070] Typically, images are composed of the smallest unit, pixels, and each pixel's information is composed of RGB information of different brightness levels. That is, the original image can include information from three components: red (R), green (G), and blue (B). If RGB signals are used directly for image signal transmission, it is incompatible with black and white televisions and consumes a lot of bandwidth. Therefore, traditional image signal processing (ISP) devices convert the acquired RAW image into an RGB image through a series of linear and non-linear steps, and then convert the RGB image from the RGB color space to the YUV color space for transmission.
[0071] In the YUV color space, image information consists of one luminance value and two chromaticity values. Luminance is represented by Y, and chromaticity is composed of hue and saturation, represented by UV. When transmitting image signals, digital sampling of analog component video or YUV signals is required, which means sampling both luminance and chromaticity information. Commonly used YUV sampling methods include YUV444, YUV422, YUV420, and YUV411.
[0072] (3) Image coding and image decoding: Image coding, also known as image compression, refers to the technique of representing an image or the information contained in an image with a smaller number of bits while meeting certain quality requirements (such as signal-to-noise ratio requirements or subjective evaluation scores). Image decoding is the reverse process of image coding and can also be called image decompression.
[0073] (4) Video Encoding and Video Decoding: Video encoding, also known as video compression, processes a continuous sequence of images, i.e., video frames. Video encoding not only needs to compress each frame of the image, but also needs to process redundant information between frames through inter-frame coding techniques to achieve a higher compression ratio. Video decoding needs to decode each frame of the image to restore its original pixel data and process the temporal relationship between frames to achieve smooth video playback.
[0074] (5) RAW Encoding and Compression Technology: Traditional standard encoding and compression technologies, such as H.266 / H.265 / H.264 / Joint Photographic Experts Group (JPEG) video / image encoding standards, only provide solutions for certain color formats (such as RGB or YUV formats). With the development of RAW sensing technology, RAW encoding and compression technology has emerged. This involves processing RAW data (non-traditional ISP processing), reusing traditional standard encoding and compression technologies for encoding, and then storing or transmitting the data. This allows subsequent backend processing using the decoded RAW data, thereby maximizing the informational advantages of RAW data.
[0075] (6) Image enhancement: This is a part of image processing, a technique to improve the readability of photographic images, also known as quality enhancement. The purposes of image enhancement include: ① Increasing image resolution, making already visible details clearer, and revealing details that are not yet easily seen, highlighting details and making full use of useful information. ② Enhancing image contrast, making the image easier to interpret. Image enhancement can be performed using optical methods or computers. In practical applications, depending on the nature of the work and the task at hand, optical enhancement can be combined with computer-aided enhancement.
[0076] Image enhancement techniques for compressed videos can include single-frame image enhancement and multi-frame image enhancement. Single-frame image enhancement focuses on processing each frame of the video sequence individually, while multi-frame image enhancement also needs to utilize the temporal information between the frames of the video to improve image quality.
[0077] (7) Deep learning image signal processing (ISP) model: This is a model that uses deep learning algorithms to improve image quality. This model can improve image sharpness, color accuracy, dynamic range and overall visual quality.
[0078] In fields such as advanced driver assistance systems (ADAS), automated driving systems (ADS), and intelligent driving systems, the increasing resolution, frame rate, and bit depth of cameras lead to a growing demand for bandwidth in video image output (e.g., RAW images captured by cameras). To alleviate the pressure on video image transmission, image encoding methods are typically used to reduce bandwidth requirements. Simultaneously, to meet the paramount safety requirements of autonomous driving, image encoding needs to satisfy compression requirements such as low latency, low complexity, and high compression performance. However, a higher compression ratio results in greater quality loss of the decompressed image / video, leading to poorer performance in backend processing. Therefore, it is necessary to perform image enhancement on the decompressed images (including video frames) to restore or improve the quality of the corresponding images.
[0079] Current image enhancement techniques targeting standard coding technologies such as H.266 / H.265 / H.264 / JPEG, which result in quality loss, process images in RGB or YUV format. While developing encoding and compression techniques for raw image formats (also known as RAW images) can reuse these standard coding technologies, they cannot directly utilize image enhancement techniques designed for them. Therefore, there is an urgent need for an image enhancement technique specifically for RAW data to reduce compression information loss and improve the closed-loop efficiency and benefits of RAW data.
[0080] To address the aforementioned problems, this application provides an image processing method and apparatus for enhancing RAW data, thereby reducing compression information loss and improving the closed-loop efficiency and benefits of RAW data. The method and apparatus are based on the same technical concept. Since the principles underlying the problems solved by the method and apparatus are similar, their implementations can be mutually referenced, and repeated details will not be elaborated upon. Furthermore, in the various embodiments of this application, unless otherwise specified or logically conflicting, the terminology and / or descriptions between the embodiments are consistent and can be mutually referenced. Technical features in different embodiments can be combined to form new embodiments based on their inherent logical relationships.
[0081] It should be noted that in the embodiments of this application, "at least one" refers to one or more, and "more than one" refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, and c can be single or multiple.
[0082] Furthermore, unless otherwise specified, the ordinal numbers such as "first" and "second" mentioned in the embodiments of this application are used to distinguish multiple objects and are not used to limit the priority or importance of multiple objects. For example, "first RAW image" and "second RAW image" are only used to distinguish different RAW images, and do not indicate that the two RAW images have different priorities or importance.
[0083] This application embodiment can be applied to scenarios where terminal devices capture images or videos and upload them to cloud devices, i.e., edge-cloud scenarios. Referring to Figure 1, it is a schematic diagram of the structure of an edge-cloud system provided by this application embodiment. In edge-cloud, "edge" refers to the terminal device, and "cloud" refers to the cloud device. The cloud device can also be a cloud server or a cloud platform. The cloud device can have massive computing power and massive storage capabilities. The edge-cloud system may include: a terminal device 110 and a cloud device 120, and the terminal device 110 can connect to the cloud device 120 via a wireless network.
[0084] In one embodiment, the cloud device 120 may be a computer server or a server cluster consisting of multiple servers. This application does not limit the implementation architecture of the cloud device 120. The terminal device 110 may be a device with camera and network functions. The terminal device 110 may also have computing processing functions. For example, the terminal device 110 may be a smartphone, tablet, wearable device, in-vehicle device, augmented reality (AR) / virtual reality (VR) device, laptop, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), or other mobile terminals. Alternatively, it may be a digital camera, SLR / mirrorless camera, action camera, gimbal camera, drone, or other professional shooting equipment. This application does not limit the specific type of terminal device. In optional embodiments, the number of terminal devices 110 included in the edge-cloud system may be one or more. The types of multiple terminal devices 110 may be the same or different.
[0085] In one example, taking terminal device 110 as a vehicle, the hardware structure of the vehicle is described. See Figure 2, which shows a schematic diagram of the hardware structure of a vehicle.
[0086] The vehicle may include a processor 210, an external memory interface 220, an internal memory 221, an automotive bus interface 230, a communication module 240, a sensing system 250, and a display screen 260. The automotive bus interface 230 may include, but is not limited to, a controller area network (CAN) bus interface, a FlexRay bus interface, and a LIN bus interface. Through these various bus interfaces, interconnection between various components within the vehicle and between the vehicle and peripheral devices can be achieved. The vehicle can also communicate with servers, user mobile phones, or other vehicles via the communication module 240 and a network. The sensing system 250 may include at least one sensor. The processor 210 can use the perception information provided by at least one sensor to identify the vehicle's environment or scene, thereby assisting the vehicle in achieving autonomous driving or intelligent assisted driving functions. For example, the sensing system 250 may include, but is not limited to, at least one of the following: image sensors (e.g., cameras), light sensors, distance sensors, LiDAR (light detection and ranging), millimeter-wave radar (RADAR), etc. The number of different types of sensors may be one or more, and the deployment location may be inside or outside the vehicle. For example, in some embodiments, the vehicle may include N cameras, where N is a positive integer greater than 1. For example, external cameras may include front-facing cameras, rear-facing cameras, or 360-degree surround-view cameras. Or, for example, internal cameras may include cockpit cameras.
[0087] It is understood that the structure illustrated in the embodiments of this application does not constitute a specific limitation on the terminal device 110. In other embodiments of this application, the terminal device may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0088] In specific implementation, taking a vehicle as an example, the processor 210 can be deployed on relevant vehicle-mounted equipment, such as in the vehicle's mobile data center (MDC) or cockpit domain controller (CDC), or vehicle control unit (VCU), or vehicle domain controller (VDC), or advanced driver assistance system (ADAS) domain controller, or on the control unit of other components of the vehicle. This application embodiment does not limit the product form and deployment method of the processor.
[0089] Processor 210 may include one or more processing units, such as: application processor (AP), modem processor, graphics processing unit (GPU), image signal processor (ISP), controller, codec, digital signal processor (DSP), baseband processor, or neural network processing unit (NPU), etc. Different processing units may be independent devices or integrated into one or more processors; this application embodiment does not specifically limit this.
[0090] For example, the controller can generate operation control signals based on the instruction opcode and timing signals to complete the control of fetching and executing instructions.
[0091] The processor 210 may also include a memory for storing instructions and data. In some embodiments, the memory in the processor 210 is a cache memory. This memory can store instructions or data that the processor 210 has just used or that are used repeatedly. If the processor 210 needs to use the instruction or data again, it can directly retrieve it from this memory, avoiding repeated accesses, reducing the processor 210's waiting time, and thus improving system efficiency.
[0092] In one example, the terminal device can achieve image / video capture and image / video processing through at least one of the following: camera, ISP, DSP, codec, GPU, display 260, and application processor.
[0093] For example, a camera can be used to capture still images or videos. An object is projected onto a photosensitive element through a lens, which can be a CCD or a CMOS phototransistor. The photosensitive element converts the light signal into an electrical signal, which is then transmitted to the ISP in the processor 210 for conversion into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signals in standard formats such as RGB and YUV. In some embodiments, the processor 210 can trigger the camera to start according to a program or instruction in the internal memory 221, thereby allowing the camera to capture at least one image and perform corresponding processing on the at least one image according to the program or instruction, such as image post-processing (e.g., skin smoothing, super-resolution processing to improve clarity). After processing, the processed image can be displayed on the display screen 260.
[0094] In some embodiments, the ISP in processor 210 can be used to process data fed back from the image sensor. For example, when taking a picture, the shutter is opened, and light is transmitted through the lens to the camera's photosensitive element. The light signal is converted into an electrical signal, and the camera's photosensitive element then performs analog-to-digital (A / D) conversion on the electrical signal, outputting the corresponding digital signal. The digital signal is passed to the ISP for processing to convert it into an image visible to the naked eye. The digital signal output by the camera sensor to the ISP can be understood as the raw image captured by the camera, i.e., a RAW image. The ISP can perform ISP processing on the RAW image to ultimately generate a YUV format image.
[0095] For example, ISP processing may include: black frame correction, bad pixel correction (DPC), RAW domain noise reduction, black level correction (BLC), lens shading correction (LSC), auto white balance (AWB) gain, green balance correction, demosica color interpolation, color correction matrix (CCM), dynamic range compression (DRC), gamma, 3D lookup table (LUT), YUV domain noise reduction, sharpening, and detail enhancement. The ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
[0096] In some embodiments, some functions of the ISP can be located in the image sensor, while other ISP processing can be retained in the ISP device of the processor. For example, the black frame correction, bad pixel correction, RAW domain noise reduction, lens brightness correction, black level correction, automatic white balance gain, green balance correction, DRC, and other ISP processing described above can be located in the camera or other sensors with video recording capabilities. Optionally, the ISP's function of optimizing parameters such as exposure and color temperature of the shooting scene can also be integrated into the camera end. The de-mosaic color interpolation, color correction, gamma correction, 3D lookup table, YUV domain noise reduction, sharpening, and detail enhancement processing described above can be located in the processor.
[0097] Digital signal processors (DSPs) are used to process digital signals, including digital image signals and other digital signals. For example, when a terminal device selects a frequency, a DSP can perform a Fourier transform on the frequency energy.
[0098] Codecs can be used to compress or decompress digital video. In this embodiment, the terminal device can support one or more video codecs. Thus, the terminal device can play or record videos in various encoding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG 2, MPEG 3, MPEG 4, etc.
[0099] An NPU (Neural Processing Unit) is a computational processor for neural networks (NNs). By borrowing the structure of biological neural networks, such as the transmission patterns between neurons in the human brain, it can rapidly process input information and continuously learn on its own. NPUs enable intelligent cognitive applications in terminal devices, such as image recognition, facial recognition, speech recognition, and text understanding.
[0100] The external storage interface 220 can be used to connect an external storage card, such as a Micro SD card, to expand the storage capacity of the terminal device. The external storage card communicates with the processor 210 through the external storage interface 220 to perform data storage functions. For example, music, video, and other files can be saved on the external storage card.
[0101] Internal memory 221 can be used to store executable program code, including instructions. Internal memory 221 may include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as a camera application), etc. The data storage area may store data created during the use of the terminal device (such as images or videos captured by a camera), etc. Furthermore, internal memory 221 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc. Processor 210 executes various functional applications and data processing of the terminal device by running instructions stored in internal memory 221 and / or instructions stored in memory located within the processor.
[0102] It is understood that the structure shown in Figure 2 does not constitute a specific limitation on the terminal device. In some embodiments, the terminal device may also include more or fewer components than those shown in Figure 2, or combine some components, or split some components, or have different component arrangements, etc. Alternatively, some of the components shown in Figure 2 may be implemented in hardware, software, or a combination of software and hardware.
[0103] Additionally, when the terminal device is other mobile terminals such as tablets, wearable devices, AR / VR devices, laptops, UMPCs, netbooks, and PDAs, or professional shooting equipment such as digital cameras, SLR / mirrorless cameras, action cameras, gimbal cameras, and drones, the specific structure of these other terminal devices can also be referenced in Figure 2. For example, other terminal devices may have components added or removed based on the structure given in Figure 2, which will not be described in detail here.
[0104] It should also be understood that one or more camera applications may run on the terminal device to enable the taking of photos. For example, the camera application may include a system-level application called "Camera." Alternatively, the camera application may include other applications installed on the terminal device that can be used for taking photos. Furthermore, the camera application may be other types of applications that also have photo-taking functionality. This application does not specifically limit these aspects.
[0105] In one embodiment of this application, the cloud device 120 in FIG1 can construct one or more mapping relationships and provide the constructed one or more mapping relationships to the terminal device 110 for use, so as to provide a solution for quality enhancement of the decoded RAW data on the terminal device 110 side, thereby reducing the loss of compressed information in the RAW data. Alternatively, the terminal device 110 can also send the encoded bitstream obtained after encoding the RAW data to the cloud device 120. The cloud device 120 can receive the encoded bitstream from the terminal device 110 and decode it. It can also directly use the one or more mapping relationships to enhance the quality of the decoded RAW data, so as to reduce the loss of compressed information in the RAW data and improve the accuracy of back-end processing.
[0106] In the cloud device 120, encoding and / or decoding are required in different processing flows during the mapping relationship construction and application phases. For ease of distinction, the encoding process used for data acquisition during the construction phase can be referred to as the first encoding process in the following text. The cloud device can use the first decoding process corresponding to the first encoding process to decode the acquired first encoded bitstream (including the first encoded bitstream received in real time or the first encoded bitstream obtained from the local storage medium) and obtain the first RAW image from the first encoded bitstream.
[0107] When constructing a mapping relationship (e.g., denoted as a first mapping relationship), the cloud device can also perform a second encoding process on the first RAW image decoded from the first encoded bitstream to obtain a second encoded bitstream, and then use the corresponding second decoding process to decode the second encoded bitstream, and use the decoded second RAW image to construct the first mapping relationship. In the application phase, the terminal device can use the second encoding process to encode its own obtained image / video stream to obtain a third encoded bitstream, and store the third encoded bitstream, or send the third encoded bitstream to the cloud device. The cloud device can use the second decoding process to decode the third encoded bitstream to obtain a third RAW image, and based on the first mapping relationship, obtain the RGB image corresponding to the third RAW image for subsequent processing. The terminal device itself can also obtain the third encoded bitstream from local storage media, use the second decoding process to decode the third encoded bitstream to obtain a third RAW image, and based on the first mapping relationship, obtain the RGB image corresponding to the third RAW image for outputting the RGB image.
[0108] Similarly, when constructing another mapping relationship (e.g., represented as a second mapping relationship), the cloud device can, for example, perform a third encoding process on the first RAW image decoded from the first encoded bitstream to obtain a fourth encoded bitstream, and then use the third decoding process corresponding to the third encoding process to decode the fourth encoded bitstream, and use the decoded fourth RAW image to construct the second mapping relationship. In the application phase, the terminal device can use the third encoding process to encode its own obtained image / video stream to obtain a fifth encoded bitstream, and store the fifth encoded bitstream, or send the fifth encoded bitstream to the cloud device. The cloud device can use the third decoding process to decode the fifth encoded bitstream to obtain a fifth RAW image, and based on the second mapping relationship, obtain the RGB image corresponding to the fifth RAW image for subsequent processing. The terminal device itself can also obtain the fifth encoded bitstream from local storage media, use the third decoding process to decode the fifth encoded bitstream to obtain a fifth RAW image, and based on the second mapping relationship, obtain the RGB image corresponding to the fifth RAW image for outputting the RGB image.
[0109] It should be understood that the above description of the mapping relationship constructed by the cloud device 120 is merely an example and does not constitute any limitation. In other embodiments, one or more mapping relationships may be constructed by the terminal device and provided for use by itself or the cloud device. This application embodiment does not specifically limit this. The implementation details of the terminal device constructing one or more mapping relationships are similar to those of the cloud device constructing one or more mapping relationships, and can be referred to each other, and will not be repeated here.
[0110] To facilitate understanding, the following section will take the construction of one or more mapping relationships on a cloud device as an example, and provide a detailed introduction with reference to the accompanying diagram.
[0111] During the mapping relationship construction phase, as shown in Figure 3, the cloud device can perform the following steps:
[0112] S310: Decode the first RAW image from the first encoded bitstream.
[0113] In this embodiment of the application, the first encoded bitstream may be, for example, an encoded bitstream of multiple still images, or an encoded bitstream of a video stream. Accordingly, the number of the first RAW images may be multiple.
[0114] When implementing S310, the cloud device can receive a first encoded bitstream from a terminal device, which can be the terminal device shown in Figure 1 or Figure 2, or a specific acquisition device. Alternatively, when implementing S310, the cloud device can also obtain the first encoded bitstream locally from a storage medium, and the first encoded bitstream stored in the local storage medium can be obtained and stored from a specific acquisition device. This application embodiment does not specifically limit the method of obtaining the first encoded bitstream.
[0115] The specific acquisition device refers to a device capable of acquiring real images or video streams, which can be used to provide a first encoded bitstream to the cloud device 120. The encoder of the acquisition device can be a hardware encoder used to perform encoding operations. The encoder supports an image bit width depth greater than or equal to 8 bits. The encoder can adopt any of the following encoding standards: H.264, H.265, H.266, or JPEG low-latency encoding.
[0116] For example, as shown in Figure 4, the acquisition device may include, but is not limited to, the following types of devices:
[0117] The vehicle is equipped with cameras, sensors, and other devices to collect environmental information around the vehicle and obtain images or video streams. The camera device can be a monocular camera, a binocular camera, or similar. The camera's field of view can be the vehicle's external or internal environment. The vehicle's sensors can include radar such as lidar, millimeter-wave radar, and ultrasonic radar for acquiring environmental information, and can also include an inertial navigation system (e.g., a global navigation satellite system (GNSS), an inertial measurement unit (IMU), etc.) for acquiring the vehicle's pose. This application does not specifically limit the implementation of the vehicle's camera and sensor devices in this embodiment.
[0118] A roadside unit (RSU) is a communication-enabled device installed on one or both sides of a road. Typically, a RSU establishes a connection with the onboard unit (OBU) of a vehicle as it passes, enabling vehicle identification. In this embodiment, the RSU may also encapsulate functional modules such as cameras, radar, or laser emitters, allowing it to monitor the road in real time and obtain images or video streams.
[0119] A third-party server refers to a server set up in a third-party organization that is independent of the cloud device 120 and the terminal device 110 and has sharing functions. In the field of vehicle networking, third-party organizations may include, for example, managers of the transportation system, such as the Ministry of Transport, or urban transportation management offices or bureaus at all levels. Alternatively, they may include managers of other systems, such as map providers or entertainment media streaming providers. This application does not specifically limit the implementation method of the data collection device in its embodiments.
[0120] The aforementioned acquisition device can perform a first encoding process on the acquired image or video stream to obtain a first encoded bitstream, and then send the first encoded bitstream to the cloud device.
[0121] In this embodiment, the acquisition device can also adopt the structure shown in Figure 2. Specifically, the acquisition device may include an image acquisition module and a preprocessing module. The image acquisition module can be, for example, the image sensor shown in Figure 2, and the preprocessing module can be, for example, a module integrated into the processor 210 shown in Figure 2. Before implementing S310, the image acquisition module in the acquisition device can acquire RAW data corresponding to the current scene in real time, such as an image in CFA format. The preprocessing module can directly perform a first encoding process on the CFA format image to obtain a first encoded bitstream. Alternatively, for example, the preprocessing module can also perform a first encoding process on other formats of images obtained after preprocessing the CFA format image to obtain a first encoded bitstream. For example, after preprocessing the CFA format image, an RGGB format image, a GRGB format image, or a YCoCgDg format image can be obtained, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component. Accordingly, the first RAW image decoded by the cloud device from the first encoded bitstream may include any of the following: an image in CFA format, an image in RGGB format, an image in GRGB format, or an image in YCoCgDg format.
[0122] Taking the preprocessing of a CFA format image to obtain a YCoCgDg format image as an example, Y represents the luminance component, Co and Cg represent the chromaticity components, and Dg represents the difference component. The YCoCgDg format image can be obtained by directly performing a color gamut transformation on a Bayer raw format image, without needing to convert to RGB format via demosaicing or color difference. Alternatively, the preprocessing module can perform a non-linear transformation on the Bayer raw format image to obtain a transformed RAW image, then perform a color gamut transformation on the transformed RAW image to obtain four component images in the YCoCgDg format, and finally stitch the four component images together according to a set stitching method to obtain the final YCoCgDg format image.
[0123] By performing a nonlinear transformation on the CFA format RAW image, the data distribution of the RAW image can be made more suitable for subsequent encoding processing, thereby improving image encoding performance and reducing image encoding latency. By performing a color gamut transformation on the RAW image, correlations between different channels and redundant information between color signals can be removed, improving image compression performance and reducing computational complexity (or image processing complexity). For example, computational complexity can be reduced by more than 60%, thus achieving improved image compression performance. The RAW image obtained by stitching four components according to a set stitching method is suitable for the YUV format supported by the encoder and can directly reuse existing encoding standards. For example, the first encoding process can use any of the following encoding standards: H.264, H.265, H.266, or JPEG low-latency encoding standard.
[0124] It should be understood that this is merely an illustrative example of processing methods for RAW images of different formats and does not constitute any limitation. In specific implementations, the image acquisition module of the acquisition device may also possess some ISP functions, referred to as internal ISP processing. The CFA format image provided by the image acquisition module can be a RAW image processed by the internal ISP. For example, internal ISP processing may include at least one of the following: black frame correction, bad pixel correction, RAW domain noise reduction, lens brightness correction, black level correction, automatic white balance gain, green balance correction, DRC, etc. Optionally, the acquisition device may also include an ISP module independent of the image acquisition module and the preprocessing module, referred to as an external ISP processing module. As needed, this external ISP processing module can be used to perform other ISP processing on the CFA format image provided by the image acquisition module to obtain RGB images or YUV images, etc. For example, the external ISP module may include at least one of the following: demosaic color interpolation, color correction, gamma correction, 3D lookup table, YUV domain noise reduction, sharpening, detail enhancement, etc.
[0125] In this embodiment of the application, to ensure the accuracy of one or more mapping relationships constructed by the cloud device, one implementation method is to use lossless compression during data acquisition and the first encoding process on the acquisition device side. Lossless compression here includes using a first compression ratio to perform a first encoding process on the target RAW data to obtain a first encoded bitstream. The target RAW data can be, for example, an image in CFA format, RGGB format, GRGB format, or YCoCgDg format as described above. The first compression ratio is less than or equal to a first threshold, which can be, for example, 10 or other values. That is, using a smaller compression ratio for encoding processing results in minimal image quality loss due to compression and decompression, which can be considered lossless compression. It should be understood that when the first compression ratio is 0, i.e., no compression, it can be considered a special type of compression processing.
[0126] S320: Perform image signal processing (ISP) on the first RAW image to obtain the first RGB image.
[0127] For example, the first RAW image is subjected to at least one of the following processes: demosaic color interpolation, color correction, gamma correction, 3D lookup table, YUV domain noise reduction, sharpening, detail enhancement, etc., in order to convert the RAW image into an RGB image.
[0128] Since the first encoded bitstream is lossless compressed and transmitted, the first RAW image decoded from it essentially retains the original information of the original RAW image. If the original RAW image is a high-resolution or high-quality image, the first RGB image obtained after decoding and ISP processing will still be a high-quality RGB image. Based on this first RGB image, one or more mapping relationships can be constructed. In different encoding scenarios, the corresponding mapping relationships can be used to obtain (e.g., inference) high-quality RGB images corresponding to lossy RAW data, thereby ensuring the sharpness, color accuracy, dynamic range, and overall visual quality of the obtained RGB images, while also ensuring the accuracy of image-based back-end processing.
[0129] In an optional implementation, the first RGB image can also be converted to a first YUV image in S320 if necessary. Thus, one or more mapping relationships in this application embodiment can also be used to characterize the mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding YUV image. The constructed mapping relationship can also be used to obtain a high-quality YUV image corresponding to the RAW data, so as to ensure the sharpness, color accuracy, dynamic range, and overall visual quality of the obtained YUV image, while ensuring the accuracy of image-based back-end processing. This application embodiment does not impose any limitations on the image format involved in the mapping relationship.
[0130] S330: Encode the first RAW image to obtain the second encoded bitstream.
[0131] For example, the second encoding process described above is performed on the first RAW image to obtain a second encoded bitstream. Exemplarily, the second encoding process in this embodiment of the application may be to encode the first RAW image using any of the following encoding standards to obtain a second encoded bitstream: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, or JPEG low-latency encoding standard.
[0132] S340: Decode the second RAW image from the second encoded bitstream. For example, use the second decoding process corresponding to the second encoding process to decode the second RAW image from the second encoded bitstream.
[0133] In this embodiment of the application, to ensure the accuracy of one or more mapping relationships constructed by the cloud device, one implementation method is to employ lossy compression during the second encoding process on the cloud device side. This lossy compression includes using a second compression ratio to perform the second encoding and corresponding second decoding processes on the first RAW image, and this second compression ratio is greater than the first compression ratio described above. For example, the second compression ratio can be a large value, such as 40, 60, 80, 100, 120, 240, etc. Or, for example, the second compression ratio can be greater than a second threshold, such as 80 or other values. In other words, using a larger compression ratio for encoding results in a significant and non-negligible loss of image quality during compression and decompression, and is therefore considered lossy compression.
[0134] In this embodiment of the application, the first RGB image and the second RAW image can satisfy a first mapping relationship, which can be obtained, for example, based on step S350 as shown in FIG5:
[0135] S350: Construct a first mapping relationship based on the first RGB image and the second RAW image. It should be noted that this first mapping relationship can be expressed in any way, and this application does not limit the form and content of the specific mapping relationship.
[0136] In the embodiments of this application, one or more mapping relationships can be implemented as artificial intelligence (AI) neural network models, such as AI-ISP models. This allows the powerful nonlinear fitting capabilities of neural network models to learn the mapping relationship between lossy RAW data and corresponding high-quality RGB images. This enables the trained AI-ISP model to infer high-quality RGB images from lossy RAW data, thereby obtaining high-quality RGB images and reducing the loss of compressed information in RAW data.
[0137] When implementing S350, a cloud device can, for example, train a first AI neural network model based on a first RGB image and a second RAW image. This first AI neural network model can be used to represent a first mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image. The encoded bitstream to which the first AI neural network model is applicable (e.g., represented as a third encoded bitstream) uses a second compression ratio. In other words, the first AI neural network model uses the same compression ratio for the encoded bitstream it targets during both the training and application phases.
[0138] The cloud device can provide the constructed first mapping relationship to the terminal device to offer a solution for enhancing the quality of the decoded RAW data on the terminal device 110 side, thereby reducing the loss of compressed information in the RAW data. For example, after the terminal device decodes the encoded bitstream obtained from the local storage medium, it uses the decoded RAW data as input data to the first AI neural network model and obtains the model output data, which includes a high-quality RGB image of the lossy RAW data. Or, for example, after the terminal device decodes the encoded bitstream received from another terminal device, it uses the decoded RAW data as input data to the first AI neural network model and obtains the model output data, which includes a high-quality RGB image of the lossy RAW data.
[0139] Alternatively, the cloud device itself can use this first mapping relationship to enhance the quality of the decoded RAW data, reducing the loss of compressed information and improving the accuracy of backend processing. For example, the cloud device can decode the encoded bitstream received from the terminal device, use the decoded RAW data as input data to a first AI neural network model, and obtain model output data, which includes a high-quality RGB image of the lossy RAW data. Or, for example, the cloud device can decode the encoded bitstream obtained from local storage media, use the decoded RAW data as input data to a first AI neural network model, and obtain model output data, which includes a high-quality RGB image of the lossy RAW data.
[0140] If the cloud device needs to build more mapping relationships, it can repeat the above S330-S350 based on the first RAW image by changing one or more encoding / decoding parameters.
[0141] Taking the change of the second compression ratio to the fourth compression ratio and the construction of the second mapping relationship based on the first RAW image as an example, the above S330-S350 executed by the cloud device can be replaced by steps S330a-S350a as shown in Figure 6a:
[0142] S330a: Encode the first RAW image to obtain a fourth encoded bitstream. For example, perform fourth encoding processing on the first RAW image to obtain a fourth encoded bitstream. This fourth encoding process is similar to the second encoding process described above and can employ any of the following encoding standards: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, or JPEG low-latency encoding standard.
[0143] In practice, the fourth encoding process can use the same encoding standard as the second encoding process, for example, both can use the H.266 encoding standard. Alternatively, the fourth encoding process can use a different encoding standard than the second encoding process, for example, the fourth encoding process can use the H.266 encoding standard while the second encoding process uses the H.265 encoding standard.
[0144] S340a: Decode the fourth RAW image from the fourth encoded bitstream. For example, use the third decoding process corresponding to the third encoding process to decode the fourth RAW image from the fourth encoded bitstream.
[0145] In this embodiment, the fourth compression ratio is different from the second compression ratio corresponding to the second encoding process. The fourth encoding process can also be lossy compression. For example, the fourth compression ratio can be a large value, such as 60, 80, 100, 120, 240, etc. Or, for example, the fourth compression ratio can be greater than a fourth threshold, such as 100, or other values.
[0146] In this embodiment of the application, a second mapping relationship can be satisfied between the first RGB image and the fourth RAW image. This second mapping relationship can be obtained, for example, based on step S350a as shown in FIG6a:
[0147] S350a: Construct a second mapping relationship based on the first RGB image and the fourth RAW image. Similarly, this second mapping relationship can be expressed in any way, and this application does not limit the form and content of the specific mapping relationship.
[0148] Taking this second mapping relationship, which is also an AI-ISP model, as an example, when implementing S350a, the cloud device can, for example, train a second AI neural network model based on the first RGB image and the fourth RAW image. This second AI neural network model can be used to represent the second mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image. The encoded bitstream to which the second AI neural network model is applicable (e.g., represented as the fifth encoded bitstream) uses the fourth compression ratio. That is, the second AI neural network model uses the same compression ratio for the encoded bitstream during both the training and application phases.
[0149] Cloud devices can provide the constructed second mapping relationship to terminal devices to offer a solution for enhancing the quality of decoded RAW data on the terminal device 110 side, thereby reducing the loss of compressed information in the RAW data. For example, after the terminal device decodes the encoded bitstream obtained from the local storage medium, it uses the decoded RAW data as input data to a second AI neural network model and obtains the model's output data, which includes a high-quality RGB image of the lossy RAW data. Alternatively, for example, after the terminal device decodes the encoded bitstream received from another terminal device, it uses the decoded RAW data as input data to a second AI neural network model and obtains the model's output data, which includes a high-quality RGB image of the lossy RAW data.
[0150] Alternatively, the cloud device itself can use this second mapping relationship to enhance the quality of the decoded RAW data, reducing the loss of compressed information and improving the accuracy of backend processing. For example, the cloud device can decode the encoded bitstream received from the terminal device, use the decoded RAW data as input data to a second AI neural network model, and obtain the model's output data, which includes a high-quality RGB image of the lossy RAW data. Or, for example, the cloud device can decode the encoded bitstream obtained from local storage media, use the decoded RAW data as input data to a second AI neural network model, and obtain the model's output data, which includes a high-quality RGB image of the lossy RAW data.
[0151] Therefore, by using the above method, cloud devices or terminal devices can enhance the quality of RAW data decoded from encoded bitstreams based on different compression ratios by using the first and second mapping relationships respectively. This ensures the clarity, color accuracy, dynamic range, and overall visual quality of the obtained RGB images in different encoding scenarios, while also guaranteeing the accuracy of image-based backend processing. Furthermore, this method allows cloud devices to customize personalized mapping relationships (AI-ISP model) for terminal devices supporting different compression ratios, meeting the diverse encoding / decoding and image enhancement needs of different terminal devices, thus broadening the application scenarios.
[0152] It should be understood that the above description of the construction process of the first and second mapping relationships is merely an example and does not constitute any limitation. In an alternative implementation, a mapping relationship can also be used to characterize the mapping relationship between RAW images decoded from encoded bitstreams based on different compression ratios and their corresponding RGB images.
[0153] Taking the construction of a mapping relationship (e.g., represented as a third mapping relationship) based on two compression ratios (e.g., the second and fourth compression ratios mentioned above) as an example, the cloud device can execute the steps shown in Figure 6b after S320:
[0154] S330b: Encode the first RAW image to obtain a second encoded bitstream. For example, using a second compression ratio, perform a second encoding process on the first RAW image to obtain a second encoded bitstream.
[0155] S340b: Decode the second RAW image from the second encoded bitstream. For example, use the second decoding process corresponding to the second encoding process to decode the second RAW image from the second encoded bitstream.
[0156] S350b: Encode the first RAW image to obtain a fourth encoded bitstream. For example, using a fourth compression ratio, perform fourth encoding processing on the first RAW image to obtain a fourth encoded bitstream.
[0157] S360b: Decode the fourth RAW image from the fourth encoded bitstream. For example, use the fourth decoding process corresponding to the fourth encoding process to decode the fourth RAW image from the fourth encoded bitstream.
[0158] In this embodiment of the application, the first RGB image, the second RAW image, and the fourth RAW image can satisfy a third mapping relationship, which can be obtained, for example, based on step S370 as shown in FIG6b:
[0159] S370: Construct a third mapping relationship based on the first RGB image, the second RAW image, and the fourth RAW image. Similarly, this third mapping relationship can be expressed in any way, and this application does not limit the form and content of the specific mapping relationship.
[0160] Taking this third mapping relationship, which is also the AI-ISP model, as an example, when implementing S370, the cloud device can train a third AI neural network model based on the first RGB image, the second RAW image, and the fourth RAW image. This third AI neural network model can be used to represent the third mapping relationship between the RAW image decoded from the encoded bitstream based on different compression ratios and the corresponding RGB image. The encoded bitstream applicable to the third AI neural network model can adopt the second compression ratio and / or the fourth compression ratio. In other words, the compression ratio adopted by the encoded bitstream that the third AI neural network model targets during the application phase is the compression ratio supported by the encoded bitstream that the third neural network model targets during the training phase.
[0161] It should be understood that S330b-S340b and S350b-S360b in Figure 6b can be executed sequentially, in parallel, or S350b-S360b can be executed first and then S330b-S340b. This embodiment does not specifically limit the execution order. Specific implementation details of S330b-S340b can be found above in conjunction with the description of S330-S340, and specific implementation details of S350b-S360b can be found above in conjunction with the description of S330a-S340a, and will not be repeated here. The dashed boxes in Figures 6a and 6b indicate that the corresponding steps are optional and do not constitute any limitation on the form or content of the corresponding mapping relationship.
[0162] Similarly, cloud devices can provide the constructed third mapping relationship to terminal devices to offer a solution for enhancing the quality of decoded RAW data on the terminal device 110 side, thereby reducing the loss of compressed information in the RAW data. For example, after the terminal device decodes the encoded bitstream obtained from the local storage medium, it uses the decoded RAW data as input data to a third AI neural network model and obtains the model's output data, which includes a high-quality RGB image of the lossy RAW data. Or, for example, after the terminal device decodes the encoded bitstream received from another terminal device, it uses the decoded RAW data as input data to a third AI neural network model and obtains the model's output data, which includes a high-quality RGB image of the lossy RAW data.
[0163] Alternatively, the cloud device itself can use this third mapping relationship to enhance the quality of the decoded RAW data, reducing compression loss and improving the accuracy of backend processing. For example, the cloud device can decode the encoded bitstream received from the terminal device, use the decoded RAW data as input to a third AI neural network model, and obtain the model's output data, which includes a high-quality RGB image of the lossy RAW data. Or, for example, the cloud device can decode the encoded bitstream obtained from local storage media, use the decoded RAW data as input to a third AI neural network model, and obtain the model's output data, which includes a high-quality RGB image of the lossy RAW data.
[0164] It should be understood that, when necessary, the first RGB image used in the process of constructing the second and third mapping relationships can be replaced with the first YUV image. Thus, the mapping relationships in this embodiment can also be used to characterize the mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding YUV image. The constructed mapping relationships can also be used to obtain high-quality YUV images corresponding to the RAW data, so as to ensure the clarity, color accuracy, dynamic range, and overall visual quality of the obtained YUV images, while also ensuring the accuracy of image-based back-end processing. This embodiment does not impose any limitations on the image formats involved in the mapping relationships.
[0165] The following describes an image processing method based on one or more mapping relationships according to embodiments of this application, with reference to the accompanying drawings. This method can be implemented by an image processing device, which can be deployed in a cloud device or a terminal device.
[0166] Taking the application of the first mapping relationship as an example, as shown in Figure 7, the method may include the following steps:
[0167] S710: Obtain the third encoded bitstream.
[0168] In this embodiment, the third encoded bitstream can be obtained using a third encoding process, which can be the same as the second encoding process described above. For example, the third encoding process can also use any of the following encoding standards: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, or JPEG low-latency encoding standard. Alternatively, for example, the third compression ratio used in the third encoding process is the same as the second compression ratio described above, and the third compression ratio and the second compression ratio are greater than the first compression ratio.
[0169] If the image processing device is deployed on a terminal device, when implementing S710, the image processing device can obtain the third encoded bitstream from the local storage medium of the terminal device. The third encoded bitstream stored in the local storage medium can be acquired and encoded by the terminal device's own image acquisition device, or it can be obtained from other terminal devices. If the image processing device is deployed on a cloud device, when implementing S710, the image processing device can receive the third encoded bitstream from the terminal device. Alternatively, for example, the image processing device can obtain the third encoded bitstream from the local storage medium of the cloud device, and the third encoded bitstream stored in the local storage medium can be received and stored from the terminal device. The encoder of the terminal device can be a hardware encoder used to perform encoding operations. The bit depth of the image supported by the encoder can be, for example, greater than or equal to 8 bits. The encoder can adopt any of the following encoding standards: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, and JPEG low-latency encoding standard.
[0170] S720: Decodes the third RAW image from the third encoded bitstream.
[0171] For example, a third RAW image is obtained by decoding from the third encoded bitstream using a third decoding process corresponding to the third encoding process. For instance, the third RAW image may include any of the following: an image in CFA format, an image in RGGB format, an image in GRGB format, or an image in YCoCgDg format, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component.
[0172] S730: Based on the first mapping relationship, obtain the second RGB image corresponding to the third RAW image.
[0173] As mentioned above, the first mapping relationship can be the mapping relationship between the first RGB image corresponding to the first RAW image and the second RAW image. The first RAW image can be obtained by decoding from the first encoded bitstream, and the second RAW image can be obtained by decoding from the second encoded bitstream. The second encoded bitstream can be obtained by encoding the first RAW image.
[0174] Taking the first mapping relationship as an example of a first AI neural network model, in S730, the image processing device can use the third RAW image as input data for the first AI neural network model, and the output data of the first AI neural network model includes the second RGB image. If the image processing device is deployed on a terminal device, after S730, the image processing device can provide the second RGB image to the display device of the terminal device (e.g., an in-vehicle display screen) so that the display device can output the second RGB image. Alternatively, if the terminal device also includes a functional module for performing back-end processing, the image processing device can also provide the second RGB image to the module of the terminal device for performing back-end processing so that back-end processing can be performed based on the second RGB image. If the image processing device is deployed on a cloud device, after S730, the image processing device can provide the second RGB image to the module of the cloud device for performing back-end processing so that back-end processing can be performed based on the second RGB image.
[0175] In an optional implementation, if the first mapping relationship is used to characterize the mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding YUV image, the second YUV image corresponding to the third RAW image can be obtained during S730. The terminal device or cloud device where the image processing device is located can perform backend processing based on the second YUV image, which will not be elaborated further here.
[0176] If the first mapping relationship is replaced with the second mapping relationship, then steps S710-S730 shown in Figure 7 can be replaced with the following steps S710a-S730a:
[0177] S710a: Obtain the fifth encoded bitstream.
[0178] The fifth encoded bitstream can be obtained using a fifth encoding process, which can be the same as the fourth encoding process described above. For example, the fifth encoding process can also use any of the following encoding standards: H.264, H.265, H.266, or JPEG low-latency encoding. Alternatively, the fifth encoding process can use the same fifth compression ratio as the fourth compression ratio mentioned above, and both the fifth and fourth compression ratios are greater than the first compression ratio.
[0179] Similarly, if the image processing device is deployed on a terminal device, when implementing S710a, the image processing device can obtain the fifth encoded bitstream from the local storage medium of the terminal device. The fifth encoded bitstream stored in the local storage medium can be acquired and encoded by the terminal device's own image acquisition device, or it can be obtained from other terminal devices. If the image processing device is deployed on a cloud device, when implementing S710a, the image processing device can receive the fifth encoded bitstream from the terminal device. Alternatively, for example, the image processing device can obtain the fifth encoded bitstream from the local storage medium of the cloud device, which can be received and stored from the terminal device. The encoder of the terminal device can be a hardware encoder used to perform encoding operations. The bit depth of the image supported by the encoder can be, for example, greater than or equal to 8 bits. The encoder can adopt any of the following encoding standards: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, or JPEG low-latency encoding standard.
[0180] S720a: Decodes the fifth RAW image from the fifth encoded bitstream.
[0181] For example, the fifth decoding process corresponding to the fifth encoding process is used to decode the fifth encoded bitstream to obtain the fifth RAW image. For instance, the fifth RAW image may include any of the following: an image in CFA format, an image in RGGB format, an image in GRGB format, or an image in YCoCgDg format.
[0182] S730a: Based on the second mapping relationship, obtain the third RGB image corresponding to the fifth RAW image.
[0183] As mentioned above, the second mapping relationship can be the mapping relationship between the first RGB image corresponding to the first RAW image and the fourth RAW image. The first RAW image can be obtained by decoding from the first encoded bitstream, and the fourth RAW image can be obtained by decoding from the fourth encoded bitstream, which is obtained by encoding the first RAW image.
[0184] Taking the second mapping relationship as an example of a second AI neural network model, when implementing S730a, the image processing device can use the fifth RAW image as input data for the second AI neural network model, and the output data of the second AI neural network model includes the third RGB image. If the image processing device is deployed on a terminal device, after S730a, the image processing device can provide the third RGB image to the display device of the terminal device (e.g., an in-vehicle display screen) so that the display device can output the third RGB image. Alternatively, if the terminal device also includes a functional module for performing back-end processing, the image processing device can also provide the third RGB image to the module of the terminal device for performing back-end processing so that back-end processing can be performed based on the third RGB image. If the image processing device is deployed on a cloud device, after S730a, the image processing device can provide the third RGB image to the module of the cloud device for performing back-end processing so that back-end processing can be performed based on the third RGB image.
[0185] In an optional implementation, if the second mapping relationship is used to characterize the mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding YUV image, the third YUV image corresponding to the fifth RAW image can be obtained when implementing S730a. The terminal device or cloud device where the image processing device is located can perform backend processing based on the third YUV image, which will not be elaborated here.
[0186] The application of the third mapping relationship is similar to that of the first and second mapping relationships. For detailed implementation, please refer to the relevant introduction above, which will not be repeated here.
[0187] Therefore, by employing the methods described above, based on the encoding and compression techniques for RAW data, and through the establishment of one or more mapping relationships, image restoration and enhancement of the decoded RAW data can be achieved. This effectively enhances the quality of the decompressed image, thus minimizing the loss of compressed information in the RAW data. When performing backend processing based on the enhanced RAW data, the effectiveness of the backend processing tasks can be effectively guaranteed, thereby improving the closed-loop efficiency and benefits of RAW data processing.
[0188] To facilitate understanding, the implementation details of the above image processing method will be illustrated below in conjunction with the modular structure of terminal devices or cloud devices.
[0189] As shown in Figure 8, in one example, the terminal device (e.g., a vehicle) may include a camera, a deserializer, a preprocessing module, an encoding module 1, a communication module 1, a storage module, a decoding module, an image enhancement processing module, and a back-end processing / display module. The camera may be based on the image sensor described in Figure 2, and the preprocessing module may be based on the preprocessing module described in Figure 2.
[0190] In one example, encoding module 1 can collaborate with other modules to implement the RAW domain encoding process involved in the image processing method of this application embodiment, so as to provide a first encoded bitstream to the cloud device. As shown in FIG9, the method may include the following steps:
[0191] S901: After the camera's photo or video recording function is triggered on the vehicle side, the camera can capture multiple images or video streams and send the captured images or video streams to the deserializer. Optionally, the camera can integrate some ISP functions (such as the internal ISP processing described above) to perform simple processing on the raw RAW images.
[0192] S902: The deserializer can receive multiple image or video streams from a camera. The camera may send a serial data stream to the deserializer, for example, as a media stream. The deserializer can convert this serial data stream (e.g., a RAW image in CFA format) into a parallel data stream for subsequent processing by encoding module 1, preprocessing module, ISP module, or other modules.
[0193] S903: The preprocessing module can obtain a RAW image through a deserializer and preprocess the RAW image to obtain an image in RGGB or YCoCgDg format. Optionally, a GRGB format image can also be obtained after preprocessing. The preprocessing module can send the RGGB or YCoCgDg format image to the encoding module 1. Detailed implementation details can be found in the preceding descriptions in conjunction with Figures 2-7, and will not be repeated here.
[0194] S904: Encoding module 1 can perform a first encoding process on any one of the following image formats: CFA, RGGB, GRGB, or YCoCgDg, to obtain a first encoded bitstream. This first encoded bitstream is a lossless encoded bitstream. For specific implementation details, please refer to the relevant introduction in conjunction with Figures 1-7 above, which will not be repeated here.
[0195] S905: Encoding module 1 can send the first encoded bitstream to the cloud device via communication module 1. Optionally, encoding module 1 can store the first encoded bitstream via storage module.
[0196] Accordingly, the cloud device can receive the first encoded bitstream from the terminal device and construct one or more mapping relationships based on the first RAW image decoded from the first encoded bitstream. The construction process can be referred to in the following description in conjunction with Figures 11-12, and will not be elaborated here.
[0197] In another example, if the terminal device itself has an image enhancement processing module, this module can work with other modules to implement the RAW image enhancement processing flow involved in the image processing method of this application embodiment. As shown in Figure 10, the method may include the following steps:
[0198] S1001: The image enhancement processing module can obtain the first mapping relationship from the cloud device.
[0199] S1002: Encoding module 1 can perform third encoding processing on any one of the following image formats: CFA, RGGB, GRGB, or YCoCgDg, to obtain a third encoded bitstream. Here, R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component. Encoding module 1 can store the third encoded bitstream through a storage module. Optionally, encoding module 1 can send the third encoded bitstream to a cloud device through communication module 2. This third encoded bitstream is a lossy encoded bitstream; specific encoding processing details can be found in the relevant descriptions above in conjunction with Figures 1-7, and will not be repeated here.
[0200] S1003: The decoding module can obtain the third encoded bitstream from the storage module and decode it to obtain the third RAW image. The third RAW image includes any of the following: an image in CFA format, an image in RGGB format, an image in GRGB format, or an image in YCoCgDg format. The decoding module can send the third RAW image to the image enhancement processing module.
[0201] S1004: The image enhancement processing module obtains the second RGB image corresponding to the third RAW image based on the first mapping relationship. The image enhancement processing module can send the second RGB image to the back-end processing module / display module of the terminal device. The back-end processing module can perform back-end processing based on the second RGB image. The display module can output the second RGB image.
[0202] It should be understood that the above description in conjunction with Figures 8 and 9 only indicates that the terminal device in this embodiment supports the acquisition and lossless transmission of RAW data to provide lossless RAW data to the cloud device, so that the cloud device can construct one or more mapping relationships based on the lossless RAW data. The above description in conjunction with Figure 10 only indicates that the terminal device in this embodiment can also utilize one or more mapping relationships provided by the cloud device to restore or enhance the quality of the decoded RAW data, thereby reducing the loss of compressed information in the RAW data, and does not constitute any limitation on the functionality of the terminal device or the cloud device.
[0203] In optional implementations, the terminal device can also support the acquisition and transmission of RGB or YUV data. For example, the terminal device may also include an ISP module, a communication automation (SA) module, a distortion correction module involved in vision preprocessing (VPC), a stitching module, an image preprocessing module, etc. The ISP module processes the RAW image provided by the deserializer to obtain a YUV format image. After SA processing and VPC distortion correction processing, the YUV format image is sent to the encoding module 2. The encoding module 2 can encode the YUV format image and then send it to the cloud device through the communication module 2, thus realizing the encoding and transmission of the YUV format image.
[0204] It should be understood that the modules in Figure 8 are merely examples and not any limitation. In specific implementations, encoding module 1 and encoding module 2 in Figure 8 can be the same encoding module, and communication module 1 and communication module 2 can also be the same communication module. This application embodiment does not specifically limit the implementation method of each module.
[0205] As shown in Figure 11, the cloud device may include a communication module, a decoding module 1, an ISP module, a lossy encoding / decoding module, and a training module. The training module can collaborate with other modules to implement the process of constructing mapping relationships in the image processing method of this application embodiment. As shown in Figure 12, the method may include the following steps:
[0206] S1201: Decoding module 1 receives the first encoded bitstream from the acquisition device via the communication module. Alternatively, decoding module 1 obtains the first encoded bitstream from the storage module. The acquisition device may include the terminal device described above. The first encoded bitstream is a lossless encoded bitstream; specific implementation details can be found in the relevant descriptions above in conjunction with Figures 1-9, and will not be repeated here. The communication module sends the first encoded bitstream to the decoding module.
[0207] S1202: Decoding module 1 decodes the first RAW image from the first encoded bitstream. The decoding module provides the first RAW image to the ISP module and the lossy encoding / decoding module respectively.
[0208] S1203: The ISP module performs ISP processing on the first RAW image to obtain the first RGB image.
[0209] S1204: The lossy encoding / decoding module acquires one or more lossy RAW data based on the first RAW image.
[0210] For example, the lossy encoding / decoding module performs a second encoding process on the first RAW image to obtain a second encoded bitstream, and decodes the second encoded bitstream to obtain a lossy second RAW image. Or, for example, the lossy encoding / decoding module performs a fourth encoding process on the first RAW image to obtain a fourth encoded bitstream, and decodes the fourth encoded bitstream to obtain a lossy fourth RAW image.
[0211] S1205: The training module constructs one or more mapping relationships based on the first RGB image and one or more lossy RAW data.
[0212] For example, the training module constructs a first mapping relationship based on the first RGB image and the second RAW image. Or, for example, the training module constructs a second mapping relationship based on the first RGB image and the fourth RAW image. Or, for example, the training module constructs a third mapping relationship based on the first RGB image, the second RAW image, and the fourth RAW image. Specific implementation details can be found in the above description in conjunction with Figures 3, 5, 6a, 6b, and 7, and will not be repeated here.
[0213] The training module can provide one or more mapping relationships to the terminal device. Alternatively, the training module can provide one or more mapping relationships to the image enhancement processing module of the cloud device. Accordingly, the image enhancement processing module of the terminal device or the cloud device can perform image enhancement on the RAW data based on one or more mapping relationships to reduce the loss of compressed information in the RAW data.
[0214] As shown in Figure 13, the cloud device may further include a decoding module 2, an image enhancement processing module, and a backend processing module. The image enhancement processing module can work with other modules to implement the application mapping relationship process in the image processing method of this application embodiment. As shown in Figure 14, the method may include the following steps:
[0215] S1401: Decoding module 2 receives the third and / or fifth encoded bitstreams from the terminal device via the communication module. Alternatively, decoding module 2 obtains the third and / or fifth encoded bitstreams from the storage module.
[0216] S1402: Decoding module 2 decodes the third RAW image from the third encoded bitstream. And / or, decoding module 2 decodes the fifth RAW image from the fifth encoded bitstream. The third or fifth RAW image can be any of the following: an image in CFA format, an image in RGGB format, an image in GRGB format, or an image in YCoCgDg format. Decoding module 2 sends the third RAW image and / or the fifth RAW image to the image enhancement processing module.
[0217] S1403: The image enhancement processing module can process the third RAW image and / or the fifth RAW image based on one or more mapping relationships. For example, the image enhancement processing module can obtain the second RGB image corresponding to the third RAW image based on the first mapping relationship. Or, for example, the image enhancement processing module can obtain the third RGB image corresponding to the fifth RAW image based on the second mapping relationship. The image enhancement processing module can send the second RGB image and / or the third RGB image to the backend processing module.
[0218] S1404: The back-end processing module can perform back-end processing based on at least one of the second RGB image and the third RGB image.
[0219] It should be understood that Figures 11 and 13 merely distinguish the mapping relationship construction stage and application stage implemented by the cloud device using decoding module 1 and decoding module 2, and do not constitute any limitation. Decoding module 1 and decoding module 2 can be the same decoder. In some possible implementations, some of the above-mentioned functional modules can also be integrated into the same module. For example, decoding module 2 can be integrated with the image enhancement processing module into the same module. This application embodiment does not specifically limit this.
[0220] Based on the same concept, this application also provides an image processing apparatus suitable for the system architecture shown in FIG1. Exemplarily, the image processing apparatus can be a vehicle as shown in FIG1, or it can be a functional element (such as a plug-in, component, or chip) installed in the vehicle, which has the function of implementing an image processing method. In one example, the image processing apparatus can be any device with image processing capabilities in the vehicle (such as a computing device or control unit), such as a camera, MDC, or CDC in the vehicle. In another example, the image processing apparatus can be other devices located outside the vehicle (such as a server or cloud), or it can be a functional element with image processing capabilities installed in other devices, which has the function of implementing an image processing method. Optionally, the image processing apparatus can be used to implement the image processing method provided in the above embodiments, or a module (such as a chip) of the image processing apparatus can be used to implement the image processing method provided in the above embodiments, thus also achieving the beneficial effects of the above embodiments.
[0221] In one example, as shown in FIG15, when the image processing device 1500 is used to implement the image processing methods shown in FIG3, FIG5, FIG6a, FIG6b, and FIG12, it may include: a first decoding unit 1501, used to decode a first original RAW image from a first encoded bitstream, wherein the first RAW image includes any one of the following: an image in CFA format, an RGGB format image, or a YCoCgDg format image, wherein R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; an image signal processing unit 1502, used to perform image signal processing (ISP) on the first RAW image to obtain a first RGB image; an encoding unit 1503, used to encode the first RAW image to obtain a second encoded bitstream; and a second decoding unit 1504, used to decode a second RAW image from the second encoded bitstream; wherein the first RGB image and the second RAW image satisfy a first mapping relationship. Optionally, the device may include a mapping unit, which can, for example, construct a first mapping relationship based on the first RGB image and the second RAW image. For specific implementation details, please refer to the method steps described in the above method embodiments in conjunction with Figures 3, 5, 6a, 6b, and 12; these will not be repeated here.
[0222] In another example, as shown in FIG16, when the image processing apparatus 1600 is used to implement the image processing method shown in FIG7, FIG10, or FIG14, it may include: an acquisition unit 1601 for acquiring a third encoded bitstream; a decoding unit 1602 for decoding a third RAW image from the third encoded bitstream, wherein the third RAW image includes any one of the following: an image in CFA format, an image in RGGB format, or an image in YCoCgDg format, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component; and a mapping unit 1603 for acquiring a second RGB image corresponding to the third RAW image based on a first mapping relationship, wherein the first mapping relationship is a mapping relationship satisfied between the first RGB image corresponding to the first original RAW image and the second RAW image, the first RAW image is decoded from the first encoded bitstream, and the second RAW image is decoded from the second encoded bitstream, wherein the second encoded bitstream is obtained by encoding the first RAW image. For specific implementation details, please refer to the method steps implemented in conjunction with Figures 7, 10, or 14 in the above method embodiments, which will not be repeated here.
[0223] It should be noted that the module division in this embodiment is illustrative and represents only one logical functional division. In actual implementation, other division methods may exist. Furthermore, the functional units in each embodiment of this application can be integrated into one processing unit, exist as separate physical entities, or be integrated into one unit. For example, taking the preprocessing module and encoding module as an example, the preprocessing module and encoding module can be integrated into one module, or the preprocessing module and encoding module can be the same module. The integrated unit described above can be implemented in hardware or as a software functional unit.
[0224] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, or a server, etc.) or processor to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0225] In a simplified embodiment, those skilled in the art will realize that the image processing apparatus in the above embodiments can all adopt the form shown in FIG17. This image processing apparatus can be used to implement the technical solutions involved in the image processing apparatus in the above method embodiments, and therefore can also achieve the beneficial effects possessed by the image processing apparatus in the above method embodiments.
[0226] As shown in Figure 17, the image processing apparatus 1700 includes a transceiver 1710 and a processor 1720. Optionally, the image processing apparatus 1700 further includes a memory 1730. The transceiver 1710, processor 1720, and memory 1730 are interconnected. When the image processing apparatus 1700 is used to implement the image processing method provided in the above embodiments, the transceiver 1710 can be used to implement the data transmission and reception functions of the aforementioned terminal device or cloud device; the processor 1720 can be used to implement data processing functions for RAW domain images, or it can also be used to implement data processing functions for RGB domain images or YUV domain images.
[0227] Optionally, the transceiver 1710, processor 1720, and memory 1730 are interconnected via bus 1740. Bus 1740 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is used in Figure 17, but this does not indicate that there is only one bus or one type of bus.
[0228] Transceiver 1710 is used to receive and send data. For example, taking transceiver 1710 deployed in a terminal device as an example, transceiver 1710 can be used to achieve communication with an image acquisition module or a cloud device. In one example, the transceiver can be a transceiver device with integrated data transmission and reception functions. In another example, the transceiver can also consist of a transmitter and a receiver, where the transmitter is used to send data and the receiver is used to receive data.
[0229] Optionally, transceiver 1710 may include a transmitter and / or a receiver. The transmitter is used to send signals, messages, information, or data, etc. The receiver is used to receive signals, messages, information, or data, etc. For example, the transmitter sends signals, messages, information, or data, etc., under the control of processor 1720. The receiver receives signals, messages, information, or data, etc., under the control of processor 1720.
[0230] The functions of processor 1720 can be referred to in the above method embodiments, and will not be repeated here. Processor 1720 can be a central processing unit (CPU), a network processor (NP), or a combination of CPU and NP, etc. Processor 1720 may further include hardware chips. The aforementioned hardware chips can be application-specific integrated circuits (ASICs), programmable logic devices (PLDs), or combinations thereof. The aforementioned PLDs can be complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), generic array logic (GALs), or any combination thereof. When implementing the above functions, processor 1720 can be implemented in hardware, or it can be implemented by hardware executing corresponding software.
[0231] The memory 1730 may include volatile memory, such as random access memory (RAM), and may also include non-volatile memory, such as at least one disk storage device.
[0232] The memory 1730 stores executable program code, and the processor 1720 executes the executable program code to implement the functions of the aforementioned image processing system, thereby implementing the image processing method provided in this application embodiment. That is, the memory 1730 stores computer program instructions for executing the image processing method.
[0233] Alternatively, the memory 1730 stores executable code, which the processor 1720 executes to implement the functions of the aforementioned image processing apparatus (such as a preprocessing module, an ISP module, or an image processing enhancement module), thereby implementing the image processing method provided in the embodiments of this application. That is, the memory 1730 stores computer program instructions for the image processing apparatus to execute the image processing method provided in the embodiments of this application.
[0234] Based on the same concept, embodiments of this application also provide a possible image processing system, which may include one or more of an MDC, CDC, industrial control computer, or server (or cloud). Optionally, the image processing system may also include an image enhancement module for enhancing the quality of the decoded RAW data, such as obtaining a corresponding RGB image. The image processing system may also include a display module for displaying the RGB image. For example, an image processing system including an MDC and a CDC may be used. In one example, a preprocessing module is deployed on the MDC, and an image enhancement module is deployed on the CDC. For example, the number of MDCs or CDCs included in the image processing system may be one or more, and this application embodiment does not limit this. The image processing system may be deployed on a vehicle.
[0235] Based on the same concept, this application also provides a computer program product, which includes a computer program or instructions that, when run on a computer, cause the computer to perform the image processing method provided in the above embodiments.
[0236] Based on the same concept, embodiments of this application also provide a computer-readable storage medium storing a computer program or instructions, which, when executed by a computer, causes the computer to perform the image processing method provided in the above embodiments.
[0237] The storage medium can be any available medium that a computer can access. For example, but not limited to, a computer-readable medium can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
[0238] Based on the same concept, this application also provides a chip coupled to a memory, which is used to read a computer program stored in the memory to implement the image processing method provided in the above embodiments.
[0239] Based on the same concept, embodiments of this application also provide a chip system including a processor for supporting a computer device in implementing the functions involved in the image processing system described in the above embodiments. In one possible design, the chip system further includes a memory for storing necessary programs and data of the computer device. The chip system may be composed of chips or may include chips and other discrete components.
[0240] The methods provided in this application can be implemented entirely or partially through software, hardware, firmware, or any combination thereof. When implemented in software, they can be implemented entirely or partially in the form of a computer program product. This computer program product includes one or more computer instructions. When these computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVDs)), or semiconductor media (e.g., solid-state drives (SSDs)).
[0241] The steps of the methods described in the embodiments of this application can be directly embedded in hardware, a software unit executed by a processor, or a combination of both. The software unit can be stored in RAM, ROM, EEPROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium in the art. Exemplarily, the storage medium can be connected to the processor so that the processor can read information from the storage medium and write information to the storage medium. Optionally, the storage medium can also be integrated into the processor. The processor and the storage medium can be housed in an ASIC.
[0242] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more blocks of the flowchart illustrations and / or one or more blocks of the block diagrams.
[0243] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.
[0244] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. An image processing method, characterized in that, include: A first raw RAW image is obtained by decoding from the first encoded bitstream. The first RAW image includes any one of the following: an image in CFA format, an image in RGGB format, or an image in YCoCgDg format, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component. The first RAW image is subjected to image signal processing (ISP) to obtain the first RGB image; The first RAW image is encoded to obtain a second encoded bitstream; The second RAW image is obtained by decoding the second encoded bitstream; The first RGB image and the second RAW image satisfy a first mapping relationship.
2. The method according to claim 1, characterized in that, The first mapping relationship is used to obtain the second RGB image corresponding to the third RAW image, wherein the third RAW image is decoded from the third encoded bitstream, and the third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
3. The method according to claim 1 or 2, characterized in that, The method further includes: A first artificial intelligence (AI) neural network model is trained based on the first RGB image and the second RAW image. The first AI neural network model is used to represent the first mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image.
4. The method according to any one of claims 1-3, characterized in that, The method further includes: The first RAW image is encoded to obtain a fourth encoded bitstream; The fourth RAW image is obtained by decoding the fourth encoded bitstream; The first RGB image and the fourth RAW image satisfy a second mapping relationship.
5. The method according to claim 4, characterized in that, The second mapping relationship is used to obtain the third RGB image corresponding to the fifth RAW image, wherein the fifth RAW image is decoded from the fifth encoded bitstream, and the fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
6. The method according to claim 4 or 5, characterized in that, The method further includes: A second AI neural network model is trained based on the first RGB image and the fourth RAW image. The second AI neural network model is used to represent a second mapping relationship between the RAW image decoded from the encoded bitstream and the corresponding RGB image.
7. The method according to any one of claims 1-6, characterized in that, The method further includes: Obtain the first encoded bitstream from the local storage medium; or, Receive the first encoded bitstream from the terminal device.
8. The method according to claim 7, characterized in that, The encoder of the terminal device is used to perform encoding operations, and the encoder supports an image bit width depth greater than or equal to 8 bits.
9. The method according to any one of claims 1-8, characterized in that, The process of encoding the first RAW image to obtain the second encoded bitstream includes: The first RAW image is encoded using any of the following encoding standards to obtain the second encoded bitstream: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, Joint Picture Experts Group (JPI) JPEG low-latency coding standard.
10. An image processing method, characterized in that, Applied to an image processing apparatus, the method includes: Obtain the third encoded bitstream; A third RAW image is obtained by decoding from the third encoded bitstream. The third RAW image includes any one of the following: an image in CFA format, an image in RGGB format, or an image in YCoCgDg format, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component. Based on the first mapping relationship, the second RGB image corresponding to the third RAW image is obtained, wherein the first mapping relationship is the mapping relationship between the first RGB image corresponding to the first original RAW image and the second RAW image, the first RAW image is obtained by decoding from the first encoded bitstream, and the second RAW image is obtained by decoding from the second encoded bitstream, the second encoded bitstream is obtained by encoding the first RAW image.
11. The method according to claim 10, characterized in that, The third compression ratio of the third encoded bitstream and the second compression ratio of the second encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
12. The method according to claim 10 or 11, characterized in that, The method further includes: Obtain the fifth encoded bitstream; A fifth RAW image is obtained by decoding the fifth encoded bitstream, and the fifth RAW image includes any one of the following: an image in CFA format, an image in RGGB format, or an image in YCoCgDg format; Based on the second mapping relationship, the third RGB image corresponding to the fifth RAW image is obtained. The second mapping relationship is a mapping relationship between the first RGB image corresponding to the first RAW image and the fourth RAW image. The fourth RAW image is obtained by decoding from the fourth encoded bitstream, which is obtained by encoding the first RAW image.
13. The method according to claim 12, characterized in that, The fifth compression ratio of the fifth encoded bitstream and the fourth compression ratio of the fourth encoded bitstream are greater than the first compression ratio of the first encoded bitstream.
14. The method according to any one of claims 10-13, characterized in that, The image processing device is located in a cloud device, which is further configured to perform backend processing based on at least one of the second RGB image and the third RGB image; or... The image processing device is located in the terminal device, which is also used to output at least one of the second RGB image and the third RGB image.
15. The method according to claim 14, characterized in that, When the image processing device is located in a cloud device, the acquisition of the third encoded bitstream includes: Receive the third encoded bitstream from the terminal device.
16. The method according to claim 15, characterized in that, The encoder of the terminal device is used to perform encoding operations, and the encoder supports an image bit width depth greater than or equal to 8 bits.
17. The method according to any one of claims 10-16, characterized in that, Decoding the third RAW image from the third encoded bitstream includes: The third RAW image is obtained by decoding from the third encoded bitstream using any of the following decoding standards: H.264 encoding standard, H.265 encoding standard, H.266 encoding standard, JPEG low latency encoding standard.
18. An image processing apparatus, characterized in that, include: The first decoding unit is used to decode the first original RAW image from the first encoded bitstream. The first RAW image includes any one of the following: an image in CFA format, an image in RGGB format, or an image in YCoCgDg format, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component. An image signal processing unit is used to perform image signal processing (ISP) on the first RAW image to obtain a first RGB image; The encoding unit is used to encode the first RAW image to obtain a second encoded bitstream; The second decoding unit is used to decode the second encoded bitstream to obtain the second RAW image; The first RGB image and the second RAW image satisfy a first mapping relationship.
19. An image processing apparatus, characterized in that, include: The acquisition unit is used to acquire the third encoded bitstream; The decoding unit is used to decode a third RAW image from the third encoded bitstream. The third RAW image includes any one of the following: an image in CFA format, an image in RGGB format, or an image in YCoCgDg format, where R represents the red component, G represents the green component, B represents the blue component, Y represents the luminance component, Co and Cg represent the color components, and Dg represents the difference component. The acquisition unit is configured to acquire the second RGB image corresponding to the third RAW image based on a first mapping relationship, wherein the first mapping relationship is a mapping relationship between the first RGB image corresponding to the first original RAW image and the second RAW image, the first RAW image is obtained by decoding from a first encoded bitstream, and the second RAW image is obtained by decoding from a second encoded bitstream, the second encoded bitstream being obtained by encoding the first RAW image.
20. An electronic device, characterized in that, Includes at least one processor, said at least one processor being coupled to memory; The at least one processor is configured to execute a computer program or instructions stored in the memory to cause the electronic device to perform the method as described in any one of claims 1-9, or to perform the method as described in any one of claims 10-17.
21. A communication system, characterized in that, Includes cloud devices for implementing the method as described in any one of claims 1-9, or for implementing the method as described in any one of claims 10-17.
22. A computer-readable storage medium, characterized in that, The computer-readable medium stores program code that, when executed on a computer, causes the computer to perform the method as described in any one of claims 1-9, or the method as described in any one of claims 10-17.
23. A computer program product, characterized in that, When the computer program product is run on a computer, it causes the computer to perform the method as described in any one of claims 1-9, or the method as described in any one of claims 10-17.