Synthetic image signal processor

By using a synthetic image signal processor (ISP) to adjust the customized settings of each pixel using a machine learning model, the problem of time-consuming and expensive adjustment in traditional ISPs is solved, achieving more efficient and flexible image processing and improving image quality.

CN117561538BActive Publication Date: 2026-06-26QUALCOMM INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
QUALCOMM INC
Filing Date
2022-06-08
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Traditional image signal processors (ISPs) are time-consuming and expensive to adjust parameters when processing images, and static adjustment settings cannot adapt to different scenarios, resulting in poor image quality.

Method used

A synthetic image signal processor (ISP) is used, which utilizes one or more trained machine learning models to identify and adjust customized settings for each pixel based on image data and metadata, including parameters such as gain, offset, grayscale coefficient, and Gaussian filtering, to achieve spatially variable parameter settings.

Benefits of technology

It improves image quality, reduces adjustment time and cost, enhances the customizability and flexibility of the ISP, reduces computing resources and silicon area requirements, and improves processing efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117561538B_ABST
    Figure CN117561538B_ABST
Patent Text Reader

Abstract

Systems and techniques for image processing are described. An imaging system can include an image sensor that captures image data. An image signal processor (ISP) of the imaging system can demosaic the image data. The imaging system can input the image data, in some cases along with metadata associated with the image data, into one or more trained machine learning models. The one or more trained machine learning models can output settings for a set of parameters of the ISP based on the image data and / or the metadata. The imaging system can generate an output image by processing the image data using the ISP with the parameters of the ISP set according to the settings. Each of the pixels of the image data can be processed using respective settings for adjusting corresponding parameters. The parameters of the ISP can include gain, offset, gamma, and Gaussian filtering.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates generally to image processing, and more specifically to systems and techniques for processing image data using a synthetic image signal processor (ISP) that uses one or more trained machine learning models to identify custom settings for different pixels in the image data. Background Technology

[0002] A camera is a device that uses an image sensor to receive light and capture image frames (such as still images or video frames). A camera may include a processor (such as an image signal processor (ISP)) that can receive and process one or more image frames. For example, raw image frames captured by the camera sensor can be processed by the ISP to generate a final image. A camera can be configured with various image capture and image processing settings to alter the appearance of an image. Some camera settings, such as ISO, exposure time, aperture size, f / stop, shutter speed, focus, and gain, are determined and applied before or during photo capture. Other camera settings can configure post-processing of the photo, such as changes to contrast, brightness, saturation, sharpness, levels, curves, or colors.

[0003] Traditional image signal processors (ISPs) have separate, discrete blocks that solve individual partitions of an image-based problem space. For example, a typical ISP has discrete function blocks, each applying specific operations to raw camera sensor data to create the final output image. Such function blocks can include blocks for de-mosaicing, noise reduction (denoising), color processing, tone mapping, and many other image processing functions. Each of these function blocks contains numerous pre-adjusted parameters, resulting in an ISP with a large number (e.g., over 10,000) of pre-adjusted parameters that must be readjusted according to each client's adjustment preferences. Manually adjusting such parameters is very time-consuming and expensive, and therefore is typically performed only once. Once adjusted, a traditional ISP typically processes the image using a limited set of adjustment settings. For example, there might be one set of adjustment settings for processing low-light images and a second set for processing bright images. For any single image, static adjustment settings are used to process the entire image. Summary of the Invention

[0004] In some examples, systems and techniques for processing image data using a synthetic image signal processor (ISP) are described, which employs one or more trained machine learning models to identify customized settings for different pixels of the image data for each scene. An imaging system may include an image sensor that captures image data and an image signal processor (ISP) that processes the image data. For example, the ISP of the imaging system may demosaic the image data and / or perform one or more other functions on the image data based on one or more settings used to pre-configure or adjust the ISP. The imaging system may use one or more machine learning models (e.g., one or more trained neural networks or other types of machine learning models) to determine or adjust settings for ISP functions other than those pre-configured or adjusted to perform (e.g., demosaic and / or other functions). For example, the imaging system may input image data into one or more machine learning models, in some cases, along with metadata associated with the image data. One or more trained machine learning models may output a set of parameter settings for the ISP based on the image data and / or metadata. In some cases, these settings can be used to adjust or fine-tune the previously adjusted or initialized value of one of the parameters in a set of ISP parameters (e.g., after adjustments to the ISP have been performed previously by the original equipment manufacturer, etc.).

[0005] An imaging system can generate an output image by processing image data using an ISP (Image Signal Processor), where the parameters of the ISP are set or adjusted according to settings. The settings, provided by a machine learning model, vary spatially, allowing modification of each pixel based on the parameters set or adjusted using those settings. For example, each pixel in the image data can be processed using corresponding settings for adjusting corresponding parameters (e.g., adjusting the gain setting for a specific pixel). Using spatially varying settings and parameters provides advantages over using statically adjusted settings for processing the entire image (e.g., better output image quality, etc.). In some examples, the parameters of the ISP may include parameters for gain, offset, grayscale coefficients, and Gaussian filtering. In one example, the ISP may use a multiplier to apply gain settings for one or more gain parameters (e.g., which may correspond to different color channels) to one or more pixels. In another example, the ISP may use an adder to apply offset settings for one or more offset parameters (e.g., which may correspond to different color channels) to one or more pixels. In yet another example, the ISP may use a tone adjuster to apply grayscale coefficient settings for grayscale coefficient parameters to one or more pixels. In another example, the ISP can use a Gaussian filter to apply Gaussian filter settings for one or more Gaussian filter parameters (e.g., which can define the shape of the Gaussian curve) to one or more pixels.

[0006] In one example, an apparatus for image processing is provided. The apparatus includes a memory and one or more processors (e.g., implemented in a circuit) coupled to the memory. The one or more processors are configured to: acquire image data associated with an image frame; determine multiple settings for adjusting one or more parameters of an image signal processor (ISP) based on the output of one or more trained machine learning models using the image data as input, wherein the multiple settings vary spatially among the image data; and generate an output image by processing a plurality of pixels of the image data at least in part using the ISP, wherein each of the plurality of pixels is processed using a corresponding setting of the multiple settings for adjusting a corresponding parameter among the one or more parameters.

[0007] In another example, an image processing method is provided. The method includes: obtaining image data associated with an image frame; determining multiple settings for adjusting one or more parameters of an image signal processor (ISP) based on the output of one or more trained machine learning models using the image data as input, wherein the multiple settings vary spatially among the image data; and generating an output image by processing multiple pixels of the image data at least in part using the ISP, wherein each of the multiple pixels is processed using a corresponding setting of the multiple settings for adjusting a corresponding parameter among the one or more parameters.

[0008] In another example, a non-transitory computer-readable medium is provided having instructions stored thereon, which, when executed by one or more processors, cause the one or more processors to: acquire image data associated with an image frame; determine, based on the output of one or more trained machine learning models using the image data as input, a plurality of settings for adjusting one or more parameters of an image signal processor (ISP), wherein the plurality of settings vary spatially among the image data; and generate an output image by processing a plurality of pixels of the image data at least in part using the ISP, wherein each of the plurality of pixels is processed using a corresponding setting of the plurality of settings for adjusting a corresponding parameter among the one or more parameters.

[0009] In another example, an apparatus for image processing is provided. The apparatus includes: means for acquiring image data associated with an image frame; means for determining, based on the output of one or more trained machine learning models using the image data as input, a plurality of settings for adjusting one or more parameters of an image signal processor (ISP), wherein the plurality of settings vary spatially among the image data; and means for generating an output image by processing a plurality of pixels of the image data at least in part using the ISP, wherein each of the plurality of pixels is processed using a corresponding setting of the plurality of settings for adjusting a corresponding parameter among the one or more parameters.

[0010] In some respects, image data includes de-mosaiced image data.

[0011] In some respects, image data is raw image data with multiple color components corresponding to the color filter array of the image sensor.

[0012] In some respects, feeding image data into one or more trained machine learning models includes feeding raw image data into one or more trained machine learning models.

[0013] In some respects, in order to generate an output image, the methods, apparatus, and computer-readable media described above further include: de-mosaicing the image data before processing multiple pixels of the image data using an ISP that has been adjusted based on multiple settings for each of one or more parameters.

[0014] In some respects, the methods, apparatuses, and computer-readable media described above also include: de-mosaicing of image data before inputting the image data into one or more trained machine learning models.

[0015] In some aspects, in order to obtain image data, the methods, apparatus and computer-readable medium described above also include receiving image data from an image sensor that captures image data.

[0016] In some aspects, the methods, apparatus, and computer-readable media described above also include: obtaining metadata corresponding to image data. In such aspects, the output of one or more trained machine learning models is based on inputting the metadata and image data into one or more trained machine learning models.

[0017] In some aspects, one or more parameters of the ISP include multiple gain parameters, and multiple settings include multiple gain settings corresponding to the multiple gain parameters. In such aspects, each of the multiple gain parameters corresponds to one color channel among multiple color channels. In order to process multiple pixels of image data using the ISP, the ISP is configured to perform one or more multiplier operations on at least one pixel based on the multiple gain settings.

[0018] In some aspects, one or more parameters of the ISP include multiple offset parameters, and multiple settings include multiple offset settings corresponding to the multiple offset parameters. In such aspects, each of the multiple offset parameters corresponds to one color channel among multiple color channels, wherein in order to process multiple pixels of image data using the ISP, the ISP is configured to perform one or more addition operations on at least one pixel based on the multiple offset settings.

[0019] In some aspects, one or more parameters of the ISP include one or more grayscale coefficient parameters, and multiple settings include one or more grayscale coefficient settings corresponding to the one or more grayscale coefficient parameters. In such aspects, in order to process multiple pixels of image data using the ISP, the ISP is configured to adjust the hue of at least one pixel based on one or more grayscale coefficient settings.

[0020] In some aspects, one or more parameters of the ISP include one or more Gaussian filter parameters, and multiple settings include one or more Gaussian filter settings corresponding to the one or more Gaussian filter parameters. In such aspects, in order to process multiple pixels of image data using the ISP, the ISP is configured to apply a Gaussian filter to at least one pixel based on a Gaussian curve. The shape of the Gaussian curve is based on one or more Gaussian filter settings.

[0021] In some aspects, one or more parameters of the ISP include one or more demosaic parameters, and multiple settings include one or more demosaic settings corresponding to the one or more demosaic parameters. In such aspects, in order to process multiple pixels of image data using the ISP, the ISP is configured to demosaic at least one pixel of the image data based on one or more demosaic settings.

[0022] In some respects, one or more parameters of the ISP are associated with at least one of noise reduction, sharpening, tone mapping, and color saturation.

[0023] In some aspects, in order to process multiple pixels of image data using an ISP that has each of one or more parameters adjusted based on multiple settings, the methods, apparatuses, and computer-readable media described above further include: processing a first pixel of the multiple pixels of the image data based on a first setting of a first parameter of one or more parameters; and processing a second pixel of the multiple pixels of the image data based on a second setting of the first parameter, wherein the multiple settings include at least the first setting and the second setting.

[0024] In some aspects, in order to process multiple pixels of image data using an ISP that has each of one or more parameters adjusted based on multiple settings, the methods, apparatuses, and computer-readable media described above further include: processing a first pixel of the multiple pixels of the image data based on a first setting of a first parameter of one or more parameters; and processing the first pixel of the multiple pixels based on a second setting corresponding to a second parameter of one or more parameters, wherein the multiple settings include at least the first setting and the second setting.

[0025] In some respects, the methods, apparatus, and computer-readable media described above also include displaying an output image on one or more displays (e.g., the apparatus or other device).

[0026] In some respects, one or more trained machine learning models comprise one or more trained neural networks.

[0027] In some aspects, each of the devices or apparatuses described above is, may be part of, or may include: extended reality (XR) devices (e.g., virtual reality (VR) devices, augmented reality (AR) devices, or mixed reality (MR) devices), smart devices or assistants, vehicles, mobile devices (e.g., mobile phones or so-called "smartphones" or other mobile devices), wearable devices, personal computers, laptop computers, tablet computers, server computers, or other devices. In some aspects, the device or apparatus includes one or more image sensors (e.g., a camera) or multiple image sensors (e.g., multiple cameras) for capturing one or more images. In some aspects, the device or apparatus includes one or more displays for displaying one or more images, notifications, and / or other displayable data. In some aspects, the device or apparatus includes one or more speakers, one or more light-emitting devices, and / or one or more microphones. In some aspects, the device or apparatus described above may include one or more sensors. In some cases, one or more sensors may be used to determine the location of the device, the state of the device (e.g., tracking state, operating state, temperature, humidity level, and / or other states), and / or for other purposes.

[0028] This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to define the scope of the claimed subject matter. This summary should be understood with reference to the appropriate portions of the entire specification, any or all drawings, and each claim.

[0029] The foregoing, as well as other features and embodiments, will become more apparent from the following description, claims and accompanying drawings. Attached Figure Description

[0030] The illustrative embodiments of this application are described in detail below with reference to the following figures, wherein:

[0031] Figure 1 This is a block diagram illustrating an exemplary architecture of an image capture and processing system based on some examples;

[0032] Figure 2A This is a block diagram illustrating an exemplary architecture of an imaging system with an image signal processor (ISP) according to some examples, which is set or adjusted based on settings generated using a trained machine learning system;

[0033] Figure 2B This is a block diagram illustrating an example imaging system with de-mosaiced images, spatially varied settings, and output images, based on some examples;

[0034] Figure 3A This is a conceptual diagram illustrating an example of an input image, including multiple pixels labeled P0 to P63, based on some examples;

[0035] Figure 3B This shows the mapping corresponding to some examples. Figure 3A A conceptual diagram illustrating the spatial variation of the settings of each pixel in the input image;

[0036] Figure 4 This is a block diagram illustrating examples of neural networks that can be used by a machine learning system trained with settings that can be used by an image signal processor (ISP), based on some examples;

[0037] Figure 5 This is a block diagram illustrating an example of a neural network architecture for a trained neural network in a trained machine learning system, based on some examples.

[0038] Figure 6 This is a flowchart illustrating operations for processing image data, based on some examples; and

[0039] Figure 7 This is a diagram illustrating an example of a computing system used to implement some of the aspects described herein. Detailed Implementation

[0040] Certain aspects and embodiments of this disclosure are provided below. Some of these aspects and embodiments may be applied independently, and some may be combined, as will be apparent to those skilled in the art. Specific details are set forth in the following description for purposes of explanation to provide a thorough understanding of embodiments of this application. However, it will be apparent, however, that various embodiments may be practiced without these specific details. The accompanying drawings and descriptions are not intended to be limiting.

[0041] The following description provides only exemplary embodiments and is not intended to limit the scope, applicability, or configuration of this disclosure. Rather, the following description of exemplary embodiments is intended to provide those skilled in the art with enabling descriptions for implementing the exemplary embodiments. It should be understood that various changes may be made to the function and arrangement of elements without departing from the spirit and scope of this application as set forth in the appended claims.

[0042] A camera is a device that uses an image sensor to receive light and capture image frames (such as still images or video frames). The terms "image," "image frame," and "frame" are used interchangeably herein. A camera can be configured with various image capture and image processing settings. Different settings produce images with different appearances. Camera settings such as ISO, exposure time, aperture size, f / stop, shutter speed, focus, and gain are determined and applied before or during the capture of one or more image frames. For example, settings or parameters can be applied to the image sensor used to capture one or more image frames. Other camera settings can configure post-processing of one or more image frames, such as changes to contrast, brightness, saturation, sharpness, level, curves, or color. For example, settings or parameters can be applied to a processor (e.g., an image signal processor or ISP) used to process one or more image frames captured by the image sensor.

[0043] Image capture devices capture images by receiving light from a scene using an image sensor with an array of photodiodes. An image signal processor (ISP) then processes the raw image data captured by the photodiodes of the image sensor into an image that the user can store and view. The way the scene is depicted in the image depends in part on capture settings that control how much light the image sensor receives, such as exposure time and aperture settings. The way the scene is depicted also depends on how the ISP is adjusted to process the photodiode data captured by the image sensor into an image.

[0044] ISP processing of raw image data can include desacrifice, which combines data from different color components (e.g., red, green, blue) from a color filter array (CFA) into individual pixels each having multiple color channels (e.g., red, green, blue), thus reproducing colors from the entire color spectrum. ISPs can also perform other types of processing based on their parameters, including, for example, noise reduction, sharpening, tone mapping, and color saturation. ISPs can generate images differently from raw image data depending on how certain ISP parameters are adjusted.

[0045] Traditionally, the image capture device's ISP (Instrumentation Screen) is adjusted only once during the manufacturing process. This adjustment affects how each image is processed within the image capture device. The adjustment typically affects each pixel of every image consistently. Regardless of the scene being shot, users generally expect their image capture devices to capture high-quality images. To avoid situations where image capture devices cannot properly capture certain types of scenes, the ISP is usually adjusted to work well for as many scene types as possible. However, precisely because of this, traditional ISP adjustment is generally not the optimal choice for shooting all types of scenes.

[0046] An image processing unit (ISP) can be designed to use one or more trained machine learning (ML) models (e.g., trained neural networks (NNs) and / or other trained ML models). For example, a fully ML-based ISP can feed raw image data into one or more neural networks (or other trained ML models) that can output images that can be stored and viewed by the user. ML-based ISPs may be more customizable than pre-tuned ISPs. For example, ML-based ISPs can process different images in different ways, such as based on different scenes depicted in different images. However, because performing fast image processing requires a large number of components, a fully ML-based ISP may also require a larger silicon area than a pre-tuned ISP. A fully ML-based ISP also requires heavy computational resources because it needs to process many pixels. Therefore, a fully ML-based ISP can make the most of limited battery life and computational resources. Thus, the use of a fully ML-based ISP in devices with limited battery life and computational resources (such as mobile devices) may lead to a reduction in the already limited battery life of these devices, a slowdown in the computing power of these devices, etc. Similarly, a fully ML-based ISP may occupy a large amount of space in devices such as mobile devices where limited space is used for internal electronics. In some cases, a fully ML-based ISP may be inefficient for certain ISP tasks. For example, a convolutional neural network (CNN) may be less efficient than a pre-tuned ISP component in performing tone adjustment.

[0047] This document describes systems, apparatuses, processes (also referred to as methods), and computer-readable media (collectively, “Systems and Technologies”) for providing a synthetic (or hybrid) ISP that combines aspects of a pre-tuned ISP and aspects of a ML-based ISP. Synthetic ISPs can achieve a good balance between customization and efficiency. For example, a synthetic ISP includes a trained ML system with one or more trained ML models (e.g., one or more trained neural networks). In some cases, the trained ML system may receive image data from an image, image patches of the image (e.g., blocks or arrays comprising pixels from the image), metadata from the image capture process, or any combination thereof as input. Metadata may include, for example, automatic white balance (AWB), analog gain, digital gain, ISO, radial distance, other metadata, or any combination thereof.

[0048] A trained ML system can identify customized ISP settings for different parameters of the ISP for different images. For example, customized settings determined by a trained ML system for one or more images can be output to one or more non-ML ISP components, such as multipliers, adders, tone engines, and / or Gaussian filters. The non-ML ISP components can perform one or more processing operations on the one or more images based on the customized settings, and can provide processing effects such as noise reduction, sharpening, tone mapping, color saturation, or combinations thereof. In some cases, a trained ML system can identify customized settings for different parts of a single image, such that these settings vary spatially between images. For example, a trained ML system can generate a first customized setting of ISP parameters for a first pixel of an image, and simultaneously generate a second customized setting of the same ISP parameters for a second pixel of the same image.

[0049] In some examples, the operation and / or functionality of the ISP can be simplified. For example, the operation of the ISP may include a multiplier, an adder, a tone adjuster, and a Gaussian filter. In some examples, the trained ML system outputs different settings for various parameters of each operation. In some examples, the trained ML system may output three gain settings corresponding to the three gain parameters of the multiplier, i.e., one setting each for the red channel gain parameter, green channel gain parameter, and blue channel gain parameter. In some examples, the trained ML system outputs three offset settings for the three offset parameters of the adder, i.e., one setting each for the red channel offset parameter, green channel offset parameter, and blue channel offset parameter. In some examples, the trained ML system outputs a single grayscale coefficient value setting for the grayscale coefficient parameter of the tone adjuster, which can be shared among the red, green, and blue channels. In some examples, the trained ML system outputs four settings for the four parameters of the Gaussian filter, which can specify a Gaussian curve for a Gaussian filter (e.g., a 5×5 Gaussian filter) that can be shared among the red, green, and blue channels.

[0050] Using such synthetic ISPs offers various technical improvements over pre-adjustment ISPs and fully ML-based ISPs. For example, the synthetic ISP described herein represents a technical improvement over pre-adjustment ISPs limited to uniform settings because synthetic ISPs can provide different image-optimized settings for different images. In another example, synthetic ISPs offer a technical improvement over pre-adjustment ISPs limited to uniform settings because synthetic ISPs can provide different image-region-optimized settings for different regions of each image. In such examples, synthetic ISPs are more customizable and flexible than pre-adjustment ISPs. Synthetic ISPs also represent a technical improvement over fully ML-based ISPs because they occupy less silicon area. Additionally, synthetic ISPs offer a technical improvement over fully ML-based ISPs because they can be more efficient in terms of battery usage. Furthermore, synthetic ISPs represent a technical improvement over fully ML-based ISPs because they can be more efficient in terms of computational resource usage. Synthetic ISPs offer a further technical improvement over fully ML-based ISPs because they can efficiently perform pixel processing tasks, such as tone adjustment, that are inefficient in fully ML-based ISPs.

[0051] Various aspects of this application will be described with reference to the accompanying drawings. Figure 1This is a block diagram illustrating the architecture of an image capture and processing system 100. The image capture and processing system 100 includes various components used for capturing and processing images of a scene (e.g., an image of scene 110). The image capture and processing system 100 can capture individual images (or photographs) and / or can capture video including multiple images (or video frames) in a specific sequence. A lens 115 of the system 100 faces scene 110 and receives light from scene 110. The lens 115 bends the light toward an image sensor 130. The light received by the lens 115 passes through an aperture controlled by one or more control mechanisms 120 and is received by the image sensor 130.

[0052] One or more control mechanisms 120 may control exposure, focus, and / or zoom based on information from image sensor 130 and / or information from image processor 150. One or more control mechanisms 120 may include multiple mechanisms and components; for example, control mechanism 120 may include one or more exposure control mechanisms 125A, one or more focus control mechanisms 125B, and / or one or more zoom control mechanisms 125C. One or more control mechanisms 120 may also include additional control mechanisms besides those shown, such as control mechanisms for analog gain, flash, HDR, depth of field, and / or other image capture attributes.

[0053] The focus control mechanism 125B of the control mechanism 120 can obtain focus settings. In some examples, the focus control mechanism 125B stores the focus settings in a memory register. Based on the focus settings, the focus control mechanism 125B can adjust the position of the lens 115 relative to the position of the image sensor 130. For example, based on the focus settings, the focus control mechanism 125B can adjust the focus by actuating a motor or servo to move the lens 115 closer to or further away from the image sensor 130. In some cases, additional lenses, such as one or more microlenses above each photodiode of the image sensor 130, may be included in the system 100, each microlens bending light received from the lens 115 toward the corresponding photodiode before it reaches the photodiode. The focus settings can be determined by contrast detection autofocus (CDAF), phase detection autofocus (PDAF), or some combination thereof. The focus settings can be determined using the control mechanism 120, the image sensor 130, and / or the image processor 150. The focus settings may be referred to as image capture settings and / or image processing settings.

[0054] The exposure control mechanism 125A of the control mechanism 120 can obtain the exposure settings. In some cases, the exposure control mechanism 125A stores the exposure settings in a memory register. Based on these exposure settings, the exposure control mechanism 125A can control the aperture size (e.g., aperture size or f / stop), the duration of the aperture being open (e.g., exposure time or shutter speed), the sensitivity of the image sensor 130 (e.g., ISO speed or film speed), the analog gain applied by the image sensor 130, or any combination thereof. The exposure settings may be referred to as image capture settings and / or image processing settings.

[0055] The zoom control mechanism 125C of the control mechanism 120 can obtain zoom settings. In some examples, the zoom control mechanism 125C stores the zoom settings in a memory register. Based on the zoom settings, the zoom control mechanism 125C can control the focal length of a lens element assembly (lens assembly) including lens 115 and one or more additional lenses. For example, the zoom control mechanism 125C can control the focal length of the lens assembly by actuating one or more motors or servos to move one or more lenses relative to each other. The zoom settings may be referred to as image capture settings and / or image processing settings. In some examples, the lens assembly may include a parfocal zoom lens or a variable focal length zoom lens. In some examples, the lens assembly may include a focusing lens (which in some cases may be lens 115) that first receives light from scene 110, wherein the light then passes through a focusless zoom system between the focusing lens (e.g., lens 115) and image sensor 130 before reaching image sensor 130. In some cases, a focusless zoom system may include two positive (e.g., converging, convex) lenses with equal or similar focal lengths (e.g., within a threshold difference), with a negative (e.g., diverging, concave) lens between them. In some cases, the zoom control mechanism 125C moves one or more of the lenses in the focusless zoom system, such as one or both of the negative and positive lenses.

[0056] Image sensor 130 includes one or more arrays of photodiodes or other photosensitive elements. Each photodiode measures the amount of light that ultimately corresponds to a specific pixel in the image generated by image sensor 130. In some cases, different photodiodes may be covered by different color filters, and thus light matching the color of the color filter covering the photodiode can be measured. For example, Bayer color filters include red, blue, and green color filters, where each pixel of the image is generated based on red light data from at least one photodiode covered by the red color filter, blue light data from at least one photodiode covered by the blue color filter, and green light data from at least one photodiode covered by the green color filter. Other types of color filters may be used in place of or supplement to red, blue, and / or green color filters using yellow, magenta, and / or cyan (also known as "emerald") color filters. Some image sensors may be completely devoid of color filters, and may alternatively use different photodiodes (in some cases stacked vertically) throughout the pixel array. Different photodiodes throughout the pixel array may have different spectral sensitivity profiles, thereby responding to light of different wavelengths. Monochrome image sensors may also lack color filters and therefore lack color depth.

[0057] In some cases, image sensor 130 may alternatively or additionally include an opaque and / or reflective mask that blocks light from reaching certain photodiodes or portions of certain photodiodes at certain times and / or from certain angles, which can be used for phase detection autofocus (PDAF). Image sensor 130 may also include an analog gain amplifier for amplifying the analog signal output from the photodiodes and / or an analog-to-digital converter (ADC) for converting the analog signal output from the photodiodes (and / or the analog signal amplified by the analog gain amplifier) ​​into a digital signal. In some cases, certain components or functions discussed with respect to one or more control mechanisms 120 may alternatively or additionally be included in image sensor 130. Image sensor 130 may be a charge-coupled device (CCD) sensor, an electron-multiplying CCD (EMCCD) sensor, an active pixel sensor (APS), a complementary metal-oxide-semiconductor (CMOS), an N-type metal-oxide-semiconductor (NMOS), a hybrid CCD / CMOS sensor (e.g., sCMOS), or some other combination thereof.

[0058] The image processor 150 may include one or more processors, such as one or more image signal processors (ISPs) (including ISP 154), one or more host processors (including host processor 152), and / or one or more processors 710 of any other type discussed with respect to computing device 700. Host processor 152 may be a digital signal processor (DSP) and / or other types of processor. In some implementations, the image processor 150 is a single integrated circuit or chip (e.g., referred to as a system-on-a-chip or SoC) including host processor 152 and ISP 154. In some cases, the chip may also include one or more input / output ports (e.g., input / output (I / O) port 156), a central processing unit (CPU), a graphics processing unit (GPU), a broadband modem (e.g., 3G, 4G, or LTE, 5G, etc.), memory, and connectivity components (e.g., Bluetooth). TM The I / O port 156 may include any suitable input / output port or interface according to one or more protocols or specifications, such as Inter-Integrated Circuit 2 (I2C) interface, Inter-Integrated Circuit 3 (I3C) interface, Serial Peripheral Interface (SPI) interface, Serial General Purpose Input / Output (GPIO) interface, Mobile Industrial Processor Interface (MIPI) (such as MIPI CSI-2 physical (PHY) layer port or interface, Advanced High Performance Bus (AHB) bus, any combination thereof and / or other input / output ports). In an illustrative example, the host processor 152 may communicate with the image sensor 130 using the I2C port, and the ISP 154 may communicate with the image sensor 130 using the MIPI port.

[0059] Image processor 150 can perform multiple tasks, such as demosaicing, color space conversion, image frame downsampling, pixel interpolation, automatic exposure (AE) control, automatic gain control (AGC), CDAF, PDAF, automatic white balance, merging image frames to form an HDR image, image recognition, object recognition, feature recognition, receiving input, managing output, managing memory, or a combination thereof. Image processor 150 can store image frames and / or processed images in random access memory (RAM) 140 / 5020, read-only memory (ROM) 145 / 5025, cache, memory unit, another storage device, or a combination thereof.

[0060] Various input / output (I / O) devices 160 may be connected to the image processor 150. I / O devices 160 may include a display screen, keyboard, keypad, touchscreen, touchpad, touch-sensitive surface, printer, any other output device 735, any other input device 745, or some combination thereof. In some cases, text may be input to the image processing device 105B via the physical keyboard or keypad of the I / O device 160, or via a virtual keyboard or keypad on the touchscreen of the I / O device 160. I / O 160 may include one or more ports, jacks, or other connectors that enable wired connections between the system 100 and one or more peripheral devices, through which the system 100 may receive data from and / or transmit data to one or more peripheral devices. I / O 160 may include one or more wireless transceivers that enable wireless connections between the system 100 and one or more peripheral devices, through which the system 100 may receive data from and / or transmit data to one or more peripheral devices. Peripheral devices may include any type of I / O device 160 discussed earlier, and they can be considered I / O devices 160 in themselves once they are coupled to ports, jacks, wireless transceivers or other wired and / or wireless connectors.

[0061] In some cases, the image capture and processing system 100 may be a single device. In other cases, the image capture and processing system 100 may be two or more independent devices, including an image capture device 105A (e.g., a camera) and an image processing device 105B (e.g., a computing device coupled to the camera). In some implementations, the image capture device 105A and the image processing device 105B may be coupled together, for example, via one or more wires, cables, or other electrical connectors, and / or wirelessly coupled together via one or more wireless transceivers. In some implementations, the image capture device 105A and the image processing device 105B may be disconnected from each other.

[0062] like Figure 1 The vertical dashed line shown will Figure 1 The image capture and processing system 100 is divided into two parts, namely an image capture device 105A and an image processing device 105B. The image capture device 105A includes a lens 115, a control mechanism 120, and an image sensor 130. The image processing device 105B includes an image processor 150 (including an ISP 154 and a host processor 152), RAM 140, ROM 145, and I / O 160. In some cases, certain components shown in the image capture device 105A (such as the ISP 154 and / or the host processor 152) may be included in the image capture device 105A.

[0063] Image capture and processing system 100 may include electronic devices such as mobile or landline handsets (e.g., smartphones, cellular phones, etc.), desktop computers, laptops or notebook computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, Internet Protocol (IP) cameras, or any other suitable electronic devices. In some examples, image capture and processing system 100 may include one or more wireless transceivers for wireless communication, such as cellular network communication, 802.11 Wi-Fi communication, wireless local area network (WLAN) communication, or some combination thereof. In some implementations, image capture device 105A and image processing device 105B may be different devices. For example, image capture device 105A may include a camera device, and image processing device 105B may include a computing device, such as a mobile handset, desktop computer, or other computing device.

[0064] Although the image capture and processing system 100 is shown to include certain components, those skilled in the art will appreciate that the image capture and processing system 100 may include more than [other components]. Figure 1 The components shown may have more or fewer components. The components of the image capture and processing system 100 may include software, hardware, or one or more combinations of software and hardware. For example, in some implementations, the components of the image capture and processing system 100 may include and / or may be implemented using electronic circuitry or other electronic hardware (which may include one or more programmable electronic circuits (e.g., microprocessors, GPUs, DSPs, CPUs, and / or other suitable electronic circuits)), and / or may include and / or may be implemented using computer software, firmware, or any combination thereof to perform the various operations described herein. The software and / or firmware may include one or more instructions stored on a computer-readable storage medium and executable by one or more processors of an electronic device implementing the image capture and processing system 100.

[0065] This document describes a system, apparatus, process, and computer-readable medium for an imaging system 200 having a dedicated synthesis variant of an ISP 154, which includes an ISP 220 and a trained machine learning system 210. The trained machine learning system 210 generates custom settings 275 based on image data and provides these settings 275 to various operators of the ISP 220 (e.g., multiplier 230, adder 240, tone adjuster 250, and Gaussian filter 260). The settings 275 set values ​​for certain parameters of the operators of the ISP 220 that control how the operators of the ISP 220 are applied to the image data.

[0066] Figure 2AThis is a block diagram illustrating an exemplary architecture of an imaging system 200A with an image signal processor (ISP) 220, which is set or adjusted according to settings 275 generated using a trained machine learning system 210, according to some examples. The imaging system 200 includes an image capture device 202 that captures raw image data 205. The image capture device 202 may be an example of an image capture device 105A. The image capture device 202 may include, for example, an image sensor 130 and / or one or more control mechanisms 120. A representation of the raw image data 205 is depicted, comprising pixels each having a single color component (e.g., red, green, or blue), where each color component corresponds to a color component of a color filter array (CFA) of the image capture device 202. The image capture device 202 also obtains metadata 270 associated with the raw image data 205 and / or associated with the capture of the raw image data 205. Metadata 270 can identify, for example, the time and / or date when the image capture device 202 captures raw image data 205, as well as various image capture settings of the image sensor 130 and / or control mechanism 120, such as aperture size, shutter speed, exposure time, ISO, zoom, focus, analog gain, digital gain, automatic white balance (AWB) gain, radial distance from the image center (which can identify radial lens distortion) or combinations thereof.

[0067] Imaging system 200A includes an ISP 220 with a demosaic engine 225. The demosaic engine 225 can receive raw image data 205 as input and can demosaic the raw image data 205 to generate demosaiced image data. To demosaic the raw image data 205, the demosaic engine 225 can merge color information from multiple adjacent pixels representing different color components from the raw image data 205 into a single pixel. Each pixel of the demosaiced image data can include different color channels corresponding to the different color components of the raw image data 205.

[0068] The ISP 220 of the imaging system 200A also includes four processors that modify the demosaiced image data to ultimately generate the output image 280A. Figure 2AIn the example, the output image depicts a building with a lighthouse. The four arithmetic units include a multiplier 230, an adder 240, a tone adjuster 250, and a Gaussian filter 260. The multiplier 230 can modify the gain in the image data. In some examples, the multiplier 230 may include three gain parameters, each of which can be individually set to a corresponding setting to affect how the multiplier 230 modifies the gain in the image data. In some examples, the three gain parameters for the multiplier 230 may each correspond to one of the color components of the CFA of the image capture device 202, for example, including a red channel gain parameter controlling the gain adjustment for the red channel of the image data, a green channel gain parameter controlling the gain adjustment for the green channel of the image data, and a blue channel gain parameter controlling the gain adjustment for the blue channel of the image data.

[0069] Adder 240 can modify one or more offsets in the image data. In some examples, adder 240 may include three offset parameters, each of which can be individually set to influence how adder 240 modifies the offsets in the image data. In some examples, the three offset parameters for adder 240 may each correspond to one of the color components of the CFA of image capture device 202, such as a red channel offset parameter controlling the offset adjustment for the red channel of the image data, a green channel offset parameter controlling the offset adjustment for the green channel of the image data, and a blue channel offset parameter controlling the offset adjustment for the blue channel of the image data. In some examples, the offset may be referred to as a notch or bias.

[0070] The tone adjuster 250 can modify the tone in image data. In some examples, the tone adjuster 250 may include a single grayscale coefficient parameter that controls tone adjustment across all color channels (e.g., across the red, green, and blue channels).

[0071] A Gaussian filter 260 can apply a Gaussian blur (also known as Gaussian smoothing) to a region of image data. In some examples, the Gaussian filter 260 may include four Gaussian filter parameters that can define the shape of the Gaussian curve, which controls how the Gaussian filter 260 applies the Gaussian blur to the region of image data across all color channels (e.g., across the red, green, and blue channels). In some examples, the Gaussian filter 260 may apply a 5×5 Gaussian filter. In other examples, Gaussian filters of other sizes may be used. The four parameters of the 2D Gaussian filter may include: two eigenvalues ​​of the Gaussian covariance matrix, and an angle x for the directions of the parameterized eigenvectors [cos x, sin x] and [sin x, -cos x]. The fourth term is a sharpening factor, denoted as y. The Gaussian filter provides smoothing and therefore sharpening can be achieved by performing the following functions:

[0072] I=I+y(I-smooth)

[0073] Formula (1)

[0074] Where y is the sharpening factor, I is the original image, and "smoothing" is a Gaussian smoothed version of I. I-smoothing represents high-frequency information. Therefore, using formula (1), sharpening can be achieved by multiplying I-smoothing by the sharpening factor y and then adding it back to the original image I.

[0075] Although Figure 2A The examples include multiplier 230, adder 240, tone adjuster 250 and Gaussian filter 260, but in other examples, fewer or additional arithmetic units may be included in the ISP 220 without departing from the scope of the systems and techniques described herein.

[0076] Imaging system 200A can input image data into trained machine learning system 210. The image data input into trained machine learning system 210 can be raw image data 205 or de-mosaiced image data generated by de-mosaic engine 225 using raw image data 205. In some examples, imaging system 200A can also input at least some of the metadata 270 into trained machine learning system 210. Trained machine learning system 210 can include one or more trained convolutional neural networks (CNNs), one or more CNNs, one or more trained neural networks (NNs), one or more NNs, one or more trained support vector machines (SVMs), one or more SVMs, one or more trained random forests, one or more random forests, one or more trained decision trees, one or more decision trees, one or more trained gradient boosting algorithms, one or more gradient boosting algorithms, one or more trained regression algorithms, one or more regression algorithms, or combinations thereof. In some examples, trained machine learning system 210 includes one or more trained machine learning models 215. In some examples, the trained machine learning model 215 includes one or more trained neural networks (NNs). Examples of one or more trained NNs include those described herein. Figure 4 Neural network 400 and Figure 5 The neural network 520.

[0077] The trained machine learning system 210 can output settings 275 for each parameter in the arithmetic unit of the ISP 220 based on inputting image data (and in some cases, metadata 270). Each of the settings 275 generated by the trained machine learning system 210 can correspond to a parameter in one of the arithmetic units of the ISP 220. For example, the settings 275 generated by the trained machine learning system 210 may include settings for each of the red channel gain parameter, green channel gain parameter, and blue channel gain parameter of the multiplier 230. Specifically, settings 275 may include gain settings 235A for the red channel gain parameter, gain settings 235B for the green channel gain parameter, and gain settings 235C for the blue channel gain parameter. The settings 275 generated by the trained machine learning system 210 may include settings for each of the red channel offset parameter, green channel offset parameter, and blue channel offset parameter of the multiplier 230. Specifically, setting 275 may include offset setting 245A for the red channel offset parameter, offset setting 245B for the green channel offset parameter, and offset setting 245C for the blue channel offset parameter. Setting 275 generated by the trained machine learning system 210 may include grayscale coefficient setting 255 for the grayscale coefficient parameter. Setting 275 generated by the trained machine learning system 210 may include settings for each of four Gaussian filter parameters (which may define the shape of the Gaussian curve). Specifically, setting 275 may include a first Gaussian filter setting 265A (e.g., a first eigenvalue of the Gaussian covariance matrix), a second Gaussian filter setting 265B (e.g., a second eigenvalue of the Gaussian covariance matrix), a third Gaussian filter setting 265C (e.g., an angle x for parameterizing the direction of the eigenvector), and a fourth Gaussian filter setting 265D (e.g., a sharpening factor y).

[0078] Imaging system 200A feeds setting 275 to the corresponding arithmetic unit of ISP 220 to control the adjustment of image data to generate output image 280A. For example, multiplier 230 adjusts the gain of image data according to the red channel gain parameter set to red channel gain setting 235A, the green channel gain parameter set to green channel gain setting 235B, and the blue channel gain parameter set to blue channel gain setting 235C. Adder 240 adjusts the offset of image data according to the red channel offset parameter set to red channel offset setting 245A, the green channel offset parameter set to green channel offset setting 245B, and the blue channel offset parameter set to blue channel offset setting 245C. Tone adjuster 250 adjusts the tone of image data according to the grayscale coefficient parameter set to grayscale coefficient setting 255. Gaussian filter 260 applies a 5×5 Gaussian blur to the image data according to the Gaussian filter parameters set to the first Gaussian filter setting 265A, the second Gaussian filter setting 265B, the third Gaussian filter setting 265C, and the fourth Gaussian filter setting 265D.

[0079] In some examples, the demosaic engine 225 may also include one or more demosaic parameters that control the demosaicing of the original image data 205, and settings 275 generated by the trained machine learning system 210 (based on inputting the original image data 205 and / or metadata 270 into the trained machine learning system 210) may include one or more settings for one or more demosaic parameters of the demosaic engine 225.

[0080] In some examples, the trained machine learning system 210 may include a separate trained machine learning model 215 for generating each of the settings 275. In some examples, one of the trained machine learning models 215 may generate multiple related settings of setting 275. For example, one of the trained machine learning models 215 may generate at least one subset of three gain settings 235A to 235C, one of the trained machine learning models 215 may generate at least one subset of three offset settings 245A to 245C, one of the trained machine learning models 215 may generate at least one subset of four Gaussian filter settings 265A to 265D, and / or one of the trained machine learning models 215 may generate at least one subset of a demosaic setting (not shown). In some examples, one of the trained machine learning models 215 may generate multiple unrelated settings of setting 275. For example, one of the trained machine learning models 215 can generate one or more of gain settings 235A to 235C, one or more of offset settings 245A to 245C, grayscale coefficient setting 255, one or more of Gaussian filter settings 265A to 265D, one or more of demosaic settings (not shown), or a combination thereof.

[0081] In some examples, setting 275 can vary spatially between original image data 205, between de-mosaiced image data, and / or between output images 280A. In some examples, imaging system 200A can generate different settings 275 for different regions of image data (e.g., original image data 205 or de-mosaiced image data). In some examples, imaging system 200A can generate different settings 275 for each pixel of image data (e.g., original image data 205 or de-mosaiced image data).

[0082] In some examples, ISP 220 may not be used as... Figure 2A Some of the arithmetic units shown as part of the ISP 220. For example, the ISP 220 may not have a multiplier 230, an adder 240, a tone adjuster 250, and / or a Gaussian filter 260. In some examples, they may not be present as... Figure 2A Some parameters are shown as part of the ISP 220. In some examples, they can be used as... Figure 2A The ISP 220 demonstrates certain parameter combinations, such as those applied to multiple color channels instead of just one (e.g., gain parameters or offset parameters). In some examples, these can be used as... Figure 2AA portion of the ISP 220 shows certain parameter segmentation applied to multiple color channels to include separate parameters for each color channel (e.g., grayscale coefficient parameters and / or Gaussian filter parameters).

[0083] In some examples, it can be used as Figure 2A A portion of the ISP 220 shows the arithmetic unit and / or other arithmetic units rearranged to be compatible with... Figure 2A and Figure 2B The order shown is different. For example, adder 240 can precede multiplier 230, tone adjuster 250, and / or Gaussian filter 260. Tone adjuster 250 can precede multiplier 230, adder 240, and / or Gaussian filter 260. Gaussian filter 260 can precede multiplier 230, adder 240, and / or tone adjuster 250.

[0084] Figure 2B This is a block diagram illustrating an example imaging system 200B with a de-mosaic image 285, spatially varied settings 295, and an output image 280B, according to some examples. Figure 2B The imaging system 200B is similar to the imaging system 200A. Figure 2B The output image 280B and the de-mosaiced image 285 show a building with large, segmented windows. The output image 280B appears sharper and clearer than the de-mosaiced image 285 in certain areas (e.g., including slightly darker areas of sky and ground). Furthermore, the colors in the output image 280B are more natural and can be quite different from those in the de-mosaiced image 285, which represents the raw pixels captured by the sensor (e.g., pixels not visible in grayscale images). As used herein, the term "de-mosaiced image data" may refer to the de-mosaiced image 285 or a portion thereof. The de-mosaiced image 285 may be output by the de-mosaicing engine 225 to a trained machine learning system 210. The de-mosaiced image 285 may be output by the de-mosaicing engine 225 to other processors of the ISP 220 (e.g., to multiplier 230, adder 240, tone adjuster 250, and / or Gaussian filter 260). An example of setting 275 that varies spatially with setting 295 is shown in... Figure 2BThe diagram illustrates and includes four adjustment plots. Each of the four adjustment plots of spatially varied setting 295 includes a different setting for each pixel of the image data (original image data 205, demosaic image 285, and / or output image 280B). The values ​​of the different settings are shown as different shades of gray in spatially varied setting 295. In some examples, a darker gray shade in spatially varied setting 295 may correspond to a higher value of a single setting, while a lighter gray shade in spatially varied setting 295 may correspond to a lower value of a single setting. In some examples, a darker gray shade in spatially varied setting 295 may correspond to a lower value of a single setting, while a lighter gray shade in spatially varied setting 295 may correspond to a higher value of a single setting.

[0085] Figure 3A This is a conceptual diagram illustrating an example of an input image 302 comprising multiple pixels labeled P0 to P63, based on some examples. The input image is 7 pixels wide and 7 pixels high. These pixels are numbered sequentially from left to right from P0 to P63 within each row, counting one by one from the top row towards the bottom row.

[0086] Figure 3B This shows the mapping corresponding to some examples. Figure 3A This is a conceptual diagram of an example of an adjustment diagram 303 showing the spatially varying settings of each pixel in the input image 302. The spatially varying settings include multiple values ​​labeled V0 to V63. The spatially varying settings are shown as an adjustment diagram 303 with a width of 7 units (pixels) and a height of 7 units (pixels). These units are numbered sequentially from left to right from V0 to V63 within each row, counting one by one from the top row towards the bottom row.

[0087] Each value in each cell of adjustment diagram 303 corresponds to a pixel in input image 302. For example, value V0 in adjustment diagram 303 corresponds to pixel P0 in input image 302. The values ​​in adjustment diagram 303 are used to adjust or modify their corresponding pixels in input image 302 based on one of the parameters of the arithmetic unit of ISP 220. For example, adjusting each value in Figure 303 can indicate the gain setting 235A for the red channel gain parameter, the gain setting 235B for the green channel gain parameter, the gain setting 235C for the blue channel gain parameter, the offset setting 245A for the red channel offset parameter, the offset setting 245B for the green channel offset parameter, the offset setting 245C for the blue channel offset parameter, the grayscale coefficient setting 255 for the grayscale coefficient parameter, the first Gaussian filter setting 265A for the first Gaussian filter parameter, the second Gaussian filter setting 265B for the second Gaussian filter parameter, the third Gaussian filter setting 265C for the third Gaussian filter parameter, the fourth Gaussian filter setting 265D for the fourth Gaussian filter parameter, one or more demosaic parameters, or combinations thereof. In some examples, adjusting each value in Figure 303 indicates the intensity or amount of image processing functionality to be applied to the corresponding pixel. For example, a first value of V0 in adjustment diagram 303 (e.g., value 0, value 1, or other value) can indicate that the image processing function of the adjustment diagram will be applied to the corresponding pixel P0 in the input image 302 with zero intensity (it will not be applied at all). In another example, a second value of V15 in adjustment diagram 303 (e.g., value 0, value 1, or other value) indicates that the image processing function will be applied to the corresponding pixel P15 in the input image 302 with maximum intensity (the maximum amount of image processing function). Values ​​in different types of adjustment diagrams can indicate different levels of applicability of the corresponding image processing function.

[0088] The processor of ISP 220 can apply settings to the input image 302 based on the adjustment graph 300. The adjustment graph 303 can be generated by a trained machine learning system 210 based on image data (raw image data 205 or demosaic image 285) and / or metadata 270.

[0089] Figure 4This is a block diagram illustrating examples of neural networks that can be used by a machine learning system trained to generate a setup for use by an image signal processor (ISP), according to some examples. Neural network 400 can include any type of deep network, such as convolutional neural networks (CNNs), autoencoders, deep belief networks (DBNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and / or other types of neural networks. Neural network 400 can be an example of one or more trained machine learning models 215 (e.g., one or more trained neural networks or other machine learning models) of a trained machine learning system 210.

[0090] The input layer 410 of the neural network 400 includes input data. The input data of the input layer 410 may include data representing pixels of an input image frame. In one illustrative example, the input data of the input layer 410 may include data representing pixels of image data (e.g., raw image data 205 and / or de-mosaiced image 285 and / or input image 302) and / or metadata corresponding to the image data (e.g., metadata 270). In one illustrative example, the input data of the input layer 410 may include raw image data 205 and / or metadata 270. In another illustrative example, the input data of the input layer 410 may include de-mosaiced image 285 and / or metadata 270. The image may include image data from an image sensor, including raw pixel data (including single color per pixel based on, for example, a Bayer color filter) or processed pixel values ​​(e.g., RGB pixels of an RGB image). The neural network 400 includes multiple hidden layers 412a, 412b to 412n. Hidden layers 412a, 412b through 412n comprise “n” hidden layers, where “n” is an integer greater than or equal to one. The number of hidden layers can be as many as required for a given application. The neural network 400 also includes an output layer 414, which provides the output produced by the processing performed by the hidden layers 412a, 412b through 412n. In some examples, the output layer 414 may provide one or more settings, such as any of settings 275. In one illustrative example, the output layer 414 may provide one or more adjustment plots 303, as in the four exemplary adjustment plots of the spatially varied setting 295.

[0091] Neural network 400 is a multi-layer neural network of interconnected filters. Each filter can be trained to learn features representing the input data. In some cases, neural network 400 may include a feedforward network, in which there is no feedback connection that feeds the network's output back to itself. In some cases, network 400 may include a recurrent neural network, which may have loops that allow information to be carried across nodes as input is read.

[0092] In some cases, information can be exchanged between layers via node-to-node interconnects. In some cases, the network may include a convolutional neural network, which may not link every node in one layer to every other node in the next layer. In a network in which information is exchanged between layers, nodes in input layer 410 can activate a set of nodes in the first hidden layer 412a. For example, as shown, each input node of input layer 410 can be connected to each node in the first hidden layer 412a. Nodes in the hidden layer can transform information by applying an activation function (e.g., a filter) to the information of each input node. The information derived from this transformation can then be passed to nodes in the next hidden layer 412b and these nodes can be activated, thereby performing their own specified functions. Exemplary functions include convolution, scaling, scaling, data transformation, and / or any other suitable functions. The output of hidden layer 412b can then activate nodes in the next hidden layer, and so on. Finally, the output of hidden layer 412n can activate one or more nodes in output layer 414, which provide the processed output image. In some cases, although a node in neural network 400 (e.g., node 416) is shown as having multiple output lines, the node has a single output and all lines shown from the node's output represent the same output value.

[0093] In some cases, each node or the interconnection between nodes can have weights, which are a set of parameters derived from the training of the neural network 400. For example, the interconnection between nodes can represent a piece of information about what the interconnected nodes have learned. The interconnections can have adjustable numerical weights that can be set (e.g., based on the training dataset), allowing the neural network 400 to adapt to the input and learn as more and more data is processed.

[0094] The neural network 400 is pre-trained to process features from the data in the input layer 410 using different hidden layers 412a, 412b to 412n in order to provide output through the output layer 414.

[0095] Figure 5This is a block diagram illustrating an example of a neural network architecture 500 of a trained neural network 520 of a trained machine learning system 210, according to some examples. The trained neural network 520 may be an example of one or more trained neural networks 215 of the trained machine learning system 210. The neural network architecture 500 receives an input image 505 and metadata 510 as its input. The input image 505 may include raw image data, such as raw image data 205. The raw image data may correspond to the entire image or image patches representing regions of the entire image. The input image 505 may include de-mosaiced image data, such as a de-mosaiced image 285 or image patches representing regions of the de-mosaiced image 285. An example of metadata 510 may include metadata 270.

[0096] The trained neural network 520 outputs settings 515. The trained neural network 520 can output settings 515 in the form of one or more adjustment graphs, as described in... Figure 3B Adjustment diagram 303 or Figure 2B The settings 295 vary spatially, as shown in the four exemplary adjustment diagrams. The trained neural network 520 can output settings 515 as a single value that can be listed in a list, matrix, and / or grid.

[0097] Key 530 identifies different NN operations performed by the trained NN 520 to generate settings 515 based on the input image 505 and / or metadata 510. For example, a convolution with a 3×3 filter and a stride of 1 is indicated by a thick white arrow with a black outline pointing to the right. A convolution with a 2×2 filter and a stride of 2 is indicated by a thick black arrow pointing downwards. Upsampling (e.g., bilinear upsampling) is indicated by a thick black arrow pointing upwards.

[0098] Figure 6This is a flowchart illustrating the operation of process 600 for processing image data. The operation of process 600 can be performed by an imaging system. In some examples, the imaging system performing the operation can be imaging system 200A. In some examples, the imaging system performing the operation can be imaging system 200B. In some examples, the imaging system performing the operation of process 600 can include, for example, one or more means for performing the operation, which can include image capture and processing system 100, image capture device 105A, image processing device 105B, image processor 150, ISP 154, host processor 152, imaging system 200A, imaging system 200B, ISP 220, trained machine learning system 210, image capture device 202, demosaic engine 225, multiplier 230, adder 240, tone adjuster 250, Gaussian filter 260, neural network 400, neural network architecture 500, trained neural network 520, computing system 700, or combinations thereof.

[0099] At operation 605, process 600 includes obtaining image data associated with an image frame. In some cases, the image data includes de-mosaiced image data. For example, in some examples, to generate an output image, process 600 may include de-mosaicing the image data before processing multiple pixels of the image data using an ISP that has been adjusted based on multiple settings using one or more parameters. In some examples, process 600 may include de-mosaicing the image data before inputting the image data into one or more trained machine learning models. In some cases, the image data is raw image data having multiple color components corresponding to a color filter array (e.g., a Bayer color filter) of an image sensor. For example, inputting the image data into one or more trained machine learning models may include inputting the raw image data into one or more trained machine learning models. An example of raw image data 205 is shown in... Figure 2A and Figure 2B As shown and discussed above. In some aspects, to obtain image data, process 600 may include an image sensor (e.g., [image sensor name missing]) from which image data (e.g., raw image data) is captured. Figure 1 The image sensor 130 receives image data.

[0100] In some examples, process 600 may include obtaining metadata corresponding to the image data. In such examples, the output of one or more trained machine learning models is based on inputting the metadata and image data into one or more trained machine learning models. In some cases, the metadata may include automatic white balance (AWB), analog gain, digital gain, ISO, radial distance, other metadata, any combination thereof, and / or other information.

[0101] At operation 610, process 600 includes determining multiple settings for adjusting one or more parameters of an image signal processor (ISP) based on the output of one or more trained machine learning models that use image data as input. In some aspects, the one or more trained machine learning models include one or more trained neural networks. The multiple settings vary spatially among the image data. In some examples, one or more parameters of the ISP are associated with at least one of noise reduction, sharpening, tone mapping, color saturation, any combination thereof, and / or other parameters.

[0102] At operation 615, process 600 includes generating an output image by at least partially processing a plurality of pixels of image data using an ISP. Each pixel of the plurality of pixels is processed using a corresponding setting of a plurality of settings for adjusting a corresponding parameter of one or more parameters. As described above, the plurality of settings vary spatially among the image data, in which case different settings from the plurality of settings may be applied to the same pixel of the image data (e.g., a single frame or image) and / or to different pixels of the image data (e.g., the same single frame or image). For example, in order to process a plurality of pixels of image data using an ISP having each of the one or more parameters adjusted based on the plurality of settings, process 600 may include processing a first pixel of the plurality of pixels of image data using a first setting (from the plurality of settings that vary spatially among the image data) based on a first parameter of the one or more parameters. Process 600 may also include processing a second pixel of the plurality of pixels of image data using a second setting (from the plurality of settings) based on the first parameter of the first parameter. In another example, in order to process a plurality of pixels of image data using an ISP having each of the one or more parameters adjusted based on the plurality of settings, process 600 may include processing a first pixel of the plurality of pixels of image data using a first setting (from the plurality of settings) based on a first parameter of the one or more parameters. Process 600 may also include processing the first pixel of a plurality of pixels based on a second setting (from a plurality of settings) corresponding to a second parameter of one or more parameters.

[0103] In some examples, one or more parameters of the ISP include multiple gain parameters, and multiple settings include multiple gain settings corresponding to the multiple gain parameters. In such examples, each of the multiple gain parameters may correspond to one color channel among multiple color channels. Process 600 can process multiple pixels of image data using the ISP, at least in part, by performing one or more multiplier operations on at least one pixel based on the multiple gain settings. Multiplier operations can be performed on at least one pixel using an ISP with multiple gain settings.

[0104] In some examples, one or more parameters of the ISP include multiple offset parameters, and multiple settings include multiple offset settings corresponding to the multiple offset parameters. In such examples, each of the multiple offset parameters may correspond to one color channel among multiple color channels. Process 600 can process multiple pixels of image data using the ISP, at least in part, by performing one or more addition operations on at least one pixel based on the multiple offset settings. One or more addition operations can be performed on at least one pixel using an ISP with multiple offset settings.

[0105] In some examples, one or more parameters of the ISP include one or more grayscale coefficient parameters, and multiple settings include one or more grayscale coefficient settings corresponding to the one or more grayscale coefficient parameters. Process 600 can process multiple pixels of image data using the ISP at least in part by adjusting the hue of at least one pixel based on one or more grayscale coefficient settings. An ISP with one or more grayscale coefficient settings can be used to adjust the hue of at least one pixel.

[0106] In some examples, one or more parameters of the ISP include one or more Gaussian filter parameters, and multiple settings include one or more Gaussian filter settings corresponding to the one or more Gaussian filter parameters. Process 600 can process multiple pixels of image data using the ISP, at least in part, by applying a Gaussian filter to at least one pixel based on a Gaussian curve. The shape of the Gaussian curve is based on one or more Gaussian filter settings.

[0107] In some examples, one or more parameters of the ISP include one or more demosaic parameters, and multiple settings include one or more demosaic settings corresponding to the one or more demosaic parameters. Process 600 can process multiple pixels of image data using the ISP, at least in part, by demosaicing at least one pixel of the image data based on one or more demosaic settings.

[0108] In some examples, process 600 may include displaying an output image on one or more displays (e.g., the apparatus or device performing process 600, an external device located outside the apparatus or device performing process 600, etc.). In some examples, process 600 may include storing the output image in a storage device (e.g., the apparatus or device performing process 600, a remote storage device located remotely relative to the apparatus or device performing process 600 and accessible via a wired or wireless network, etc.).

[0109] In some examples, the processes described herein (e.g., process 600 and / or other processes described herein) may be performed by a computing device or apparatus. In some examples, process 600 may be performed by imaging system 200A and / or imaging system 200B. In some examples, process 600 may be performed by a device having Figure 7 The computing device of the computing system 700 shown in the figure performs the operation. For example, it has... Figure 7 The computing device of the computing system 700 shown may include at least some components of the imaging system 200A and / or the imaging system 200B, and / or may implement Figure 6 The process involves 600 operations.

[0110] Computing devices may include any suitable device, such as mobile devices (e.g., mobile phones), desktop computing devices, tablet computing devices, wearable devices (e.g., VR headsets, AR headsets, AR glasses, network-connected watches or smartwatches, or other wearable devices), server computers, autonomous vehicles or computing devices of autonomous vehicles, robotic devices, televisions, and / or any other computing device with the resource capability to perform the processes described herein (including process 600). In some cases, a computing device or apparatus may include various components, such as one or more input devices, one or more output devices, one or more processors, one or more microprocessors, one or more microcomputers, one or more cameras, one or more sensors, and / or other components configured to perform the steps of the processes described herein. In some examples, a computing device may include a display, a network interface configured to transmit and / or receive data, any combination thereof, and / or other components. The network interface may be configured to transmit and / or receive Internet Protocol (IP) based data or other types of data.

[0111] Components of a computing device can be implemented in circuitry. For example, a component may include electronic circuitry or other electronic hardware, and / or may be implemented using electronic circuitry or other electronic hardware, which may include one or more programmable electronic circuits (e.g., a microprocessor, graphics processing unit (GPU), digital signal processor (DSP), central processing unit (CPU), and / or other suitable electronic circuitry), and / or a component may include computer software, firmware, or a combination thereof for performing the various operations described herein, and / or may be implemented using computer software, firmware, or a combination thereof for performing the various operations described herein.

[0112] Process 600 is shown as a logic flowchart, the operations of which represent a sequence of operations that can be implemented by hardware, computer instructions, or a combination thereof. In the context of computer instructions, each operation represents a computer-executable instruction stored on one or more computer-readable storage media that, when executed by one or more processors, performs the described operation. Generally, computer-executable instructions include routines, programs, objects, components, data structures, etc., that perform a specific function or implement a specific data type. The order in which the operations are described is not intended to be construed as limiting, and any number of described operations can be combined in any order and / or in parallel to implement the process.

[0113] Furthermore, process 600 and / or other processes described herein may be executed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) that executes jointly on one or more processors, implemented in hardware, or a combination thereof. As described above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising multiple instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.

[0114] Figure 7 This is a diagram illustrating an example of a system used to implement certain aspects of this technology. Specifically, Figure 7 An example of a computing system 700 is shown, which can be any computing device constituting, for example, an internal computing system, a remote computing system, a camera, or any component thereof, wherein the components of the system communicate with each other using connection 705. Connection 705 can be a physical connection using a bus, or a direct connection to processor 710, such as in a chipset architecture. Connection 705 can also be a virtual connection, a networking connection, or a logical connection.

[0115] In some embodiments, computing system 700 is a distributed system, wherein the functions described herein may be distributed across a data center, multiple data centers, a peer-to-peer network, etc. In some embodiments, one or more of the described system components represent a number of such components that each perform one or more of the functions described for use by the component. In some embodiments, the components may be physical devices or virtual devices.

[0116] An exemplary system 700 includes at least one processing unit (CPU or processor) 710 and a connection 705 that couples various system components, including system memory 715 (such as read-only memory (ROM) 720 and random access memory (RAM) 725), to the processor 710. The computing system 700 may include a cache 712 of high-speed memory that is directly connected to, adjacent to, or integrated into the processor 710.

[0117] Processor 710 may include any general-purpose processor and hardware or software services, such as services 732, 734, and 736 stored in storage device 730, which are configured to control processor 710 as well as dedicated processors, in which software instructions are incorporated into the actual processor design. Processor 710 can be a largely independent computing system, containing multiple cores or processors, buses, memory controllers, caches, etc. Multi-core processors can be symmetric or asymmetric.

[0118] To enable user interaction, the computing system 700 includes an input device 745 that can represent any number of input mechanisms, such as a microphone for voice, a touch-sensitive screen for gesture or graphical input, a keyboard, a mouse, motion input, voice input, etc. The computing system 700 may also include an output device 735, which can be one or more of many output mechanisms. In some cases, a multi-mode system allows the user to provide multiple types of input / output to communicate with the computing system 700. The computing system 700 may include a communication interface 740, which typically controls and manages user input and system output. The communication interface can perform or facilitate the reception and / or transmission of wired or wireless communications using wired and / or wireless transceivers, including utilizing audio jacks / plugs, microphone jacks / plugs, Universal Serial Bus (USB) ports / plugs, etc. Ports / plugs, Ethernet ports / plugs, fiber optic ports / plugs, dedicated wired ports / plugs Wireless signal transmission Low Energy (BLE) wireless signal transmission Wireless signal transmission, including radio frequency identification (RFID) wireless signal transmission, near field communication (NFC) wireless signal transmission, dedicated short range communication (DSRC) wireless signal transmission, 802.11 Wi-Fi wireless signal transmission, wireless local area network (WLAN) signal transmission, visible light communication (VLC), microwave access global interoperability (WiMAX), infrared (IR) wireless signal transmission, public switched telephone network (PSTN) signal transmission, integrated services digital network (ISDN) signal transmission, 3G / 4G / 5G / LTE cellular data network wireless signal transmission, ad hoc network signal transmission, radio wave signal transmission, microwave signal transmission, infrared signal transmission, visible light signal transmission, ultraviolet light signal transmission, wireless signal transmission along the electromagnetic spectrum, or some combination thereof. The communication interface 740 may also include one or more Global Navigation Satellite System (GNSS) receivers or transceivers for determining the location of the computing system 700 based on one or more signals received from one or more satellites associated with one or more GNSS systems. GNSS systems include, but are not limited to, the U.S. Global Positioning System (GPS), Russia's Global Navigation Satellite System (GLONASS), China's BeiDou Navigation Satellite System (BDS), and Europe's Galileo GNSS. There are no limitations on operation on any particular hardware configuration, and therefore the underlying features here can be easily replaced to obtain improved hardware or firmware configurations as they are developed.

[0119] Storage device 730 may be a non-volatile and / or non-transitory and / or computer-readable storage device, and may be a hard disk or other type of computer-readable medium capable of storing data accessible by a computer, such as magnetic tape cassettes, flash memory cards, solid-state storage devices, digital versatile optical discs, magnetic tape cassettes, floppy disks, flexible disks, hard disks, magnetic tape, magnetic stripes / strips, any other magnetic storage media, flash memory, memristor memory, any other solid-state storage, compressed optical disc read-only memory (CD-ROM), rewritable compressed optical disc (CD), digital video optical disc (DVD), Blu-ray disc (BDD), holographic disc, another optical medium, secure digital (SD) card, micro secure digital (microSD) card, etc. Cards, smart card chips, EMV chips, Subscriber Identity Module (SIM) cards, mini / micro / nano / micro SIM cards, another integrated circuit (IC) chip / card, random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash EPROM, cache memory (L1 / L2 / L3 / L4 / L5 / L#), resistive random access memory (RRAM / ReRAM), phase-change memory (PCM), spin-transfer torque RAM (STT-RAM), another memory chip or cassette and / or combinations thereof.

[0120] Storage device 730 may include software services, servers, etc., which enable the system to perform functions when the code defining such software is executed by processor 710. In some embodiments, hardware services that perform specific functions may include software components stored in computer-readable media connected to necessary hardware components (such as processor 710, connection 705, output device 735, etc.) to perform functions.

[0121] As used herein, the term "computer-readable medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing, or carrying instructions and / or data. Computer-readable media may include non-transitory media in which data may be stored and which do not include carrier waves and / or transient electronic signals propagating wirelessly or over a wired connection. Examples of non-transitory media may include, but are not limited to, magnetic disks or magnetic tapes, optical storage media such as CDs or DVDs, flash memory, memory, or memory devices. Computer-readable media may store code and / or machine-executable instructions thereon, which may represent procedures, functions, subroutines, routines, subroutines, modules, software packages, classes, or any combination of instructions, data structures, or program statements. Code segments may be coupled to other code segments or hardware circuitry by passing and / or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc., may be passed, forwarded, or transmitted using any suitable means, including memory sharing, messaging, token passing, network transmission, etc.

[0122] In some implementations, computer-readable storage devices, media, and memories may include wired or wireless signals containing bit streams, etc. However, when referred to, non-transitory computer-readable storage media explicitly exclude media such as energy, carrier signals, electromagnetic waves, and the signals themselves.

[0123] Specific details are provided in the foregoing description to provide a thorough understanding of the embodiments and examples provided herein. However, those skilled in the art will understand that these embodiments can be practiced without these specific details. For clarity, in some cases, the technology may be presented as comprising individual functional blocks, including functional blocks containing devices, device components, steps or routines in methods embodied in software or a combination of hardware and software. Additional components may be used in addition to those shown in the figures and / or described herein. For example, circuits, systems, networks, processes and other components may be shown as components in block diagram form to avoid obscuring these embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures and techniques may be shown without the need for unnecessary detail to avoid confusing the embodiments.

[0124] Individual implementations can be described above as processes or methods illustrated as flowcharts, flow diagrams, data flow diagrams, structure diagrams, or block diagrams. Although flowcharts can describe operations as sequential processes, many operations within an operation can be performed in parallel or simultaneously. Furthermore, the order of operations can be rearranged. A process terminates when its operations are completed, but a process may have additional steps not included in the accompanying drawings. A process can correspond to a method, function, procedure, subroutine, subroutine, etc. When a process corresponds to a function, its termination corresponds to the function returning to the calling function or the main function.

[0125] The processes and methods described in the examples above can be implemented using stored computer-executable instructions or computer-executable instructions otherwise available from a computer-readable medium. These instructions may include, for example, instructions and data that cause or otherwise configure a general-purpose computer, special-purpose computer, or processing device to perform a function or group of functions. Parts of the computer resources used may be accessible via a network. The computer-executable instructions may be, for example, binary, intermediate format instructions, such as assembly language, firmware, source code, etc. Examples of computer-readable media that may be used to store instructions, information used, and / or information created during the methods according to the described examples include disks or optical discs, flash memory, USB devices with non-volatile memory, networked storage devices, etc.

[0126] Devices implementing the processes and methods according to these disclosures may include hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof, and may take any of a variety of form factors. When implemented as software, firmware, middleware, or microcode, program code or code segments (e.g., computer program products) for performing necessary tasks may be stored in a computer-readable or machine-readable medium. A processor may perform the necessary tasks. Typical examples of form factors include laptops, smartphones, mobile phones, tablet devices, or other small form factor personal computers, personal digital assistants, rack-mounted devices, standalone devices, etc. The functionality described herein may also be embodied in peripheral devices or intercalation cards. As a further example, such functionality may also be implemented on circuit boards within different chips or different processes executed on a single device.

[0127] Instructions, media for transmitting such instructions, computing resources for executing them, and other structures for supporting such computing resources are example means of providing the functionality described in this disclosure.

[0128] In the foregoing description, various aspects of this application have been described with reference to specific embodiments thereof; however, those skilled in the art will recognize that this application is not limited thereto. Therefore, although illustrative embodiments of this application have been described in detail herein, it should be understood that the inventive concept can be implemented and employed in a variety of other ways, and the appended claims are not intended to be construed as including these variations unless limited by prior art. Various features and aspects of the above applications can be used individually or in combination. Furthermore, without departing from the broader spirit and scope of this specification, the embodiments can be used in any number of environments and applications beyond those described herein. Therefore, the specification and drawings should be considered illustrative rather than restrictive. For illustrative purposes, the methods are described in a particular order. It should be understood that in alternative embodiments, the methods may be performed in a different order than described.

[0129] Those skilled in the art will understand that, without departing from the scope of this specification, the less than ("<") and greater than (">") symbols or terms used herein may be replaced by the less than or equal to ("≤") and greater than or equal to ("≥") symbols, respectively.

[0130] When a component is described as being “configured” to perform certain operations, such configuration can be achieved, for example, by designing electronic circuits or other hardware to perform the operations, by programming programmable electronic circuits (e.g., microprocessors or other suitable electronic circuits) to perform the operations, or any combination thereof.

[0131] The phrase “coupled to” means any component is physically connected directly or indirectly to another component, and / or any component communicates directly or indirectly with another component (e.g., connected to another component via a wired or wireless connection and / or other suitable communication interface).

[0132] The language used to state "at least one of" and / or "one or more of" in a set, or other languages, indicates that one member of the set or multiple members of the set (in any combination) satisfy the claim. For example, the language used to state "at least one of A and B" means A, B, or A and B. In another example, the language used to state "at least one of A, B, and C" means A, B, C, or A and B, or A and C, or B and C, or A and B and C. The language used to state "at least one of" and / or "one or more of" in a set does not limit the set to the items listed in the set. For example, the language used to state "at least one of A and B" may mean A, B, or A and B, and may additionally include items not listed in the set of A and B.

[0133] The various illustrative logic blocks, modules, circuits, and algorithmic steps described in conjunction with the embodiments disclosed herein can be implemented as electronic hardware, computer software, firmware, or a combination thereof. To clearly illustrate this interchangeability between hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described in general terms of their functionality. Whether this functionality is implemented as hardware or software depends on the specific application and the design constraints imposed on the overall system. Those skilled in the art may implement the described functionality in different ways for each specific application, but such implementation decisions should not be construed as departing from the scope of this application.

[0134] The techniques described herein can also be implemented in electronic hardware, computer software, firmware, or any combination thereof. Such techniques can be implemented in any of a variety of devices, such as general-purpose computers, wireless communication handheld devices, or integrated circuit devices with multiple uses, including applications in wireless communication handheld devices and other devices. Any feature described as a module or component can be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, these techniques can be implemented at least in part by a computer-readable data storage medium comprising program code, including instructions that, when executed, perform one or more of the methods described above. The computer-readable data storage medium can form part of a computer program product and may include packaging material. The computer-readable medium can include memory or data storage media, such as random access memory (RAM) (e.g., synchronous dynamic random access memory (SDRAM)), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), flash memory, magnetic or optical data storage media, etc. Alternatively or alternatively, the technology may be implemented at least in part by a computer-readable communication medium that carries or transmits program code in the form of instructions or data structures that can be accessed, read and / or executed by a computer, such as propagated signals or waves.

[0135] The program code can be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Such a processor can be configured to perform any of the techniques described in this disclosure. A general-purpose processor may be a microprocessor; however, in alternatives, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors and a DSP core, or any other such configuration. Therefore, as used herein, the term "processor" may refer to any of the foregoing structures, any combination of the foregoing structures, or any other structure or means suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within or incorporated into a dedicated software or hardware module configured for encoding and decoding, or into a combined video encoder-decoder (codec).

[0136] The illustrative aspects of this disclosure include:

[0137] Aspect 1: An apparatus for processing image data, the apparatus comprising: a memory; and one or more processors coupled to the memory, the one or more processors being configured to: acquire image data associated with an image frame; determine, based on the output of one or more trained machine learning models using the image data as input, a plurality of settings for adjusting one or more parameters of an image signal processor (ISP), wherein the plurality of settings vary spatially among the image data; and generate an output image by processing a plurality of pixels of the image data at least in part using the ISP, wherein each of the plurality of pixels is processed using a corresponding setting of the plurality of settings for adjusting a corresponding parameter among the one or more parameters.

[0138] Aspect 2: The apparatus according to aspect 1, wherein the plurality of settings includes one or more adjustable settings.

[0139] Aspect 3: The apparatus according to any one of Aspect 1 or 2, wherein the image data includes de-mosaiced image data.

[0140] Aspect 4: The apparatus according to any one of Aspects 1 to 3, wherein the image data is raw image data having a plurality of color components corresponding to a color filter array of an image sensor.

[0141] Aspect 5: The apparatus according to aspect 4, wherein inputting the image data into the one or more trained machine learning models includes inputting the raw image data into the one or more trained machine learning models.

[0142] Aspect 6: The apparatus according to any one of aspects 1 to 5, wherein, in order to generate the output image, the one or more processors are configured to demosaic the image data before using the ISP having each of the one or more parameters adjusted based on the plurality of settings to process the plurality of pixels of the image data.

[0143] Aspect 7: The apparatus according to any one of Aspects 1 to 6, wherein the one or more processors are configured to: demosaic the image data before inputting the image data into the one or more trained machine learning models.

[0144] Aspect 8: The apparatus according to any one of aspects 1 to 7, wherein, in order to obtain the image data, the one or more processors are configured to receive the image data from an image sensor that captures the image data.

[0145] Aspect 9: An apparatus according to any one of aspects 1 to 8, wherein the one or more processors are configured to: obtain metadata corresponding to the image data, wherein the output of the one or more trained machine learning models is based on inputting the metadata and the image data into the one or more trained machine learning models.

[0146] Aspect 10: An apparatus according to any one of aspects 1 to 9, wherein the one or more parameters of the ISP include a plurality of gain parameters, and the plurality of settings include a plurality of gain settings corresponding to the plurality of gain parameters, each of the plurality of gain parameters corresponding to one of a plurality of color channels, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to perform one or more multiplier operations on at least one pixel based on the plurality of gain settings.

[0147] Aspect 11: An apparatus according to any one of aspects 1 to 10, wherein the one or more parameters of the ISP include a plurality of offset parameters, and the plurality of settings include a plurality of offset settings corresponding to the plurality of offset parameters, each of the plurality of offset parameters corresponding to one of a plurality of color channels, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to perform one or more addition operations on at least one pixel based on the plurality of offset settings.

[0148] Aspect 12: An apparatus according to any one of aspects 1 to 11, wherein the one or more parameters of the ISP include one or more grayscale coefficient parameters, and the plurality of settings include one or more grayscale coefficient settings corresponding to the one or more grayscale coefficient parameters, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to adjust the hue of at least one pixel based on the one or more grayscale coefficient settings.

[0149] Aspect 13: An apparatus according to any one of aspects 1 to 12, wherein the one or more parameters of the ISP include one or more Gaussian filter parameters, and the plurality of settings include one or more Gaussian filter settings corresponding to the one or more Gaussian filter parameters, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to apply a Gaussian filter to at least one pixel based on a Gaussian curve, wherein the shape of the Gaussian curve is based on one or more Gaussian settings.

[0150] Aspect 14: An apparatus according to any one of Aspects 1 to 13, wherein the one or more parameters of the ISP include one or more demosaic parameters, and the plurality of settings include one or more demosaic settings corresponding to the one or more demosaic parameters, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to demosaic at least one pixel of the image data based on the one or more demosaic settings.

[0151] Aspect 15: The apparatus according to any one of Aspects 1 to 14, wherein the one or more parameters of the ISP are associated with at least one of noise reduction, sharpening, tone mapping and color saturation.

[0152] Aspect 16: An apparatus according to any one of aspects 1 to 15, wherein, in order to process the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings, the one or more processors are configured to: process a first pixel of the plurality of pixels of the image data based on a first setting of a first parameter of the one or more parameters; and process a second pixel of the plurality of pixels of the image data based on a second setting of the first parameter, wherein the plurality of settings includes at least the first setting and the second setting.

[0153] Aspect 17: An apparatus according to any one of aspects 1 to 16, wherein, in order to process the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings, the one or more processors are configured to: process a first pixel of the plurality of pixels of the image data based on a first setting of a first parameter of the one or more parameters; and process the first pixel of the plurality of pixels based on a second setting corresponding to a second parameter of the one or more parameters, wherein the plurality of settings includes at least the first setting and the second setting.

[0154] Aspect 18: The apparatus according to any one of aspects 1 to 17 further includes: one or more displays, wherein the one or more processors are configured to display the output image on the one or more displays.

[0155] Aspect 19: The apparatus according to any one of aspects 1 to 18, wherein the one or more trained machine learning models comprise one or more trained neural networks.

[0156] Aspect 20: A method of processing image data, comprising: acquiring image data associated with an image frame; determining, based on the output of one or more trained machine learning models using the image data as input, a plurality of settings for adjusting one or more parameters of an image signal processor (ISP), wherein the plurality of settings vary spatially among the image data; and generating an output image by processing a plurality of pixels of the image data at least in part using the ISP, wherein each of the plurality of pixels is processed using a corresponding setting of the plurality of settings for adjusting a corresponding parameter among the one or more parameters.

[0157] Aspect 21: According to the method of aspect 20, the plurality of settings includes one or more adjusted settings.

[0158] Aspect 22: The method according to any one of Aspects 20 or 21, wherein the image data includes de-mosaiced image data.

[0159] Aspect 23: The method according to any one of aspects 20 to 22, wherein the image data is raw image data having a plurality of color components corresponding to a color filter array of an image sensor.

[0160] Aspect 24: The method according to aspect 23, wherein inputting the image data into the one or more trained machine learning models includes inputting the raw image data into the one or more trained machine learning models.

[0161] Aspect 25: The method according to any one of aspects 20 to 24, wherein generating the output image comprises: demosaicing the image data before processing the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings.

[0162] Aspect 26: The method according to any one of aspects 20 to 25 further includes de-mosaicing the image data before inputting the image data into the one or more trained machine learning models.

[0163] Aspect 27: The method according to any one of aspects 20 to 26, wherein obtaining the image data includes receiving the image data from an image sensor that captures the image data.

[0164] Aspect 28: The method according to any one of aspects 20 to 27 further includes: obtaining metadata corresponding to the image data, wherein the output of the one or more trained machine learning models is based on inputting the metadata and the image data into the one or more trained machine learning models.

[0165] Aspect 29: The method according to any one of Aspects 20 to 28, wherein the one or more parameters of the ISP include a plurality of gain parameters, and the plurality of settings include a plurality of gain settings corresponding to the plurality of gain parameters, each of the plurality of gain parameters corresponding to one of a plurality of color channels, and wherein processing the plurality of pixels of the image data using the ISP includes performing one or more multiplier operations on at least one pixel using the ISP based on the plurality of gain settings.

[0166] Aspect 30: The method according to any one of Aspects 20 to 29, wherein the one or more parameters of the ISP include a plurality of offset parameters, and the plurality of settings include a plurality of offset settings corresponding to the plurality of offset parameters, each of the plurality of offset parameters corresponding to one of a plurality of color channels, and wherein processing the plurality of pixels of the image data using the ISP includes performing one or more addition operations on at least one pixel using the ISP based on the plurality of offset settings.

[0167] Aspect 31: The method according to any one of Aspects 20 to 30, wherein the one or more parameters of the ISP include one or more grayscale coefficient parameters, and the plurality of settings include one or more grayscale coefficient settings corresponding to the one or more grayscale coefficient parameters, and wherein processing the plurality of pixels of the image data using the ISP includes adjusting the hue of at least one pixel using the ISP based on the one or more grayscale coefficient settings.

[0168] Aspect 32: The method according to any one of Aspects 20 to 31, wherein the one or more parameters of the ISP include one or more Gaussian filter parameters, and the plurality of settings include one or more Gaussian filter settings corresponding to the one or more Gaussian filter parameters, and wherein processing the plurality of pixels of the image data using the ISP includes applying a Gaussian filter to at least one pixel using the ISP based on a Gaussian curve, wherein the shape of the Gaussian curve is based on one or more Gaussian settings.

[0169] Aspect 33: The method according to any one of Aspects 20 to 32, wherein the one or more parameters of the ISP include one or more demosaic parameters, and the plurality of settings include one or more demosaic settings corresponding to the one or more demosaic parameters, and wherein processing the plurality of pixels of the image data using the ISP includes demosaicing at least one pixel of the image data using the ISP based on the one or more demosaic settings.

[0170] Aspect 34: The method according to any one of Aspects 20 to 33, wherein the one or more parameters of the ISP are associated with at least one of noise reduction, sharpening, tone mapping and color saturation.

[0171] Aspect 35: The method according to any one of aspects 20 to 34, wherein processing the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings comprises: processing a first pixel of the plurality of pixels of the image data based on a first setting of a first parameter of the one or more parameters; and processing a second pixel of the plurality of pixels of the image data based on a second setting of the first parameter, wherein the plurality of settings includes at least the first setting and the second setting.

[0172] Aspect 36: The method according to any one of aspects 20 to 35, wherein processing the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings comprises: processing a first pixel of the plurality of pixels of the image data based on a first setting of a first parameter of the one or more parameters; and processing the first pixel of the plurality of pixels based on a second setting corresponding to a second parameter of the one or more parameters, wherein the plurality of settings includes at least the first setting and the second setting.

[0173] Aspect 37: The method according to any one of aspects 20 to 36 further includes: displaying the output image on one or more displays.

[0174] Aspect 38: The method according to any one of aspects 20 to 37, wherein the one or more trained machine learning models comprise one or more trained neural networks.

[0175] Aspect 39: A computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform any one of aspects 1 to 38.

[0176] Aspect 40: An apparatus comprising means for performing the operation according to any one of aspects 1 to 38.

Claims

1. An apparatus for processing image data, the apparatus comprising: At least one memory; and At least one processor coupled to the at least one memory, the at least one processor being configured to: Obtain image data; Multiple settings for adjusting one or more parameters of the image signal processor (ISP) are determined based on the output of one or more trained machine learning models that use the image data and metadata associated with the image data as input, wherein the multiple settings vary spatially between different pixels of the image frame associated with the image data. as well as An output image frame is generated at least in part by processing multiple pixels of the image data using the ISP, wherein each of the multiple pixels is processed using a corresponding setting of a plurality of settings for adjusting a corresponding parameter among the one or more parameters.

2. The apparatus of claim 1, wherein the plurality of settings includes one or more adjustable settings.

3. The apparatus of claim 1, wherein the image data includes de-mosaiced image data.

4. The apparatus of claim 1, wherein the image data is raw image data having a plurality of color components corresponding to a color filter array of an image sensor.

5. The apparatus of claim 4, wherein inputting the image data into the one or more trained machine learning models comprises inputting the raw image data into the one or more trained machine learning models.

6. The apparatus of claim 1, wherein, in order to generate the output image, the at least one processor is configured to: The image data is de-mosaiced before being processed by the ISP having one or more parameters adjusted based on the plurality of settings for each of the plurality of pixels.

7. The apparatus of claim 1, wherein the at least one processor is configured to: The image data is de-mosaiced before being input into the one or more trained machine learning models.

8. The apparatus of claim 1, wherein, in order to obtain the image data, the at least one processor is configured to receive the image data from an image sensor that captures the image data.

9. The apparatus of claim 1, wherein the at least one processor is configured to: Obtain image capture settings used in capturing the image data, wherein the metadata includes the image capture settings, and wherein the output of the one or more trained machine learning models is based on inputting the image capture settings and the image data into the one or more trained machine learning models.

10. The apparatus of claim 1, wherein the one or more parameters of the ISP include a plurality of gain parameters, and the plurality of settings include a plurality of gain settings corresponding to the plurality of gain parameters, each of the plurality of gain parameters corresponding to one of a plurality of color channels, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to perform one or more multiplier operations on at least one pixel based on the plurality of gain settings.

11. The apparatus of claim 1, wherein the one or more parameters of the ISP include a plurality of offset parameters, and the plurality of settings include a plurality of offset settings corresponding to the plurality of offset parameters, each of the plurality of offset parameters corresponding to one of a plurality of color channels, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to perform one or more addition operations on at least one pixel based on the plurality of offset settings.

12. The apparatus of claim 1, wherein the one or more parameters of the ISP include one or more grayscale coefficient parameters, and the plurality of settings include one or more grayscale coefficient settings corresponding to the one or more grayscale coefficient parameters, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to adjust the hue of at least one pixel based on the one or more grayscale coefficient settings.

13. The apparatus of claim 1, wherein the one or more parameters of the ISP include one or more Gaussian filter parameters, and the plurality of settings include one or more Gaussian filter settings corresponding to the one or more Gaussian filter parameters, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to apply a Gaussian filter to at least one pixel based on a Gaussian curve, wherein the shape of the Gaussian curve is based on the one or more Gaussian filter settings.

14. The apparatus of claim 1, wherein the one or more parameters of the ISP include one or more demosaic parameters, and the plurality of settings include one or more demosaic settings corresponding to the one or more demosaic parameters, wherein in order to process the plurality of pixels of the image data using the ISP, the ISP is configured to demosaic at least one pixel of the image data based on the one or more demosaic settings.

15. The apparatus of claim 1, wherein the one or more parameters of the ISP are associated with at least one of noise reduction, sharpening, tone mapping, and color saturation.

16. The apparatus of claim 1, wherein, in order to process the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings, the at least one processor is configured to: The image data is processed based on a first setting of a first parameter among the one or more parameters, specifically the first pixel among the plurality of pixels; and The second pixel of the plurality of pixels in the image data is processed based on a second setting of the first parameter, wherein the plurality of settings include at least the first setting and the second setting.

17. The apparatus of claim 1, wherein, in order to process the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings, the at least one processor is configured to: The image data is processed based on a first setting of a first parameter among the one or more parameters, specifically the first pixel among the plurality of pixels; and The first pixel among the plurality of pixels is processed based on a second setting corresponding to a second parameter among the one or more parameters, wherein the plurality of settings includes at least the first setting and the second setting.

18. The apparatus according to claim 1, further comprising: One or more displays, wherein the at least one processor is configured to display the output image on the one or more displays.

19. The apparatus of claim 1, wherein the one or more trained machine learning models comprise one or more trained neural networks.

20. A method for processing image data, comprising: Obtain image data; Multiple settings for adjusting one or more parameters of the image signal processor (ISP) are determined based on the output of one or more trained machine learning models that use the image data and metadata associated with the image data as input, wherein the multiple settings vary spatially between different pixels of the image frame associated with the image data. as well as An output image frame is generated at least in part by processing multiple pixels of the image data using the ISP, wherein each of the multiple pixels is processed using a corresponding setting of a plurality of settings for adjusting a corresponding parameter among the one or more parameters.

21. The method of claim 20, wherein the plurality of settings includes one or more adjusted settings.

22. The method of claim 21, wherein the image data is raw image data having a plurality of color components corresponding to a color filter array of an image sensor.

23. The method of claim 21, wherein generating the output image comprises: The image data is de-mosaiced before being processed by the ISP having one or more parameters adjusted based on the plurality of settings for each of the plurality of pixels.

24. The method of claim 21, further comprising: Obtain image capture settings used in capturing the image data, wherein the metadata includes the image capture settings, and wherein the output of the one or more trained machine learning models is based on inputting the image capture settings and the image data into the one or more trained machine learning models.

25. The method of claim 21, wherein the one or more parameters of the ISP include a plurality of gain parameters, and the plurality of settings include a plurality of gain settings corresponding to the plurality of gain parameters, each of the plurality of gain parameters corresponding to one of a plurality of color channels, and wherein processing the plurality of pixels of the image data using the ISP includes performing one or more multiplier operations on at least one pixel using the ISP based on the plurality of gain settings.

26. The method of claim 20, wherein the one or more parameters of the ISP include a plurality of offset parameters, and the plurality of settings include a plurality of offset settings corresponding to the plurality of offset parameters, each of the plurality of offset parameters corresponding to one of a plurality of color channels, and wherein processing the plurality of pixels of the image data using the ISP includes performing one or more addition operations on at least one pixel using the ISP based on the plurality of offset settings.

27. The method of claim 20, wherein the one or more parameters of the ISP include one or more grayscale coefficient parameters, and the plurality of settings include one or more grayscale coefficient settings corresponding to the one or more grayscale coefficient parameters, and wherein processing the plurality of pixels of the image data using the ISP includes adjusting the hue of at least one pixel using the ISP based on the one or more grayscale coefficient settings.

28. The method of claim 20, wherein the one or more parameters of the ISP include one or more Gaussian filter parameters, and the plurality of settings include one or more Gaussian filter settings corresponding to the one or more Gaussian filter parameters, and wherein processing the plurality of pixels of the image data using the ISP includes applying a Gaussian filter to at least one pixel based on a Gaussian curve using the ISP, wherein the shape of the Gaussian curve is based on the one or more Gaussian filter settings.

29. The method of claim 20, wherein processing the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings comprises: The first pixel of the plurality of pixels in the image data is processed based on a first setting of a first parameter among the one or more parameters; as well as The second pixel of the plurality of pixels in the image data is processed based on a second setting of the first parameter, wherein the plurality of settings include at least the first setting and the second setting.

30. The method of claim 20, wherein processing the plurality of pixels of the image data using the ISP having each of the one or more parameters adjusted based on the plurality of settings comprises: The first pixel of the plurality of pixels in the image data is processed based on a first setting of a first parameter among the one or more parameters; as well as The first pixel among the plurality of pixels is processed based on a second setting corresponding to a second parameter among the one or more parameters, wherein the plurality of settings includes at least the first setting and the second setting.

31. A computer-readable medium storing program code for processing image data at a device, wherein, The program code can be executed by one or more processors to enable the device to: Obtain image data; Multiple settings for adjusting one or more parameters of the image signal processor (ISP) are determined based on the output of one or more trained machine learning models that use the image data and metadata associated with the image data as input, wherein the multiple settings vary spatially between different pixels of the image frame associated with the image data. as well as An output image frame is generated at least in part by processing multiple pixels of the image data using the ISP, wherein each of the multiple pixels is processed using a corresponding setting of a plurality of settings for adjusting a corresponding parameter among the one or more parameters.

32. An apparatus for processing image data, comprising: Components used to acquire image data; Components for determining multiple settings for adjusting one or more parameters of an image signal processor (ISP) based on the output of one or more trained machine learning models that use the image data and metadata associated with the image data as input, wherein the multiple settings vary spatially between different pixels of the image frame associated with the image data; as well as A component for generating an output image frame by at least partially processing a plurality of pixels of the image data using the ISP, wherein each of the plurality of pixels is processed using a corresponding setting of a plurality of settings for adjusting a corresponding parameter among the one or more parameters.