Image processing method, image processing device, and image processing system
By using a machine learning model to obtain captured image and optical system state information, the problem of excessive learning load and storage data in existing technologies is solved, and high-precision sharpening or reshaping of blurred images is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CANON KK
- Filing Date
- 2020-06-02
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies using Wiener filters and convolutional neural networks (CNNs) for image sharpening suffer from insufficient accuracy or excessive learning load and data storage, making it difficult to ensure high-precision sharpening of blurred images while suppressing the learning load and data storage.
By acquiring information about the captured image and the state of the optical system as input data, machine learning models such as CNNs are used for sharpening or shaping, uniformly learning blurred data of various shapes, and achieving high-precision processing while suppressing the learning load and the amount of data stored.
It achieves high-precision sharpening or reshaping of blurred images caused by optical systems while suppressing learning load and storage data volume, thus improving the accuracy and efficiency of image processing.
Smart Images

Figure CN122265094A_ABST
Abstract
Description
[0001] This application is a divisional application of the invention patent application with application number 202010487417.0, application date June 2, 2020, and invention title "Image Processing Method, Image Processing Apparatus and Image Processing System". Technical Field
[0002] The present invention relates to an image processing method for sharpening or reshaping blur caused by an optical system used to capture a captured image. Background Technology
[0003] Japanese Patent Publication No. (“JP”) 2011-123589 discloses a method for correcting blur caused by aberrations in a captured image and obtaining a sharpened image using Wiener filter-based processing. JP 2017-199235 discloses a method for correcting blur caused by defocus in a captured image using a convolutional neural network (CNN).
[0004] However, the method disclosed in JP 2011-123589 uses Wiener filter-based processing (linear processing), and therefore cannot sharpen blurred images with high precision. For example, it is impossible to recover information about the subject in which the blur makes the spatial spectrum zero or makes the intensity of the subject the same as the intensity of noise. Furthermore, since different Wiener filters need to be used for different aberrations, the amount of stored data (indicating the data capacity of multiple Wiener filters) used to sharpen the captured image increases in optical systems that generate various aberrations.
[0005] On the other hand, in JP 2017-199235, since the CNN method uses non-linear processing, the spatial spectrum of the subject, which has been reduced to near zero, can be estimated. However, using the CNN method may reduce sharpening accuracy or increase the learning load and storage data volume when sharpening captured images by optical systems that generate various aberrations. CNNs cannot properly sharpen images with blur that the CNN has not yet learned. In optical systems, the generated blur varies due to the state of zoom, F-number, subject distance (focal distance), etc., and the following two methods can be considered to sharpen all blurred images.
[0006] The first method uses training data that includes all possible blurs in the optical system to train the CNN. However, in the first method, because the CNN learns to average all blurs included in the training data, the sharpening accuracy is reduced for each blur with different shapes. The second method divides the possible blurs in the optical system into multiple similar groups and trains the CNN individually based on the training data of each group. However, in the second method, when the optical system is, for example, a high-magnification zoom lens and generates various aberrations, the number of groups increases significantly, thus increasing the learning load and the amount of stored data (the data capacity of the learned CNN weights). Therefore, it is difficult to ensure the accuracy of sharpening blurry images while suppressing the learning load and the amount of stored data. Summary of the Invention
[0007] This invention provides an image processing method, an image processing apparatus, an image processing system, and a method for manufacturing learned weights, all of which can sharpen captured images or reshape blur in captured images with high accuracy while suppressing the learning load and data storage volume of machine learning models.
[0008] An image processing method according to one aspect of the present invention includes: a first step of acquiring input data including a captured image and optical system information relating to the state of an optical system used to capture the captured image; and a second step of inputting the input data into a machine learning model and generating an estimated image obtained by sharpening the captured image or by shaping blur included in the captured image.
[0009] A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the image processing method described above also constitutes another aspect of the invention.
[0010] An image processing apparatus as an aspect of the present invention includes: an acquisition unit configured to acquire input data including a captured image and optical system information relating to the state of an optical system used to capture the captured image; and a generation unit configured to input the input data into a machine learning model and generate an estimated image obtained by sharpening the captured image or by shaping blur included in the captured image.
[0011] An image processing system as an aspect of the present invention has a first device and a second device. The first device includes: a sending unit configured to send a request to the second device relating to the execution of processing of a captured image, and the second device includes: a receiving unit configured to receive the request; an acquiring unit configured to acquire input data including the captured image and optical system information relating to the state of an optical system used to capture the captured image; and a generating unit configured to input the input data into a machine learning model and generate an estimated image obtained by sharpening the captured image or by shaping blur included in the captured image.
[0012] An image processing method as an aspect of the present invention includes: a first step of acquiring input data including a training image and optical system information relating to the state of an optical system corresponding to the training image; a second step of inputting the input data into a machine learning model and generating an output image obtained by sharpening the training image or by shaping blur included in the training image; and a third step of updating the weights of the machine learning model based on the output image and a standard answer image.
[0013] A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the image processing method described above also constitutes another aspect of the invention.
[0014] A method for generating learned weights as an aspect of the present invention includes: a first step of acquiring input data including a training image and optical system information relating to the state of an optical system corresponding to the training image; a second step of inputting the input data into a machine learning model and generating an output image obtained by sharpening the training image or by shaping blur included in the training image; and a third step of updating the weights of the machine learning model based on the output image and a standard answer image.
[0015] An image processing apparatus as an aspect of the present invention includes: an acquisition unit configured to acquire input data including a training image and optical system information relating to the state of an optical system corresponding to the training image; a generation unit configured to input the input data into a machine learning model and generate an output image obtained by sharpening the training image or by shaping blur included in the training image; and an update unit configured to update the weights of the machine learning model based on the output image and a standard answer image.
[0016] Further features of the invention will become clear from the following description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0017] Figure 1 This is a configuration diagram illustrating the machine learning model according to the first embodiment.
[0018] Figure 2 This is a block diagram illustrating an image processing system according to a first embodiment.
[0019] Figure 3 This is an external view of the image processing system according to the first embodiment.
[0020] Figure 4 This is a flowchart related to weight learning based on the first and second embodiments.
[0021] Figure 5 This is a flowchart related to the generation of the estimated image according to the first embodiment.
[0022] Figure 6 This is a block diagram illustrating an image processing system according to a second embodiment.
[0023] Figure 7 This is an external view of the image processing system according to the second embodiment.
[0024] Figure 8 This is a configuration diagram illustrating the machine learning model according to the second embodiment.
[0025] Figure 9 This is a diagram illustrating the relationship between the image sensor and the image circle of the optical system according to the second embodiment.
[0026] Figure 10 This is a diagram illustrating an example of a positional diagram according to the second embodiment.
[0027] Figure 11 This is a flowchart related to the generation of the estimated image according to the second embodiment.
[0028] Figure 12 This is a diagram illustrating the effect of sharpening a captured image according to the second embodiment.
[0029] Figure 13 This is an example diagram showing a manufacturing variation according to the second embodiment.
[0030] Figure 14 This is a block diagram illustrating an image processing system according to a third embodiment.
[0031] Figure 15 This is an external view of the image processing system according to the third embodiment.
[0032] Figure 16This is a flowchart related to weight learning according to the third embodiment.
[0033] Figure 17 This is a configuration diagram illustrating the machine learning model according to the third embodiment.
[0034] Figure 18 This is a flowchart related to the generation of the estimated image according to the third embodiment. Detailed Implementation
[0035] A detailed description of embodiments of the invention will now be given with reference to the accompanying drawings. Corresponding elements in the various drawings will be indicated by the same reference numerals, and repeated descriptions will be omitted.
[0036] Before giving a detailed description of each embodiment, the gist of the invention will be described. The invention uses machine learning models to sharpen captured images or to reshape blur in captured images. Optical systems are used to capture images and cause blur (generate blur). As used herein, the term "optical system" refers to those that have optical effects on image capture. That is, "optical system" includes not only imaging optical systems but also, for example, optical low-pass filters and microlens arrays of image sensors. Therefore, blur caused by optical systems includes blur caused by aberrations, diffraction and defocus, the operation of optical low-pass filters, pixel aperture degradation of image sensors, etc.
[0037] Machine learning models include, for example, neural networks, genetic programming, and Bayesian networks. Neural networks include CNNs (Convolutional Neural Networks), GANs (Generative Adversarial Networks), and RNNs (Recently Nearest Neural Networks).
[0038] Sharpening refers to the process used to restore the frequency components of a subject that have been reduced or lost due to blurring. Shaping refers to the transformation of the shape of a blur without restoring the frequency components. For example, shaping includes transforming from two-line blur to Gaussian or disk (flat circular distribution), transforming from a vignetted defocus blur to a circular defocus blur, etc.
[0039] The input data to the machine learning model includes the captured image and information related to the state of the optical system at the time of capturing the image (hereinafter referred to as optical system information). The state of the optical system refers to the state of the devices that can change the optical actions related to image capture. The state of the optical system includes, for example, the state of the optical system's zoom, F-number, subject distance, etc. The optical system information may include information about the presence (or type) of an optical low-pass filter and the presence (or type) of accessories attached to the optical system (e.g., converter lenses).
[0040] Optical system information is fed into the machine learning model during learning and during estimation after learning, enabling the model to determine which state of the optical system causes blur in the captured image. Thus, even when the image to be learned includes blur of various shapes, the machine learning model can learn weights for sharpening (or reshaping blur) the image for each state of the optical system, rather than learning weights for uniformly sharpening (or reshaping blur).
[0041] Therefore, the present invention can perform highly accurate sharpening (or shaping) on each blurred image. Furthermore, it can uniformly learn learning data including blurs of various shapes while suppressing the decrease in accuracy when sharpening (or shaping) the captured image. Thus, the present invention can shape or sharpen the blurred image of a captured image with blur caused by the optical system with high accuracy while suppressing the learning load and storage data volume. The effects of the present invention will be described quantitatively in the second embodiment. In the following description, the learning phase refers to the step of learning weights in a machine learning model, and the estimation phase refers to the step of using the learned weights in the machine learning model to sharpen or shape the blurred image.
[0042] First Embodiment
[0043] First, a description of an image processing system according to a first embodiment of the present invention will be given. This embodiment sharpens a captured image, but the invention is similarly applicable to blurring. This embodiment describes sharpening an image with blur caused by aberrations and diffraction, but the invention can be applied to images with blur caused by defocus.
[0044] Figure 2 This is a block diagram illustrating the image processing system 100 in this embodiment. Figure 3 This is an external view of an image processing system 100. The image processing system 100 includes a learning device (image processing device) 101, an image capture device 102, and a network 103. The learning device 101 and the image capture device 102 are connected via the network 103, either wired or wirelessly. The learning device 101 includes a memory 111, an acquirer (acquisition unit) 112, a calculator (generation unit) 113, and an updater (updating unit) 114, and is configured to learn weights in a machine learning model for sharpening a captured image. The image capture device 102 is configured to acquire a captured image by capturing the subject space and to sharpen a blurred captured image using information about the weights read before or after image capture. A detailed description of the weight learning in the learning device 101 and the sharpening of the captured image in the image capture device 102 will be given later.
[0045] Image capture device 102 includes an optical system (imaging optical system) 121 and an image sensor 122. The optical system 121 collects light that has entered the image capture device 102 from the subject space. The image sensor 122 receives (electrically converts) the optical image (subject image) formed via the optical system 121 and generates the captured image. The image sensor 122 is, for example, a CCD (charge-coupled device) sensor or a CMOS (complementary metal-oxide-semiconductor) sensor.
[0046] Image processor (image processing device) 123 includes an acquirer (acquisition unit) 123a and a sharpener (generation unit) 123b, and is configured to generate an estimated image (sharpened image) from a captured image by sharpening the captured image. The estimated image is generated using information about weights learned by learning device 101. The weight information has been pre-read from learning device 101 via wired or wireless network 103 and stored in memory 124. The stored weight information can be the weight values themselves or their encoded form. Recording medium 125 is configured to store the estimated image. Alternatively, the captured image can be stored on recording medium 125, and image processor 123 can read the captured image to generate the estimated image. Display 126 is configured to display the estimated image stored on recording medium 125 according to user instructions. System controller 127 is configured to control the above series of operations.
[0047] Next, we will refer to Figure 4 Here is a description of the weight learning (learning phase, method of creating the learned model) performed by the learning device 101 in this embodiment. Figure 4 This is a flowchart related to weight learning. Figure 4 Each step is primarily performed by the acquirer 112, calculator 113, or updater 114 in the learning device 101. In this embodiment, a CNN is used as the machine learning model, but the invention can be similarly applied to other models.
[0048] First, in step S101, the acquirer 112 acquires one or more sets of ground truth images and training input data from the memory 111. The training input data is the input data used in the learning phase of the CNN. The training input data includes training images and corresponding optical system information (hereinafter referred to as training optical system information). The training images and ground truth images are a pair of images that include the same subject and differ only in the presence or absence of blur. The ground truth images do not contain blur, while the training images do. Blur is generated by combining aberrations and diffraction caused by the optical system 121 with pixel aperture degradation of the image sensor 122. A training image includes blur generated by combining aberrations and diffraction caused by the optical system 121 with specific states of zoom, F-number, and focal length with pixel aperture degradation. The training optical system information is information indicating at least one of the specific states of zoom, F-number, and focal length. In other words, the optical system information specifies the blur included in the training image. In this embodiment, the optical system information includes all states of zoom, F-number, and focal length. The training images are not limited to captured images, but can also be images generated by CG, etc.
[0049] The following are examples of methods for generating the standard answer image and training input data stored in memory 111. The first generation method uses original images as subjects to perform image capture simulations. Original images are images captured in the real world, CG (computer graphics) images, etc. Original images can include edges, textures, gradients, flat areas, etc., with varying intensities and directions, allowing for accurate sharpening of various subjects. One or more original images can be used. The standard answer image is obtained by performing image capture simulations on the original image without applying blur. The training image is obtained by performing image capture simulations on the original image with the blur to be learned (blurring).
[0050] This embodiment uses blurring generated by aberrations and diffraction, as well as pixel aperture degradation, produced under the states (Z, F, D) of the optical system 121. Here, Z, F, and D represent the states of zoom, f-number, and subject distance, respectively. When the image sensor 122 acquires multiple color components, blurring for each color component is applied to the original image. Blurring can be applied by convolving the original image with a point spread function (PSF) or by multiplying the frequency characteristics of the original image with an OTF (Optical Transfer Function). When blurring under state (Z, F, D) is applied to a training image, the training optical system information is the information used to specify (Z, F, D). The standard answer image and training image can be undeveloped RAW images or developed images. Multiple different blurrings of (Z, F, D) are applied to one or more original images to generate multiple sets of standard answer images and training images.
[0051] This embodiment uniformly learns to correct all blurs generated in the optical system 121. Therefore, multiple sets of standard answer images and training images are generated for all possible (Z, F, D) of the optical system 121. Since multiple blurs may occur in the same (Z, F, D) depending on the image height and orientation, sets of standard answer images and training images are generated for each different image height and orientation.
[0052] The original image may have a signal value higher than the brightness saturation value of the image sensor 122. This is because when the image capture device 102 captures the actual subject under specific exposure conditions, the brightness may not fall within the brightness saturation value. A standard answer image is generated by trimming the signal using the brightness saturation value of the image sensor 122. A training image is generated by applying blur and then trimming the signal using the brightness saturation value.
[0053] When generating the standard answer image and training images, the size of the original image can be reduced (scaling down). When a real image is used as the original image, blurring occurs due to aberrations or diffraction. The size of the original image is reduced to suppress the effects of blurring in the original image and generate a standard answer image with high resolution. In this case, the size of the training image is also reduced to match the scale of the standard answer image. Scaling down or blurring can be performed first to generate the training image. When blurring is performed first, the sampling rate of the blur needs to be reduced for the sake of scaling down. In the case of PSF, the sampling points in space can be finer, while in the case of OTF, the maximum frequency can be larger. When the original image contains enough high-frequency components, highly accurate sharpening can be performed on the original image without scaling down, so scaling down is unnecessary.
[0054] In the generation of training images, the blur applied does not include distortion. If distortion is large, the position of the subject changes, and the subject in the image may differ between the standard answer image and the training image. Therefore, in this embodiment, the learned CNN is made not to correct for distortion. The estimation stage uses bilinear interpolation, bicubic interpolation, etc., to correct distortion after sharpening the captured image. Similarly, in the generation of training images, the blur applied does not include chromatic aberration. By using an offset for each color component, the estimation stage corrects chromatic aberration before sharpening the captured image.
[0055] A second method for generating the standard answer image and training input data is to use real images captured by optical system 121 and image sensor 122. Optical system 121 captures images in (Z, F, D) states to obtain training images. Training optical system information is information used to specify (Z, F, D). For example, the standard answer image can be obtained by capturing the same subject as the training image using an optical system with higher performance than optical system 121. A predetermined number of pixels can be extracted from the training images and standard answer images generated in both methods described above, and these regions can be used for learning.
[0056] Subsequently, Figure 4 In step S102, the calculator 113 inputs the training input data into the CNN and generates the output image. (Refer to...) Figure 1 A description of the generation of the output image in this embodiment is given. Figure 1 This is a configuration diagram illustrating the machine learning model in this embodiment.
[0057] The training input data includes training images 201 and optical system information (z, f, d) 202. Training image 201 can be grayscale or have multiple channel components. The same applies to the standard answer image. (z, f, d) is normalized (Z, F, D). Normalization is performed based on the possible range of optical system 121 for each of zoom, f-number, and subject distance. For example, Z represents focal length, F represents aperture value (f-number), and D represents the reciprocal of the absolute value of the distance from image capture device 102 to the subject in focus. min and Z max These are the minimum and maximum focal lengths of optical system 121, F. min and F max These are the minimum and maximum aperture values, D. min and D max It represents the minimum and maximum values of the reciprocal of the absolute value of the focusing distance. When the maximum focusing distance is infinity, D... min =1 / |∞|=0. The normalized (z, f, d) is obtained through the following expression (1).
[0058]
[0059] In expression (1), x and X are pseudo-variables indicating any one of (z, f, d) and (Z, F, D), respectively. min =X max In the case of , x is a constant. Alternatively, in the case of X min =X maxIn this case, since x has no degrees of freedom, it can be excluded from the optical system information. Generally, the closer the subject is, the greater the change in the performance of the optical system 121. Therefore, D is the reciprocal of the distance.
[0060] In this embodiment, the CNN 211 has a first sub-network 221 and a second sub-network 223. The first sub-network 221 has one or more convolutional layers or fully connected layers. The second sub-network 223 has one or more convolutional layers. During the first learning iteration, the weights of the CNN 211 (each element of the filter and the value of the bias) are generated using random numbers. The first sub-network 221 receives optical system information (z, f, d) 202 as input and generates a state map 203 that is converted into a feature map. The state map 203 is a map indicating the state of the optical system and has the same number of elements (number of pixels) as one channel component of the training image 201. A connection layer 222 connects the training image 201 and the state map 203 in a predetermined order along the channel direction. Other data can be connected between the training image 201 and the state map 203. The second sub-network 223 uses the connected training image 201 and the state map 203 as input and generates an output image 204. Figure 4 In step S101, when multiple sets of training input data are obtained, an output image 204 is generated for each set. The training image 201 can be converted into a feature map through a third sub-network, and the feature map and the state map 203 can be connected through a connection layer 222.
[0061] Subsequently, Figure 4 In step S103, updater 114 updates the CNN weights based on the error between the output image and the standard answer image. This embodiment uses the Euclidean norm of the difference between the signal values of the output image and the standard answer image as the loss function. However, the loss function is not limited to this. When multiple sets of training input data and standard answer images are acquired in step S101, the value of the loss function is calculated for each set. The calculated value of the loss function is used to update the weights through backpropagation, etc.
[0062] Subsequently, in step S104, the updater 114 determines whether weight learning has been completed. Completion can be determined based on whether the number of learning (weight update) iterations has reached a predetermined number, and whether the weight change during the update is less than a predetermined value. If it is determined in step S104 that weight learning has not yet been completed, the process returns to step S101, and the acquirer 112 acquires one or more sets of new training input data and standard answer images. On the other hand, when it is determined that weight learning has been completed, the updater 114 terminates the learning and stores the weight information in the memory 111.
[0063] Next, we will refer to Figure 5A description of the sharpening (estimation stage) of the captured image performed by the image processor 123 in this embodiment is given. Figure 5 This is a flowchart related to the sharpening of the captured image (generation of the estimated image) in this embodiment. Figure 5 Each step in the process is primarily performed by either the acquirer 123a or the sharpener 123b in the image processor 123.
[0064] First, in step S201, the acquirer 123a acquires input data and weight information. The input data includes the captured image and optical system information of the optical system 121 that captured the captured image. The captured image to be acquired can be a portion of the entire captured image. The optical system information is (z, f, d) indicating the zoom, f-number, and subject distance of the optical system 121. The weight information is read from and acquired from the memory 124.
[0065] Subsequently, in step S202, sharpener 123b feeds the input data into the CNN and generates an estimated image. The estimated image is obtained by sharpening the captured image, which is blurred due to aberrations and diffraction of the optical system 121 and pixel aperture degradation of the image sensor 122. This is done in the same manner as learning. Figure 1 The CNN shown is used to generate the estimated image. The learned weights are used in the CNN. For all possible (z, f, d) of the optical system 121, this embodiment uniformly learns weights for sharpening the captured image. Therefore, using the same weights, the CNN sharpens all captured images with various (z, f, d) blurs.
[0066] This embodiment can provide a shaping unit configured to shape blur included in a captured image, instead of a sharpener 123b configured to sharpen the captured image. The same applies in the second embodiment described later. In this embodiment, the image processing apparatus (image processor 123) includes an acquisition unit (acquirer 123a) and a generation unit (sharpener 123b or shaping unit). The acquisition unit is configured to acquire input data including the captured image and optical system information when capturing the captured image. The generation unit is configured to input the input data into a machine learning model and generate an estimated image obtained by sharpening the captured image or an image obtained by shaping blur included in the captured image. In this embodiment, the image processing apparatus (learning apparatus 101) includes an acquisition unit (acquirer 112), a generation unit (calculator 113), and an update unit (uploader 114). The acquisition unit is configured to acquire input data including training images and training optical system information. The generation unit is configured to feed input data into a machine learning model and generate an output image obtained by sharpening the training image or by shaping the blur included in the training image. The update unit is configured to update the weights of the machine learning model based on the output image and the standard answer image.
[0067] This embodiment can provide an image processing apparatus and an image processing system, both of which can sharpen captured images, including those blurred by optical systems, with high precision while suppressing the learning load of machine learning models and the amount of data stored.
[0068] Second Embodiment
[0069] Next, a description of the image processing system according to a second embodiment of the present invention will be given. This embodiment performs sharpening processing on the captured image, but the present invention can be applied to the reshaping of blurry images.
[0070] Figure 6 This is a block diagram illustrating the image processing system 300 in this embodiment. Figure 7 This is an external view showing the image processing system 300. The image processing system 300 includes a learning device (image processing device) 301, a lens device 302, an image capture device 303, an image estimation device (image processing device) 304, a display device 305, a recording medium 306, an output device 307, and a network 308. The learning device 301 includes a memory 301a, an acquirer (acquisition unit) 301b, a calculator (generation unit) 301c, and an updater (update unit) 301d, and is configured to learn weights for a machine learning model used to sharpen captured images. A detailed description of weight learning and the sharpening process of captured images using the weights will be given later.
[0071] The lens assembly 302 and the image capturing device 303 can be detachably attached and can connect different types of lens assemblies 302 and image capturing devices 303. Depending on the type, the lens assembly 302 takes on different states of focal length, F-number, and subject distance. Depending on the type, due to different lens configurations, the lens assembly 302 generates different shapes of blur caused by aberrations or diffraction. The image capturing device 303 includes an image sensor 303a. Different types of image capturing devices differ in the presence and type of optical low-pass filter (separation method, cutoff frequency, etc.), pixel pitch (including pixel aperture), color filter array, etc.
[0072] Image estimation device 304 includes a memory 304a, an acquirer (acquisition unit) 304b, and a sharpener (generation unit) 304c. Image estimation device 304 is configured to generate an estimated image by sharpening a captured image (or at least a portion thereof) with blur. The captured image is captured by image capture device 303, and the blur is caused by an optical system. Image estimation device 304 can be connected to multiple combinations of lens device 302 and image capture device 303. To sharpen the captured image, learning device 301 uses learned weights to employ a machine learning model. Learning device 301 and image estimation device 304 are connected via network 308, and image estimation device 304 reads information about the learned weights from learning device 301 during or before sharpening the captured image. The estimated image is output to at least one of display device 305, recording medium 306, or output device 307. Display device 305 is, for example, a liquid crystal display or a projector. Users can perform editing tasks while viewing the image being processed via display device 305. Recording medium 306 is, for example, semiconductor memory, hard disk, or a server on a network. Output device 307 is, for example, a printer.
[0073] Next, we will refer to Figure 4 The following describes the weight learning (learning phase) performed by the learning device 301. This embodiment uses a CNN as the machine learning model, but the invention can be similarly applied to other models. Descriptions similar to those in the first embodiment will be omitted.
[0074] First, in step S101, the acquirer 301b acquires one or more sets of standard answer images and training input data from the memory 301a. The memory 301a stores training images corresponding to multiple combinations of the lens device 302 and the image capture device 303. In this embodiment, weights for sharpening the captured image are uniformly learned for each type of lens device 302. Therefore, this embodiment first determines the type of lens device 302 used to learn the weights, and then acquires training images from a set of training images corresponding to that type. Each set of training images corresponding to a certain type of lens device 302 is a set of images in which each image has different states of blur such as zoom, F-number, subject distance, image height and orientation, optical low-pass filter, pixel pitch, color filter array, etc.
[0075] This embodiment uses... Figure 8 The provided CNN configuration is used to perform learning. Figure 8 This is a configuration diagram illustrating the machine learning model in this embodiment. Training input data 404 includes training image 401, state map 402, and position map 403. This step generates state map 402 and position map 403. State map 402 and position map 403 respectively indicate (Z, F, D) and (X, Y) corresponding to the blur applied to the acquired training image. (X, Y) is... Figure 9 The coordinates of the image plane shown are (horizontal and vertical), and correspond to the image height and orientation in the polar coordinate display.
[0076] In this embodiment, the coordinates (X, Y) include the optical axis of the lens device 302 as the origin position. Figure 9 The diagram illustrates the relationship between the image ring 501 of the lens assembly (optical system) 302, the first effective pixel region 502 and the second effective pixel region 503 of the image sensor 303a, and coordinates (X, Y). Since the image sensor 303a has different sizes depending on the type of image capturing device 303, the image capturing device 303 may have either a first effective pixel region 502 or a second effective pixel region 503, depending on its type. Among all types of image capturing devices 303 that can be connected to the lens assembly 302, the image capturing device 303 including the largest image sensor 303a has a first effective pixel region 502.
[0077] Position map 403 is generated based on (x, y) obtained by normalizing the coordinates (X, Y). Normalization is performed by dividing (X, Y) by the length (radius) 511 of the image circle 501 based on lens device 302. Alternatively, normalization can be performed by dividing X by the horizontal length 512 of the first effective pixel region from the origin and Y by the vertical length 513 of the first effective pixel region from the origin. If (X, Y) is normalized such that the edges of the captured image are always 1, then when images are captured by image sensors 303a of different sizes, (x, y) with the same value can indicate different locations (X, Y), complicating the correspondence between (x, y) values and blur. As a result, accuracy is reduced when sharpening the captured image. Position map 403 is a two-channel map with each value of (x, y) as a channel component. Polar coordinates can be used for position map 403, and the origin position is not limited to... Figure 9 .
[0078] State graph 402 is a three-channel graph with normalized values (z, f, d) as channel components. Each of the training image 401, state graph 402, and location graph 403 has the same number of elements per channel (number of pixels). The configuration of location graph 403 and state graph 402 is not limited to this embodiment. For example, as... Figure 10 The position map shown can divide the first effective pixel area 502 into multiple sub-regions, and values can be assigned to each sub-region, so that the position map can be represented by a single channel. The number of sub-regions and the method of assigning values are not limited to... Figure 10 As shown, (Z, F, D) can be divided into multiple regions in three-dimensional space around each axis and can be assigned values, and the state diagram can be represented by a single channel. Figure 8 The connection layer 411 in the middle connects the training image 401, the state map 402 and the position map 403 in a predetermined order in the channel direction, and generates training input data 404.
[0079] Subsequently Figure 4 In step S102, the calculator 301c inputs the training input data 404 into the CNN 412 and generates an output image 405. Then, in step S103, the updater 301d updates the CNN weights based on the error between the output image and the standard answer image. Then, in step S104, the updater 301d determines whether learning is complete. Information about the learned weights is stored in memory 301a.
[0080] Next, refer to Figure 11 This will provide a description of the sharpening (estimation phase) of the captured image performed by the image estimation device 304. Figure 11It is a flowchart related to the sharpening of the captured image (the generation of the estimated image). Figure 11 Each step in the process is primarily performed by either the acquirer 304b or the sharpener 304c of the image estimation device 304.
[0081] First, in step S301, the acquirer 304b acquires the captured image (or at least a portion thereof). Then, in step S302, the acquirer 304b acquires weight information corresponding to the captured image. In the second embodiment, weight information for each type of lens device 302 has been pre-read from memory 301a and stored in memory 304a. The acquirer 304b acquires from memory 304a the weight information corresponding to the type of lens device 302 used to capture the captured image. The type of lens device 302 used for capture is specified from, for example, metadata in the captured image file.
[0082] Subsequently, in step S303, the acquirer 304b generates a state map and a position map based on the captured image, and generates input data. The state map is generated based on the number of pixels in the captured image and information about the state (Z, F, D) of the lens device 302 when capturing the captured image. The number of elements (pixels) per channel in the state map is the same as in the captured image. (Z, F, D) are specified from, for example, metadata of the captured image. The position map is generated based on the number of pixels in the captured image and position information about each pixel in the captured image. The number of elements (pixels) per channel in the position map is the same as in the captured image. A normalized position map is generated by specifying the size of the effective pixel area of the image sensor 303a used to capture the captured image using metadata of the captured image, and by performing normalization on the position map using, for example, the length of the image circle of the lens device 302 specified using metadata. Input data is generated by connecting the captured image, state map, and position map in a predetermined order in the channel direction, and... Figure 8 Similar to that shown. In this embodiment, the order of steps S302 and S303 is irrelevant. The state map and position map can be generated when the captured image is captured and can be stored together with the captured image.
[0083] Then in step S304, with Figure 8 Similarly, as shown, sharpener 304c feeds the input data into the CNN and generates an estimated image. Figure 12 This is a graph describing the effect of sharpening on a captured image, and indicating the effect of sharpening the captured image at 90% of the image height for a specific zoom lens at a given (Z, F, D) setting. Figure 12In the diagram, the horizontal axis indicates spatial frequency, and the vertical axis indicates the measured value of SFR (Spatial Frequency Response). SFR corresponds to MTF (Modulation Transfer Function) in a certain cross-section. The Nyquist frequency of the image sensor used for capture is 76 [lp / mm]. Solid line 601 represents the captured image, and dashed lines 602, single-point chain 603, and two-point chain 604 represent the sharpened captured image (sharpened image) from the CNN. The sharpened image represented by dashed lines 602, single-point chain 603, and two-point chain 604 was acquired by the CNN, which has learned the sharpening of the captured image using learning data acquired by mixing all the blurs generated by the aberrations and diffraction of the zoom lens.
[0084] Dashed line 602 represents a sharpened image acquired during the estimation (or learning) phase by using only the captured image (or training image) as input data. Single-point chain line 603 represents a sharpened image acquired by using the captured image (or training image) and a position map as input data. Two-point chain line 604 represents a sharpened image acquired by using the captured image (or training image), a position map, and a state map as input data, with the input data having the same configuration as in this embodiment. The CNNs used to acquire each sharpened image represented by dashed line 602, single-point chain line 603, and two-point chain line 604 differ only in the number of channels in the first layer filter (because the number of channels in the input data is different), and are the same in filter size, number of filters, number of layers, etc. Therefore, when acquiring the sharpened images represented by dashed line 602, single-point chain line 603, and two-point chain line 604, the learning load and the amount of stored data (the data capacity of the CNN's weight information) are basically the same. Figure 12 The diagram shows that the configuration of this embodiment has a high sharpening effect, as indicated by the two-point chain line 604.
[0085] This embodiment can provide an image processing apparatus and an image processing system, both of which can sharpen captured images with high precision while suppressing the learning load and storage data volume in machine learning models and causing blurring caused by the optical system used.
[0086] Next, a description of the conditions that can enhance the effect of this embodiment will be given.
[0087] The input data may also include information indicating the presence and type of an optical low-pass filter used by the image capture device 303 to capture the captured image. This can improve the sharpening effect on the captured image. Depending on the type, optical low-pass filters have different separation methods (vertical two-point separation, horizontal two-point separation, four-point separation, etc.) and cutoff frequencies. A graph with values specifying the presence and type can be generated based on the number of pixels in the captured image and can be included in the input data.
[0088] The input data may also include information about manufacturing variations of the lens device 302 used to capture the captured image. This allows for precise sharpening of images with blur due to manufacturing variations. During the learning phase, training images can be generated by applying blur including the blur caused by manufacturing variations to the original image, and the machine learning model can learn from the training input data including information indicating manufacturing variations. This information indicating manufacturing variations could, for example, be a numerical value representing the degree to which the actual performance, including the manufacturing variation, is relative to the design performance. For example, the value is set to 0 when the actual performance equals the design performance, and changes negatively when the actual performance is inferior to the design performance, and positively when the actual performance is superior to the design performance. In the estimation phase, such as... Figure 13 As shown, for multiple partial regions in the captured image (or for each pixel), the input data may include a graph (manufacturing variation graph) with numerical values representing the degree of actual performance relative to design performance. Figure 13 This is an example diagram illustrating a manufacturing variation map. The manufacturing variation map is generated based on the number of pixels in the captured image. The manufacturing variation map can be obtained, for example, by measuring the actual performance including manufacturing errors of the lens assembly 302 during its manufacture. Manufacturing variations can be categorized into several types, such as overall image performance degradation (spherical aberration degradation) and performance variations due to orientation (single-sided defocus), and manufacturing variations can be represented by numerical values indicating the categories.
[0089] The input data may also include information about the distribution of distances in the subject space when the captured image is captured. This allows for precise sharpening of images with blur caused by performance variations due to defocus. Due to axial chromatic aberration and field curvature, a defocused subject plane may result in better optical performance than the focal plane. If this is not considered and a machine learning model that only learns about blur on the focal plane is used to sharpen the captured image, the perceptual resolution may be too high, potentially resulting in unnatural images. Therefore, in the learning phase, learning can be performed using training images where defocus blur is applied to the original image. At this point, the training input data also includes numerical values representing the amount of defocus (corresponding to distances in the subject space). For example, the focal plane is set to 0, and the direction of distance from the image capture device is set as negative and the direction of distance as positive. In the estimation phase, a defocus map of the captured image (indicating information about the distribution of distances in the subject space) can be obtained using a captured parallax image such as DFD (Depth from Defocus) and can be included in the input data. The defocus map can be generated based on the number of pixels in the captured image.
[0090] The input data may also include information about the pixel pitch or color filter array of the image sensor 303a used to capture the captured image. Thus, regardless of the type of image sensor 303a, images with blur can be accurately sharpened. The intensity of pixel aperture degradation and the size of the blur per pixel vary depending on the pixel pitch. The blur shape varies depending on the color components that make up the color filter array. The color components are, for example, RGB (red, green, blue) and the complementary CMY (cyan, magenta, yellow). When the training image and the captured image are undeveloped Bayer images, the blur shape may differ even among pixels in the same location, depending on the arrangement order of the color filter array. During the learning phase, the training input data may include information for specifying the pixel pitch and color filter array corresponding to the training image. For example, the training input data may include a graph with normalized values of the pixel pitch as elements. Normalization can use the largest pixel pitch among the pixel pitches of various types of image capture devices 303 as a divisor. The training input data may include a graph (color graph) with values representing the color components of the color filter array as elements. During the estimation phase, sharpening accuracy can be improved by including a color map in the input data. The color map can be generated based on the number of pixels in the captured image.
[0091] The input data may also include information indicating the presence and type of accessories of the lens device 302 (accessory information). Accessories are, for example, wide-angle converters, telephoto converters, close-up lenses, and wavelength cutoff filters. Because the shape of the blur varies depending on the type of accessory, the image including the influence of accessories can be sharpened when the input data includes accessory information. During the learning phase, the blur applied to the training image may include the influence of accessories, and the training input data may include information specifying accessories, such as a graph (accessory graph) with numerical values representing the presence and type of accessories as elements. During the estimation phase, the input data may include accessory information (graph). The accessory graph may be generated based on the number of pixels in the captured image.
[0092] Third Embodiment
[0093] Next, a description of an image processing system according to a third embodiment of the present invention will be given. This embodiment performs shaping processing for blur, but the present invention can be similarly applied to sharpening processing of captured images.
[0094] Figure 14 This is a block diagram showing the image processing system 700. Figure 15This is an external view of an image processing system 700. The image processing system 700 includes a learning device 701, a lens device (optical system) 702, an image capture device 703, a control device (first device) 704, an image estimation device (second device) 705, and networks 706 and 707. The learning device 701 and the image estimation device 705 are, for example, servers. The control device 704 is a user-operated device, such as a personal computer or mobile terminal. The learning device 701 includes a memory 701a, an acquirer (acquisition unit) 701b, a calculator (generation unit) 701c, and an updater (updating unit) 701d, and is configured to learn weights for a machine learning model used to shape the blur of a captured image captured using the lens device 702 and the image capture device 703. A detailed description of the learning process will be given later. In this embodiment, the blur to be shaped is caused by defocus, but the invention can be similarly applied to aberrations and diffraction.
[0095] Image capture device 703 includes image sensor 703a. Image sensor 703a is configured to perform photoelectric conversion on an optical image formed by lens device 702 and acquire a captured image. Lens device 702 and image capture device 703 are detachably attached and can be combined with various types of lens devices 702 and image capture devices 703. Control device 704 includes communicator 704a, memory 704b, and display 704c, and is configured to control the processing to be performed on the captured image acquired from the wired or wirelessly connected image capture device 703 according to user operation. Alternatively, control device 704 may have pre-stored the captured image captured by image capture device 703 in memory 704b and can read the captured image.
[0096] Image estimation apparatus 705 includes a communicator 705a, a memory 705b, an acquirer (acquisition unit) 705c, and a shaping unit (generation unit) 705d. Image estimation apparatus 705 is configured to perform shaping processing on a captured image in response to a request from a control device 704 connected via a network 707. Image estimation apparatus 705 is configured to acquire learned weight information from a learning device 701 connected via a network 706 during or before blur shaping, and use this information to shape the blur in the captured image. The estimated image after blur shaping is again sent to the control device 704, stored in the memory 704b, and displayed on a display 704c.
[0097] Next, we will refer to Figure 16 The weight learning (learning phase, method of creating a well-learned model) performed by the learning device 701 in this embodiment is described below. Figure 16 This is a flowchart related to weight learning. Figure 16 Each step in the process is primarily performed by the acquirer 701b, calculator 701c, or updater 701d of the learning device 701. This embodiment uses a GAN as the machine learning model, but this embodiment can be similarly applied to other models. The GAN includes: a generator configured to generate an output image with a shaped defocused blur; and a discriminator configured to distinguish between a standard answer image and the output image generated by the generator. In the learning process, as described in the first embodiment, a first learning process using only the generator is performed, and then a second learning process using both the generator and the discriminator is performed when the weights of the generating units have converged to a certain extent. In the following descriptions, descriptions of the same parts as those described in the first embodiment will be omitted.
[0098] First, in step S401, the acquirer 701b acquires one or more sets of standard answer images and training input data from the memory 701a. In this embodiment, the standard answer image and the training image are a pair of images with different defocus blur shapes. The training image is the image on which the defocus blur to be shaped is applied. For example, the blur to be shaped includes double-line blur, fragmentation due to vignetting, ring blur due to pupil occlusion of a catadioptric lens, and ring patterns caused by uneven scraping of the mold of an aspherical lens, etc. The standard answer image is the image on which the defocus blur has taken effect after the blur has been shaped. The shape of the defocus blur after the blur has been shaped can be determined according to the user's preference, such as Gaussian or disk (a circular distribution with flat intensity). In the third embodiment, the memory 701a stores multiple training images and standard answer images generated by applying blur to the original image. By applying blur with various amounts of defocus, multiple training images and standard answer images are generated, thereby ensuring accuracy when shaping the blur for various amounts of defocus. Because the image may not change before and after blur reshaping on the focal plane, training images with zero defocus and corresponding standard answer images will also be generated.
[0099] This embodiment uniformly learns weights for blur shaping for various types of lens devices 702. Therefore, the optical system information includes information specifying the type of lens device 702. The memory 701a provides training images with blur corresponding to the type of lens device 702 to be uniformly learned. The optical system information also includes information specifying the zoom, F-number, and subject distance of the lens device 702 corresponding to the blur applied to the training images.
[0100] Figure 17This is a configuration diagram illustrating the machine learning model (GAN). A lens state diagram 802 is generated based on the number of pixels in the training image 801. The lens state diagram 802 includes numerical values (L, z, f, d) as channel components for specifying the type of lens device 702 and the states of zoom, f-number, and subject distance, respectively. A connection layer 811 connects the training image 801 and the lens state diagram 802 in a predetermined order along the channel direction and generates training input data 803.
[0101] Subsequently Figure 16 In step S402, the calculator 701c inputs the training input data 803 into the generator 812 and generates an output image 804. The generator 812 is, for example, a CNN. Subsequently, in step S403, the updater 701d updates the weights of the generator 812 based on the error between the output image 804 and the standard answer image 805. The Euclidean norm of the difference at each pixel is used in the loss function.
[0102] Then, in step S404, the updater 701d determines whether the first learning has been completed. If the first learning has not been completed, the process returns to step S401. On the other hand, if the first learning has been completed, the process continues to step S405, and the updater 701d performs the second learning.
[0103] In step S405, the acquirer 701b acquires one or more sets of standard answer images 805 and training input data 803 from the memory 701a, as described in step S401. Then, in step S406, the calculator 701c inputs the training input data 803 into the generator 812 and generates an output image 804, as described in step S402. Then, in step S407, the updater 701d updates the weights of the discriminator 813 based on the output image 804 and the standard answer images 805. The discriminator 813 determines whether the input image is a fake image generated by the generator 812 or a real image that serves as the standard answer image 805. The output image 804 or the standard answer image 805 is input into the discriminator 813, and a discrimination label (fake or real) is generated. The updater 701d updates the weights of the discriminator 813 based on the error between the discrimination label and the real answer label (output image 804 is fake and standard answer image 805 is real). This embodiment uses S-shaped cross-entropy as the loss function, but other loss functions can be used in this invention.
[0104] Subsequently, in step S408, updater 701d updates the weights of generator 812 based on output image 804 and standard answer image 805. The loss function is a weighted sum of the following two terms of the Euclidean norm from step S403. The first term, called Content Loss, is obtained by converting output image 804 and standard answer image 805 into feature maps and taking the Euclidean norm of the difference for each element. By adding the difference in the feature maps to the loss function, the more abstract properties of output image 804 can be made closer to standard answer image 805. The second term, called Adversarial Loss, is the S-shaped cross-entropy of the discriminant label obtained by inputting output image 804 into discriminator 813. As discriminator 813 learns to recognize the subject as real, output image 804 that looks more like standard answer image 805 can be obtained.
[0105] Subsequently, in step S409, the updater 701d determines whether the second learning has been completed. As described in step S404, if the second learning has not been completed, the process returns to step S405. On the other hand, when the second learning has been completed, the updater 701d stores the learned weight information of the generator 812 in the memory 701a.
[0106] Next, we will refer to Figure 18 This section describes the shaping of the blur (estimation stage) performed by the control device 704 and the image estimation device 705. Figure 18 This is a flowchart related to blurry reshaping (the generation of estimated images). Figure 18 Each step in the process is mainly performed by each unit of the control device 704 or the image estimation device 705.
[0107] First, in step S501, the communicator 704a of the control device 704 transmits the captured image and a request (processing request) for performing blur-correcting processing to the image estimation device 705. Then, in step S601, the communicator 705a of the image estimation device 705 receives and acquires the captured image and processing request transmitted from the control device 704. Next, in step S602, the acquirer 705c of the image estimation device 705 retrieves information about the learned weights corresponding to the captured image from the memory 705b. The weight information has been read from the memory 701a and pre-stored in the memory 705b.
[0108] Subsequently, in step S603, the acquirer 705c acquires the optical system information of the captured image and generates input data. From the metadata of the captured image, the acquirer 705c obtains information about the type of the specified lens device (optical system) 702 and its zoom, F-number, and subject distance status when capturing the captured image, and generates a lens status diagram, such as... Figure 17 As shown in the diagram, input data is generated by concatenating the captured image and the lens state map in a predetermined order along the channel direction. This embodiment may concatenate the captured image or a feature map and state map based on the captured image along the channel direction before or during input to the machine learning model.
[0109] Subsequently, in step S604, the shaping unit 705d inputs the input data to the generator and generates an estimated image with a shaped blur. The generator uses weight information. Then, in step S605, the communicator 705a transmits the estimated image to the control device 704. Then, in step S502, the communicator 704a of the control device 704 acquires the estimated image transmitted from the image estimation device 705.
[0110] This embodiment can provide a sharpener configured to sharpen a captured image, replacing the shaping unit 705d used to shape blur included in the captured image. In this embodiment, the image processing system 700 includes a first device (control device 704) and a second device (image estimation device 705) capable of communicating with each other. The first device includes a transmission unit (communicator 704a) configured to send a request to the second device for performing processing on the captured image. The second device includes a receiving unit (communicator 705a), an acquisition unit (acquirer 705c), and a generation unit (shaping unit 705d or sharpener). The receiving unit is configured to receive a request. The acquisition unit is configured to acquire input data including the captured image and optical system information at the time of capturing the captured image. The generation unit is configured to input the input data into a machine learning model based on the request and generate an estimated image by sharpening the captured image or by shaping blur included in the captured image.
[0111] This embodiment can provide an image processing apparatus and an image processing system, both of which can reshape blur caused by the optical system of the captured image with high precision while suppressing the learning load of the machine learning model and the amount of data stored.
[0112] This embodiment has described an example in which the communication unit 704a of the control device 704 sends the captured image along with a processing request for the captured image to the image estimation device 705 in step S501. However, the control device 704 may not necessarily transmit the captured image. For example, the control device may only send a processing request for the captured image to the image estimation device 705, and the image estimation device 705 may be configured to retrieve the captured image corresponding to the request from another image storage server or the like in response to the request.
[0113] Other embodiments
[0114] One or more embodiments of the present invention can also be implemented by a computer that reads and executes computer-executable instructions (e.g., one or more programs) recorded on a storage medium (also more fully referred to as a "non-transitory computer-readable storage medium") to perform the functions of one or more of the above embodiments and / or includes one or more circuits (e.g., application-specific integrated circuits (ASICs)) for performing the functions of one or more of the above embodiments, and by a method performed by a computer of the system or device, for example by reading and executing computer-executable instructions from the storage medium to perform the functions of one or more of the above embodiments and / or controlling one or more circuits to perform the functions of one or more of the above embodiments. The computer may include one or more processors (e.g., a central processing unit (CPU), a microprocessor unit (MPU)) and may include separate computers or networks of separate processors to read and execute computer-executable instructions. The computer-executable instructions may be provided to the computer, for example, from a network or storage medium. The storage medium may include one or more of, for example, a hard disk, random access memory (RAM), read-only memory (ROM), storage devices for distributed computing systems, optical discs (such as CDs, DVDs, or Blu-ray discs (BD)™), flash memory devices, memory cards, etc.
[0115] Other embodiments
[0116] The embodiments of the present invention can also be implemented by providing software (programs) that perform the functions of the above embodiments to a system or device via a network or various storage media, and the computer or central processing unit (CPU) or microprocessor unit (MPU) of the system or device reads out and executes the program.
[0117] According to embodiments, the present invention can provide an image processing method, an image processing apparatus, an image processing system, and a method for manufacturing learned weights, all of which can sharpen captured images or reshape blur in captured images with high precision while suppressing the learning load and storage data volume of machine learning models.
[0118] Although the invention has been described with reference to exemplary embodiments, it should be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the appended claims should be given the broadest interpretation to cover all such modifications and equivalent structures and functions.
Claims
1. An image processing method, comprising: The first step is to acquire input data including the captured image and optical system information related to the state of the optical system used to capture the captured image; as well as The second step involves inputting the input data into a machine learning model and generating an estimated image obtained by sharpening the captured image or by shaping the blur included in the captured image.
2. The image processing method according to claim 1, characterized in that, The state of the optical system includes at least one of the states of zoom, F-number, and subject distance.
3. The image processing method according to claim 1, characterized in that, The second step uses a machine learning model with the same weights for the first captured image captured in the first state of the optical system and the second captured image captured in the second state of the optical system, which is different from the first state.
4. The image processing method according to claim 1, characterized in that, The optical system information includes a value indicating at least one of the states of the optical system: zoom, f-number, and subject distance. The values are normalized based on the range that the optical system may take for at least one of the states of zoom, F-number, and subject distance.
5. The image processing method according to claim 1, characterized in that, The input data includes a state diagram indicating the state of the optical system, and The state diagram is generated based on the number of pixels in the captured image and the optical system information.
6. The image processing method according to claim 5, characterized in that, Each element of the same channel in the state diagram has the same value.
7. The image processing method according to claim 1, characterized in that, The input data also includes positional information about each pixel of the captured image.
8. The image processing method according to claim 7, characterized in that, The location information has a value normalized according to the length of the image circle based on the optical system.
9. The image processing method according to claim 1, characterized in that, The optical system information includes information about the type of the optical system.
10. The image processing method according to claim 1, characterized in that, The optical system information includes information about the presence or absence of an optical low-pass filter or the type of the optical low-pass filter.
11. The image processing method according to claim 1, characterized in that, The optical system information includes information about the presence or type of accessories of the optical system.
12. The image processing method according to claim 1, characterized in that, The input data also includes information about the distribution of distances to the subject space when the captured image is captured.
13. The image processing method according to claims 1 to 12, characterized in that, The input data includes information about the pixel pitch or color filter array of the image sensor used to capture the captured image.
14. A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the image processing method according to any one of claims 1 to 13.
15. An image processing apparatus, comprising: The acquisition unit is configured to acquire input data including a captured image and optical system information relating to the state of the optical system used to capture the captured image; as well as The generation unit is configured to input the input data into a machine learning model and generate an estimated image obtained by sharpening the captured image or by shaping the blur included in the captured image.
16. An image processing system, comprising a first device and a second device, characterized in that, The first device includes The sending unit is configured to send a request to the second device relating to the execution of processing of the captured image, and The second device includes: A receiving unit is configured to receive the request; The acquisition unit is configured to acquire input data including the captured image and optical system information relating to the state of the optical system used to capture the captured image; and The generation unit is configured to input the input data into a machine learning model and generate an estimated image obtained by sharpening the captured image or by shaping the blur included in the captured image.
17. An image processing method, comprising: The first step is to acquire input data including training images and optical system information related to the state of the optical system corresponding to the training images; The second step is to input the input data into the machine learning model and generate an output image obtained by sharpening the training image or by shaping the blur included in the training image. as well as The third step is to update the weights of the machine learning model based on the output image and the standard answer image.
18. A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the image processing method according to claim 17.
19. A method for generating learned weights, comprising: The first step is to acquire input data including training images and optical system information related to the state of the optical system corresponding to the training images; The second step is to input the input data into the machine learning model and generate an output image obtained by sharpening the training image or by shaping the blur included in the training image. as well as The third step is to update the weights of the machine learning model based on the output image and the standard answer image.
20. An image processing apparatus, comprising: The acquisition unit is configured to acquire input data including training images and optical system information relating to the state of the optical system corresponding to the training images; The generation unit is configured to input the input data into a machine learning model and generate an output image obtained by sharpening the training image or by shaping the blur included in the training image. as well as The update unit is configured to update the weights of the machine learning model based on the output image and the standard answer image.