Image processing device, image processing method, and image processing program

The image processing apparatus separates direct light from scattered light using spatial frequency analysis and mask processing to enhance image clarity in scattering environments, addressing the challenge of reduced visibility in fog, smoke, or dust.

WO2026126670A1PCT designated stage Publication Date: 2026-06-18NAT UNIV CORP KYUSHU INST OF TECH (JP)

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NAT UNIV CORP KYUSHU INST OF TECH (JP)
Filing Date
2025-10-27
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Existing image capturing technologies struggle to effectively visualize objects in environments with light-scattering media like fog, smoke, or dust, leading to reduced visibility and increased accident risks for autonomous vehicles and drones.

Method used

An image processing apparatus and method that utilizes spatial frequency analysis and mask processing to separate direct light from scattered light, employing Fourier transforms and inverse transforms to generate clear images by attenuating the effects of scattering media.

🎯Benefits of technology

The solution enhances image clarity by reducing the influence of scattering media, allowing for clearer visualization of objects and potentially improving safety in environments with fog, smoke, or dust.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025037681_18062026_PF_FP_ABST
    Figure JP2025037681_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention performs image processing on an image obtained by imaging a subject in an environment in which a scattering medium such as fog, smoke, or dust is present, thereby generating an image in which the influence of the scattering medium is reduced. Provided is an image processing device 100 for performing image processing on input image data obtained by imaging a subject in an environment where a scattering medium that scatters light is present, the image processing device comprising: a conversion unit 130 for extracting a spatial frequency component of the input image data; a mask generation unit 140 for generating a mask image for attenuating the spatial frequency component of scattered light generated by the scattering medium; a mask processing unit 150 which uses the mask image to perform mask processing on the spatial frequency component of the input image data output by the conversion unit 130; and an inverse conversion unit 160 which inversely converts the spatial frequency component of the input image data output by the mask processing unit 150, and generates image data.
Need to check novelty before this filing date? Find Prior Art

Description

Image processing device, image processing method, and image processing program 【0001】 This disclosure relates to an image processing apparatus, an image processing method, and an image processing program capable of outputting image data with reduced influence from light-scattering media, when image data is captured by a camera in an environment where light-scattering media such as fog, smoke, or dust are present. 【0002】 When light-scattering media such as fog, smoke, or dust are present, visibility is reduced, increasing the accident rate. This is especially true when vehicles are driving autonomously or when drones are flying, as these vehicles and drones use cameras and LiDAR sensors to perceive their surroundings. However, when scattering media such as fog or smoke are present, it becomes difficult to perceive the surrounding area, leading to accidents. 【0003】 Non-Patent Document 1 describes a technique for removing the effects of fog and smoke in environments where fog or smoke is present. Non-Patent Document 1 discloses a method for generating an image with reduced effects from a scattering medium by removing photons scattered by the scattering medium when a camera captures an image by detecting photons reflected from an object. It is assumed that the scattering medium follows a Gaussian distribution, and the scattering medium, such as fog, is estimated using the Maximum Likelihood Estimation method. By removing the image caused by the estimated scattering medium from the captured image, an image with the effects of fog and other media removed is generated. Because the scattering medium such as fog has been removed, the overall brightness of the generated image becomes darker. To restore the brightness information of the image, a Photon Counting Detection Model utilizing the Poisson distribution is used. The probability of photon existence is estimated from the elemental image, and photons are statistically generated using the Poisson distribution to reconstruct the image and restore the brightness. 【0004】Myungjin Cho and Bahram Javidi, Peplography - a passive 3D photon counting imaging through scattering media, Optics Letters, Vol. 41, Issue 22, p.5401-5404 (2016) 【0005】 This disclosure provides an image processing apparatus, an image processing method, and an image processing program that visualize an object by reducing the influence of light-scattering media such as fog, smoke, and dust in environments where such media are present. 【0006】 The image processing apparatus of the present disclosure is an image processing apparatus that performs image processing on input image data captured in an environment in which a light-scattering medium is present, and comprises: a conversion unit that extracts the spatial frequency components of the input image data; a mask generation unit that generates a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium; a mask processing unit that performs mask processing on the spatial frequency components of the input image data output by the conversion unit using the mask image; and an inverse conversion unit that inversely converts the spatial frequency components of the input image data output by the mask processing unit and generates image data. 【0007】 The image processing method disclosed herein is an image processing method for performing image processing on input image data captured in an environment in which a light-scattering medium is present, and comprises: a conversion step for extracting the spatial frequency components of the input image data; a mask generation step for generating a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium; a mask processing step for performing mask processing on the spatial frequency components of the input image data output by the conversion step using the mask image; and an inverse transformation step for inversely transforming the spatial frequency components of the input image data output by the mask processing step and generating image data. 【0008】The image processing apparatus disclosed herein is an image processing apparatus that performs image processing on moving image data captured of a subject in an environment where a light-scattering medium is present, and comprises: a histogram transformation unit that performs a transformation on each of the multiple frames of image data constituting the moving image data to broaden the distribution of histograms which are biased to a narrow area to a wider area; a transformation unit that extracts the spatial frequency components of the image data output by the histogram transformation unit; an aggregation unit that superimposes the spatial frequency components of the multiple frames of image data output by the transformation unit; and an inverse transformation unit that inversely transforms the spatial frequency components of the image data output by the aggregation unit to generate image data. 【0009】 The image processing method disclosed herein is an image processing method for performing image processing on moving image data captured in an environment where a light-scattering medium is present, and comprises: a histogram transformation step that performs a transformation on each of the multiple frames of image data constituting the moving image data to broaden the distribution of histograms which are biased to a narrow area to a wider area; a transformation step that extracts the spatial frequency components of the image data output by the histogram transformation step; an aggregation step that superimposes the spatial frequency components of the multiple frames of image data output by the transformation step; and an inverse transformation step that inversely transforms the spatial frequency components of the image data output by the aggregation step to generate image data. 【0010】 According to this disclosure, in environments where scattering media such as fog, smoke, and dust are present, it is possible to generate images with reduced influence from scattering media by performing image processing on images of a subject. 【0011】A block diagram showing the configuration of the image processing apparatus in Embodiment 1. A flowchart showing the operation of the image processing apparatus in Embodiment 1. A flowchart showing the operation of the mask preprocessing unit in Embodiment 1. A figure showing the frequency components of the two-dimensional Rice distribution and the two-dimensional Rayleigh distribution in Embodiment 1. A figure showing the region where scattered light has a large effect in the frequency components of the two-dimensional Rice distribution and the two-dimensional Rayleigh distribution in Embodiment 1. A figure showing the mask image in Embodiment 1. A figure showing the change in the mask coefficient due to the change in the horizontal frequency of the mask image in Embodiment 1. A figure showing the change in the mask coefficient due to the change in the vertical frequency of the mask image in Embodiment 1. A figure showing image data before and after image processing in Embodiment 1. A figure showing another example of image data before and after image processing in Embodiment 1. A block diagram showing the configuration of the image processing apparatus in Embodiment 2. A block diagram showing another configuration of the image processing apparatus in Embodiment 2. Embodiment A block diagram showing other configurations of the image processing device in Embodiment 2. A block diagram showing the configuration of the image processing device in Embodiment 3. A flowchart showing the operation of the image processing device in Embodiment 3. A figure showing the image change due to histogram transformation and Fourier transform in Embodiment 3. A figure showing the image change due to superposition, inverse Fourier transform and inverse histogram transformation in Embodiment 3. A figure explaining the histogram transformation in Embodiment 3. A block diagram showing the configuration of the image processing device in Embodiment 4. A flowchart showing the operation of the image processing device in Embodiment 4. A figure showing the image change due to histogram transformation and Fourier transform and a mask image in Embodiment 4. A figure showing the image change due to mask processing, superposition, inverse Fourier transform and inverse histogram transformation in Embodiment 4. A block diagram showing the configuration of the image processing device in Embodiment 5. A flowchart showing the operation of the image processing device in Embodiment 5. 【0012】 The image processing apparatus, image processing method, and image processing program in this embodiment will be described below. However, the present invention is not limited to the following embodiments. 【0013】(Embodiment 1) The inventor applied a concept used in communication theory to reduce the influence of scattering media such as fog, smoke, or dust on images taken in an environment where such scattering media are present. In communication theory, when explaining communication between a receiving end and a transmitting end, if there is no direct wave (line of sight wave) from the transmitting end to the receiving end, it follows a Rayleigh distribution, and if a direct wave is present, it follows a Rice distribution. In this case, assuming that the light reflected from an object is a wave, the object corresponds to the transmitting end, the light reflected from the object corresponds to the electromagnetic wave emitted by the transmitting end, and the receiving end that receives the electromagnetic wave corresponds to the camera that receives the light. In this case, if the electromagnetic wave scattered by buildings and external factors is replaced with light scattered by a scattering media such as fog, the concept of communication theory can be applied to images taken in an environment where a scattering media such as fog is present. The light received by a camera consists of scattered light from a scattering medium such as fog, and direct light that reaches the camera directly without being scattered by the scattering medium. By removing the scattered light from the light received by the camera and extracting only the direct light, it is possible to generate image data that visualizes the subject by removing the influence of the scattering medium. The details of an embodiment based on the above concept will be described below. 【0014】 The image processing apparatus, image processing method, and image processing program in this embodiment are based on Adaptive Removal via Mask through Scatter (ARMS). 【0015】[1-1. Configuration] Figure 1 is a block diagram showing the configuration of the image processing device (ARMS) in this embodiment. The image processing device 100 includes an input unit 110, a mask pre-processing unit 120, a conversion unit 130, a mask generation unit 140, a mask processing unit 150, an inverse conversion unit 160, and an output unit 170. In an environment where a light-scattering medium such as fog, smoke, or dust is present, an imaging device 10 such as a camera is used to photograph a subject. The image processing device 100 performs image processing on the image data covered with fog, etc., input from the imaging device 10, and outputs image data with the effects of fog, etc. reduced. In Figure 1, the imaging device 10 and the image processing device 100 are shown as separate devices, but the image processing device 100 may be implemented inside the imaging device such as a camera, and implemented as an internal module of the imaging device. 【0016】 Image data of an object covered in fog or the like, captured by the imaging device 10, is input to the image processing device 100 via the input unit 110. The input unit 110 outputs the input image data to the mask preprocessing unit 120 and the conversion unit 130. The mask preprocessing unit 120 extracts parameters that characterize the scattered light from the scattering medium in the image data. Based on the extracted parameters, the mask preprocessing unit 120 generates a two-dimensional Rayleigh distribution representing the scattered light from the scattering medium and a two-dimensional Rice distribution representing both the scattered light from the scattering medium and the direct light that arrives without scattering. 【0017】 The conversion unit 130 performs a Fourier transform on the image data input from the input unit 110 and the two-dimensional Rice distribution and two-dimensional Rayleigh distribution generated by the mask preprocessing unit 120 to extract frequency components. The conversion unit 130 uses the Fast Fourier Transform as an implementation method for the Fourier transform. 【0018】 The mask generation unit 140 uses the spatial frequency domain information of the two-dimensional Rice distribution and the spatial frequency domain information of the two-dimensional Rayleigh distribution output by the conversion unit 130 to generate a mask image for removing scattered light from the scattering medium. 【0019】The mask processing unit 150 uses the mask image generated by the mask generation unit 140 to perform mask processing on the spatial frequency components of the image data output by the conversion unit 130, thereby generating spatial frequency components of the image data from which the frequency components of scattered light from the scattering medium have been removed. 【0020】 The inverse transform unit 160 performs an inverse Fourier transform on the spatial frequency components of the image data, which have had the effects of scattered light output by the mask processing unit 150 removed, to generate image data. The inverse transform unit 160 uses the inverse fast Fourier transform as an implementation method for the inverse Fourier transform. 【0021】 The output unit 170 outputs the image data generated by the inverse conversion unit 160. The output format includes display on a display device such as a screen, or recording to a storage medium such as a hard disk or a memory card. The output unit 170 displays the image data on a display device or records it to a memory card, etc. 【0022】 The image processing device 100 consists of memory and a processor. In Figure 1, each block shown inside the image processing device 100 is implemented by software in which the memory and processor work together. It may also be implemented as a program that runs on a computer. It is not necessary to implement all of each block in software; some may be implemented in hardware. 【0023】 [1-2. Operation] The operation of the image processing device 100 will be explained using the flowchart shown in Figure 2. The input unit 110 receives image data from the imaging device 10, which is taken in an environment where a light-scattering medium such as fog is present (step S201). The input unit 110 outputs the input image data to the mask preprocessing unit 120 and the conversion unit 130. 【0024】The mask preprocessing unit 120 extracts parameters representing the effect of scattered light from the scattering medium for each R, G, and B channel that constitutes the input image data, and extracts a two-dimensional Rayleigh distribution and a two-dimensional Rice distribution (step S202). Light has both particle and wave properties, and from a wave perspective, the light from the subject consists of scattered light from the scattering medium and direct light that reaches the imaging device 10 directly without scattering by the scattering medium. Scattered light can be assumed to undergo Rayleigh scattering as it passes through the scattering medium and can be predicted by the Rayleigh distribution. On the other hand, the Rice distribution is a distribution that assumes there is a wave in Rayleigh scattering. By considering the direct wave as the direct light from the subject, the light from the subject, which consists of scattered light from the scattering medium and direct light, can be predicted by the Rice distribution. By removing the scattered light due to Rayleigh scattering from the Rice distribution, only the direct light from the subject can be left. 【0025】 The details of the process in step S202 performed by the mask preprocessing unit 120 will be explained using the flowchart shown in Figure 3. The mask preprocessing unit 120 processes the average value μ of the channel data of the input image. image The mean value μ is calculated (step S301). If the image data is M x N pixels and the data for each pixel of the channel is x(i,j) (i=1,2,...,M) (j=1,2,...,N), then the mean value μ is calculated. image It is calculated using the following formula. 【0026】 The mask preprocessing unit 120 processes the standard deviation σ of the image channel data. image It is described in (step S302). Standard deviation σ image It is calculated using the following formula. 【0027】 The mask preprocessing unit 120 calculates the standard deviation σ of the channel data of the image. image and mean μ image Using this, the parameters used for the two-dimensional Rayleigh distribution and the two-dimensional Rice distribution are calculated using the following formula (step S303). 【0028】 Standard deviation σ of the image channel dataimage It can be assumed that the smaller the value, the more scattering occurs due to the scattering medium. Therefore, the parameter value representing the amount of scattering used in the Rayleigh distribution is σ with 255, which is the peak value of the channel data represented by 8 bits, as the numerator image and is obtained by taking the reciprocal of it. 【0029】 The average value of the channel data of the image becomes higher as the scattering by the scattering medium increases and lower as the scattering decreases because scattered light is added to the direct light when there is a scattering medium. Therefore, the parameter value representing the intensity of the direct wave in the Rice distribution is μ with 255, which is the peak value of the channel data represented by 8 bits, as the numerator image and is obtained by taking the reciprocal of it. 【0030】 The mask preprocessing unit 120 generates a two-dimensional Rayleigh distribution using the calculated parameter value (step S304). The generated two-dimensional Rayleigh distribution is represented by the following formula. Here, f Rayleigh () is the probability density function, and r represents the amplitude. 【0031】 The mask preprocessing unit 120 generates a two-dimensional Rice distribution using the calculated parameter value (step S305). The generated two-dimensional Rice distribution is represented by the following formula. Here, f Rician () is the probability density function, r is the amplitude, and I 0 () represents the zero-order Bessel function of the first kind. 【0032】 The mask preprocessing unit 120 outputs the three generated two-dimensional Rayleigh distributions and two-dimensional Rice distributions for each channel to the conversion unit 130. 【0033】Returning to the flowchart shown in FIG. 2, the overall process will be described. The conversion unit 130 extracts the frequency components of the data for each of the R, G, and B channels of the image input from the input unit 110 (step S203). The conversion unit 130 extracts the frequency components of the channel data of the image by Fourier transform. The information in the spatial domain that constitutes the channel data of the image is composed of different frequency components. The conversion unit 130 converts, by Fourier transform, the information in the spatial domain into information in the spatial frequency domain, which is a space indicating what frequency components it is composed of. The conversion unit 130 extracts, as information in the spatial frequency domain, information having two frequency components in the horizontal and vertical directions. In the spatial frequency domain, near the center, information of low-frequency components is arranged, and as it moves away from the center, information of high-frequency components is arranged. 【0034】 The conversion unit 130 performs Fourier transform on the three two-dimensional Rayleigh distributions and two-dimensional Rice distributions for each channel output by the mask preprocessing unit 120, and extracts the frequency components of each distribution (step S204). FIG. 4(a) shows the information in the spatial frequency domain of the two-dimensional Rice distribution of one channel output by the conversion unit 130, and FIG. 4(b) shows the information in the spatial frequency domain of the two-dimensional Rayleigh distribution of the same channel. In FIGS. 4(a) and 4(b), the darker parts have more frequency components, and the lighter parts have fewer frequency components. Comparing FIGS. 4(a) and 4(b), in FIG. 4(a), there are regions with more frequency components in a cross shape centered around near frequency zero in both the horizontal and vertical directions, while in FIG. 4(b), the frequency components are the most at the center point where the frequencies in both the horizontal and vertical directions are zero, and the frequency components decrease as it moves away from the center point. 【0035】 The conversion unit 130 outputs the information in the spatial frequency domain of the data for each channel of the image to the mask processing unit 150, and outputs the information in the spatial frequency domain of the two-dimensional Rice distribution for each channel and the information in the spatial frequency domain of the two-dimensional Rayleigh distribution for each channel to the mask generation unit 140. 【0036】The mask generation unit 140 generates a mask image to remove scattered light from the scattering medium from the spatial frequency domain information of the two-dimensional Rice distribution for each channel and the spatial frequency domain information of the two-dimensional Rayleigh distribution for each channel (step S205). Figure 5 shows the region with a large number of frequency components common to both the two-dimensional Rayleigh distribution for one channel and the two-dimensional Rice distribution for the same channel shown in Figure 4, with a border. Figure 5(a) shows the spatial frequency domain information of the two-dimensional Rice distribution, and Figure 5(b) shows the spatial frequency domain information of the two-dimensional Rayleigh distribution. In Figures 5(a) and 5(b), the region with a large number of spatial frequency components common to both, indicated by the border, can be assumed to be scattered light from the scattering medium that is common to both distributions. The mask generation unit 140 can remove the spatial frequency components due to scattered light common to both by subtracting the spatial frequency domain information of the two-dimensional Rayleigh distribution from the spatial frequency domain information of the two-dimensional Rice distribution. In Figure 5(a), there is a region that is present but not in Figure 5(b). In Figure 5(a), there is a cross-shaped region centered on each axis. This region contains information about direct light from the subject that is present in the two-dimensional Rice distribution but not in the two-dimensional Rayleigh distribution. By preserving this region, the subject can be visualized more clearly. 【0037】The mask images of the channels generated by the mask generation unit 140 are shown in FIG. 6. The mask generation unit 140 generates three mask images for each channel. The mask image has the same amount of information as the information in the spatial frequency domain of the channel data of the image, and has a coefficient (hereinafter referred to as a mask coefficient) that attenuates the spatial frequency components of the channel data of the corresponding image for each frequency in the vertical and horizontal directions. The mask coefficient takes a value from 0 to 1. When mask processing using the mask image is executed, the spatial frequency components of the image become values obtained by multiplying the original spatial frequency components by the mask coefficient for each frequency in the vertical and horizontal directions. For example, when the mask coefficient is 1, the spatial frequency components of the image remain the same value, and when the mask coefficient is 0, the spatial frequency components of the image become 0. Thus, in a region where the mask coefficient is low, the spatial frequency components of the image are reduced by the mask processing, and the information in the spatial region corresponding to the spatial frequency components (the part corresponding to scattered light) is suppressed. On the other hand, in a region where the mask coefficient is high, the spatial frequency components of the image are maintained, and the information in the spatial region corresponding to the spatial frequency domain (the part corresponding to the subject) is emphasized. In FIG. 6, the closer the mask image is to white, the larger the value of the mask coefficient, and the frequency components are passed, and the closer it is to black, the more the frequency components are attenuated. 【0038】 As shown in FIG. 6, the mask image includes a passing region that transmits a signal and an attenuation region that attenuates the signal. The passing region forms a cross shape extending in the horizontal and vertical directions from the center of the frequency space, and the attenuation region is located outside the passing region. Also, in a region where either one of the vertical and horizontal frequencies constituting the spatial frequency components is a high frequency, the mask image has a region that retains the spatial frequency components of the corresponding region of the input image data. Further, in a region where either one of the vertical and horizontal frequencies constituting the spatial frequency components is near zero and the other is a high frequency, the mask image retains the spatial frequency components of the corresponding region of the input image data. Furthermore, in a region where either one of the vertical and horizontal frequencies constituting the spatial frequency components is near zero and the other is in all regions from low frequency to high frequency, the mask image retains the spatial frequency components of the corresponding region of the input image data. 【0039】The shape of the mask image in more detail will be explained using Figures 7 and 8. Figure 7(a) is a mask image, where the x-direction represents the horizontal frequency components and the y-direction represents the vertical frequency components. The mask image has a mask coefficient value corresponding to its position in the z-direction, which is the height direction. The closer to white, the larger the mask coefficient value, and the closer to black, the smaller the mask coefficient value. In Figure 7(a), Figure 7(b) shows the mask coefficient value at position (1) parallel to the x-direction, Figure 7(c) shows the mask coefficient value at position (2), and Figure 7(d) shows the mask coefficient value at position (3). Looking at Figures 7(b) to 7(d), at each position, the absolute value of the horizontal frequency is the same, the mask coefficient is the same, and the shape is symmetrical in the positive and negative directions around the position where the horizontal frequency is zero. Furthermore, the rate of change of the mask coefficient in response to changes in the horizontal frequency (the rate at which the value of the mask coefficient changes when the horizontal frequency changes by 1 while the vertical frequency remains the same) decreases as the absolute value of the frequency increases, being large in the low-frequency region and small in the high-frequency region. More specifically, in all of Figures 7(b) to 7(d), when the horizontal frequency changes, the mask coefficient changes significantly near zero, while the change in the mask coefficient is small in the high-frequency region. 【0040】 In comparison with Figure 7(b), in Figures 7(c) and 7(d), which have vertical frequency components, the horizontal frequency region where the lowest mask coefficient value is obtained in the vertical frequency region is the highest frequency region, but there is also a frequency region near zero where the mask coefficient value is almost the same as that of the horizontal region. 【0041】 Comparing Figure 7(b), where the vertical frequency component is zero, to Figure 7(c), which has a vertical frequency component, and to Figure 7(d), which has a higher vertical frequency component, we find that the frequencies at which the highest mask coefficient is taken in the vertical frequency are two frequency regions where the horizontal frequency is near zero in Figure 7(c), and two frequency regions where the horizontal frequency is even further from zero than in Figure 7(c), in Figure 7(d). 【0042】In comparison with Figure 7(b), Figures 7(c) and 7(d), which have vertical frequency components, appear to have a certain range for the mask coefficient value in the horizontal frequency domain. This is because, when taking the difference between the spatial frequency domain information of the original two-dimensional Rice distribution and the two-dimensional Rayleigh distribution, the high-frequency domain is susceptible to noise, causing the mask coefficient value to fluctuate finely. When the horizontal and vertical frequencies are specified, the value of the mask coefficient is uniquely determined. In both the horizontal and vertical directions, the higher the frequency domain, the finer the fluctuations in the mask coefficient due to noise, and the rate of change of the mask coefficient repeatedly reverses between positive and negative as the frequency changes. 【0043】 Next, we will use Figure 8 to explain the change in the mask coefficient with respect to changes in the vertical frequency. In Figure 8(a), Figure 8(b) shows the value of the mask coefficient at position (1) parallel to the y direction, Figure 8(c) shows the value of the mask coefficient at position (2), and Figure 8(d) shows the value of the mask coefficient at position (3). Looking at Figures 8(b) to 8(d), at each position, the mask coefficient is the same where the absolute value of the vertical frequency is the same, and the shape is symmetrical in the positive and negative directions around the position where the vertical frequency is zero. Furthermore, the rate of change of the mask coefficient with respect to changes in the vertical frequency (the rate at which the value of the mask coefficient changes when the horizontal frequency is the same and the vertical frequency changes by 1) decreases as the absolute value of the frequency increases, is large in the low-frequency region and small in the high-frequency region. More specifically, in all of Figures 8(b) to 8(d), when the vertical frequency changes, the mask coefficient changes significantly near zero, while the change in the mask coefficient is small in the high-frequency region. 【0044】 In comparison with Figure 8(b), in Figures 8(c) and 8(d), which have horizontal frequency components, the vertical frequency region where the lowest mask coefficient value is obtained in the horizontal frequency region is the highest frequency region, but there is also a frequency region near zero where the mask coefficient value is almost the same as that of the vertical frequency region. 【0045】Comparing Figure 8(b), where the horizontal frequency component is zero, to Figure 8(c), which has a horizontal frequency component, and to Figure 8(d), which has a higher horizontal frequency component, we find that the frequencies at which the highest mask coefficient is taken in the horizontal frequency are two frequency regions where the vertical frequency is near zero in Figure 8(c), while in Figure 8(d) they are two frequency regions further below zero than in Figure 8(c), whereas in Figure 8(b) the vertical frequency is zero. 【0046】 In comparison with Figure 8(b), Figures 8(c) and 8(d), which have horizontal frequency components, appear to have a certain range for the mask coefficient value in the vertical frequency domain. This is because, when taking the difference between the spatial frequency domain information of the original two-dimensional Rice distribution and the two-dimensional Rayleigh distribution, the high-frequency domain is susceptible to noise, causing the mask coefficient value to fluctuate finely. When the horizontal and vertical frequencies are specified, the value of the mask coefficient is uniquely determined. In both the horizontal and vertical directions, the higher the frequency domain, the finer the fluctuations in the mask coefficient due to noise, and the rate of change of the mask coefficient repeatedly reverses between positive and negative as the frequency changes. 【0047】 The mask processing unit 150 performs masking on the spatial frequency components of the image's channel data using the generated mask image for the same channel (step S206). The mask processing unit 150 multiplies the value of the spatial frequency component of the image's channel data by the mask coefficient of the mask image for each vertical and horizontal frequency, and stores that value as the spatial frequency component. This generates a spatial frequency component from the spatial frequency component of the image's channel data, from which the frequency component of scattered light from the scattering medium has been removed. The mask processing unit 150 performs masking on each channel, generates three spatial frequency components for each channel, and outputs them to the inverse transform unit 160. 【0048】The inverse transform unit 160 converts the spatial frequency domain information for each channel into spatial domain information using an inverse Fourier transform, and generates channel data for the image (step S207). The inverse transform unit 160 synthesizes the generated channel data for the image to generate image data (step S208). The image processing device 100 performs a Fourier transform of the image channel data, generates a mask image and performs mask processing, and performs an inverse Fourier transform for each R, G, and B channel to reconstruct the image channel data, thereby generating image data with excellent color reproduction of the original subject. The inverse transform unit 160 outputs the generated image data to the output unit 170. The output unit 170 stores the generated image data in a storage medium or outputs it to a display device. 【0049】 Figure 9 shows image data before and after image processing by the image processing device 100. Figure 9(a) is image data taken in an environment where a scattering medium such as smoke is present and input to the image processing device 100, and Figure 9(b) shows image data output by the image processing device 100. As shown in Figure 9(b), the image processing device 100 can reduce the influence of scattering mediums such as smoke and generate image data in which the subject is visualized more clearly. 【0050】 Figure 10 shows other examples of image data before and after image processing by the image processing device 100. Figures 10(a) and 10(c) are image data captured in an environment where a scattering medium such as smoke is present and input to the image processing device 100, while Figures 10(b) and 10(d) show image data output by the image processing device 100, respectively. As shown in Figures 10(b) and 10(d), the image processing device 100 can reduce the influence of scattering mediums such as smoke and generate image data in which the subject is visualized more clearly. 【0051】[1-3. Effects] The image processing apparatus in this embodiment is an image processing apparatus 100 that performs image processing on input image data captured in an environment where a scattering medium that scatters light is present, and comprises: a conversion unit 130 that extracts the spatial frequency components of the input image data; a mask generation unit 140 that generates a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium; a mask processing unit 150 that performs mask processing on the spatial frequency components of the input image data output by the conversion unit 130 using the mask image; and an inverse conversion unit 160 that inversely converts the spatial frequency components of the input image data output by the mask processing unit 150 and generates image data. The image processing apparatus 100 can reduce the influence of the scattering medium and generate an image in which the subject is visualized more clearly by performing mask processing with a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium such as fog, smoke, and dust. Furthermore, because the image processing apparatus 100 reduces the influence of the scattering medium by mask processing, it is possible to increase the processing speed. 【0052】 The mask generation unit 140 may generate a mask image using parameters extracted from the input image data. The image processing device 100 can generate a mask image corresponding to the input image data, thereby reducing the influence of the scattering medium. 【0053】 The mask generation unit 140 may generate a mask image by the difference between the spatial frequency components of a two-dimensional Rice distribution and a two-dimensional Rayleigh distribution using parameters extracted from the input image data. The mask generation unit 140 can generate a mask image that extracts direct light by removing the frequency component of the two-dimensional Rayleigh distribution of scattered light from the spatial frequency components of the two-dimensional Rice distribution, which includes both direct light and scattered light. The image processing device 100 can reduce the effect of scattered light and generate an image in which the subject is visualized more clearly by performing mask processing with the generated mask image. 【0054】The mask generation unit 140 may generate a mask image using parameters extracted from the input image data, based on the difference in spatial frequency components of the two-dimensional probability distribution of amplitude in environments where both direct and scattered waves exist, and in environments where only scattered waves exist. The mask generation unit 140 can generate a mask image that extracts direct light by removing the frequency component of the two-dimensional probability distribution of amplitude in an environment where only scattered light exists from the spatial frequency component of the two-dimensional probability distribution of amplitude in an environment containing both direct and scattered light. The image processing device 100 can reduce the effect of scattered light and generate an image in which the subject is visualized more clearly by performing mask processing with the generated mask image. 【0055】 The conversion unit 130 may extract the spatial frequency components of the input image data using the Fourier transform, and the inverse conversion unit 160 may generate the image data using the inverse Fourier transform. The image processing device 100 converts spatial domain information into spatial frequency domain information using the Fourier transform, thereby enabling the separation of the effects of scattering media, which are difficult to separate in the spatial domain, by converting them to the spatial frequency domain. Furthermore, the image processing device 100 can restore the original spatial domain information by performing an inverse Fourier transform on the spatial frequency domain information. 【0056】The image processing method (ARMS) in this embodiment is an image processing method that performs image processing on input image data captured in an environment where a scattering medium that scatters light is present, and comprises: a conversion step (step S203) for extracting the spatial frequency components of the input image data; a mask generation step (step S205) for generating a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium; a mask processing step (step S206) for performing mask processing on the spatial frequency components of the input image data output by the conversion step (step S203) using the mask image; and an inverse transformation step (step S207) for inversely transforming the spatial frequency components of the input image data output by the mask processing step (step S206) to generate image data. According to the image processing method in this embodiment, by performing mask processing with a mask image that attenuates the spatial frequency components of the scattered light from a scattering medium such as fog, smoke, or dust, the influence of the scattering medium can be reduced, an image in which the subject is visualized more clearly can be obtained, and the processing speed can be increased. 【0057】 The image processing program (ARMS) in this embodiment is an image processing program that causes a computer to execute an image processing method for input image data captured in an environment where a scattering medium that scatters light is present, and it includes: a conversion step (step S203) for extracting the spatial frequency components of the input image data; a mask generation step (step S205) for generating a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium; a mask processing step (step S206) for performing mask processing on the spatial frequency components of the input image data output by the conversion step (step S203) using the mask image; and an inverse conversion step (step S207) for inversely transforming the spatial frequency components of the input image data output by the mask processing step (step S206) to generate image data.According to the image processing program in this embodiment, by performing mask processing with a mask image that attenuates the spatial frequency components of the scattered light from a scattering medium such as fog, smoke, or dust, the influence of the scattering medium can be reduced, an image in which the subject is visualized more clearly can be obtained, and the processing speed can be increased. 【0058】 (Embodiment 2) Embodiment 2 describes an image processing device that adds pre-processing of input image data and post-processing of image data after inverse transformation to the image processing device 100 described in Embodiment 1. 【0059】 Figure 11 is a block diagram showing the configuration of the image processing apparatus in this embodiment. The image processing apparatus 101 includes an input unit 110, a mask pre-processing unit 120, a conversion unit 130, a mask generation unit 140, a mask processing unit 150, an inverse conversion unit 160, a post-processing unit 165, and an output unit 170. The input unit 110, the mask pre-processing unit 120, the conversion unit 130, the mask generation unit 140, the mask processing unit 150, the inverse conversion unit 160, and the output unit 170 perform the same processing as the units described in Embodiment 1. 【0060】 The post-processing unit 165 performs image processing on the image data output by the inverse transform unit 160, which has reduced the effects of scattered light, such as edge enhancement to sharpen contours, color correction to adjust the color tone, and processing to cut out only a predetermined area. The post-processing unit 165 may also use a trained model learned by machine learning to perform processing on the image data output by the inverse transform unit 160 to determine a predetermined shape, color, or whether or not it is a person. 【0061】 In this way, by performing various processing in the post-processing unit 165, it is possible to provide the optimal response to image data captured in various environments. In addition to reducing the effects of scattering media, the image processing device 101 can further enhance the clarity of image data and identify and extract specific objects, thereby improving convenience. 【0062】 Figure 12 is a block diagram showing another configuration of the image processing apparatus in this embodiment. The image processing apparatus 102 includes an input unit 110, a pre-processing unit 115, a mask pre-processing unit 120, a conversion unit 130, a mask generation unit 140, a mask processing unit 150, an inverse conversion unit 160, and an output unit 170. The input unit 110, the mask pre-processing unit 120, the conversion unit 130, the mask generation unit 140, the mask processing unit 150, the inverse conversion unit 160, and the output unit 170 perform the same processing as the units described in Embodiment 1. 【0063】 The preprocessing unit 115 applies a filter to the image data from the input unit 110 to remove frequencies other than a predetermined frequency, and outputs image data of a specific frequency. The preprocessing unit 115 may also extract channel data of only the G component from the channel data of the R (red), G (green), and B (blue) color components that constitute the image data. In the first embodiment, the image processing device 100 processed the channel data of all color components R, G, and B, but in the second embodiment, the image processing device 102 processes only the channel data of the G component. This is because the portion corresponding to the G component is the largest in the visible light region. By processing only the G component, the image processing device 102 can reduce the processing load while reducing the influence of the scattering medium and making the subject visible, although the hue of the image data changes. 【0064】 Figure 13 is a block diagram showing the other configurations of the image processing device in this embodiment. The image processing device 103 includes both the pre-processing unit 115 and the post-processing unit 165 described above. By including both the pre-processing unit 115 and the post-processing unit 165, the image processing device 103 can reduce the processing load on the pre-processing unit 115 and further enhance the clarity of image data and extract specific objects using the post-processing unit 165, thereby significantly improving convenience. 【0065】 In Figures 11 to 13, the imaging device 10 and the image processing devices 101 to 103 are shown as separate devices. However, the image processing devices 101 to 103 may be implemented inside an imaging device such as a camera, and implemented as an internal module of the imaging device. 【0066】 The image processing devices 101 to 103 consist of memory and a processor, and each internal block shown in the image processing devices 101 to 103 is implemented by software in which the memory and processor work together. It may also be implemented as a program that runs on a computer. It is not necessary to implement all of each block in software; some may be implemented in hardware. 【0067】(Embodiment 3) Embodiment 3 describes an image processing apparatus for motion image data in which the subject, or a scattering medium such as fog, smoke, or dust, moves over time. 【0068】 [3-1. Configuration] Figure 14 is a block diagram showing the configuration of the image processing apparatus in this embodiment. The image processing apparatus 400 includes an input unit 410, a histogram conversion unit 420, a conversion unit 430, an aggregation unit 440, an inverse conversion unit 450, an inverse histogram conversion unit 460, and an output unit 470. 【0069】 In an environment where a light-scattering medium such as fog, smoke, or dust is present, a subject is photographed using an imaging device 20, such as a camera. The image data output by the imaging device 20 is moving image data. The image processing device 400 performs image processing on the moving image data, which is covered by fog, etc., input from the imaging device 20, and outputs moving image data with the effects of the fog, etc. reduced. In Figure 14, the imaging device 20 and the image processing device 400 are shown as separate devices, but the image processing device 400 may be implemented inside the imaging device such as a camera, and implemented as an internal module of the imaging device. 【0070】 The motion image data of an object covered in fog or the like, captured by the imaging device 20, is input to the image processing device 400 via the input unit 410. The input unit 410 outputs the image data of multiple frames constituting the input motion image data to the histogram conversion unit 420. For each of the input multiple frames of image data, the histogram conversion unit 420 converts the distribution of histograms, which is biased towards a narrow area, into a distribution that is spread out over a wider area and distributed evenly. 【0071】 The conversion unit 430 performs a Fourier transform on the image data obtained after histogram transformation of multiple frames output by the histogram transformation unit 420, and extracts frequency components. The conversion unit 430 uses the Fast Fourier Transform as an implementation method for the Fourier transform. 【0072】The aggregation unit 440 superimposes the frequency components of the multiple frames of image data output by the conversion unit 430. Specifically, the aggregation unit 440 adds up the values ​​for each spatial frequency component and divides the result by the number of frames to obtain the average value. The aggregation unit 440 calculates the average value for all spatial frequency components. 【0073】 The inverse transform unit 450 performs an inverse Fourier transform on the spatial frequency components of the image data output by the aggregation unit 440 to generate image data. The inverse transform unit 450 uses the inverse fast Fourier transform as an implementation method for the inverse Fourier transform. 【0074】 The inverse histogram conversion unit 460 performs the reverse processing of the processing performed by the histogram conversion unit 420, restoring the image data to a shape closer to the histogram distribution of the image data input to the image processing device 400. The histogram conversion unit 420 converted the histogram distribution, which was biased to a narrow area, into a distribution that was spread over a wide area and distributed evenly, but the inverse histogram conversion unit 460 processes the histogram distribution, which has been spread over a wide area, back to the original histogram distribution that was biased to a narrow area. 【0075】 The output unit 470 outputs the image data output by the inverse histogram conversion unit 460. The output format includes display on a display device such as a screen, or recording to a storage medium such as a hard disk or a memory card. The output unit 470 displays the image data on a display device or records it to a memory card or the like. 【0076】 The image processing device 400 consists of memory and a processor, and each block shown inside the image processing device 400 in Figure 14 is implemented by software in which the memory and processor work together. It may also be implemented as a program that runs on a computer. It is not necessary to implement all of each block in software; some may be implemented in hardware. 【0077】[3-2. Operation] Figure 15 is a flowchart showing the operation of the image processing device 400. Figures 16 and 17 show the changes in the processed image as a result of the image processing performed by the image processing device 400. The operation of the image processing device 400 will be explained using the flowchart shown in Figure 15, with reference to Figures 16 and 17. 【0078】 The input unit 410 receives motion image data from the imaging device 20, which captures a subject in an environment where a light-scattering medium such as fog is present (step S501). The input unit 410 outputs the G component of the R (red), G (green), and B (blue) color components (channel data) that constitute the image data for the first to N frames that make up the input motion image data. The value of N can be between 5 and 7. In this embodiment, the case where N=5 will be described. The G components of the five consecutive frames of image data output by the input unit 410 are shown in Figures 16(a) to (e). In this embodiment, the imaging device 20 takes a picture of a tree as the subject, capturing the pattern of fog being blown by the wind in an environment where fog has occurred at night. 【0079】 The histogram conversion unit 420 converts the histogram distribution, which is biased towards a narrow area, for each of the multiple input frames of image data into a distribution that is spread out over a wider area and distributed evenly (step S502). In Figure 16, Figures 16(a) to (e) show the image data before conversion, and Figures 16(f) to (j) show the image data after histogram conversion. In Figure 16, the arrows indicate the correspondence between the image data before and after processing. 【0080】Figure 18 will be used to explain the details of the histogram transformation process. Figure 18 shows the cumulative histogram of the image data before and after histogram transformation by the histogram transformation unit 420. In Figure 18, the horizontal axis represents the pixel values ​​that make up the image data, and the vertical axis represents the cumulative probability (frequency) of occurrence of pixels that take the value on the horizontal axis in the entire image. Here, the cumulative value is transformed so that it becomes 100% when it takes a value of 255. In Figure 18, the image data before transformation has a histogram distribution that is biased towards a narrow region, with a high probability of occurrence of pixels that take pixel values ​​from 0 to 25, and pixels that take pixel values ​​from 0 to 50 occupying almost the entire image. Therefore, as shown in Figure 16(a), the entire image is dark, and it is difficult to distinguish the color differences between pixels. As shown in Figure 18, the histogram transformation unit 420 generates image data with a transformed cumulative histogram by performing a histogram matching process on the cumulative histogram of the image data before transformation with the cumulative histogram of y = x shown by the dotted line. The cumulative histogram of the converted image is a result of a transformation that amplifies and makes more pronounced slight differences in pixel values ​​before the transformation, resulting in a generally brighter image and clearer color differences between pixels. Figure 16(f) shows the image converted by the histogram transformation unit 420 for the image shown in Figure 16(a). As shown in Figure 16(f), the contrast is increased in the image after histogram transformation, and the trees and fog are clearer. 【0081】The conversion unit 430 extracts frequency components from the image data after histogram conversion of multiple frames output by the histogram conversion unit 420 using a Fourier transform (step S503). The spatial domain information that constitutes the image data is composed of different frequency components. The conversion unit 430 converts the spatial domain information into spatial frequency domain information, which is a space that shows what kind of frequency components it is composed of, using a Fourier transform. The conversion unit 430 extracts information that has two frequency components, horizontal and vertical, as spatial frequency domain information. In the spatial frequency domain, low-frequency component information is located near the center, and high-frequency component information is located as you move away from the center. In Figure 16, Figures 16(f) to (j) show the image data before the Fourier transform, and Figures 16(k) to (o) show the spatial frequency domain information extracted by the Fourier transform. In Figure 16, the arrows show the correspondence between the image data before and after processing. 【0082】 The aggregation unit 440 superimposes the frequency components of the multiple frames of image data output by the conversion unit 430 (step S504). The aggregation unit 440 adds up the values ​​for each spatial frequency component of the spatial frequency domain information shown in Figures 16(k) to (o), divides the result by the number of frames (5), and obtains the average value for each spatial frequency component. The spatial frequency domain information calculated by the aggregation unit 440 is shown in Figure 17(p). 【0083】 By converting image data to the spatial frequency domain in this way, while it is difficult in the spatial domain to separate the subject from the fog and reduce only the effect of the fog when the subject is moving, in the frequency domain, the subject and fog can be separated, and the effect of the fog can be reduced regardless of the subject's movement by overlapping and reducing the frequency domain that identifies the fog across multiple frames. 【0084】 The inverse transform unit 450 converts spatial frequency domain information into spatial domain information using an inverse Fourier transform and generates image data (step S505). The inverse transform unit 450 converts the spatial frequency domain information shown in Figure 17(p) and generates image data shown in Figure 17(q). 【0085】The inverse histogram conversion unit 460 performs processing in the reverse direction of the processing performed by the histogram conversion unit 420 (step S506). The cumulative histogram of the image input to the inverse histogram conversion unit 460 has a distribution shape similar to the cumulative histogram of the converted image in Figure 18. The image actually captured by the imaging device 20 is an image with a shape similar to the cumulative histogram of the image before conversion in Figure 18. Therefore, by performing the inverse histogram conversion processing in the reverse direction of the conversion performed by the histogram conversion unit 420 using the inverse histogram conversion unit 460, the color information of the image captured by the imaging device 20 can be reproduced. The inverse histogram conversion unit 460 converts the image data shown in Figure 17(q) and generates the image data shown in Figure 17(r). 【0086】 The output unit 470 outputs the image data output by the inverse histogram conversion unit 460 (step S507). The output format is display on a display device such as a screen, or recording to a storage medium such as a hard disk or a memory card. The output unit 470 displays the image data on a display device or records it to a memory card or the like. 【0087】 In this embodiment, the image processing device 400 is described in which an inverse histogram conversion unit 460 is included and the inverse histogram conversion process is performed. However, the image processing device 400 may also be configured to output image data generated by the inverse conversion unit 450 without performing an inverse histogram conversion. As a result, although the color information changes compared to the input image, it is possible to obtain an image with increased contrast that makes it easier to visualize the subject, while reducing the influence of scattering media such as fog. 【0088】Through the above processing, the processing of image data for frames 1 to 5 that constitute the moving image data is completed, and image data with reduced influence from scattering media such as fog is generated. The image processing device 400 continues to perform the same processing (processing from step S502 to step S507) on image data for frames 2 to 6 (Yes in step S508). Thereafter, the image processing device 400 sequentially processes all frames that constitute the moving image data and outputs moving image data in which the generated image data is composed of frames. The image processing device 400 terminates processing when processing all frames is completed (No in step S508). In this embodiment, the form in which the G (green) component of the color components of the image data is targeted has been described. This is because the portion corresponding to the G component is the largest in the visible light region, and by processing only the G component, it is possible to visualize the subject while reducing the processing load. However, when only the G component is targeted, the color tone will change from the input moving image. The image processing device 400 may perform the same processing (processing from step S502 to step S508) on the R (red) and B (blue) components in addition to the G component, and the output unit 470 may generate and output image data by adding the G, R, and B components of the same frame together. This makes it possible to output video image data with the same color tones as the input video image. 【0089】 [3-3. Effects] The image processing apparatus in this embodiment is an image processing apparatus 400 that performs image processing on moving image data captured in an environment where a scattering medium that scatters light is present, and comprises: a histogram conversion unit 420 that performs a conversion on each of the multiple frames of image data constituting the moving image data to broaden the distribution of histograms which are biased to a narrow area to a wide area; a conversion unit 430 that extracts the spatial frequency components of the image data output by the histogram conversion unit 420; an aggregation unit 440 that superimposes the spatial frequency components of the multiple frames of image data output by the conversion unit 430; and an inverse conversion unit 450 that inversely converts the spatial frequency components of the image data output by the aggregation unit 440 to generate image data. 【0090】The image processing device 400 can generate a moving image with reduced influence from the scattering medium, such as fog, by performing a histogram transformation to clearly identify the scattering medium on the image, converting the image data to the spatial frequency domain, and superimposing the frequency domain information across multiple frames. 【0091】 The image processing device 400 may also include an inverse histogram conversion unit 460 that returns the histogram of the image data output by the inverse conversion unit 450 back to the original distribution concentrated in a narrow region. This makes it possible to generate moving images that reproduce the color information of the input image while reducing the influence of scattering media such as fog. 【0092】 The image processing method in this embodiment is an image processing method that performs image processing on moving image data captured in an environment where a scattering medium that scatters light is present, and comprises: a histogram transformation step (step S502) that performs a transformation on each of the multiple frames of image data constituting the moving image data to broaden the distribution of histograms which are biased to a narrow area to a wider area; a transformation step (step S503) that extracts the spatial frequency components of the image data output by the histogram transformation step; an aggregation step (step S504) that superimposes the spatial frequency components of the multiple frames of image data output by the transformation step; and an inverse transformation step (step S505) that inversely transforms the spatial frequency components of the image data output by the aggregation step to generate image data. 【0093】 The image processing method in this embodiment can generate a moving image with reduced influence from scattering media such as fog by first performing a histogram transformation to clearly identify scattering media such as fog in the image, then converting the image data to the spatial frequency domain, and superimposing the frequency domain information across multiple frames. 【0094】 (Embodiment 4) Embodiment 4 describes an image processing apparatus for motion image data in which the subject and the scattering medium such as fog, smoke, or dust do not move over time and have little to no movement. 【0095】Embodiment 4 describes an embodiment in which the masking process described in Embodiment 1 is applied to video data. In Embodiment 4, the method for creating the mask image used for masking is the same as the method for creating the mask image described in Embodiment 1. 【0096】 [4-1. Configuration] Figure 19 is a diagram showing the configuration of the image processing apparatus in this embodiment. The image processing apparatus 600 includes an input unit 610, a histogram conversion unit 620, a mask preprocessing unit 625, a conversion unit 630, a mask generation unit 632, a mask processing unit 635, an aggregation unit 640, an inverse conversion unit 650, an inverse histogram conversion unit 660, and an output unit 670. 【0097】 The imaging device 20 captures a subject in an environment where a light-scattering medium such as fog, smoke, or dust is present, and outputs the captured video data. The image processing device 600 performs image processing on the video data covered with fog, etc., input from the imaging device 20, and outputs video data with the effects of the fog, etc. reduced. In Figure 19, the imaging device 20 and the image processing device 600 are shown as separate devices, but the image processing device 600 may be implemented inside an imaging device such as a camera, and implemented as an internal module of the imaging device. 【0098】 The input unit 610, histogram conversion unit 620, conversion unit 630, aggregation unit 640, inverse conversion unit 650, inverse histogram conversion unit 660, and output unit 670 each have the same functions and perform the same processing as the input unit 410, histogram conversion unit 420, conversion unit 430, aggregation unit 440, inverse conversion unit 450, inverse histogram conversion unit 460, and output unit 470 described in Embodiment 3. 【0099】 The mask preprocessing unit 625 performs the same processing as the mask preprocessing unit 120 described in Embodiment 1. The mask preprocessing unit 625 extracts parameters that characterize the scattered light from the scattering medium in the image data. Based on the extracted parameters, the mask preprocessing unit 625 generates a two-dimensional Rayleigh distribution representing the scattered light from the scattering medium and a two-dimensional Rice distribution representing both the scattered light from the scattering medium and the direct light that arrives without scattering. 【0100】The mask generation unit 632 performs the same processing as the mask generation unit 140 described in Embodiment 1. The mask generation unit 632 uses the spatial frequency domain information of the two-dimensional Rice distribution and the spatial frequency domain information of the two-dimensional Rayleigh distribution output by the conversion unit 630 to generate a mask image for removing scattered light from the scattering medium. 【0101】 The mask processing unit 635 performs the same processing as the mask processing unit 150 described in Embodiment 1. The mask processing unit 635 uses the mask image generated by the mask generation unit 632 to perform mask processing on the spatial frequency components of the image data output by the conversion unit 630, and generates spatial frequency components from which the frequency components of scattered light due to the scattering medium have been removed. 【0102】 In the image processing device 600 for motion image data, the pre-processing unit 115 in the image processing devices 101 to 103 described in Embodiment 2 corresponds to the histogram conversion unit 620, and the post-processing unit 165 corresponds to the inverse histogram conversion unit 660. 【0103】 The image processing device 600 consists of memory and a processor, and each block shown inside the image processing device 600 in Figure 19 is implemented by software in which the memory and processor work together. It may also be implemented as a program that runs on a computer. It is not necessary to implement all of each block in software; some may be implemented in hardware. 【0104】 [4-2. Operation] Figure 20 is a flowchart showing the operation of the image processing device 600. Figures 21 and 22 show the changes in the processed image as a result of the image processing performed by the image processing device 600. The operation of the image processing device 600 will be explained using the flowchart shown in Figure 20, with reference to Figures 21 and 22. 【0105】The input unit 610 receives motion image data from the imaging device 20, which captures a subject in an environment where a light-scattering medium such as fog is present (step S701). The input unit 610 outputs the G component of the R (red), G (green), and B (blue) color components (channel data) that constitute the image data for the first to N frames that make up the input motion image data. The value of N can be between 5 and 7. In this embodiment, the case where N=5 will be described. The G components of the five consecutive frames of image data output by the input unit 610 are shown in Figures 21(a) to (e). In this embodiment, the imaging device 20 takes a forest as the subject and captures a scene where there is no wind and the fog is almost motionless in an environment where fog has occurred at night. 【0106】 The histogram conversion unit 620 converts the histogram distribution, which is biased towards a narrow area, for each of the multiple input frames of image data into a distribution that is spread out over a wider area and distributed evenly (step S702). In Figure 21, Figures 21(a) to (e) show the image data before conversion, and Figures 21(f) to (j) show the image data after histogram conversion. In Figure 21, the arrows indicate the correspondence between the image data before and after processing. The processing performed by the histogram conversion unit 620 is the same as the processing described using Figure 18 in Embodiment 3. Histogram conversion makes the color differences between pixels clearer and improves the contrast of the image. 【0107】 The conversion unit 630 extracts frequency components from the histogram-converted image data of multiple frames output by the histogram conversion unit 620 using a Fourier transform (step S703). In Figure 21, Figures 21(f) to (j) show the image data before the Fourier transform, and Figures 21(k) to (o) show the spatial frequency domain information extracted by the Fourier transform. In Figure 21, the arrows indicate the correspondence between the image data before and after processing. 【0108】 The image processing device 600 performs mask preprocessing and mask generation processing using the mask preprocessing unit 625, the conversion unit 630, and the mask generation unit 632 in order to generate a mask image to be used by the mask processing unit 635. 【0109】 The mask preprocessing unit 625 extracts parameters representing the effect of scattered light from the scattering medium from the image data of the first frame (Figure 21(f)) out of the five frames of image data output by the histogram conversion unit 620, and generates a two-dimensional Rayleigh distribution and a two-dimensional Rice distribution. The processing performed by the mask preprocessing unit 625 is the same as the processing described using Figure 3 in Embodiment 1. 【0110】 The mask preprocessing unit 625 calculates the average value μ of the image data of the first frame. image The standard deviation σ is calculated using the formula shown in (Equation 1) in Embodiment 1. image These are calculated using the formula shown in (Equation 2) in Embodiment 1. The mask preprocessing unit 625 calculates the standard deviation σ of the calculated image data. image and mean μ image Using this, parameters for the two-dimensional Rayleigh distribution and the two-dimensional Rice distribution are calculated using the formulas shown in (Equation 3) and (Equation 4) in Embodiment 1. The mask preprocessing unit 625 generates the two-dimensional Rayleigh distribution shown in (Equation 5) and the two-dimensional Rice distribution shown in (Equation 6) in Embodiment 1 using the calculated parameter values. The mask preprocessing unit 625 outputs the two-dimensional Rayleigh distribution and the two-dimensional Rice distribution of the generated image data to the conversion unit 630. 【0111】 The process for generating the mask is the same as the process described in Embodiment 1. The transformation unit 630 performs a Fourier transform on the two-dimensional Rayleigh distribution and the two-dimensional Rice distribution output by the mask preprocessing unit 625 to extract the frequency components of each distribution. The spatial frequency domain information of the two-dimensional Rice distribution output by the transformation unit 630 is the same as that shown in Figure 4(a) in Embodiment 1, and the spatial frequency domain information of the two-dimensional Rayleigh distribution is the same as that shown in Figure 4(b). The transformation unit 630 outputs the spatial frequency domain information of the two-dimensional Rice distribution and the spatial frequency domain information of the two-dimensional Rayleigh distribution to the mask generation unit 632. 【0112】The mask generation unit 632 generates a mask image to remove scattered light from the scattering medium from the spatial frequency domain information of the two-dimensional Rice distribution and the spatial frequency domain information of the two-dimensional Rayleigh distribution. The mask generation unit 632 can remove the spatial frequency component due to scattered light common to both by subtracting the spatial frequency domain information of the two-dimensional Rayleigh distribution from the spatial frequency domain information of the two-dimensional Rice distribution. The mask image generated by the mask generation unit 632 is the same as the image shown in Figure 6 in Embodiment 1. The mask image has the same amount of information as the spatial frequency domain information of the image data and has a coefficient (hereinafter referred to as the mask coefficient) that attenuates the corresponding spatial frequency component of the image data for each vertical and horizontal frequency. The mask coefficient takes a value from 0 to 1. When mask processing is performed using the mask image, the spatial frequency components of the image become the original spatial frequency components multiplied by the mask coefficient for each vertical and horizontal frequency. For example, when the mask coefficient is 1, the spatial frequency components of the image remain at their original values, and when the mask coefficient is 0, the spatial frequency components of the image become 0. Thus, in regions with a low mask coefficient, the spatial frequency components of the image are reduced by the masking process, and the information in the spatial region corresponding to those spatial frequency components (the part corresponding to scattered light) is suppressed. On the other hand, in regions with a high mask coefficient, the spatial frequency components of the image are maintained, and the information in the spatial region corresponding to those spatial frequency components (the part corresponding to the subject) is emphasized. 【0113】 The more detailed shape of the mask image is the same as the shape described using Figures 7 and 8 in Embodiment 1. 【0114】The mask processing unit 635 performs masking on each of the spatial frequency components of the five frames of image data using the generated mask image (step S704). The mask processing unit 635 multiplies the value of the spatial frequency component of the image data by the mask coefficient of the mask image for each vertical and horizontal frequency, and stores that value as the spatial frequency component. This generates a spatial frequency component from which the frequency component of scattered light from the scattering medium has been removed from the spatial frequency component of the image data. Figure 21(p) shows the generated mask image, and Figures 22(q) to (u) show the spatial frequency components of the image data generated after performing masking on the spatial frequency components of the image data shown in Figures 21(k) to (o). 【0115】 In this embodiment, a method for generating a mask image from the image data of the first frame among multiple frames has been described. The image data frame from which the mask image is generated is not limited to the first frame; other frames may also be used. By using the same mask image for multiple frames of image data, the processing load required for mask image generation can be reduced. The mask processing unit 635 outputs the results of the mask processing to the aggregation unit 640. 【0116】 The aggregation unit 640 superimposes the frequency components of the multiple frames of image data output by the mask processing unit 635 (step S705). The aggregation unit 640 sums the values ​​for each spatial frequency component of the spatial frequency domain information shown in Figures 22(q) to (u), divides the result by the number of frames (5), and obtains the average value for each spatial frequency component. The spatial frequency domain information calculated by the aggregation unit 640 is shown in Figure 22(v). 【0117】 The inverse transform unit 650 converts spatial frequency domain information into spatial domain information using an inverse Fourier transform and generates image data (step S706). The inverse transform unit 650 converts the spatial frequency domain information shown in Figure 22(v) and generates image data shown in Figure 22(w). 【0118】The inverse histogram conversion unit 660 performs the reverse processing of the processing performed by the histogram conversion unit 620 (step S707). The inverse histogram conversion unit 660 converts the image data shown in Figure 22(w) and generates the image data shown in Figure 22(x). 【0119】 The output unit 670 outputs the image data output by the inverse histogram conversion unit 660 (step S708). The output format is display on a display device such as a screen, or recording to a storage medium such as a hard disk or a memory card. The output unit 670 displays the image data on a display device or records it to a memory card or the like. 【0120】 In this embodiment, the image processing device 600 is described in which an inverse histogram conversion unit 660 is included and performs inverse histogram conversion processing. However, the image processing device 600 may also be configured to output image data generated by the inverse conversion unit 650 without performing inverse histogram conversion. As a result, although the color information changes compared to the input image, it is possible to obtain an image with increased contrast that makes it easier to visualize the subject, while reducing the influence of scattering media such as fog. 【0121】 Through the above processing, the processing of image data for frames 1 to 5 that constitute the moving image data is completed, and image data with reduced influence from scattering media such as fog is generated. The image processing device 600 then performs the same processing (processing from step S702 to step S708) on image data for frames 2 to 6 (Yes in step S709). Thereafter, the image processing device 600 sequentially processes all frames that constitute the moving image data and outputs moving image data composed of the generated image data as frames. The image processing device 600 terminates processing when processing all frames is complete (No in step S709). By converting the image data to the spatial frequency domain and performing mask processing to reduce the influence of scattering media such as fog, and processing to superimpose multiple frames, a clearer image can be obtained with further reduced influence from scattering media such as fog in the input moving image data. 【0122】In this embodiment, the description focuses on the green (G) component of the color components of the image data. This is because the portion corresponding to the G component is the largest in the visible light region. By processing only the G component, the subject can be visualized while reducing the processing load. However, if only the G component is processed, the color tone will change from the input video. The image processing device 600 may also perform the same processing (processing from step S702 to step S709) on the red (R) and blue (B) components in addition to the G component, and when outputting the image data in the output unit 670, it may perform a process to add up the G, R, and B components of the same frame. This makes it possible to output image data with the same color tone as the input video. 【0123】 [4-3. Effects] The image processing apparatus in this embodiment is an image processing apparatus 600 that performs image processing on moving image data captured in an environment where a scattering medium that scatters light is present, and comprises: a histogram conversion unit 620 that performs a conversion on each of the multiple frames of image data constituting the moving image data to broaden the distribution of histograms which are biased to a narrow area to a wide area; a conversion unit 630 that extracts the spatial frequency components of the image data output by the histogram conversion unit 620; a mask generation unit 632 that generates a mask image that attenuates the spatial frequency components of the scattered light from the scattering medium; a mask processing unit 635 that performs mask processing on the spatial frequency components of the multiple frames of image data output by the conversion unit 630 using the mask image; an aggregation unit 640 that superimposes the spatial frequency components of the multiple frames of image data output by the mask processing unit 635; and an inverse conversion unit 650 that inversely converts the spatial frequency components of the image data output by the aggregation unit 640 and generates image data. 【0124】 The image processing device 600 performs masking using a mask image that attenuates the spatial frequency components of scattered light from scattering media such as fog, smoke, and dust. By superimposing the frequency domain information after masking across multiple frames, the influence of scattering media such as fog can be further reduced, resulting in a clearer image. 【0125】The image processing device 600 may also include an inverse histogram conversion unit 660 that returns the histogram of the image data output by the inverse conversion unit 650 to a distribution concentrated in the original narrow region. This allows the image processing device 600 to reproduce the color information of the input image while reducing the influence of scattering media such as fog. 【0126】 The mask generation unit 632 may generate a mask image by the difference between the spatial frequency components of a two-dimensional Rice distribution and a two-dimensional Rayleigh distribution using parameters extracted from one of the multiple frames of image data output by the histogram conversion unit 620. The mask generation unit 632 can generate a mask image that extracts direct light by removing the frequency component of the two-dimensional Rayleigh distribution of scattered light from the spatial frequency components of the two-dimensional Rice distribution, which includes both direct light and scattered light. The image processing device 600 can reduce the effect of scattered light and generate an image in which the subject is visualized more clearly by performing mask processing with the generated mask image. In addition, the image processing device 600 can reduce the processing load by using the same mask image for multiple frames. 【0127】 (Embodiment 5) Embodiment 5 describes an image processing device in which a function for detecting motion in video data is added to the image processing device 600 described in Embodiment 4. When there is motion in the input video data, the processing described in Embodiment 3 is performed, and when there is no motion in the input video data, the processing described in Embodiment 4 is performed. 【0128】 [5-1. Configuration] Figure 23 is a block diagram showing the configuration of the image processing apparatus in this embodiment. The image processing apparatus 800 includes an input unit 810, a motion detection unit 815, a histogram conversion unit 820, a mask preprocessing unit 825, a conversion unit 830, a mask generation unit 832, a mask processing unit 835, an aggregation unit 840, an inverse conversion unit 850, an inverse histogram conversion unit 860, and an output unit 870. 【0129】The imaging device 20 captures a subject in an environment where a light-scattering medium such as fog, smoke, or dust is present, and outputs the captured video data. The image processing device 800 performs image processing on the video data covered with fog, etc., input from the imaging device 20, and outputs video data with the effects of the fog, etc. reduced. In Figure 23, the imaging device 20 and the image processing device 800 are shown as separate devices, but the image processing device 800 may be implemented inside an imaging device such as a camera, and implemented as an internal module of the imaging device. 【0130】 The input unit 810, histogram conversion unit 820, mask preprocessing unit 825, conversion unit 830, mask generation unit 832, mask processing unit 835, aggregation unit 840, inverse conversion unit 850, inverse histogram conversion unit 860, and output unit 870 each have the same functions and perform the same processing as the input unit 610, histogram conversion unit 620, mask preprocessing unit 625, conversion unit 630, mask generation unit 632, mask processing unit 635, aggregation unit 640, inverse conversion unit 650, inverse histogram conversion unit 660, and output unit 670 described in Embodiment 4. 【0131】 The motion detection unit 815 determines whether the input video is a video with motion. The motion detection unit 815 determines that there is motion if the percentage of pixels that have changed in the video between frames exceeds a first threshold, and determines that there is no motion if it is below the first threshold. Furthermore, regarding pixel changes, the unit determines that a pixel has changed if the pixel value has changed by more than a second threshold, and determines that the pixel has not changed if it is below the second threshold. The degree of change between frames required to determine that there is motion can be adjusted as appropriate by changing the settings of the first and second threshold values. The motion detection method is not limited to this method and any other method may be used. 【0132】The image processing device 800 consists of memory and a processor. In Figure 23, each block shown inside the image processing device 800 is implemented by software in which the memory and processor work together. It may also be implemented as a program that runs on a computer. It is not necessary to implement all of each block in software; some may be implemented in hardware. 【0133】 [5-2. Operation] Figure 24 is a flowchart showing the operation of the image processing device 800. The operation of the image processing device 800 will be explained, focusing on the differences from Embodiment 4. The motion image data captured by the imaging device 20 is input to the motion detection unit 815 via the input unit 810. The motion detection unit 815 analyzes the motion image data and determines whether there is motion (step S902). 【0134】 The motion detection unit 815 outputs the result of determining whether or not there is motion to the histogram conversion unit 820. If the histogram conversion unit 820 receives a result indicating that there is no motion, it outputs the histogram-converted image of the first frame to the mask preprocessing unit 825. If it receives a result indicating that there is motion, it does not output an image to the mask preprocessing unit 825. 【0135】 If an image is input to the mask preprocessing unit 825 (No in step S905), the mask preprocessing unit 825 generates a two-dimensional Rayleigh distribution and a two-dimensional Rice distribution. The mask generation unit 832 generates a mask image from the spatial frequency component information of the two-dimensional Rayleigh distribution and the two-dimensional Rice distribution generated by the conversion unit 830. The mask processing unit 835 performs mask processing using the generated mask image (step S906). On the other hand, if no image is input to the mask preprocessing unit 825 (Yes in step S905), no mask image is generated, and the mask processing unit 835 outputs the input image as is without performing mask processing. 【0136】 As explained in the processing flow described above, if the motion detection unit 815 determines that there is motion in the input video data, the image processing device 800 executes the process described in Embodiment 3, and if it determines that there is no motion, it executes the process described in Embodiment 4. 【0137】 In this way, the image processing device 800 can select a processing method suitable for the input video data by determining whether or not there is motion in the video data and changing the processing method, thereby generating video data with reduced influence from scattering media such as fog. 【0138】 (Other Embodiments) As described above, embodiments have been explained as examples of the present invention. However, the present invention is not limited thereto and can be applied to embodiments that have been modified, replaced, added, or omitted. Therefore, other embodiments are described below. 【0139】 Embodiments 1 to 5 described an image processing apparatus that reduces the effects of light-scattering media such as fog, smoke, or dust on images taken in an environment where such media are present. However, the scattering media is not limited to fog, smoke, or dust. For example, it can reduce the effect of a thin curtain on image data of a subject located behind it, making the subject more clearly visible. It can also reduce the effects of reflection and color changes in water on image data taken underwater. 【0140】 Embodiments 1 to 5 describe the form in which the input image data is composed of R, G, and B color components. The image data may be, for example, a YUV image signal consisting of Y (luminance signal), U (first chromatic difference signal), and V (second chromatic difference signal), or a YCbCr image signal, or a YPbPr image signal. In this case, the processing described in Embodiments 1 to 5 may be performed using Y (luminance signal), or processing may be performed on each image signal, and the original color image may be synthesized and output at the time of output. In addition to color images, the image may also be a grayscale image having luminance values ​​of any color depth, such as 1 bit or 8 bits. 【0141】In embodiments 3 to 5, the processing of image data of multiple (N) frames constituting moving image data was described in a configuration where N is 5 to 7. The number of frames is not limited to this number; it may be 2 to 4, or 8 or more. Increasing the number of frames can further reduce the influence of scattering media such as fog. However, increasing the number of frames increases the processing load, so the number of N is set considering the processing performance of the image processing device and the required real-time performance. 【0142】 Embodiments 4 and 5 describe a method of masking image data of multiple (N) frames constituting moving image data, in which the same mask image is used for N frames. Alternatively, a mask image may be generated for each of the N frames and masking may be performed, or the same mask image may be used for all frames constituting the moving image data. Increasing the number of mask images generated can further reduce the influence of scattering media such as fog. However, increasing the number of mask images generated increases the processing load, so the number of mask images to be generated is set considering the fog conditions and the motion conditions in the moving image. 【0143】 10, 20 Imaging device 100, 101, 102, 103, 400, 600, 800 Image processing device 110, 410, 610, 810 Input unit 115 Pre-processing unit 420, 620, 820 Histogram conversion unit 130, 430, 630, 830 Conversion unit 440, 640, 840 Aggregation unit 160, 450, 650, 850 Inverse conversion unit 165 Post-processing unit 460, 660, 860 Inverse histogram conversion unit 170, 470, 670, 870 Output unit 120, 625, 825 Mask pre-processing unit 140, 632, 832 Mask generation unit 150, 635, 835 Mask processing unit 815 Motion detection unit

Claims

1. An image processing apparatus for performing image processing on input image data captured in an environment where a light-scattering medium is present, comprising: a conversion unit for extracting spatial frequency components of the input image data; a mask generation unit for generating a mask image that attenuates the spatial frequency components of the light scattered by the light-scattering medium; a mask processing unit for performing mask processing on the spatial frequency components of the input image data output by the conversion unit using the mask image; and an inverse conversion unit for inversely converting the spatial frequency components of the input image data output by the mask processing unit and generating image data.

2. The image processing apparatus according to claim 1, wherein the mask image has a mask coefficient which is a coefficient that attenuates the spatial frequency component for each of the vertical and horizontal frequencies that constitute the spatial frequency component, and when either the vertical or horizontal frequency takes a constant value, the mask coefficient takes the same value at the same frequency in the other direction, and the rate of change of the mask coefficient with respect to the change in the frequency in the other direction decreases as the absolute value of the frequency increases.

3. The image processing apparatus according to claim 2, wherein when the frequency in either the vertical or horizontal direction of the mask image takes a constant high value, the frequency region in the other direction that takes the lowest value of the mask coefficient is the frequency region near zero, in addition to the highest frequency region.

4. The image processing apparatus according to claim 2, wherein when the frequency in either the vertical or horizontal direction is zero, the frequency at which the highest value of the mask coefficient is obtained in the other direction is zero, and when the frequency in either the vertical or horizontal direction is a value other than zero, the frequency range at which the highest value of the mask coefficient is obtained in the other direction is one of two frequency ranges near zero.

5. The image processing apparatus according to claim 1, wherein the mask image has a region in which the spatial frequency component of the input image data remains in a region where either the vertical or horizontal frequency constituting the spatial frequency component is high frequency.

6. The image processing apparatus according to claim 1, wherein the mask image retains the spatial frequency components of the input image data in a region where one of the frequencies in the vertical and horizontal directions constituting the spatial frequency components is near zero and the other is high frequency.

7. The image processing apparatus according to claim 1, wherein the mask image retains the spatial frequency components of the input image data in all regions from low to high frequencies, where the frequency of either the vertical or horizontal direction constituting the spatial frequency components is near zero and the other direction is in the range from low to high frequencies.

8. The image processing apparatus according to claim 1, wherein the mask generation unit generates the mask image using parameters extracted from the input image data.

9. The image processing apparatus according to claim 8, wherein the mask generation unit generates the mask image by the difference of spatial frequency components of a two-dimensional Rice distribution and a two-dimensional Rayleigh distribution using the parameters.

10. The image processing apparatus according to claim 8, wherein the mask generation unit generates the mask image using the parameters, based on the difference in spatial frequency components of the two-dimensional probability distribution of amplitude in an environment where both direct waves and scattered waves exist and in an environment where only scattered waves exist.

11. The image processing apparatus according to claim 1, wherein the transformation unit extracts the spatial frequency components of the input image data using the Fourier transform, and the inverse transformation unit generates the image data using the inverse Fourier transform.

12. The image processing apparatus according to claim 1, wherein the mask generation unit generates a mask image that transmits or attenuates the spatial frequency components of scattered light by the scattering medium, the mask image includes a transmission region that transmits the signal and an attenuation region that attenuates the signal, the transmission region forms a cross shape extending horizontally and vertically from the center of the frequency space, and the attenuation region is located outside the transmission region.

13. An image processing apparatus according to claim 1, comprising: a pre-processing unit that performs image processing on the input image data and outputs the processed image data to the conversion unit; and a post-processing unit that performs image processing on the image data output by the inverse conversion unit, including edge enhancement, color correction, and cropping, or performs processing to determine whether or not the image data has a predetermined shape, color, or is a person, using a trained model trained by machine learning.

14. An image processing method for performing image processing on input image data captured in an environment where a light-scattering medium is present, comprising: a conversion step of extracting spatial frequency components of the input image data; a mask generation step of generating a mask image that attenuates the spatial frequency components of the light scattered by the scattering medium; a mask processing step of performing mask processing on the spatial frequency components of the input image data output by the conversion step using the mask image; and an inverse conversion step of inversely transforming the spatial frequency components of the input image data output by the mask processing step to generate image data.

15. An image processing program that causes a computer to perform an image processing method on input image data captured in an environment in which a light-scattering medium is present, the program comprising: a conversion step of extracting the spatial frequency components of the input image data; a mask generation step of generating a mask image that attenuates the spatial frequency components of the light scattered by the scattering medium; a mask processing step of performing a masking process on the spatial frequency components of the input image data output by the conversion step using the mask image; and an inverse conversion step of inversely transforming the spatial frequency components of the input image data output by the mask processing step to generate image data.

16. An image processing apparatus for performing image processing on motion image data captured in an environment where a light-scattering medium is present, comprising: a histogram transformation unit that performs a transformation on each of the multiple frames constituting the motion image data to broaden the distribution of histograms that are biased to a narrow area to a wider area; a transformation unit that extracts the spatial frequency components of the image data output by the histogram transformation unit; an aggregation unit that superimposes the spatial frequency components of the multiple frames output by the transformation unit; and an inverse transformation unit that performs an inverse transformation on the spatial frequency components of the image data output by the aggregation unit to generate image data.

17. The image processing apparatus according to claim 16, further comprising an inverse histogram conversion unit that returns the histogram of the image data output by the inverse conversion unit to the original distribution concentrated in a narrow region.

18. The image processing apparatus according to claim 16 or 17, comprising: a mask generation unit that generates a mask image that attenuates the spatial frequency components of scattered light by the scattering medium; and a mask processing unit that performs mask processing on the spatial frequency components of multiple frames of image data output by the conversion unit using the mask image, wherein the aggregation unit superimposes the spatial frequency components of multiple frames of image data output by the mask processing unit.

19. The image processing apparatus according to claim 18, wherein the mask generation unit generates the mask image by the difference of spatial frequency components of a two-dimensional Rice distribution and a two-dimensional Rayleigh distribution using parameters extracted from one of the multiple frames of image data output by the histogram transformation unit.

20. The image processing apparatus according to claim 18, comprising a motion detection unit for detecting motion in the video data, wherein the aggregation unit, when the motion detection unit detects motion, superimposes the spatial frequency components of the multiple frames of image data output by the conversion unit, and when the motion detection unit does not detect motion, superimposes the spatial frequency components of the multiple frames of image data output by the mask processing unit.

21. An image processing method for performing image processing on motion image data captured in an environment where a light-scattering medium is present, comprising: a histogram transformation step that performs a transformation on each of the multiple frames constituting the motion image data to broaden the distribution of histograms which are biased to a narrow area to a wider area; a transformation step that extracts the spatial frequency components of the image data output by the histogram transformation step; an aggregation step that superimposes the spatial frequency components of the multiple frames output by the transformation step; and an inverse transformation step that inversely transforms the spatial frequency components of the image data output by the aggregation step to generate image data.

22. An image processing program that causes a computer to perform an image processing method on moving image data captured in an environment where a light-scattering medium is present, the program comprising: a histogram transformation step that performs a transformation on each of the multiple frames of image data constituting the moving image data to broaden the distribution of histograms which are biased to a narrow area; a transformation step that extracts the spatial frequency components of the image data output by the histogram transformation step; an aggregation step that superimposes the spatial frequency components of the multiple frames of image data output by the transformation step; and an inverse transformation step that performs an inverse transformation on the spatial frequency components of the image data output by the aggregation step to generate image data.