Image correspondence analyzer and its analysis method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The correspondence analyzer uses even and odd harmonic convolution kernels to enhance correspondence analysis, reducing noise and complexity for precise 3D data generation in stereo images.

JP7875869B2Active Publication Date: 2026-06-18

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Filing Date: 2022-01-31
Publication Date: 2026-06-18

Application Information

Patent Timeline

31 Jan 2022

Application

18 Jun 2026

Publication

JP7875869B2

IPC: G06T7/80; G06T7/32; G06T7/593; G06T7/60; G01C3/06

CPC: G06T7/593; G06T2207/10012; G06T7/70; G06T5/20; G06T7/11; G06V10/56; G06V10/764

AI Tagging

Application Domain

Image enhancement Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Using a driver's gaze direction to improve object detection, and applications thereof
US20260148569A1Image enhancement Image analysis Driver/operator Computer graphics (images)
A multi-wheel image editing method and system based on high-frequency detail information injection
CN122199642AImage enhancement Image analysis
Elevation inversion method for complex terrain area based on small unmanned aerial vehicle borne interferometric SAR
CN117372482BGuaranteed accuracyOvercoming the problem of invalid interference phase continuity assumptionImage enhancement Image analysis Imaging processing Uncrewed vehicle
Creating real-time interactive videos
CN122162162AImage enhancement Image analysis Computer graphics (images)Interactive video
Device and method for detecting liquid level height of melt in copper pyrometallurgy slag ladle
CN122072177AImage enhancement Image analysis Dust control Copper

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 0007875869000053
Figure 0007875869000054
Figure 0007875869000055

Patent Text Reader

Abstract

The present invention provides an apparatus and method that can be used to perform correspondence analysis on image data in a particularly low-noise and efficient manner. [Solution] The method includes the steps of selecting image patches from individual images, in each case generating a number of one-dimensional signals using even and odd convolution kernels within a spatial window, nonlinearly processing the differences of the convolution results, accumulating these differences to form a correspondence function, and evaluating the function.

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This invention generally relates to the analysis of image data. More specifically, it relates to a device that can be used to identify and locate corresponding pixels in a plurality of images. In particular, it also forms the basis of stereophotogrammetry, which determines the position of pixels in space based on the positions of matching pixels. [Background technology]

[0002] The first attempt at stereoscopic photography took place in 1838, when Sir Charles Wheatstone used mirrors to create not a single photograph, but two slightly different images. The spatial impression of the captured scene was created by viewing the left image with the left eye and the right image with the right eye separately. During World War I, wide-area image blocks from aerial reconnaissance were used for the first time to evaluate things in three dimensions.

[0003]

number

[0004] The base and focal lengths are known from the prior calibration of the stereo camera. For example, one way to obtain a map of the depth coordinates (3D data) of the captured subject space is to find a number of uniformly distributed corresponding points in the input image and calculate the disparity between these corresponding points. The spatial resolution of the 3D data is determined by the grid pitch of the corresponding points. Manual evaluation is time-consuming and therefore does not meet the accuracy requirements.

[0005] The goal of machine-space vision is automated correspondence analysis, which automatically and uniquely identifies point correspondences with minimal measurement error, i.e., accurately determining parallax. 3D data can be computed from parallax. Current applications require high resolution and accuracy of computed 3D data, as well as efficient and low-power computation in real time. Currently, the methods and devices used for correspondence analysis either cannot meet these requirements or can only meet them partially. A problem with many methods is that, for example, processing large image patches to reliably identify corresponding points consumes a lot of memory and computation. This hinders the implementation of high-speed, specialized hardware and slows down the creation of 3D data.

[0006] Many technological applications are based on experience gained through the study of human vision. Human spatial vision is based on two separate, uncalibrated lenses with parameters that are variable at runtime. Humans are able to subtly change the focal length of both eyes, enabling spatial vision under various conditions such as backlighting, fog, and precipitation. However, how human spatial vision works remains unclear. At least biological and medical studies suggest that human stereoscopic vision is based on spatial frequency processing of light signals received by the human eye at multiple spatial frequency scales. See, for example, Mayhew, JE and Frisby, JP, 1976, "Rivalrous texture stereograms", Nature, 264(5581):53-56, and Marr, D. and Poggio, T., 1979, "A computational theory of human stereo vision", Proceedings of the Royal Society of London B: Biological Sciences, 204(1156):301-328.

[0007] All of the literature describes the independent calculation of phase information across multiple spatial frequency ranges and within a single window. Regarding precise signal processing, the drawback of this approach lies in the fact that the fundamental contradiction between high spatial resolution and high spatial frequency resolution is not optimally resolved. The disparity signal synthesized from phase signals across individual spatial frequency ranges contains a significant amount of noise. While this noise is reduced by upstream low-pass filtering in the input image, this process also removes signal information.

[0008] Another study (Marcelja, S., 1980, "Mathematical description of the responses of simple cortical cells", J. Opt. Soc. Am., 70(11):1297-1300) describes the sensitivity window characteristics of correspondence analysis by describing the details of the sensitivity characteristics of neurons in the visual cortex in the form of a Gabore function.

[0009] Aside from stereophotogrammetry, there are methods for extracting depth information from multiple images. U.S. Patent Application Publication 2013 / 0266210 discloses a method for determining depth information of a scene, which includes capturing at least two images of the scene with different camera parameters and selecting an image field in each scene. In the first approach, multiple different orthogonal filters are used to compute multiple complex responses with amplitude and phase for each image, and a weight is assigned to the complex response of the image corresponding to each orthogonal filter. The weight is determined by the phase relationship of the complex responses, and the depth measurement of the scene is determined from the combination of weighted complex responses. In one embodiment, a confidence score is assigned to the depth estimates of various frames as an estimate of the reliability of the depth score. For example, the number of pixels in an image patch to which a weight of 1 is assigned by applying spectral masking can be used as the confidence score.

[0010] Generally, filtering can be widely used in image evaluation methods, and to further process the data obtained by this method, the image or image patch is convolved using a convolution kernel. In the subject detection method described in U.S. Patent Application Publication 2015 / 0146915(A1), first, image data is convolved with a convolution kernel, and then the convolved image is processed using a threshold filter. The threshold filter masks several pixels that are presumed not to contain information relevant to subject detection in order to speed up further processing.

[0011] Computer vision Automated correspondence analysis is typically performed using two or more digital images captured by, for example, left and right digital cameras (hereinafter referred to as stereo cameras). In ideal conditions, ignoring imaging errors, digitization errors, and quantization errors (and assuming both cameras are capturing the same subject and the same portion of the subject is visible from both cameras), this pair of stereo images is assumed to be identical except for the horizontal offset. When the relative orientation of the two cameras, i.e., their relative positions (e.g., base B), is known through prior calibration, correspondence analysis can be reduced to a one-dimensional search along the image of epipolar lines in the digital images by using epipolar geometry and epipolar lines. However, generally, in the uncalibrated state, epipolar lines extend to converge across image space. To avoid this, it is essential to generate a pair of stereo images without y-parallax by rectification. As a result, actual stereo cameras behave similarly to a stereo normal state, and all epipolar lines extend parallel to each other. For efficiency reasons, the search should not be performed in subpixel regions perpendicular to the scanning direction, so a high rectification quality with an acceptable tolerance of less than 0.5px is required.

[0012] In the literature, correspondence analysis is classified into three groups: area-based methods, feature-based methods, and phase-based methods.

[0013] Area-based methods represent the largest group of methods to date. A window of size m × n containing intensity values from the left digital image of a stereo camera is compared to the values of a window of the same size in the right digital image of the stereo camera using a cost function (e.g., sum of absolute differences (SAD), sum of squared differences (SSD), or mutual information (MI)). Correspondence analysis is then performed based on their evaluations of the differences in the regions. Conventional algorithms in this field include cross-correlation (e.g., Marsha J. Hannah, "Computer Matching of Areas in Stereo Images", PhD Thesis, Stanford University, 1974, and Nishihara, HK, 1984, "PRISM: A Practical Real-Time Imaging Stereo Matcher", Massachusetts Institute of Technology) and semi-global matching (Hirschmueller, H., 2005, "Accurate and efficient stereo processing by semi-global matching and mutual information", Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition). Cross-correlation has the drawback that although the detected disparity information is aligned along the epipolar line, points within the spatial window are weighted and analyzed equally regardless of the direction of the epipolar line. This means that an optimal signal-to-noise ratio (S / N) cannot be achieved.

[0014] Feature-based methods are not currently used to generate high-density 3D data. This is because the characteristic points required for this method are unevenly distributed and occur only sporadically (for example, only at the corners and edges of subjects captured by stereo cameras). They involve concatenating one or more properties (e.g., gradient, orientation) of a window m×n in the digital image to a descriptor, and then comparing these features comprehensively with other feature points, usually across the entire image. These neighboring features are generally computationally intensive but largely invariant in terms of intensity, scaling, and rotation, and therefore are comprehensively unique. Due to this comprehensive uniqueness and long computation time, feature-based methods are used, for example, primarily for image alignment and orientation, and for constructing the relative orientation (homography) of stereo image pairs.

[0015] Phase-based methods are less well known, but human vision is thought to be based on such methods. This method uses phase information from the signals of the left and right images to calculate the disparity as accurately as possible from the phase difference. Studies using random dot stereograms have shown that human vision cannot be based on intensity comparison (Julesz, B., 1960, "Binocular depth perception of computer-generated patterns", Bell System Technical Journal). Further research has led to the development of a theory of correspondence analysis based on human psychophysics (Marr, D. and Poggio, T., 1979, "A computational theory of human stereo vision", Proceedings of the Royal Society of London B: Biological Sciences, 204(1156):301-328). This method is based on zero crossing of the LoG ("Gauss's Laplacian") for various local resolutions and attempts to reduce outliers with a coarse-to-fine strategy. The experiment by Mayhew and Frisby (Mayhew, JE and Frisby, JP, 1981, "Psychophysical and computational study for a theory of human stereopsis", Artificial Intelligence, 17(1):349-385) shows that zero crossing alone cannot explain human visual perception. The authors hypothesize that the signal peak after convolution by filtering is also necessary for stereopsis.In his work Weng (Weng, JJ, 1993, "Image matching using the windowed fourier phase", International Journal of Computer Vision, 11(3):211-236, hereafter referred to as "Weng (1993)"), Weng describes that because the results of zero crossing are very unstable due to the limited number of channels, he recommends the windowed fourier phase (WFP) as a "matching primitive". The WFP is a combination of multiple corrected windowed fourier transforms (WFTs), where the phases determined by each individual WFT are averaged. However, since individual spatial frequencies and phases cannot be captured by spectrally pure methods, the signal-to-noise ratio is not optimal. Another method based on LoG zero crossing (T. Mouats and N. Aouf, "Multimodal stereo correspondence based on phase congruency and edge histogram descriptor," International Conference on Information Fusion, 2013) also performs low-pass filtering before disparity analysis, and therefore, as will be discussed in more detail later, an optimal signal-to-noise ratio is not achieved.

[0016] Overview of Phase-Based Correspondence Analysis Methods The image signals from the left and right (color) cameras are Y signals, also called gray values or luminance signals. Image ) and the color signals U and V can be represented as . Image resolution and contrast are important criteria for correspondence analysis and its measurement accuracy. For this reason, the Y signal (Y) has higher resolution than U and V. Image ) is mainly used. In this way, two high-resolution Y Image The channels are compared line by line. Image The considerations regarding this matter also apply to the U channel and V channel.

[0017] Both cameras image the same subject. Assuming that the cameras perform an ideal mapping of the subject space within the image space, the corresponding partial images of both cameras will be identical (YR Image -YL Image =0). However, under actual conditions, tolerances and differences occur. · Different perspectives (projection distortion), occlusions (vignetting), and different reflection behaviors (Lambertian reflection) due to different angles of view of the cameras with respect to the subject. · Camera noise (e.g., noise of the sensors of digital cameras, etc.), as well as PRNU (sensitivity non-uniformity) and DSNU (dark signal non-uniformity). · Digitization error and quantization error. · Differences in OTF (optical transfer function) due to different lenses, and loss of contrast caused by rectification of the corners of the image (especially the barrel distortion of wide-angle lenses).

[0018] When a signal of frequency ω is decomposed into a Fourier series, a real part and an imaginary part are obtained. The real part of the cosine signal ("even") represents the even part of the Fourier series, and the imaginary part of the sine signal ("odd") represents the odd part. A one-dimensional signal pair YL Signal and YR Signal The phase shift or parallax δ is calculated in the prior art as shown in Equation 2 (Jepson, AD and Jenkin, MRM, 1989, "The fast computation of disparity from phase disphase difference", IEEE Computer Society Conference on Computer).

Equation

number

[0019] There is a need to reduce complex processing and significantly improve signal quality, especially the signal-to-noise ratio (SNR). This will achieve the following objectives: In conventional techniques, to avoid phase signal errors calculated individually for each spatial frequency using the windowed Fourier transform (WFT) and to obtain a uniform signal, the optimal correspondence function is defined by combining disparity information within the limits of a sufficiently small measurement window in the spatial domain and within the range of a sufficiently small measurement window in the spatial frequency domain. The solution to this optimal correspondence function (SSD(δ)) for δ is called the group disparity function (SSD'(δ) / SSD″(δ)). The system separately obtains an optimal correspondence function that includes parallax information in the direction of the camera's base B vector, and a separately calculated confidence function that has additional information independent of the parallax in the direction of the camera's base B vector. Using this confidence function, the system selects the correct parallax even when there are multiple candidates, without affecting the group parallax function and increasing the noise of the parallax measurement. • To minimize the number of convolution operations and compute the group disparity function with low noise, model calculations are performed to determine the optimal convolution kernel curve. This implements adaptive behavior of the group disparity function, aiming to control the effective transfer function in the spatial frequency range based on the content of the current image within the window, such that the bandwidth of the effective noise depends on the maximum amplitude in the Fourier sequence of the image signal. This achieves a generally optimal filter behavior as described in Wiener, N., 1949, "Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications", The MIT Press (hereinafter referred to as "Wiener (1949)"). • Implement correspondence analysis using high-resolution camera data and fair disparity information without pre-processing with a low-pass filter. Improve noise through low-pass filtering of the 3D data, or the series of disparity measurement results underlying these 3D data after correspondence analysis. • By adjusting the power spectrum of the image, the optimal transfer function of the group disparity function is controlled. • By adjusting the coplanarity conditions of the optical axis and monitoring and correcting the relative shift of stereo image pairs (optical motion nystagmus) during execution, noise caused by disturbances in the epipolar geometry (y-parallax) is minimized. [Overview of the Initiative] [Problems that the invention aims to solve]

[0020] The object of the present invention is to provide an apparatus and method that can be used to perform correspondence analysis of image data in a particularly low-noise and efficient manner, while improving upon the aforementioned problems. This object is achieved by the subject matter of the independent claims. More advantageous embodiments are described in the respective dependent claims. [Means for solving the problem]

[0021] To achieve the aforementioned objective, a correspondence analyzer is provided for determining the disparity between corresponding pixels in two separate digital images, also referred to in the art as frames. This correspondence analyzer determines the disparity δ, which is the shift between corresponding pixels in two separate digital images, in each case of two digital It comprises a computing unit configured to select image patches from individual images. Two digital Select one of the individual image patches as the reference image patch, and then search for multiple sets of image patches. Two digital Select from individual images. Since the reference image patch and search image patch are preferably roughly on the epipolar line, the parallax of the search image patch is the distance along the epipolar line from this search image patch to the reference image patch. The set of search image patches and their parallaxes represent the range of parallax that the correspondence analyzer needs to find to correspond (i.e., match).

[0022] In contrast to other techniques, the information from image patches related to disparity determination is preferably combined into a unified correspondence function that evaluates a rectangular spatial window, i.e., information from image patches and information from a preferably rectangular spatial frequency window containing multiple spatial frequencies. The advantage is that it avoids the process, as in other techniques, of first extracting individual spatial frequencies to introduce noise, measuring disparity for each of these spatial frequencies, and then interpolating these measurements to reintroduce noise. The relationship between the size of the spatial window, the size of the spatial frequency window, and the optical transfer function of the camera provided by the individual images will be described in more detail below.

[0023] Correspondence function SSD(δ p ) is obtained from data from an image patch, which is then processed into a signal that is sequentially convolved with a specially defined convolution kernel. Both are described in more detail below. In each case, the reference image patch is disparity δ p Combined with a search image patch having point δ p SSD (δ p ) determines. Therefore, the computing unit, - Multiple YL signals from a reference image patch signal,v Generate multiple YR signals from the search image patch. signal,v Generate, - In the spatial window, the multiple signals YL of the reference image patch are processed by substantially even convolution kernels and substantially odd convolution kernels stored in memory, that is, by even convolution kernels having a weighted sum of multiple even harmonics of different spatial frequencies and odd convolution kernels having a weighted sum of multiple odd harmonics of different spatial frequencies. signal,v Perform a convolution, - In the spatial window, multiple convolution kernels stored in memory are used to process multiple YR signals for each of the searched image patches. signal,v Perform a convolution, - Each signal pair YL signal,v and YR signal,vIt is further configured to calculate the difference between each of the convolution results.

[0024] point δ p The convolution kernel is selected such that the corresponding function is formed such that its extrema at a given point indicates the correspondence at that point. Alternatively, the first derivative of the corresponding function can be directly determined, and its zero crossing indicates the correspondence. Therefore, the arithmetic unit is - The distance from the search image to the reference image is shown by point δ, which is obtained by processing the difference of the convolution results non-linearly for each search image patch and integrating them. p The correspondence function SSD(δ p Either obtain the function value of ) or the point δ from the difference of the convolution result. p δ in p The correspondence function SSD(δ p The first derivative of ) SSD'(δ p By calculating point δ p The correspondence function SSD(δ) in p The function value of ) or its first derivative SSD'(δ p Get the function value of ) - Correspondence function SSD(δ p The extreme value of ) or the correspondence function SSD(δ p The first derivative of ) SSD'(δ p Determine the zero crossing of ) - One of the extreme values δ p or one of the zero intersection points δ p It is further configured to output the parallax δ.

[0025] Parallax is also preferably obtained at a resolution finer than a finite set of search image patches, that is, at a point where information (from adjacent search image patches) can be obtained, referred to as subpixel-precision parallax values δ p It is determined and needs to be output in δ. A preferred option for this purpose is δ p Group parallax SSD'(δ) in the vicinity of p ) and SSD″(δ p) is calculated to determine the subpixel precision portion of the disparity value.

[0026] The output may be in the form of entries in a disparity map, for example, the determined disparity being assigned to the position of the corresponding reference image patch. The output typically refers to providing values for further processing or display. Further processing may include, for example, determining the distance to the subject. Further processing may also include various filtering operations on the data, which will be discussed further below.

[0027] Correspondence analysis of individual digital images or frames is typically an operation affected by noise and tolerance, for example, the effects of discretization and quantization when representing them as a finite number of pixels with a finite resolution (e.g., 8 bits per pixel, color channels). This situation is also true for convolution in a spatial window using discrete convolution kernels. In this case, there is a further problem of how to select the coefficients of these convolution kernels so that the convolution results are low-noise and useful for correspondence analysis.

[0028] In particular, for these reasons, the present invention discloses a method for selecting a convolution kernel within the framework of a continuous signal model having a continuous function, and a method for obtaining a correspondence function that can be directly transmitted to discrete processing using a discrete convolution kernel, while simultaneously enabling the determination of disparity with low noise. The correspondence function and convolution kernel are selected in such a way that, in particular, existing noise, i.e., most of the other irrelevant information, is ignored, so that the existing disparity signal, i.e., information from the image patch relevant to the determination of disparity, is used for correspondence analysis. This is important because otherwise, the noise may cause the disparity to be determined inaccurately. Furthermore, a method is disclosed for selecting a convolution kernel for a particular profile of an input image or image patch so that an optimal filter is created together with the correspondence function.

[0029] Conversely, this differs in that the present invention discloses multiple sets of discrete convolution kernels based on a signal model, each of which has additional similar discrete convolution kernels, which either contain little additional noise or simply different types of noise to a similar degree, and therefore these are also disclosed. Such multiple sets of convolution kernels are, simply put, unlikely to be found by chance or by a search not derived by the model, because there are many possible convolution kernels (in the exemplary embodiment with four convolution kernels described below, a total of 32 coefficients need to be determined, for example, 256 for each coefficient in the case of 8-bit resolution). 32 (Equivalent to a combination of streets).

[0030] A key component of the present invention is the use of both a convolution kernel consisting of a weighted sum of multiple even harmonic functions at different spatial frequencies and a convolution kernel consisting of a sum of multiple odd harmonic functions at different spatial frequencies. As a result, the number of convolution operations required may be less than or equal to the number of spatial frequencies considered within the spatial frequency window, thus reducing the computational complexity compared to other techniques while simultaneously improving the signal-to-noise ratio. The discrete convolution kernel includes the sum of these multiple functions, in particular when the convolution kernel precisely constitutes the discretization of each sum at each position of the convolution kernel. However, when there is a deviation between the discrete coefficients of the convolution kernel and the ideal sum of even or odd functions, a high correlation between the discrete values and the underlying functions is particularly preferred. In a particularly preferred embodiment, the coefficients of the filter kernel may correspond to the function values of the weighted sum of harmonic even or harmonic odd functions, or have correlation coefficients with function values having an absolute value of at least 0.8, preferably at least 0.9. In further embodiments, these coefficients have a high coefficient of determination R for the function value. 2The coefficient of determination is preferably at least 80%, particularly at least 90%, and most preferably at least 95%. Even if the correlation coefficient and / or coefficient of determination reach the aforementioned values, the coefficients of the even convolution kernel and the coefficients of the odd convolution kernel still represent with sufficient accuracy the weighted sum of multiple even harmonics at different spatial frequencies, or the weighted sum of multiple odd harmonics at different spatial frequencies, respectively.

[0031] It is advantageous, though not essential, that the positions measured in individual images are at the center of their respective image patches or convolution kernels. The convolution kernel can be discretized such that the functions of positions adjacent to the center of the image patch or convolution kernel are either even or odd. Furthermore, the sum does not need to represent an even or odd function in a strict sense. Entries within the convolution kernel may reflect a slightly asymmetrical function profile with respect to positions adjacent to the center of the reference image patch and the search image patch, and / or each may be even or odd. For example, expanding a convolution kernel with additional coefficients at the ends that have smaller values compared to the other coefficients of the convolution kernel actually only adds a small amount of noise. Furthermore, the convolution kernel may exist in combination with convolutions from previous processing steps, but constitute a convolution operation within the scope of what is meant by this invention. Thus, the aforementioned modifications still involve sums of multiple even harmonics or sums of odd harmonics.

[0032] Correspondence function SSD(δ p It is more preferable to form a nonlinear process such as squaring the difference or convolution of features. Since both the squaring nonlinear process and the calculation of its derivative are particularly simple operations, they are easy to implement on appropriately fitted hardware. In addition to this calculation, it is also possible to create characteristic curves that include even powers of the difference (fourth power or higher) and nonlinear processes that limit differences exceeding a threshold.

[0033] Further embodiments of the present invention are enabled by selecting a convolution kernel to include a weighted sum of even harmonics and a weighted sum of odd harmonics, respectively, and by nonlinear processing of the difference of the convolution results, particularly squaring it. This significantly reduces the influence of the subject phase in the signal model on the disparity measurement results. For example, if the texture on the subject being analyzed is moved in space without moving the subject itself, the subject phase in the signal model may change. Simply put, this means that when there is a signal usable for disparity measurement within a selected range of spatial frequencies, the unified correspondence function yields low-noise measurement results that are largely independent of the texture or pattern of the subject. For this purpose, the convolution kernel is configured such that, in the signal model of each signal V within the range of spatial frequencies, k max individual Even functions and l max individual The convolution operation with odd-numbered functions results in amplitude A. m By transferring a weighted set of spatial frequency signal components as a sum, we obtain two partial sums of the spatial frequencies of each signal V and index m in the correspondence function SSD(δ). The first term is the squared amplitude Am from the result of a convolution operation using an even function. 2 The term has a characteristic, and the second term is the squared amplitude A from the result of a convolution operation using odd functions. m 2 The terms are selected to have the following characteristics: The first partial sum and the second partial sum are the sum of both partial sums SSD inv (δ) is the subject phase Δ m In particular, it is combinatable according to the Pythagorean trigonometric functions, either precisely or approximately, so as to be independent of k. Specifically, the convolution kernel is a signal model of each signal V in the range of spatial frequencies, k max individual Convolution operation of even functions and l max individual Each of the convolution operations of odd functions has an amplitude A. mBy transferring a weighted signal component of a group of spatial frequencies as a sum, two terms are obtained for each signal V and each spatial frequency having index m in the correspondence function SSD(δ). The first term is the square of the amplitude A m 2 The first constant and the product of the squares of the sine function, where the second term is the square of the amplitude A. m 2 The first constant is the product of the second constant and the square of the cosine function, where the values of the first and second constants are equal or equal within a tolerance of ±20%.

[0034] Simply put, this means that the largest component of the correspondence function's value is independent of the subject phase, and therefore, once a signal is provided, it can be used to determine the parallax with low noise.

[0035] The deviation of the parallax from the actual value due to various noise processes can be characterized by its standard deviation σδ. In systems known from the prior art, a standard deviation of 0.25 pixels or more is typically achieved, and in well-tuned systems, the standard deviation is between 0.25 and 0.5. In contrast, the correspondence analyzer of this disclosure can achieve a lower standard deviation. Generally, when determining parallax, a convolution kernel can be selected to achieve a standard deviation of less than 0.2 pixels of parallax measurements, and even 0.1 pixels can be achieved, especially in the case of a planar subject shift having intensity modulation along the direction of the epipolar line containing spatial frequencies within a range of spatial frequencies, or having a corresponding texture, where the subject shift occurs along the epipolar line at a constant distance Z from the camera. In this case, the standard deviation is hardly affected by systematic errors, particularly those occurring in the manner known from the prior art. Such tests can be used to determine the aforementioned subject phase interference. The test can be performed using captured camera images, or it can be performed using composite or computed images, such as rendered images.

[0036] Signal YL signal,vand YR signal,v These are calculated from the brightness of the pixels in each image patch. These signals can be obtained, in particular, by performing a convolution of image intensity using an appropriate convolution function, which may include, for example, averaging. Particularly preferred harmonic functions are the cosine function as an even function and the sine function as an odd function. Since the signals are convolved approximately along the epipolar line, a convolution roughly perpendicular to the epipolar line is preferred. The order of convolution in the perpendicular and along the epipolar line is arbitrary, and the convolution can also be performed in conjunction with an appropriate convolution kernel, in particular. The selection of the convolution kernel that determines the signal, in combination with a special correspondence function, aims to preserve information useful for disparity calculation while reducing the effects of noise. For this purpose, in further embodiments, particular consideration is given to the arithmetic unit being configured as follows. -By convolution of data from a reference image patch perpendicular or approximately perpendicular to the epipolar line, from the reference image patch v max The aforementioned Multiple YL signals signal,v of Generated and performed by convolution of the data of each search image patch perpendicular or approximately perpendicular to the epipolar line, from each of the search image patches v max The aforementioned Multiple YR signals signal,v of It is configured to generate signals YL signal,v and signal YR signal,v A convolution operation that generates k max The aforementioned Convolution of even functions, and l max The aforementioned The convolution operation of odd-numbered functions is selected in the signal model such that the latter multiple convolution operations convey the sum of multiple weighted signal components of multiple spatial frequencies, indicated by multiple different values of index m. -Regarding each signal, In the correspondence function SSD(δ) mentioned above, Subject phase Δ m Independent Consists of terms 1st partial sum Get The subject phase Δ m Depends on Consists of terms The selection is made such that the second partial sum is obtained, where, -v max individual signal YL signal,v and signal YR signal,v When the first partial sums of each are added together, a constructive sum is obtained in which the individual terms do not cancel each other out. -v max individual signal YL signal,v and signal YR signal,v When the second partial sums of each of these are added together, a statistical sum is obtained in which these noise components at least partially cancel each other out statistically. The addition of the first and second partial sums is performed when calculating the value of the correspondence function. As used in this disclosure, the term “statistical addition” means that the result is obtained from the sum of random, i.e., statistically distributed noise components of the image signal. This statistical addition has the advantageous property that the errors caused by the noise can at least partially cancel each other out.

[0037] The components of the present invention described so far are designed to determine parallax with particular accuracy, especially with sub-pixel precision. However, this does not aim to determine the likelihood that an actual correspondence falls within a given parallax range, i.e., to determine the reliability of the correspondence. In the correspondence function, information that is not useful for determining the parallax value is ignored as much as possible, but the same information may be relevant to determining the reliability. A simple example is a search image patch where the intensity of all pixels is 30% greater than that of the corresponding pixels in the reference image patch. This constant luminance difference does not provide useful information for accurate parallax determination and only generates noise, such as masking very low-contrast textures that are useful for accurate parallax determination, and is therefore masked by the convolution kernel (preferably not the mean) that performs the convolution of the signals in the correspondence function. At the same time, in this example, there is a second search image patch with a constant luminance difference of only 5%, and this small deviation is generated by differences in camera control. Thus, the correspondence function uses two or more search image patches as candidates for possible correspondence to determine a highly accurate but potentially ambiguous result. When the confidence level is determined individually, the region of the second search image patch, where the difference is only 5%, shows a higher probability of correspondence.

[0038] Therefore, the correspondence function is preferably supplemented by an independent confidence function. In contrast to other methods that do not distinguish between these two objectives and, for example, use only one function to determine disparity and confidence, the methods disclosed herein have the advantage that low noise allows for both accurate disparity determination and good confidence determination, rather than simply considering a trade-off between the two. Accordingly, according to a further embodiment, a correspondence analyzer is provided comprising a computing unit (3) that includes a specific convolution of the image signals described herein, independently of the correspondence determination described herein, and in particular independently of the preceding claims. This correspondence analyzer, - 2 digital When selecting each image patch from each individual image (25,26), The two aforementioned digital At least one image patch from the individual images is selected as the reference image patch, and the search image patch is the other digital The system includes a calculation unit (3) selected for individual images and configured to calculate multiple candidate disparity values from image patches, the calculation unit (3) further configured to select information from reference image patches and search image patches and, based on this information, select confidence vectors for possible disparity values that are suitable for estimating whether each result shows an actual correspondence between the reference image patch and each of the search image patches. This is particularly useful when the confidence vectors provide information that has not yet been provided by the correspondence function, or information that has not been provided with the same quality. Therefore, the calculation unit selects the value of at least one element of the confidence vector using a function that can classify candidates as valid or invalid with a higher probability than the probability when using the correspondence function alone for at least several classifications of reference image patches and search image patches. The aforementioned constant difference in luminance is one example.

[0039] Despite determining disparity with low noise, residual noise remains, which can be related to both the correspondence function and the confidence value. Residual noise can be further reduced by processing the calculated disparity values or confidence vectors for multiple reference image patches with a low-pass filter. Compared to prior art, and especially compared to other methods that perform low-pass filtering before the signal is used to determine disparity, processing the entire signal bandwidth and performing low-pass filtering downstream of the correspondence analysis achieves more effective noise reduction with comparable contrast and resolution to the disparity measurements in individual images. Furthermore, the low-pass filter can ensure that less reliable measurement results are included in smaller quantities. Therefore, in one embodiment, a configuration is considered in which the computing unit filters at least one of the calculated disparity values, confidence values, or confidence-weighted disparity values using a low-pass filter.

[0040] The search image patch is selected so as to be located at least approximately along or on the epipolar line. Thus, the signal of the search image patch forms a one-dimensional function approximately along the epipolar line. Furthermore, the parallax is given by the length of the curve between corresponding pixels along the epipolar line. The expressions "approximately along the epipolar line" or "approximately perpendicular to the epipolar line" are used to indicate that the actual epipolar line does not need to extend strictly along the image direction of the rectified image, for example, due to adjustment inaccuracies or optical distortions. Therefore, within a given range of inaccuracies, the term "approximately along the epipolar line" should be synonymous with "along the epipolar line," and the term "approximately perpendicular to the epipolar line" should be synonymous with "perpendicular to the epipolar line."

[0041] Generally, it is useful to select the order of search image patches so that the epipolar line passes through the search image patch, or so that the search image patch contains the epipolar line. As long as the epipolar line passes through the search image patch, the search image patch is generally located on the epipolar line.

[0042] The expected parallax range is a predetermined maximum range in the x-direction, or along the epipolar line, where the search image patch corresponding to the reference image patch can be located. For example, the expected parallax range could be ±50px in the x-direction around the pixels of the digital image where the parallax is determined.

[0043] The present invention also relates, in particular, to a method for determining disparity performed using a correspondence analyzer described herein. Preferably, a method is provided for determining the disparity of corresponding pixels in two separate digital images rectified to a stereo normal state, wherein a computing device is used to determine the disparity δ, by doing the following: - 2 digital When selecting individual image patches from separate images, the one selected as the reference image patch. Two digital One of the individual image patches, Two digitalselecting a series of search image patches selected from the other of the individual images, and - from the reference image patch v max individual generating a plurality of signals YL signal,v of from the search image patches, and v max individual generating a plurality of signals YR signal,v of and - in a spatial window, using an even convolution kernel having a weighted sum of a plurality of even harmonic functions of different spatial frequencies stored in memory and an odd convolution kernel having a weighted sum of a plurality of odd harmonic functions of different spatial frequencies, perform convolution of the plurality of signals YL of the reference image patch signal,v and - in a spatial window, using a plurality of convolution kernels stored in memory, perform convolution of the plurality of signals YR for each of the search image patches signal,v and - for each signal pair YL signal,v and YR signal,v calculate the difference between the respective convolution results, and - non-linearly process the difference of the convolution results for each search image patch and integrate them to obtain the distance from the search image to the reference image, the point δ p obtain the function value of the correspondence function SSD(δ p ) at, or calculate the first derivative SSD´(δ p ) with respect to δ at the point δ p from the difference of the convolution results, thus calculating the function value of the correspondence function SSD(δ p ) or the function value of its first derivative SSD´(δ p ) at the point δ p and p p p ), and - the correspondence function SSD(δ pThe extreme value of ) or the correspondence function SSD(δ p The first derivative of ) SSD'(δ p Determining the zero crossing of ) and one of the extrema δ p , or one of the zero intersection points δ p Outputting as parallax δ, or - point δ p Calculate and output the subpixel precision value of the parallax in the given context.

[0044] The present invention also relates to a stereo camera comprising two cameras, each having a camera sensor and a lens, the optical centers of which are spaced apart from each other by the width of their bases. The stereo camera is equipped with a correspondence analyzer as described above or configured to perform the method described above. However, a configuration including two cameras is not essential. In principle, 3D data can also be obtained from digital images taken sequentially at different locations.

[0045] The primary application of a correspondence analyzer is the determination of parallax in stereo images. Therefore, the present invention also relates to a stereo camera comprising a correspondence analyzer and an imaging device that captures pairs of digital images in overlapping imaging regions from equally spaced line-of-sight directions. The computing unit of the correspondence analyzer calculates the distance coordinates of pixels from the parallax of corresponding pixels. The distance between the field-of-view directions (optical centers) is the base B. The distance coordinate Z can then be calculated by the computing unit as Z = B·f / δ (where δ is in units of [mm]) according to the aforementioned formula 1.

[0046] The present invention, its background, and advantages will be described in more detail below with reference to the accompanying drawings. [Brief explanation of the drawing]

[0047] [Figure 1] Figure 1 shows a camera lens equipped with an adjustment device for adjusting the position of the optical axis. [Figure 2]Figure 2 shows a distorted grid and a rectified grid obtained by imaging with a camera. [Figure 3] Figure 3 shows the epipolar geometry for the general case and the case of stereo normalization. [Figure 4] Figure 4 shows a graph in which the image signals YLsignal,v and YRsignal,v are shifted relative to each other. [Figure 5] Figure 5 shows the function values of an example convolution kernel that performs convolution of image data in the y-direction perpendicular to the epipolar line. [Figure 6] Figure 6 shows 3D images before (image (a)) and after (image (b)) low-pass filtering. [Figure 7] Figure 7 shows a graph of the spatial frequency profile. [Figure 8] Figure 8 shows the quasi-linear relationship (characteristic) between the actual shift δsim due to the random amplitude A, phase Δ, and disparity δsim of the image input signal shown in graph (a), and the subpixel interpolation δ calculated from the average subpixel interpolation of all signals within the defined domain ±0.5px shown in graph (b). [Figure 9] Figure 9 shows the camera image and the corresponding 3D data determined by the correspondence analyzer. [Figure 10] Figure 10 shows the function values of two even convolution kernels and two odd convolution kernel sets in a signal model that performs convolution of an image signal in the x-direction. [Figure 11] Figure 11 shows the function values of the odd convolution kernel in Figure 10 and the even convolution kernel in the corresponding signal model. [Figure 12] Figure 12 shows a stereo camera equipped with a correspondence analyzer. [Figure 13] Figure 13 shows an example of the correspondence function SSD(δ) profile within a defined disparity range. [Figure 14]Figure 14 is a schematic diagram of the calculation of the data stream for the features of the camera image. [Figure 15] Figure 15 is a schematic diagram of the hardware configuration for processing the data stream. [Figure 16] Figure 16 shows a stereo camera that uses sinusoidal luminance modulation to photograph a subject. [Figure 17] Figure 17 shows the weighting of each pixel value, with graph (a) showing the weighting of pixel values using a box filter and graph (b) showing the weighting using a Gaussian filter. [Modes for carrying out the invention]

[0048] rectification The purpose of rectification is to construct an epipolar geometry based on a stereo normal state model. The nonlinear geometric transformation corrects the distortion, projection distortion, and relative orientation of the two images (left and right images) independently of distance, so that those points of the subject are imaged on the same line in the left and right camera images with sub-pixel precision. This reduces the correspondence analysis to a one-dimensional problem.

[0049] To achieve the most accurate rectification possible, the following three substeps can be performed:

[0050] Correction of the camera's internal orientation This involves correcting for the nonlinear geometric distortion of the lens, the focal length f, and the unevenness of the camera sensor.

[0051] Adjustment of coplanarity conditions The tilt of the optical axis of a stereo system is the main source of error outside of the calibration distance. This error is minimized by a limited coplanarity condition for both axes. In practice, this condition can be achieved by an eccentric sleeve, for example in the form of a microlens, that holds the camera lens. The relative position of the optical axes can be determined, for example, by measuring test images at two or more distances, and then the position of one of the optical axes can be adjusted by rotating the eccentricizer so that both axes are coplanar.

[0052] Figure 1 shows an exemplary embodiment of a lens mount 10 having a lens 8. The lens mount 10 comprises two eccentric elements 11, 12 that are rotatable relative to each other. The lens 8 is screwed into the eccentric element 11. By rotating the eccentric elements 11, 12 relative to each other, it is possible to change the position of the optical axis of the lens 8 without changing the distance between the lens and the image sensor, i.e., without maintaining the position of the image plane. After adjustment, the eccentric elements 11, 12 can be tightened and fixed relative to each other by a screw 13. In one embodiment, one of the lenses is considered to be held in an adjustable eccentric comprising two eccentric elements 11, 12, and the coplanarity of the optical axis of the lens can be adjusted by rotating the lens in the eccentric in front of the test image. This embodiment of the stereo camera can be used, in particular, independently of the correspondence analyzer according to this disclosure and the special processing of image data described herein. A stereo camera with an eccentric for adjusting the coplanar axis is also possible and will be apparent to those skilled in the art as being useful in combination with other image processing techniques. Therefore, more generally, and not limited to the correspondence analyzers described herein, a stereo camera 2 is provided having two cameras 21, 22, each including a camera sensor 5 and lenses 8, 9, wherein the optical centers of the lenses 8, 9, each containing multiple camera sensors 5, are spaced apart from each other by the width of the base B, and at least one adjustable eccentric is provided, which allows adjustment to change the orientation and position of the optical axis of one of the lenses 8, 9, so that coplanarity errors of the optical axes of the lenses can be corrected. The eccentric may be configured as described above, but variations thereof are also possible. For example, the lenses may be fixedly mounted to each other, and the eccentric may be used to adjust one of the cameras relative to the corresponding lens.

[0053] Correction of the camera's external orientation Once the internal camera orientation correction is complete, the remaining external orientation correction to be achieved is the affine transformation involving rotation and translation.

[0054] Rectification is based on the principle of a virtual camera (VIRCAM). The camera stores rectification data in a tabular format containing positional information of the actual (x,y) coordinates of image I for each target coordinate (i,j) within the epipolar grid. Since the coordinates (x,y) are rational numbers, interpolation in a 2x2px region around the pixel is advantageous for minimizing noise. VIRCAM scans within the virtual grid. For each virtual grid point, interpolation is performed in a 2x2px region around image I up to the target grid (i,j). This geometric correction is nonlinear.

[0055] Image (a) in Figure 2 illustrates an example of distortion of a regular grid in a camera image. Due to lens distortion, a regular grid in the subject space is distorted, for example, into a barrel shape as shown. This distortion and projection distortion are corrected by rectification in VIRCAM. This involves a virtual transformation of the image coordinates (x,y) to the VIRCAM coordinate system (i,j). This rectification allows the VIRCAM stereo image pair to behave as if it were a normal stereo image. In image (b), sections of the target grid are shown superimposed on the actual (x,y) coordinates, which are shown as points.

[0056] Figure 3 shows the epipolar geometry of a stereo image pair including images 104, 105, epipoles 98, 99, and the epipolar plane 102. Figure (a) shows a typical stereo state. Figure (b) shows a stereo normal state. The epipolar geometry describes the linear relationship between the camera orientation and the point correspondence between pixel 103 in image 104 and pixel 106 in the other image 105. The corresponding pixels 103, 106 are located on the epipolar line 107. When a point correspondence is found, the associated 3D point 101 arises from the parameters of the stereo camera (focal length and base) and the pixel correspondence, i.e., the pixels 103, 106 corresponding to the 3D point.

[0057] mathematical derivation Stereo normal state (YL image or YR image From each of the rectified images of the stereo camera in ), vmax individual Row signal YL signal,v or YR signal,v (v=1...v max When this happens, the selected option is chosen. These row signals can be obtained directly from the rectified image (e.g., YL image and YR image The intensity values of each row are obtained after prior convolution using k y-directions perpendicular to the row direction of the rectified image and l odd-numbered convolution kernels. Furthermore, it is also possible to obtain the row signals by performing the y-direction convolution after the x-direction convolution. In other words, the order of the convolution operations is interchangeable. In particular, the arithmetic unit is a large number of v defined in a spatial window of -T / 4…+T / 4. max individual Signal pair YL signal,v and YR signal,v Generates multiple v in the y direction max =k y +l y The convolution of an image patch can be configured using a set of convolution kernels. The y-direction is the image direction roughly perpendicular to the epipolar lines. To optimally calculate disparity, it is advantageous to limit the bandwidth to the spectrum of the actual signals. Recommended sizes for the spatial window and T can be found, as well as for the convolution window size in the x-direction, which will be discussed further below. Any convolution in the y-direction can be separated from convolutions in the x-direction, which will be discussed further below. Performing the y-direction convolution first is not mandatory, but it is advantageous.

[0058] v max An example convolution kernel f for =5 and T=16px y,v This is shown in Table 1 (the columns represent each position within the convolution kernel). Figure 5 shows the function values of the convolution kernel in the y direction from Table 1. For a precisely rectified stereo image, there are many similar convolution kernels that have the same effect, v maxThis value can also take values other than 5. In practical applications, rectification has tolerances, and the resulting noise will be discussed further below. Furthermore, as will be discussed later, noise can be further reduced by using different forms of convolution kernels. [Table 1]

[0059] In further embodiments, only a portion of the convolution kernels described above may be used. For example, one of the five convolution kernels listed in the table may be omitted, meaning that a set of four convolution kernels can be selected. In one embodiment, convolution kernel f y,2 ,f y,3 ,f y,4 , and f y,5 Used, that is, the convolution kernel f y,1 This is omitted. This embodiment still yields good results, although the noise is slightly increased, and the computational complexity is reduced.

[0060] Therefore, for each row y (along the epipolar line), YL is measured in each of the left and right cameras. signal,v (x) and YR signal,v A discrete one-dimensional function called (x) is obtained. In general, these convolution kernels can consist of function values that include a weighted sum of multiple even harmonics, called an "even convolution kernel," or a weighted sum of multiple odd harmonics, called an "odd convolution kernel." Each harmonic function samples a different spatial frequency.

[0061] Next, a specific row y, especially YL signal,v x and YR signal,v The subsignal within the window at position (x+δ) is extracted from it. Here, the left camera is the reference camera. The right camera is also the reference camera (i.e., YR signal,v So x, YL signal,vThen, (x+δ) can be selected. The correspondence function SSD(δ) is obtained by calculating the similarity of the two windows as a function of the shift δ within the disparity range of position x. Finally, the extrema of the correspondence function SSD(δ) is found and filtered using further criteria as needed, and by solving the correspondence function SSD(δ) for δ, the disparity δ determined in this procedure can be assigned to the position (x,y) in the image of the reference camera in the image plane. Then, the 3D data is calculated by back-projecting the disparity δ onto the subject coordinate system. To illustrate this, Figure 4 shows exemplary signals YL and YR at positions shifted differently pixel by pixel. In the middle graph, the relative shift corresponds to disparity δ, in the upper graph the shift is δ-1, and in the lower graph the shift is δ+1. The agreement between signals YL and YR is greatest in the middle graph, which is why the disparity δ is close to the actual disparity of the locally imaged subject. However, due to pixel-level shifts, the actual parallax does not perfectly match.

[0062] To generate high-quality 3D data, the discrete signal function YL signal,v (x) and YR signal,v Low-noise interpolation of disparity δ is required between grid positions of (x). This process is called subpixel interpolation and is performed by the computing unit of the correspondence analyzer, as will be described in more detail later. For subpixel interpolation to be successful, two prerequisites are favorable: integrating very small noise signal components distributed in the spatial frequency spectrum in the most complete and accurate way possible, and generating a known functional profile of the correspondence function SSD(δ) near the extremum that is largely independent of the specific signal form of the window signal.

[0063] Due to its similarity to the Kupfmuller uncertainty relation formulated in time-domain communications engineering (which is even more similar to Heisenberg's in 1924), there is a contradiction between high spatial resolution and high spatial frequency resolution. Therefore, in order to obtain a sufficiently small bandwidth in the spatial frequency domain, the signal YL is obtained in a small window, for example, 8px wide, which is desirable for high spatial resolution. signal,v and YR signal,v It is impossible to perform convolution on a single spatial frequency. After convolution, the spatial frequency signals used for further interpolation are superimposed by components of other spatial frequencies. Therefore, the result of convolution of a real signal cannot be considered error-free, as it is the result of convolution of a harmonic signal. Thus, the phase determination using only one spatial frequency by prior art is affected by noise.

[0064] The objective of the present invention is to ensure that theoretically unavoidable errors (particularly those resulting from the special selection of signal formats for small convolution kernels) are largely canceled out by each other. signal,v and YR signal,v The technique involves performing multiple convolutions optimized to be effective across the entire window, and then combining the convolution results into a correspondence function SSD(δ). In contrast to conventional techniques, it is not necessary to reduce the fundamental measurement error of the windowed Fourier transform (WFT) by pre-processing the image signal with a low-pass filter. The residual error remaining after correction is removed by a low-pass filter (hereinafter referred to as the output low-pass filter) only after processing the 3D data, or after processing the set of disparity measurement results based on that 3D data. Specifically, the goal is to generally detect the integrated common disparity signal included in the correspondence function SSD(δ), which consists of signal components with multiple spatial frequencies. Solving the correspondence function SSD(δ) for δ will be referred to as group disparity below.

[0065] Before expanding the discussion to the actual situation, first, simplify the explanation by assuming an idealized stereo camera and a continuous signal model. The idealized stereo camera, when simplified, as shown in Equation 4, has two ideal row type signals YL max individual and YR ideal that can be modeled as Fourier series with m ideal elements in the interval T. (Instead of YL signal,v and YR signal,v ) are provided.

Number

[0066] In the idealized stereo camera, the transfer functions of both cameras are the same, and there is no specific signal error (e.g., reflection). Therefore, the amplitude A m and the phase Δ m can be assumed to be the same for both cameras. Thus, YL ideal and YR ideal differ only by the shift due to the parallax δ. The index (or coefficient) m determines the respective spatial frequencies in the ideal signal. Here, ω is defined as 2*π / T.

[0067] In the next step, define the even convolution kernel f ideal and the odd convolution kernel f ideal used for the processing of YL even,k and YR odd,l . These convolution kernels can be modeled as Fourier series in phase form, as shown in Equation 5. The coefficient vectors c k,n and s l,n in the convolution kernel of Equation 5 determine the weighting of the respective harmonic functions at the spatial frequency n of the convolution kernel. n max is equal to m max in Equation 4. k max and l max are the numbers of the even and odd convolution kernels, respectively.

Number

[0068] Idealized signal YL ideal and ideal and convolution kernel f even,k and f odd,l This is a continuous function. Digitization will be considered separately. The spatial window is preferably half the size of the interval T, particularly -T / 4 to +T / 4. As a result, some convolution kernels contain incomplete periods, or fragments. Including fragments has the advantage of allowing more spatial frequencies to be packed into a smaller convolution kernel. In one embodiment, it is generally intended to select a window smaller than the interval T, and window sizes other than -T / 4 to +T / 4 are also usable.

[0069] The illustrated exemplary embodiment uses a window size T / 2 = 8px and a spacing T = 16px. Preferably, four spatial frequencies can be placed in such a window within the range of spatial frequencies (i.e., m in Equation 4). max =4). The window size, i.e., the number of spatial frequencies, varies depending on the desired application, but usually four is sufficient. The influence of individual spatial frequencies on the correspondence function can be strengthened or weakened by appropriately selecting the profile and convolution kernel described below. The optimal window size can be determined from the reciprocal relationship between 3D resolution and signal-to-noise ratio. This reciprocal relationship varies depending on the image content and the desired application. A practical upper limit for spatial frequencies corresponds to the period of 4 pixels in the image. Higher spatial frequencies generate undesirable nonlinear behavior of phase characteristics (Figure 8). In modern CMOS camera sensors with pixel pitches from 2 μm to 4 μm, this signal component is low because there is a low-pass effect in the filters used in color cameras that convert the lens's OTF and BAYER formats to YUV, thus limiting the line pairs to about 100 per mm.

[0070] Here, we use Fourier analysis over interval T to determine the optimal convolution kernel for group disparity. To simplify the mathematical relationship, we assume that the convolution kernel has high spectral purity (i.e., when n=k or n=l, c k,n =s l,n We initially assume that (=1, otherwise it is 0).

[0071] This will result in YL ideal Signal and YR ideal For each combination of signal components and components of even and odd convolution kernels, the convolution integral can be calculated individually and analytically. This allows for the calculation of the convolution result C. YL , C YR n max xm max The components are obtained (the even and odd convolution kernels are shown in Equation 6).

number

[0072] This is the difference of the convolution result (ΔRL). even ) n,m =(C YR,even ) n,m -(C YL,even ) n,m , and (ΔRL odd ) n,m =(C YR,odd ) n,m -(C YL,odd ) n,m However, these are calculated for each n and each m from the components of the convolution result.

[0073] By substituting the difference of trigonometric functions into the product, it is possible to summarize the convolution result in matrix form.

[0074] In an exemplary embodiment, m max =4 and n max = 4. Equation 7 shows the coefficient matrices AEV and AOD. Equation 8 shows the coefficient matrix AEV and the signal vector S even The difference ΔRL of even signals based on this evenThe matrix notation is shown below. Equation 9 is given by the coefficient matrix AOD and the signal vector S odd The difference ΔRL of odd signals based on this odd This shows that if a different range of spatial frequencies is selected than in the exemplary embodiment, the coefficient matrices AEV and AOD change accordingly. The coefficient matrices AEV and AOD are normalized and simplified to be independent of T. An additional state is K. even 2 =K odd 2 and the constant K even , K odd This cancels out in equations 8 and 9. K even 2 and K odd 2 Since this cancels out in equation 11, no further consideration is necessary.

number

[0075] Signal difference ΔRL even and ΔRL odd Regarding the coefficient vector c, k and s l Then, we multiply each of them by a scalar to return from the highly spectrally pure convolution kernel to a more general convolution kernel. k and s l The weighted vector ΔRL even and ΔRL odd The sum of the components represents the difference in features.

[0076] Therefore, the difference of the features of the general even or odd convolution kernel given in Equation 5 is the general amplitude A in Equation 4. m And the weights c of these convolution kernels k,n and s l,n Each of the and the signal YRideal and YL ideal is the difference between the convolution results of

[0077] Here, the correspondence function SSD(δ) is defined as the sum of the differences of non-linear processing, particularly the differences of the squared features or the convolution results, and it is preferable that the differences of the features of all convolution kernels are squared. Next, the structure of SSD(δ) is analyzed. For this purpose, first, as shown as SSD one (δ) in Equation 10, considering only one signal pair YL ideal and YR ideal , as well as k max individual even convolution kernels and l max individual odd convolution kernel states is preferably suitable.

Equation

[0078] According to Equation 8 and Equation 9, inserting the elements ΔRL even and ΔRL odd in the form of a product, expanding the sum of squares, and then separating the terms into a partial sum SSD 2 including the squared amplitude (e.g., A1 inv and a partial sum consisting of the mixed element SSD var . SSD inv is independent of the sign of the amplitude A m , and can be further optimized by appropriately selecting the form of the convolution kernel, that is, the weights c k,n and s l,n . In the Pythagorean theorem of trigonometric functions, the terms including the corresponding cosine component and sine component are summed up, and the dependence on Δ m completely disappears.

[0079] In this case, SSD inv is independent of the phase Δ m and is invariant to the horizontal shift of the subject to be measured (parallel to the base of the stereo camera). SSD invThis is a function of group disparity, and it is possible to calculate the group disparity, i.e., the required signal S, under specific conditions.

[0080] In a further embodiment, the convolution kernel is k max individual Even convolution kernel and l max individual The convolution operation of an odd convolution kernel is expressed by amplitude A, which is given by various values of exponent m. m and subject phase Δ m In each case, the weighted signal components of the spatial frequency group are selected to transmit their respective sums. This results in two partial sums for each signal v and each spatial frequency exponent of index m in the calculation of the correspondence function SSD(δ), and the squared amplitude A from the result of the convolution operation with an even function. m 2 The first term is characterized by the squared amplitude A from the result of a convolution operation with odd functions. m 2 The second term, characterized by , is obtained. The first partial sum and the second partial sum are the sum of both partial sums SSD inv (δ) is the subject phase Δ m The trigonometric functions are chosen to be combinatorially and independently of the Pythagorean trigonometric functions.

[0081] SSD inv The conditions for this characteristic are SSD inv For the same spatial frequency sin 2 Part and cos 2 The coefficients before the part are equal. Generalizing this to any desired number of convolution kernels and spatial frequencies, this condition for an optimal ideal disparity signal can be expressed as a system of nonlinear equations for each m, as shown in Equation 11. Equation 11 is a sum of k max +l max Using SSD inv The partial sum is obtained, and the signal pair YL ideal and YR ideal This represents the complete signal obtained from g.m This is a weighting vector, which will be explained in more detail below.

number

[0082] To determine a disparity with sufficiently low noise, the coefficients of matrices AEV and AOD do not need to exactly match the values given in Equation 7; they can vary within a range obtained by multiplying each case by a coefficient between 0.8 and 1.2. Similarly, they are sufficient as approximate solutions to the equation in Equation 11 (for example, in Equation 11, the sum for odd convolution kernels can differ from the sum for even convolution kernels by a coefficient between 0.8 and 1.2).

[0083] By using the convolution kernel optimized according to the rule in Equation 11, the definition of the correspondence function SSD(δ) shown in Equation 12 and the SSD shown in Equation 13 can be obtained. inv We obtain the definition of (δ).

number

[0084] In a more preferred embodiment, the correspondence function is the phase-independent function SSD. inv (δ) and the function SSD which depends on the subject phase Δ var A convolutional kernel is selected so that it can be expressed as a sum with (δ,Δ) using the signal model given by Equation 12. First, SSD inv Only SSDs are considered here. var This represents a noise source whose impact can be minimized, as will be further explained below.

[0085] The first derivatives SSD' with respect to δ, respectively. inv (δ)(Equation 14) and the second derivative SSD″ invThe ratio with (δ)(Equation 15) forms a group disparity function (Equation 17) that contains the obtained positional information in a compact form, based on the assumption of Equation 16.

number

[0086] A linear function of δ is given by a simple Taylor expansion of the group disparity function shown in Equation 17, which is the first derivative SSD' in the subpixel region for small δ. inv (δ) Zero-crossing neighborhood (or SSD') inv This is only valid when sin(m·ω·δ) can be linearly interpolated with sufficient quality, in the vicinity of the local minimum of (δ). Further calculation of the group disparity δ is required. sub The sub-pixel precision function value is given by the first derivative SSD', as shown in equation 32 below. inv It is obtained as the sum of the integer disparity at the zero-crossing position of (δ) and the fractional rational subpixel values of the group disparity function.

[0087] A typical characteristic curve can be obtained for the group disparity function of a real high-resolution stereo camera (Figure 8). Specifically, in the graph of Figure 8, the disparity determined as a function of the actual disparity is plotted using Equation 17. In the ideal case, the value of the group disparity determined by Equation 17 is the same as the actual disparity (linear relationship). From the graph (b) of Figure 8, it can be seen that at larger subpixel positions, i.e., the position of disparity between two pixels, a small deviation from the ideal linear line falls within the domain [-0.5px, 0.5px]. As shown in the graph (a) of Figure 8, the deviation is A m and Δ mThe curves for different random values also depend on the content of the image on which they are plotted. Graph (b) in Figure 8 shows the plot of the average of the curves shown in Graph (a) in Figure 8. These linearity errors in the characteristic curves result in multiplication noise.

[0088] The aforementioned model is one signal pair YL ideal and YR ideal from v max individual Signal pair YL ideal,v and YR ideal,v (Here v=1...v max When expanded, equations 14 and 15 are expanded into equations 18 and 19, respectively.

number

[0089] Since we simplify by using the sum of all signals, we can see that Equation 17 remains valid even after being expanded into multiple signal pairs. This expansion does not affect Equation 11.

[0090] Having explained the signals used in the group disparity function, let's now consider noise. The goal is to minimize the noise N compared to the signal S. Noise mainly consists of sensor noise and SSD noise. var This noise consists of noise caused by the influence of [unspecified factors], noise caused by the difference between the analyzed ideal camera model and the actual stereo camera, and linearity errors in the characteristic curves of the group disparity function.

[0091] High-frequency white sensor noise includes quantum noise (also known as root noise), thermal noise, and several additional noise sources such as DSNU and PRNU. Sensor noise and SSD var The noise generated by this process does not correlate with obtaining a good approximation, so it can be considered separately. Equations 15, 16, and 17 show the group disparity signal in the spatial frequency domain g mThis represents the result of weighting and integrating the results. Each signal component of the group disparity signal is expressed as m at spatial frequency mω. 2 ω 2 A m 2 Since it is expressed by , the term (or amplitude) with the largest magnitude is dominant to the transfer function. Using these terms, the group disparity function can be understood as an adaptive filter (according to the current signal format) by Wiener (1949). Signal pair YL ideal and YR ideal The same term is obtained when the signal is processed with an ideal (long) adaptive filter over a narrow bandwidth spatial frequency range, and the results, including the measured amplitude, are combined in a weighted manner to obtain a position signal. This corresponds to the signal processing of an optimal filter. Therefore, the signal-to-noise ratio of group disparity noise to sensor noise is g m This provides the optimal value for a specific weighting, as explained further below, for the signal YL. signal,v and YR signal,v It can be adjusted to match the spectrum.

[0092] A low-pass filter, referred to as an output low-pass filter, is applied to 3D data or a set of disparity measurement results based on this 3D data. In other words, it filters out high spatial frequencies in the spatial variation of disparity. Therefore, this affects further noise optimization by reducing a specific portion of the noise that is produced after the calculation of group disparity. More generally, and not limited to the example shown, in further embodiments, the arithmetic unit is designed to filter the calculated disparity values with a low-pass filter.

[0093] In one embodiment, the output low-pass filter is configured to reduce noise components at spatial frequencies above 2ω, preferably above 3ω, where the group disparity signal components are also in a low range. The filtering after calculating the group disparity does not affect the high-frequency input signals with amplitudes A3 and A4 that form the group disparity signal. Therefore, without being limited to a specific exemplary embodiment, the correspondence analyzer according to one embodiment is configured to examine input information without limiting the (signal) bandwidth used to calculate the disparity value. This contributes to improving the signal-to-noise ratio. On the other hand, the size of the analysis window in the exemplary embodiment is (8 × 8px 2 This reduces the disparity transfer function that starts in the period T / 2, i.e., 2ω. Therefore, the cutoff frequency of the two-dimensional output low-pass filter is set in the range of 2ω.

[0094] Figure 6 shows 1 mm 2 (x,y) resolution of the subject side and distance of 1850mm, 100×100px 2 The image shows 3D data of an essentially flat, white, textured wallpaper within a given image area. Image (a) in Figure 6 shows the 3D data before output low-pass filtering, and panel (b) shows the 3D data after output low-pass filtering. Visibility improved by increasing the distance resolution to 0.2 mm.

[0095] Next, SSD without affecting sensor noise optimization var Optimize. SSD var (δ,Δ) depends on the sign of the phase and amplitude, i.e., on the lateral shift of the object being measured, and represents a pseudo-random interference variable that can be understood as additional low-frequency noise in the spatial frequency band ω~4ω (in exemplary embodiments). var The first step in minimizing the noise components is multiple pieces v max of Signal pair YL signal,v and YR signal,v of Statistically achieved by using, and as a result, signal SSD inv and signal error SSDvar This is averaged. To obtain the optimal solution, the correlation between the signal pairs must be almost eliminated, which is achieved by a suitable convolution in the y direction, and under these conditions the noise is 1 / (v max ) 1 / 2 It is reduced to this.

[0096] In the second step, we will look at the SSD in formula 17. inv Considerations limited to SSD inv and SSD var It expands into a sum. In this way, noise is SSD' inv SSD' is similarly expanded into a Taylor series. var In an exemplary embodiment, an SSD starting from a spatial frequency of 3ω var To reduce the noise represented by , further investigation is needed only in the range from ω to 2ω. In an exemplary embodiment, by performing a wide range of trigonometric functions, SSD' of Equation 20 is obtained. var Partial sum of SSD's lowest spatial frequency var,1 The result is obtained, and the partial sum of 2ω can be calculated similarly.

number

[0097] The amplitude and phase in Equation 20 depend on image statistics and have almost no correlation, so when the constants const1, const2, and const3 in Equation 20 are minimized, SSD var The noise component is minimized. This is the case when the conditions shown in Equation 21 are met.

number

[0098] However, since these equations cannot be solved in general, it is sufficient to minimize the sum of squared differences in equation 21. In equation 20, A 1,v A 2,v The amplitude is A 2,v A 3,v Larger than the amplitude, A 2,v A 3,vThe amplitude is A 3,v A 4,v Since we can assume that it is larger than the amplitude of , it is advantageous to approach the first state in equation 21, then the second state, and then the third state. Coefficient c 1,3 , c 1,4 , c 2,1 , c 2,2 , s 1,3 , s 1,4 , s 2,1 , and s 2,2 When is zero (see also Equation 23 for an exemplary embodiment), a good approximation is already obtained, and then the system of equations in Equation 11 is solved, and SSD inv The optimization will be performed.

[0099] In particular, when there are only a few signal pairs, i.e., v max When is small, the noise behavior is improved by optimizing the coefficients of the convolution kernel. Therefore, the system of equations in equation 11 becomes as follows: 1,3 , c 1,4 , c 2,1 , c 2,2 , s 1,3 , s 1,4 , s 2,1 , and s 2,2 The solution is then calculated using the formula, and the constants const1, const2, and const3 are calculated. The solution that minimizes const1, const2, and const3 is then selected. As will be further explained below, this can be determined statistically using test images.

[0100] For all of these methods, Equation 11 is always satisfied, and the noise is further optimized using only the degrees of freedom remaining in Equation 11. In this way, optimization of the signal-to-noise ratio with respect to sensor noise is always achieved.

[0101] Another source of noise in real-world stereo cameras is that they don't always operate like the ideal system we've discussed. There are tolerances for the deviations and gains of the left and right cameras, as well as artifacts caused by reflections, and it's not guaranteed that the amplitudes of the two cameras for the same point on the subject in each image patch will be identical. In addition, tolerances can also be introduced into rectification.

[0102] For example, the tolerance for camera deviations that may occur due to temperature fluctuations is completely offset by this method. Since the so-called camera deviation is set to a slightly positive value, it should be noted that, for example, negative values of sensor noise are not cut off at zero, and the signal is tampered with. Deviations can be transmitted by fragmented even convolution kernels, which can lead to parallax measurement errors. Therefore, it is advantageous to average the even convolution kernels so that spatial frequency zeros are not transmitted for parallax measurement.

[0103] If the tolerance of the camera's gain is small, it will be automatically corrected by the division in Equation 17 and no noise will be generated. Here, the amplitude A m It is necessary to consider that the equality of these factors contributes to signal generation. For example, the left camera (AL m ) of A m However, the right camera (AR m ) Corresponding A m When it is greater than AR, the group disparity signal is AR m 2 Obtained from, this difference AL m -AR m This generates noise. In particular, when the OTF or distortion correction gradients are different, larger contrast differences in the image corners between cameras are not corrected. In this case, the additional amplitude component of the high-contrast camera is not included in the group disparity signal, but is instead added to the interference signal N.

[0104] Finally, there is an optimization process for the weighting coefficients g, which can further improve the signal-to-noise ratio. The weighting coefficients can be calculated by simulating the signal-to-noise ratio. For a set of multiple random weighting vectors g, the respective coefficients of the convolution kernel are calculated using Equation 11 and, if necessary, Equation 21, and the amplitude A, phase Δ, and target disparity δ are calculated using another random number generator. target Generate a sample vector containing each of the following: In this case, A m The / A1 ratio is limited to the corresponding values of the spatial frequency transfer function, which consists of the lens OTF in the depth of field range and the resolution loss in the sensor electronics. Then, SSD(δ) is calculated as in Equation 10, and the disparity δ for one or more minimum values of SSD(δ) is determined by Equation 17. target Based on a target or actual comparison with δ, the mean measurement error across a random sample of a particular weighting vector can be calculated. Then, the weighting vector with the smallest mean error is selected from the set of weighting vectors. In this way, the optimal weighting vector g for a typical transfer function is obtained.

[0105] Furthermore, g can be determined by test measurements as shown in Figure 6. Thus, the determined 3D data σ z The local distance noise can be determined from the standard deviation of the distance from a determined point in the 3D data to the nominal position in space of the imaged subject (for example, on a plane roughly representing the textured wallpaper in Figure 6). For a specific shooting situation, the minimum value of the distance noise σz can be determined as a function of the weighting vector g and the coefficients of the convolution kernel derived therefrom. Then, from an arbitrarily selected set of weighting vectors, the weighting vector with the lowest distance noise σz can be selected. The weighting vector g is determined with constant precision. It cancels out by the division in Equation 17, leaving the m-1 related component of g.

[0106] This is, for example, a way to define an optimal profile vector or weighted vector g for a selected subject in a textured wallpaper. The spectrum of the textured wallpaper can be used as a good approximation of a typical scene with natural subjects within the depth of field.

[0107] It is convenient to store various profiles with weighting coefficients on a stereo camera and adjust them as needed to suit the shooting conditions. To illustrate this, Figure 7 shows two examples of weighting coefficients g for two different shooting conditions and different spatial frequencies ω. This allows for parameter adjustments to optimize conditions, for example, for high-contrast images or images taken in fog.

[0108] Therefore, in one embodiment, in order to describe the optimal sensitivity of the correspondence function SSD(δ) in the spatial frequency domain, at least one profile vector of weight coefficients g is provided to the computing unit, and that profile vector is the weight coefficient c of the Fourier series of the convolution kernel according to Equation 11. k,n and s l,n This determines the following. In one embodiment, a classification or profile vector can be selected, preferably by examining the optical transfer function based on the power spectrum of the data of an individual image or image patch. Based on this classification or profile vector, a plurality of correspondence functions and their convolution kernels are selected, acquired, or computed by the computing unit.

[0109] As shown in the example in Figure 7, multiple weighting vectors or profile vectors can be provided, which are selected by the computing unit depending on the image content or shooting conditions. More generally, multiple profile vectors g can be stored in the correspondence analyzer 1 for the same or variously parameterized correspondence function, and / or the correspondence analyzer 1 may be configured to compute one or more profile vectors with weights g at runtime, and the correspondence analyzer 1 may further determine the local or global power spectrum of the image data and use the weights g according to the local or global power spectrum in the image. In other words, it is configured to be used for convolution of the image signal and computation of the correspondence function. In particular, the correspondence analyzer may also store multiple differently parameterized correspondence functions, their convolution kernels, and preferably their respective corresponding profile vectors g m These may be stored, or they may be determined at runtime. Furthermore, the correspondence analyzer is configured to select a portion of these multiple correspondence functions and their convolution kernels based on the current classification of individual images or image patches, or based on the classification of individual images or image patches that is advantageous for further processing. Preferably, the parameters of at least one correspondence function and its convolution kernel are the corresponding profile vector g m The weighting coefficient for the highest spatial frequency is selected such that it is smaller than at least one of the other weighting coefficients of this profile vector.

[0110] In the exemplary embodiment with a 4-pixel period, the characteristic curve extends at absolute values of δ of 0.5px or 1 / 4π, so the weighting coefficient for the highest spatial frequencies is affected by a trade-off. Therefore, experimentally determining g by measuring the signal-to-noise ratio reduces the weighting for the highest spatial frequencies. However, since smaller values of δ are also accurately measured, the weighting does not become zero.

[0111] Similar to the x-direction, the convolution kernel in the y-direction can be obtained using a principle similar to the Fourier series in Equation 4 and the rule for obtaining the optimal convolution kernel (Equation 11), and the second profile vector gy m This can be defined by [equation]. Furthermore, the sum of the squared convolutions in the y-direction forms an invariant partial sum containing the gy-weighted squared amplitude of the Fourier series by Equation 4. Furthermore, a partial sum dependent on the subject phase in the y-direction is obtained. Improvements in the signal-to-noise ratio are achieved, for example, in cases of rectification errors in actual stereo cameras, such as those that may occur as a result of temperature gradients, mechanical loads, or at image corners. Furthermore, by using a predetermined weighting of spatial frequencies, the convolution kernel optimized in the y-direction in this way reduces errors that may occur when processing periodic structures. Since disparity measurements are not performed in the y-direction, the weighting for the highest spatial frequencies is not reduced.

[0112] The discussion so far concerns signal models with continuous functions. From here, we will describe an exemplary embodiment of implementation in actual discrete systems. First, we specify the analysis interval T and the window size of the convolution kernel. Here, we need to distinguish between two cases: one in which stereo information is generated by texture or fracture edges transmitted by OTF spread across the window and captured by high-frequency processing, and another in which stereo information is generated by the angular dependence of diffuse reflection on an essentially homogeneous object or by any low-frequency texture that may be given to the subject and captured by low-frequency processing.

[0113] In the former case, contrast is determined by the lens characteristics in the high spatial frequency range. In the latter case, it is determined by the lighting scenario, as well as the radius of curvature and tilt angle of the subject in the low spatial frequency range. These will be explained with reference to the camera image shown in image(a) and the corresponding 3D data shown in image(b) in Figure 9. Image(a) is the left image of a stereo image pair from which the 3D data for image(b) was calculated. In image(b), the 3D data is displayed in grayscale (brighter pixels indicate greater distance from the camera, darker gray pixels indicate greater distance, and black pixels indicate no distance information). Shooting distance 1850mm, resolution (x,y) 1mm 2 An example of a ceramic mug with a uniform glossy surface demonstrates that areas with high-frequency stereo information can be detected with high sub-pixel interpolation quality. While glossy areas with low contrast can also be captured, the quality degrades in the low-frequency range. First, by optimizing the system for low-contrast, high-frequency textured surfaces, it becomes possible to capture, for example, a white textured wallpaper background seamlessly with high measurement accuracy.

[0114] In the former case, the signal spectrum is fully captured, while in the latter case, when the image is out of focus (blurred) within the depth of field, signal components with spatial frequencies at the edges are captured at the lower limit of 2π / T. The optimal size of the analysis interval T is when the signal from the optimally focused texture does not significantly exceed the upper limit of a 3-4px period. For a typical color camera with a Bayer filter, a range of approximately 16-70 LP / mm is usable. When using a sensor with a pixel pitch of 3.75 μm, T=16px and 4 spatial frequencies are required. In the next step, the window width is determined as a trade-off between 3D resolution and noise. An 8px window width is selected, but other integer window widths are also possible. As the window width increases, the 3D resolution decreases and the signal-to-noise ratio increases. When the ratio of the analysis interval to the window width is not 2, the matrix AEV and AOD need to be adjusted.

[0115] In the next stage, the number of convolution kernels k and l can be selected. The best accuracy within an acceptable computational complexity can be achieved using two even convolution kernels and two odd convolution kernels. Alternatively, one even convolution kernel and two odd convolution kernels are also possible, although the accuracy is reduced and the computational complexity decreases. Using only one even convolution kernel and one odd convolution kernel results in a significant increase in noise. In this exemplary embodiment, k=2 and l=2. A larger number of convolution kernels is also possible.

[0116] Next, the convolution kernel is calculated. After offsetting a typical OTF profile and setting the weighting vector g=[0.917;1.22;2.25;1.3] which trades off with the highest spatial frequency, the system of equations determining the optimal form of the convolution kernel is given by the coefficients c of the convolution kernel. k,n and s l,n It is constructed using (Equation 22). Since the system of equations is insufficient, unwanted high-frequency components are initially set to zero (Equation 23).

number

[0117] Sixteen solutions are obtained for each nonlinear system of equations. From these, real solutions are selected first, and then solutions with only a sign difference are eliminated. If there are no real solutions, the weighting vector can be adjusted. Two different solutions are obtained for the coefficient vectors c and s (Equation 24). From these solutions, the coefficient with the smallest variance is selected (Equation 24, lines 1 and 3). This is because these transmit the lowest thermal noise, including DSNU and PRNU.

number

[0118] This first approximation is SSD' varEven without further optimization of the noise component of (δ), the signal-to-noise ratio has already been significantly improved. In an exemplary embodiment related to implementation, the SSD var (δ) Since there are not enough coefficients available to completely cancel out the noise, statistical optimization can be considered. This system of equations provides a weak-output low-pass filter that reliably suppresses thermal noise and the noise of the correspondence function at higher spatial frequencies, as described above. Thus, the goal is to reduce the amplitudes of the low spatial frequencies ω and 2ω that are not processed by the filter. For each solution of Equation 24, there are three further solutions with different sign combinations. From these, we can obtain SSD' in the lower spatial frequency range. var A solution with a sign combination that minimizes the disturbance (δ) is selected. Furthermore, the coefficients zeroed in equation 23 can be replaced with small non-zero constants. This allows for SSD inv (δ) Without affecting SSD var The proportion of (δ) changes. Next, we can numerically solve equation 22, test solutions for lower spatial frequencies, and select the best solution.

[0119] In the example above, the possible functions of the convolution kernel in the x-direction are f even,k and f odd,l The following is obtained (Equation 25). These function values are shown in Figure 10, and the discrete convolution kernels are shown in Table 2. In a preferred embodiment, it is essential that the resulting convolution function does not have an average value, so the off that satisfies Equation 26 is obtained. even,1 and off ven,2 This is selected. This helps to avoid noise caused by the tolerances of the actual camera's gain and deviation.

number

[0120] As can be seen from equation 25, each of the four convolution kernels contains a weighted sum of multiple harmonic functions of different spatial frequencies. Here, the even convolution kernel contains f units even,1,2 It includes the weighted sum of the cosine functions, i.e., the weighting coefficients are 3.4954 and 0.7818 (f even,1 ) and 4.9652 and 1.8416(f even,2 Includes even functions (f odd,1,2 ) represents a weighted sum of odd sine functions. In this example, these weight coefficients are 4.0476 and -0.2559, and 6.0228 and -0.0332, respectively. Thus, in one embodiment, the arithmetic unit is 1 to v max Signal pair YL for v up to signal,v and YR signal,v Regarding the convolution, we consider a configuration that is performed using two even second convolution kernels and two odd second convolution kernels given by equations 25 and 26. More generally, from 1 to v max signal pair of v up to YL signal,v and YR signal,v It is convolved with two even second convolution kernels and two odd second convolution kernels, which include the functions listed in Equation 25. The coefficients of the sine and cosine functions (3.4954, 0.7818, ...) may also be slightly above or below the specified values, i.e., by 10%. Therefore, at least one of the coefficients 3.4954, 0.7818, 4.9652, 1.8416, 4.0476, 0.2559, 6.0228, 0.0332 can also be up to 10% larger or smaller. It is preferable that the convolution kernels are also selected so as not to include the mean value approximately or completely.

[0121] While not mandatory, it is advantageous to place the coordinate origins of the even and odd functions included in the convolution kernel near the centroid of each image patch. Here, the centroid refers to the geometric center of each image patch.

[0122] A small deviation in the coefficients of a filter kernel can be a small deviation from the discretized values of a perfectly even or odd function. This deviation can be, for example, up to 15% ideally, and preferably up to 10% from the values of the even or odd function. For clarity, the possible deviations of the discretized coefficients from the coefficients of an ideal even or odd function are shown below. If the odd filter kernel of the discretized coefficients of an ideal odd function is given the value -2;-1;1;2, then a filter kernel with negligible noise increase may be given as -2;-1;1.1;2, where the positive coefficients adjacent to the center of the kernel increase by 10%. Furthermore, the symmetry of an ideally even or odd filter kernel is only slightly disturbed when low weight coefficients are added. For example, a kernel with such small differences may be -2;-1;1;2;0.1. Here, the filter kernel includes an additional coefficient 0.1 that disrupts the ideal symmetry with respect to the kernel center between coefficients 1 and -1, but its weighting is so low that the convolution result is hardly altered.

[0123] In one modified example, the coefficients preceding the sine and cosine functions do not need to strictly match the coefficients in equations 24 and 25. They are allowed to deviate within a range of 0.8 to 1.2 times, preferably 0.9 to 1.1 times, and good noise suppression can still be obtained in this case.

[0124] Instead of two even convolution kernels, it is possible to use one even convolution kernel with slightly higher noise. The function of such an even convolution kernel is shown in Figure 11 and in Table 3 as a discrete convolution kernel. In the improved version of this embodiment implemented in the example in Figure 11 and Table 3, this convolution kernel includes weighted frequencies for all spatial frequencies from ω to 4ω. That is, this convolution kernel represents a weighted sum of harmonic functions for these spatial frequencies from ω to 4ω. This reduces computational complexity by 25%. In contrast, the k=1 and l=1 solutions are not useful because they involve considerable discretization errors, i.e., discretization always involves high noise. Using only one even kernel or only one odd kernel is also not useful because noise cancellation cannot be achieved. Calculations using only two or three spatial frequencies are similarly possible, but usually result in lower measurement accuracy. [Table 3]

[0125] In an exemplary embodiment, a convolution kernel f with spatial frequencies from zero and ω to 4ω. y,v Using YL in the y direction image and YR image v is calculated by performing convolution with max = Five signal pairs are used. Optimal noise reduction is achieved when the five signal pairs are optimally decorrelated and have similar amplitudes. At this point, SSD inv The (δ) signal increases, but at the same time, the random phase Δ causes SSD var As the proportion of (δ) decreases, the signal-to-noise ratio increases. The uncorrelated signal is acquired after convolution using an orthogonal function, e.g., WFT. The amplitude of the signal pair is adjusted by normalization with OTF, so the influence of signal pairs with higher-order spatial frequencies becomes greater. As with convolution in the x direction, it is advantageous to use the same convolution kernel already optimized for low noise (i.e., for k=2 and l=2, e.g., f y,2 f y,5We obtain the convolution kernels of equations 25 and 26 for f y,1 (Set = 1). In this case, a particularly low-noise signal is generated, and the reliability signal can be used for calculations (see below). Furthermore, as already explained, the same approach as for the convolution kernel in the x direction is used to determine the convolution kernel, but it is advantageous not to reduce the weighting for the highest spatial frequencies.

[0126] The following describes the implementation of a method for determining disparity using a correspondence analyzer. For this purpose, Figure 12 schematically shows the configuration of a stereo camera 2 equipped with a correspondence analyzer 1. The stereo camera 2 comprises an imaging device 22 consisting of two cameras 20, 21 containing multiple camera sensors 5 and two lenses 8, 9 for imaging a subject 4. The optical centers of lenses 8, 9 are spaced apart from each other by the width of their bases B. To determine the disparity δ, digital images 25, 26 are transferred to the correspondence analyzer 1 and analyzed by its computing unit 3. Then, using equation 1, the subject distance Z can be determined based on the disparity and focal length f obtained by the correspondence analyzer 1. For this purpose, the profile vectors (or convolution kernels corresponding to these profile vectors) stored in the correspondence analyzer's memory 6 are convolved with the rectified image signal. For this purpose, the convolution results of selected image patches from two digital images 25 and 26 having various relative intervals are subtracted from each other by the arithmetic unit 3 and subjected to nonlinear processing, preferably squaring processing. From the sum of these nonlinearly processed differences, the value of the correspondence function SSD(δ) for the selected relative distance δ is obtained.

[0127] The image data from the two cameras 20 and 21 is preferably rectified to sub-pixel accuracy, as described above with reference to Figure 2. When a high signal-to-noise ratio is required, it is advantageous to adjust the coplanarity of the camera's optical axes. For this purpose, first, a planar test image is used to determine the intersection points of the optical axes of the two cameras in the subject space from at least two distances, and the direction of the optical axes in space is determined by connecting these intersection points. If the alignment is correct, the optical axes are coplanar and in the epipolar plane. Therefore, the straight line connecting the intersection points of all measured distances is also coplanar. One of the two cameras is equipped with an eccentricity adjustment means (Figure 1). If the connecting lines are tilted relative to each other, a coplanarity error occurs, which is corrected by rotating the lens. Eccentricity subtly changes the direction of the optical axis relative to the mechanical axis. This rotation is continued until the optical axes are coplanar. Furthermore, even during the service life of the stereo camera, coplanarity adjustment errors may occur due to, for example, temperature fluctuations or mechanical shock loads. Although there is a certain degree of reciprocity, this error is the parallax δ in the y-direction, which is roughly perpendicular to the epipolar line. y The method for calculating (further described below) can be used to correct for a given distance Z. Finally, the mean disparity error δ measured with sub-pixel precision. y However, by being included in the rectification of one of the two cameras, the parallax error δ y The corresponding deviation is corrected. This method works within a limited parallax range but is useful in many applications where accuracy dependent on the position of the subject is required, such as positioning tasks in robotics. In one embodiment, the stereo camera is specifically considered to be configured to correct coplanarity alignment errors, which is done by additionally evaluating the parallax δy of the image patch corresponding to a direction roughly perpendicular to the epipolar line during the execution of the correspondence analyzer, and shifting one of the images in the opposite direction roughly perpendicular to the epipolar line, by correcting the mean deviation of this parallax from zero, i.e., the deviation from the ideal epipolar geometry, in particular by correcting the rectification parameter. This technique is advantageous in that it improves the signal-to-noise ratio when the subject distance Z is large. When the subject distance is small, the signal-to-noise ratio is usually sufficient.

[0128] The method described above is used to determine an appropriate convolution kernel. In particular, the weights g can be calculated by equations 11 and 21. The convolution kernel is stored in the memory of the correspondence analyzer 1. In one embodiment, the correspondence analyzer is first configured to evaluate image statistics according to the application, for example, by contrast evaluation or power spectrum evaluation. The correspondence analyzer 1 then selects a profile corresponding to the image statistics, for example, a profile for good contrast under normal conditions during autonomous driving, or a profile for reduced contrast in fog. The selected profile defines at least one set of convolution kernels. More generally, the correspondence analyzer 1 can store multiple profile vectors g for correspondence functions and convolution functions that are identical or differently parameterized, and / or the correspondence analyzer 1 may be configured to compute one or more profile vectors g at runtime, and the correspondence analyzer 1 may further determine the local or comprehensive power spectrum of the image data and use the profile vector g that is favorable based on the local or comprehensive power spectrum in the image. It is also possible to perform calculations using multiple sets of profile vectors with different parameterizations and compare the results. Thus, correspondence analysis can be performed using two or more correspondence functions and convolution kernels with different parameterizations, and it is preferable for the computing unit to combine two or more results based on the determined confidence vector, or to select a partial result from these results. In particular, applying this to a set of convolution kernels regardless of the individual profile vectors is to ensure that when the convolution kernel determines the parallax for a subject with a sinusoidally modulated intensity distribution, this parallax is largely independent of the lateral shift of the subject in the image plane of the individual images. This is especially true for modulation by spatial frequencies within a range of sampled spatial frequencies, such as those determined by the size of the search image patch.

[0129] For illustrative purposes, Figure 16 shows a subject 4 captured by cameras 20 and 21 of stereo camera 2, in the form of a flat subject whose surface is sinusoidally luminance-modulated. Since the modulation extends along the direction of the relative image shift in the individual digital images 25 and 26, it also extends in the direction of the determined parallax δ. In the diagram of Figure 16, the modulation is symbolized by a simple stripe pattern. Therefore, the illustrated modulation is simply a rectangle instead of a sine wave, but has the same orientation as the sinusoidal modulation. The parallax depends on the distance from stereo camera 2 to subject 4. Here, subject 4 is shifted in the direction v of sinusoidal modulation, i.e., in the direction of parallax, but if the distance from stereo camera 2 is constant, there is essentially no effect on the parallax unless the pattern introduces ambiguity. Invariance with respect to shift V can be examined using a calculated digital image so that its effect on idealized image data can be verified without the addition of noise.

[0130] This section describes a test that can be used to demonstrate that the parallax variation calculated by the correspondence analyzer described here is small compared to intensity modulation on the subject. As already discussed, such variation, expressed as a standard deviation (STD), is typically within a distance range of less than 0.2px, preferably less than or equal to 0.1px, whereas in prior art systems, the variation range is greater than 0.2, usually showing a variation range of 0.2 to 0.5px. More generally, and not limited to the examples described herein, the convolution kernel is preferably selected such that, when determining the parallax of a planar subject shifted along an epipolar line at a certain distance Z from the camera, a local standard deviation of the parallax measurement is achieved for the shift of the planar subject, when the subject contains intensity modulation along the direction of the epipolar line, particularly spatial frequencies within a range of spatial frequencies, or when it contains a corresponding texture.

[0131] Here, two measurements are performed using a plane physical measurement subject. This subject holds a texture containing spatial frequencies in the image plane within a spatial window (spatial frequencies ω = 2π / 9 to 2π / 5 in an 8x8 environment). The texture is perpendicular to the epipolar plane, e.g., cosωx, and the subject is precisely focused at approximately 80% of the amplitude in the image. The subject to be measured is plane.

[0132] Multiple measurements (e.g., 100 times) are taken of an object that is stationary at a first point of measurement. The sensor generates noise. From the measurement results, the standard deviation σδ and the mean value δ are obtained. mean,1 It is possible to calculate the result. Repeated measurements can be taken at different points on the same object being measured.

[0133] The subject is repeatedly moved in small increments parallel to the imaging plane and along the epipolar line, so that the distance to stereo camera 2 in the measurement field does not change. Subsequently, for example, 100 measurements are taken at the second position of the subject to be measured and at subsequent positions, and σδ and the mean δ are measured. mean,nn =2...10 is calculated. This is repeated at further points, δ mean,nn Calculate the standard deviation σ for =2...10. A typical characteristic of the correspondence analyzer currently described, or a stereo camera equipped with this correspondence analyzer, is that the standard deviation σ under good conditions is less than 0.2px or less than 0.1px.

[0134] As mentioned above, the correspondence analyzer performs convolution by discrete multiplication and addition. In an exemplary embodiment, u in the x direction max Convolution kernel with =4 (Table 4), v in the y direction max 8x8px with a convolution kernel of =5 (Table 5) 2 This explains convolution in the environment. max is, k max and l maxThe sum of these is equal to the value of 2 in the exemplary embodiment. The convolution kernels in Table 4 correspond to the convolution kernels in Table 2. The convolution kernels in Table 5 correspond to the u in Table 4. max individual Convolution kernel and convolution kernel f with zero spatial frequency y,1 It consists of. [Table 4] [Table 5]

[0135] In digital camera images, pixels at positions x and y reflect the values of the pixel neighborhoods at x+0.5 and y+0.5, so the convolution kernel index is adjusted from -3.5 to 3.5 to -4 to 3. As in the exemplary embodiment, when the convolution kernel is even, the valid measurement points are shifted, so in the calculation of 3D data using Equation 1, x' and y' are shifted by 0.5px relative to the measurement positions. image When assigning color or grayscale values to 3D data, it is essential to consider similar modifications.

[0136] The computing unit uses a convolution kernel, as shown in Equation 27, to process the left and right rectified camera images (YL, respectively). image and YR image For each image coordinate x and y in ), u max ×v max A complete set of features (each, FL u,v and FR u,v Calculate ).

number

[0137] This set of features for each image coordinate is hereafter referred to as the feature vector. Within the spatial frequency range, the feature vector contains the signal necessary for sub-pixel precision disparity measurement. If information is lost due to the subsequent derivative SSD'(δ) in the direction of the epipolar line, several false positive measurements (candidates) may be generated in addition to the correct measurements. Therefore, the processing is carried out in two steps: "optimal calculation of disparity noise" and "selection of candidate correct measurements with optimized noise".

[0138] In one embodiment, the confidence vector KL is calculated additionally or simultaneously, as shown in Equation 28. v and KR v It is selected to achieve noise reduction.

number

[0139] These confidence vectors do not contain disparity information, but they are used to estimate the quality of disparity measurements. For example, convolutional kernel f konf This can be obtained based on a Gaussian function in order to include adjacent signals in the confidence vector. For example, as shown in equation 28, v max individual Instead of, or in addition to, calculating confidence vectors using signals, it is possible to use further information from reference image patches and search image patches, such as normalized cross-correlation coefficients between the luminance data of the reference image patch and each search image patch.

[0140] The selection of disparity candidates based on confidence vectors can be used independently of the method used to determine the correspondence function. Essentially, multiple disparity candidates are determined from a reference image patch and a search image patch, and then their validity is evaluated using confidence vectors. Thus, regardless of the specific method by which disparity is calculated, a correspondence analyzer 1 is provided, comprising a computing unit 3, to determine the disparity of corresponding pixels in two separate digital images 25, 26. The computing unit 3 is configured to select image patches from the two separate images 25, 26, select at least one of those image patches as the reference image patch, select a search image patch from the other separate image, and calculate multiple candidate disparity values from those image patches. Furthermore, the computing unit 3 is configured to select information from the reference image patch and the search image patch that is not conveyed in the correspondence function or its first derivative, and to use this information to select confidence vectors for the results of the correspondence function, or to select possible disparity values suitable for estimating whether the respective results show an actual correspondence between the respective search image patch and the reference image patch. Next, the selection of candidate disparity values can be carried out based on confidence values. Therefore, in the improved example, the arithmetic unit 3 is intended to be configured to generate a list of candidate disparity values for a particular reference image patch, preferably select a confidence vector for each candidate, and then, based on that confidence vector and / or other selection criteria, make a selection to consider all or some of these candidates valid, or to consider any of the candidates invalid for a particular reference patch. It is also possible to further use or extend confidence vectors determined by other means.

[0141] In an improved version of this embodiment, the arithmetic unit 3 is configured to select the value of at least one element of the confidence vector using a function that can classify candidates as valid or invalid with a higher probability than when the correspondence function is used alone, for at least some classifications of the reference image patch and search image patch. When the correspondence function is used alone, correct determination of candidates is possible, in particular by comparing the minimum values of the correspondence function and selecting the clearest minimum. The correspondence function is preferably designed to reconsider information that is unnecessary for the disparity calculation in order to avoid potential noise sources. Using a confidence function, for example, it is possible to reconsider such suppressed information when selecting candidates without interfering with the disparity calculation. Specifically, the arithmetic unit may select the value of an element of the confidence vector using one or more of the following features. -point δ p Candidate correspondence function SSD(δ p ) and the threshold derived from the extreme values of the correspondence function of all candidate reference image patches, and the relationship or difference features between them. - Gray value relationships, preferably the difference in gray values between a portion of a reference image patch and a portion of each search image patch, or features derived from those gray value differences. - Color relationships, preferably the color difference between a portion of a reference image patch and a portion of each searched image patch, or features derived from these color differences. - Characteristics of the relationship between the signal intensity of a reference image patch and the signal intensity of each searched image patch. -In each case, the normalized cross-correlation coefficient is between a portion of the data from a reference image patch roughly perpendicular to the epipolar line and a portion of the data from each search image patch, preferably characterized by noise avoidance through minor low-pass filtering along the epipolar line.

[0142] Furthermore, the relationship can be nonlinear. Therefore, each variable, such as color or grayscale value, can be processed nonlinearly. For example, instead of a linear difference in grayscale values, the difference in squared grayscale values can also be calculated. Moreover, the input data can be pre-processed nonlinearly, and / or nonlinear processing can be performed when determining the values of the confidence vector.

[0143] The arithmetic unit 3 may also be advantageously configured to make available to the user of the correspondence analyzer or arithmetic unit a list of candidates, preferably only valid candidates, preferably with their respective confidence vectors. This can be achieved, for example, through a suitable interface such as a data output or screen. In this way, it is possible to adjust various confidence criteria to suit the quality of the 3D coordinate determination in particular. In one embodiment, an output low-pass filter can be used to further favorably filter the confidence values according to the SSD value. In particular, the output low-pass filter may have a corresponding function SSD(δ) according to one embodiment. p The same filter used for the numerical values of the correspondence function can be used. This allows both low-pass filtering processes to be used with the same hardware configuration. Furthermore, the output low-pass filter for the correspondence function values may include their respective confidence values as weights for this filtering procedure. The disparity values can also be weighted with confidence values before the low-pass filtering process. Therefore, it is also possible to filter confidence-weighted disparity values using a low-pass filter. Thus, a computing device is devised that uses a low-pass filter to filter the calculated disparity values and / or confidence values.

[0144] The feature vector and confidence vector are calculated for the discrete image positions in integer pixel coordinates. Furthermore, the arithmetic unit 3 calculates SSD(x,y,δ) as shown in equation 29 of the exemplary embodiment. p ) to integer disparity value δ p The data is then aggregated, and the sum of the squared differences in the features is calculated.

number

[0145] Correspondence function SSD(x,y,δ p This calculation is performed by the computing unit, in particular, on the parallax δ in the expected parallax range. p This is performed for all possible integer values, and the correspondence function SSD(x,y,δ p The extreme values of ) are determined. Figure 13 shows SSD(x,y,δ p This shows a typical exemplary profile of the discrete function SSD(x,y,δ). p The first derivative of ) SSD'(x,y,δ p ) and the second derivative SSD″(x,y,δ p ) is defined as shown in equation 30. In one embodiment, when the conditions of equation 31 are met, δ p This is identified as a local minimum.

number

[0146] Furthermore, the correspondence analyzer 1 or its calculation device determines its difference SSD'(x,y,δ p ) and the local minimum values shown as sign changes of these are determined. In a preferred embodiment, the arithmetic unit determines the parallax δ as shown in equation 32. p The extreme values in, in particular, the correspondence function SSD(x,y,δ p Based on the minimum value of ), the group disparity δ sub The sub-pixel precision value can be calculated.

number

[0147] The parabolic interpolation used in Equation 32 is already possible through the optimization of the group disparity function as described above. For example, similar to Equation 32, point δ sub It is advantageous to calculate the sub-pixel precision value of the correspondence function in this context.

number

[0148] δ sub is SSD'(x,y,δ p This can be determined from the value of ), which can be calculated directly from their characteristics, as shown in Equation 33. When using floating-point numbers, this calculation may be advantageous because it can be performed with a smaller word length or lower precision compared to the calculation by Equation 29. Therefore, in this embodiment, the arithmetic unit 3 uses the relationship of Equation 3 to determine the subpixel precision value of the group disparity near the extremum δ sub It is configured to calculate δ. p is the extremum of the pixel precision of the correspondence function, and SSD'(x,y,δ p ) is the correspondence function SSD(x,y,δ p This is the derivative of ).

[0149] In one embodiment, the correspondence analyzer is located at position δ p The actual disparity candidate δ determined by the computing unit for the minimum value in sub Store a list of these candidates at position δ. K It is the smallest at , and SSD″(x,y,δ K ), and the confidence function KSSD(x,y,δ) shown in formula 34. K Preferably, the values of ) and the average brightness difference or color difference between the neighbors of the left and right camera images are supplemented with attributes such as the signal intensity of the representable disparity signal. KSSD(x,y,δ K )) uses only the signal v determined by convolution using the convolution kernel in the x direction of Table 4. Here, f Konf This is a convolution kernel that is minimally affected by deviations in the x-direction, such as a Gaussian filter.

number

[0150] More generally, in one embodiment, confidence levels may be assigned to disparity candidates and these confidence levels may be compared. One or more candidates with high confidence levels are considered valid and processed further. Conversely, at least one disparity candidate with a lower confidence level compared to one or more other candidates is selected and not processed further. For example, the arithmetic unit 3 determines the confidence level of a candidate based on evaluation criteria such as SSD(δ) compared to the power spectrum of each reference point, the second derivative of SSD(δ), the averaged gray or color value in the candidate's neighborhood compared to the neighborhood of the reference point, and other measurements as options. These confidence levels may then be compared to the confidence levels of other candidates representing conflicting measurement results, and only candidates with significantly higher confidence levels in these comparisons may be considered valid. Thus, the calculated confidence levels are compared to each other, and at least one candidate for disparity is determined to be valid based on that comparison. This determination can be achieved by further processing of this disparity value or by selecting one or more other candidates for the disparity value.

[0151] In one embodiment, the computing unit 3 of the correspondence analyzer comprises at least one FPGA and / or at least one GPU, and there is also the option of comprising multiple such units. Instead of a reconfigurable FPGA, it is also possible to use a once-reconfigurable computing unit (eASIC) or a non-reconfigurable computing unit (ASIC).

[0152] Figures 14 and 15 illustrate the principle of an exemplary implementation of the correspondence analyzer 1 on the FPGA as part of the arithmetic unit 3. (Rectified image YL) image and YR image Then, each window is shifted synchronously in the row direction on the same row y0. This generates two synchronized data streams FL and FR, as shown in Figure 14. These data streams are characterized by position x u, shown from FL0 to FL19 and from FR0 to FR19, respectively. max ×v max It consists of (Equation 27). δ start This is equal to the lower limit of the expected parallax range. YRimage However, YL image The handling of cases where the entire parallax range for the pixel at position x0 is not covered is minor and will not be considered further.

[0153] In ΔFR of Figure 15, two adders 30 and a delay unit τ (indicated by 32) extract data from the data stream FR. u,v (x0+δ p ,y0)+FR u,v (x0+δ p-1 The terms ,y0) and FR u,v (x0+δ p ,y0)-FR u,v (x0+δ p-1 The term ,y0) is obtained. Next, the blocks of the correspondence analyzer 1 or its arithmetic unit 3 will be described. For example, a vector with 20 features is copied from address x0 of the data stream FL to the dual-port RAM 34 (BUF) at the start and read repeatedly. First, the data stream FR delivers features from address x0. Starting from the beginning, the DSP 36 (e.g., XILINX, DSP48E1) reads each integer δ within the expected disparity range. p Regarding this, similar to equation 33, the function value SSD'(x0,y0,δ p ) calculates YL for each adjacent address x0+1 and additional address. image For other coordinates in the row, dual-port RAM35 is used, and an additional DSP37 is used, operating similarly to the first DSP36. The DSP that operated across the entire parallax range is reusable.

[0154] Next, the first filter processor calculates the function value SSD'(x0,y0,δ p Evaluate ). If the logical AND of equation 31 (when x=x0 and y=y0) is true, then SSD(x0,y0,δ p ) is at position δ p It reaches a local minimum at this point. For such a minimum value, the subpixel-precision group disparity value δ sub These are determined. These minimum values are candidates for the group disparity value.

[0155] Thus, in one embodiment, a configuration is considered in which the arithmetic unit generates a list of candidate disparity values. Subsequently, the arithmetic unit configured accordingly can select a valid disparity value based on at least one selection criterion.

[0156] In an improved embodiment, the second possible filter processor, for this purpose, takes the signal intensity of the disparity signal, i.e., the second derivative of the correspondence function SSD″(x0,y0,δ p Use ). The expected signal strength is YL image and YR image ACFL(x0,y0) and ACFR(x0,y0,δ) in (Equation 35) p Since it can be determined individually as ), it is known that an appropriate approximation of the expected signal intensity can be obtained before calculating the correspondence function. The signal intensity is all v max individual It is integrated across the signal pair. Next, ACFL, ACFR, SSD″(x0,y0,δ p )(Equation 36) and the threshold thr L1 , thr L2 , thr R1 , thr R2 , thr A1 , and thr A2 We will examine the relationship with [the other party].

number

[0157] Simply put, these tests can be understood as tests of the integrated signal intensity of group disparity, or as tests of the integrated signal intensity in both camera images. Therefore, in this embodiment, the computing unit is configured to calculate the relationship between the signal intensity of the disparity signal and the image patch and compare them to a threshold as a selection criterion.

[0158] By taking into account the actual tolerance of the camera, in the test, for example, setting all thresholds to a value of 2 will affect the majority of the correct values of position δ. p Most of the false candidates are removed. A third possible filter processor normalizes the signal strength and compares it to a threshold value SSD. norm (x0,y0,δ p Determine (Equation 37).

number

[0159] The threshold can be thought of as the limit of noise. For example, assuming 20 features and a 10% mean deviation for each feature, the threshold would be 0.2. The position where the threshold is exceeded is δ. p The candidate will be removed. FL of formula 37 u,v Instead of (x,y), use FR in a similar way. u,v (x,y,δ p It is also possible to use ). Similarly, the normalized KSSD(x0,y0,δ p It is also possible to use a test using ) for filtering. Therefore, the selection criterion used here is a comparison of a normalized correspondence function with a threshold for the local signal intensity at each image position corresponding to the candidate disparity. Thus, more generally, this embodiment is based on the fact that the computing unit is configured to calculate a correspondence function normalized to the signal intensity of at least one of the individual images at each image position, or normalize the correspondence function by signal intensity and compare the normalized value of the candidate disparity's correspondence function with a threshold. If the threshold is exceeded, the candidate is selected.

[0160] The fourth possible filter processor is the confidence function KSSD(x0,y0,δ) of equation 34. p Use the aforementioned f. konf By the appropriate selection of δ p In other words, the dependence on changes in the x-direction is negligible. KSSD(x0,y0,δ pFor the convolution in the y direction of ), by using a convolution kernel that optimizes noise for the group disparity in the x direction, KSSD(x0,y0,δ p ) measures noise-optimized disparity in the y-direction. YL image and YR image Since it is rectified, if the parallax in the x-direction is correctly determined, the parallax in the y-direction should be zero in an ideal system. When applied to actual stereo cameras and exemplary embodiments, this corresponds to position δ K The correct candidate KSSD(x0,y0,δ) K ) is at position δ A Other candidates for KSSD(x0,y0,δ A This means that it must be the smallest compared to ). This can be used to filter the candidates and select the correct candidate. Thus, this filter processor is based on an embodiment in which the arithmetic unit is configured to generate a list of candidate disparity values and select the disparity values as valid based on at least one selection criterion, which includes calculating a confidence function value for the candidates and selecting the candidate having the lowest value of the confidence function as valid. Thus, the selection criterion is the confidence function value which depends on the disparity in the y-direction, i.e., the disparity perpendicular to the direction of the epipolar line.

[0161] Another possible selection criterion is color difference or features derived from color difference. More generally, to increase the certainty of determining the actual parallax, a determination can be made by accumulating multiple selection criteria.

[0162] Correspondence function SSD(x0, y0, δ p ) Separately from the confidence function KSSD(x0,y0,δ p Processing the ) relates to optimizing the noise in group disparity. The confidence function calculated perpendicular to the camera-based vector does not provide its own signal to the group disparity measurement and, like cross-correlation, contributes to the generation of additional noise when processed isotropically together.

[0163] A fifth possible filter processor is located at position δ K Further extract some of the aforementioned attributes of the candidates and compare them to a threshold. For example, the expected maximum luminance or chrominance difference between image patches in both camera images can be used as a filter in this method.

[0164] A sixth possible filter processor is the minimum value over the entire range of the correspondence function for all search image patches of the reference image patch, i.e., position δ. K SSD(δ) for all candidates in K Determine the minimum value of ). From there, derive a threshold and determine the SSD(δ). K ) then selects candidates that exceed that threshold. In the example shown in Figure 13, the threshold is indicated by a dashed line.

[0165] The aforementioned filter processors can be connected in any order and executed in parallel, and the number of candidates has been reduced to a sufficiently small number, so the disparity value, preferably the subpixel precision value δ, can be used. sub It is possible to store and combine these in memory for the entire row. A filter processor independent of the calculation of the correspondence function can be applied before the calculation of the correspondence function, and search image patches can be excluded before the value of the correspondence function or its first derivative is determined.

[0166] SSD norm (x0,y0,δ p The values used by the aforementioned filter processors, such as ) are KSSD(x0,y0,δ p ) can be combined using a weighting method, and a confidence value or confidence vector K is obtained for each candidate. When multiple candidates have conflicting measurements for the same or different coordinates in the image, such a confidence vector K can be used to find the candidate that is presumed to be correct and to eliminate the candidate with lower confidence. For example, if K is SSD norm (x0,y0,δ p ) and KSSD(x0,y0,δ pWhen obtained from ), the candidate with the smallest size of K is likely to be the best candidate, so it is possible to eliminate other competing candidates.

[0167] The arithmetic unit 3 uses one of the following relationships to calculate the parallax δ p The subpixel precision value of the group disparity δ near the extremum of the correspondence function at the location of the search image patch having the zero crossing of the first derivative. sub This decision can be made.

number

[0168] Correspondence function SSD(δ p Instead of, or in addition to, the calculation of its derivative SSD'(δ) can be performed as already mentioned above. p ) can also be calculated, and the disparity δ can be determined from this derivative. Therefore, in a further aspect of the present invention, the correspondence function SSD'(δ) can be calculated according to the following relationship. p A correspondence analyzer is provided that is configured to calculate the first derivative of ).

number

[0169] Here, we describe a process using a system consisting of two correspondence analyzers: one that performs high-frequency processing to accurately detect surface details based on texture, and another that performs low-frequency processing to approximate the surface based on an evaluation of diffuse reflectance when there is no texture.

[0170] Low-frequency processing In the first parallel processing, a further development of the correspondence analyzer 1, the arithmetic unit 3 processes image pairs whose resolution has been reduced by a prior low-pass filtering process. In an exemplary embodiment, the resolution is reduced to one-quarter, so the number of pixels is reduced to one-sixteenth. This processing is optimized to inherently capture low-frequency (LF) spatial frequencies that diffuse reflections, using one or more weighted vectors g LFThis is used to store at least one set of convolution kernels for convolution in the x and y directions. As previously mentioned, convolution is applied to both images to generate low-frequency processed feature vectors or data streams FL and FR. The data streams are processed by a correspondence analyzer as shown in Figure 15. Valid candidates for the disparity δ at coordinates x, y are determined using the aforementioned filter processor and, as an alternative, additional neighbor filters, so that an LF disparity map with reduced resolution and reduced measurement accuracy, for example, 1 / 4px, is obtained. The LF disparity map is then used to predict the disparity range for subsequent high-resolution analysis.

[0171] High-frequency processing In the second parallel processing, a further development of the correspondence analyzer, high-resolution image pairs are processed directly by a second part of the same configuration of the computing unit 3. The second processing is preferably delayed in time relative to the first processing so that the calculation results of the first processing in the form of an LF disparity map can be used for predicting the disparity range. For this purpose, the computing unit may be configured to use disparity values determined or estimated by the correspondence analysis with the first correspondence function in order to predict the results or to control the correspondence analysis with the second correspondence function. Here, using appropriately selected parameters or a convolution function, the second correspondence function transmits signal components with higher frequencies from the image patch than the first correspondence function.

[0172] With typical camera tolerances, high-frequency processing is performed using predictions within a parallax range of ±4px for the parallax values in the LF parallax map. Even when the LF parallax map does not contain valid candidates or only contains candidates with low coordinate confidence, high-frequency processing can analyze the maximum expected parallax range for this coordinate. In the second process, one or more weight vectors g are optimized to capture textures considering the camera's OTF. HFUsing this, at least one set of convolution kernels for convolution in the x and y directions is stored. Convolution is performed on both images in this manner, and the high-frequency second processing data streams FL and FR are obtained. Subsequent processing is the same as in the first processing.

[0173] Finally, the results of the first and second processing are combined to create a disparity map, taking into account the confidence obtained in each case. An appropriate measure of confidence is the confidence vector K mentioned above, in particular, the integrated signal intensity (e.g., ACFR(x0,y0,δ)). p It is also beneficial to include equation 35). Therefore, the measurement results for coordinates with low signal strength will also be unreliable. If the coordinate measurement results are highly reliable in both the first low-frequency processing and the second high-frequency processing, the results of the second processing, which are likely to have higher measurement accuracy, will be used. If only the first processing provides coordinate positions with high reliability, those results will be used. If the first processing only provides coordinates with low reliability, the second processing can analyze the entire expected disparity range as described above, and if the reliability is high, those results can be used. As mentioned above, conflicting measurement results can be filtered out based on reliability.

[0174] In the final step, output low-pass filtering is performed. For this purpose, δ sub It is advantageous to first convert the combined disparity map, consisting of the two components, into Cartesian coordinates according to Equation 1, and then interpolate it with a Gaussian filter. In this way, a grid that is equidistant in the x,y plane is obtained. Figure 6(a) shows the result before filtering, and Figure 6(b) shows the result after filtering. This process is known as resampling.

[0175] In the exemplary embodiment described above, for the sake of simplification, it was assumed that the information from the image patches used for disparity determination was equally weighted, regardless of its location within each image patch. However, heterogeneous weighting using the weighting function W(x) is also possible and can be integrated into the signal model as shown in Equation 40, as an extension of Equation 6.

number

[0176] The weighting function can take any form or value; for example, it is possible to use a function similar to that of a Gaussian filter, as shown in Equation 41. This weights the signals at the center of the image patch more heavily than the signals at the edges of the image patch, meaning that the former has a relatively greater influence on disparity determination than the latter. For example, if the weighting is uniform, W(x) has a constant value of 1.

number

[0177] With appropriate selection of weighting functions, the convolution kernel can be determined according to the procedure already described, using numerical computation of integrals as needed. For example, when equation 41 is used, the matrices AEV and AOD vary depending on the selection of parameter ρ, but the subsequent steps are similar. In this regard, it should be noted that the convolution kernel still contains a weighted sum of multiple even and odd harmonic functions, but by using weighting functions, they are determined to also contain further selected weighting functions. Without being limited to specific exemplary embodiments such as the special weighting of equation 41, in one embodiment, at least one (preferably all) of the convolution kernels are weighting functions. In particular, it is considered to further include weighting functions suitable for incorporating information from various parts of an image patch at various levels for correspondence analysis, especially for disparity determination.

[0178] Weighting can also be performed when determining signals from image patch data. Figure 17 shows the weighting obtained as a result of the image patch information. Graph (a) shows uniform weighting trimmed to an 8x8 image patch for easier understanding, while graph (b) shows the weighting by Equation 41 when the full width ρ at half maximum is 3.5px, both in signal determination and further processing of the signal.

[0179] Gaussian weighting functions are practically important for enhancing 3D contrast, or simply put, for focusing measurements on a portion of an image patch (e.g., the central position). For example, as a result of weighting with the weighting function from graph (b) in Figure 17, more or less powerful information is available to determine parallax, but in this example, the information used is closer to the desired measurement location. This is usable when the signal-to-noise ratio is good, for example, when the subject is well-lit, has texture, and the camera image is well-focused. This allows for more accurate parallax measurements even near uneven subject surfaces or the edges of the subject. Therefore, the weighting function can be appropriately selected based on knowledge of subject characteristics or imaging characteristics, for example, by appropriately selecting the full width at half maximum or the parameter ρ. The smaller ρ, the more the measurement is concentrated on a sub-region. On the other hand, a uniform weighting function or a large value for the parameter ρ is advantageous in image patches with a poor signal-to-noise ratio, such as in fog.

[0180] The Gaussian weighting described above represents one embodiment in which pixels near the centroid of a weighted image patch may have a higher weighting than pixels at the edges of the image. More generally, in yet another embodiment, it is considered that at least one of the filter kernels has a weighting function that uses its weighting function to weight the portion of the image patch near the centroid more strongly than the portion further away from the centroid. Here, the centroid may be, in particular, the geometric center of the image patch. Also, as described above, the weighting can be modified or selected based on image characteristics. For this purpose, in one embodiment, it is generally intended that the computing unit is configured to select a weighting function in response to image characteristics such as the signal-to-noise ratio or a leap in depth information near or within an image patch, which is determined or reasonable in previous measurements. For example, if such a leap has already been determined for a minimum number of adjacent image patches or pixels based on the course of disparity, then the leap in depth information may be reasonable and definable for the image patch. For example, the weighting may be modified when at least two adjacent pixels exhibit such a leap in depth information.

[0181] If the weighting function is selected such that the centroid of the weighting function in the image patch is different from the centroid of the image patch, the correspondence function SSD(δ p When determining the distance δ between the reference image patch and the search image patch, p It is advantageous to determine this based on the centroid of the weighting function in these image patches. When calculating the centroid of the weighting function, the function value of the weighting function is included in the calculation of the center of mass, depending on the mass or local density. In other words, the centroid of the weighting function corresponds to the weight centroid of the weighted image patch.

[0182] In the case of weighting using a Gaussian distribution, the range around ρ=3 is particularly important for an 8x8px image patch. Therefore, in further embodiments, although not limited to the illustrated example, it is generally intended that at least one of the convolutional kernels has a weight function whose function value has the full width of its half-value, and that full width of its half-value is less than 2 / 3 of the width of the image patch, preferably less than half the width of the image patch. Here, the width in question is the width in the direction in which the weight function changes. In the example in Figure 17, this can be in both the x and y directions.

[0183] As already explained, it is advantageous to apply a low-pass filter to the 3D data or any disparity determined to be valid from the data. In an alternative or additional embodiment of the present invention, an averaged correspondence function is calculated before determining the disparity δ, i.e., the correspondence function SSD(δ) for each reference image patch. p It has been proven advantageous to calculate the SSD(δ) by performing an arbitrary weighted average or low-pass filter on the calculated function value of the SSD(δ) and the correspondence function of the reference image patch in the neighborhood of the same point δ. Therefore, in one embodiment of the correspondence analyzer, the arithmetic unit 3 generally calculates the SSD(δ) correspondence function of the reference image patch. p The value of ) and the correspondence function SSD(δ) of multiple other, especially adjacent, reference image patches. p Averaging of the reference image patch is performed by calculating the arithmetic mean or weighted mean of the values of ). Furthermore, a configuration is considered in which this averaged correspondence function is processed according to the present invention, in particular, to calculate and output the subpixel-precision value of the disparity at point δ.

[0184] Equation 42, as an exemplary embodiment, uses a 3x3 environment of reference image patches and averages them using the correspondence function SSD, which includes uniform weighting. Avg This will show that, then, instead of the SSD function, SSD Avg The function is used.

number

[0185] Such combinations of correspondence functions for multiple reference image patches may slightly reduce the achievable 3D contrast on curved or non-planar surfaces, but correspondence functions also contain at least partially uncorrelated disturbances such as quantum noise or pixel artifacts, which are favorably attenuated by this averaging and low-pass filtering in the linear part of signal processing. What distinguishes this filtering from the low-pass filtering before calculation of SSDs such as Gabor's method is the application of filtering after applying the convolution kernel for group disparity and the calculation of the correspondence function. This filtering is different from output low-pass filtering, in particular, as it is also performed before sub-pixel interpolation, which determines the precise location of the disparity.

[0186] Furthermore, the SSD is the variable part of the correspondence function. var There are disturbances. These are still partially correlated at this point in signal processing and can be reduced particularly effectively by averaging multiple correspondence functions. This makes the low-pass filtering more effective. This characteristic is unique to this filtering because, since subpixel interpolation is usually nonlinear, it does not exist after the disparity calculation and does not exist in this form before the calculation of the correspondence function. As an improvement, the low-pass filter is optimally configured so that spatial frequency components above 4ω are significantly reduced, with only a slight reduction in the spatial frequency 4ω.

[0187] Deviating from the disclosed advantageous embodiments typically results in increased noise or a decrease in disparity measurement quality. Examples include deviations in the coefficients of the convolution kernel, convolution of the signal of a reference image patch with the signals of multiple search image patches using different convolution kernels, use of a weighting function with a geometric centroid that does not correspond to a desired measurement point in the image patch, or use of a convolution kernel containing even or odd functions where the coordinate origin is not at the location of the geometric centroid of the weighting function in the image patch, or, in the case of uniform weighting, not at the location of the centroid of the weights in the image patch. Such deviations typically lead to manipulation of disparity measurements. However, in combination with averaging of the correspondence function or low-pass filtering, this type or similar form of deviation can be actively used under certain circumstances. For example, convolution kernels with different convolution kernels, different centroids of the weighting function, or different coordinate origins are used for different reference image patches. More generally, the coordinate origins where the convolution kernel function is even and odd do not need to be located at the center of each image patch, but can generally be off-center, as in the embodiments described above. Choosing these deviations is advantageous when the individual measurement errors of the resulting assumed disparity are summed and weighted according to any weighting such that the sum of the correspondence functions is zero, or averaged so that the sum of the correspondence functions is zero. var The noise, in particular, depends on each parallax, and these parallaxes can be partially decorrelated by appropriate selection. The configuration and signal model disclosed herein are for SSDs. var This is typically constructed to be substantially similar to the odd-numbered functions near the extrema of the correspondence function. Therefore, averaging the correspondence function is particularly well-suited for reducing noise through the statistical integration of errors.

[0188] As mentioned earlier, if the tolerance of the camera's gain is small, noise generally does not occur, but large contrast differences between cameras with different OTFs are not canceled out. In actual stereo cameras, there is generally a tolerance in the camera's transfer function, so the amplitude of the convolution result of the signal of the reference image patch does not necessarily match the amplitude of the convolution result of the signal of the corresponding search image patch. Since the value of the correspondence function SSD at this point is not zero, additional noise may be introduced into the determined disparity. The amplitude vector of the convolution result of the signal of the image patch can be estimated from the signal intensity of the image patch. Therefore, normalizing these convolution results using signal intensity, i.e., dividing the convolution result by signal intensity, is advantageous because it reduces the difference between amplitudes.

[0189] Therefore, in one embodiment of a correspondence analyzer, it is generally considered that the computing unit normalizes at least one convolution result of the signals of one image patch, preferably all image patches, preferably all convolution results, using the signal intensity of each image patch, in particular a value correlated with the signal intensity of the signal of this image patch used in correspondence analysis.

[0190] In exemplary embodiments using digital images, signal intensity can be estimated using the second derivative obtained by comparing the image with itself using a correspondence function. Thus, using equations 30 and 29, the signal intensity can be determined as the square root of ACFL or ACFR from equation 35.

[0191] In a further embodiment of the present invention, the computing unit is configured to normalize at least one, preferably all, of the features calculated from the image data of the left and right cameras by the respective signal intensity at corresponding points in the images of the cameras, and in particular to perform further calculations using the thus normalized features. The further calculations include, in particular, determining one or more minimum values of the correspondence function. This increases the similarity of the signals, improves the signal-to-noise ratio, and brings the relative minimum of the SSD closer to the target value of zero. The use of an approximate solution instead of the square root is also possible. Furthermore, when the features are normalized as described above and there are no other disturbances, the SSD'' converges to 1. This property can also be used in subsequent confidence analysis. [Explanation of symbols]

[0192] 1. Correspondence analyzer 2 Stereo cameras 3 Computing device 4 Subject 5 Camera Sensor 6 memory 8, 9 lenses 10 Lens Mounts 11, 12 Eccentric elements 13 screws 20, 21 Camera 22 Imaging devices 25, 26 Digital images 30 Adder 32 delay units 34, 35 Dual-port RAM 36, 37 DSP 98, 99 Epipole 101 3D points 102 Epipolar Plane 103, 106 pixels Images 104 and 105 107 Epipolar Line

Claims

1. A correspondence analyzer (1) that determines the disparity δ, which is the shift between corresponding pixels in two separate digital images (25, 26), - The system includes a calculation unit (3), and the calculation unit (3) is - Selecting an image patch from the two individual digital images (25, 26), wherein the image patch from one of the two individual digital images selected as the reference image patch and a series of search image patches selected from the other of the two individual digital images are selected, and - Multiple YL signals from the aforementioned reference image patch signal,v Generates multiple signals YR from the search image patch. signal,v To generate, and - The multiple signals YL of the reference image patch are processed using an even convolution kernel having a weighted sum of multiple even harmonic functions of different spatial frequencies, and an odd convolution kernel having a weighted sum of multiple odd harmonic functions of different spatial frequencies, both stored in memory (6). signal,v Performing the convolution in the spatial window, and - Using the multiple convolution kernels stored in the memory (6), for each of the search image patches, the multiple signals YR signal,v Performing the convolution in the aforementioned spatial window, and - Each signal pair YL signal,v and YR signal,v Regarding this, calculate the difference between the respective convolution results, and - Non-linearly process the difference of the convolution results for each search image patch and accumulate them to indicate the distance from the search image to the reference image, at point δ p to obtain the function value of the correspondence function SSD(δ p ), or, from the difference of the convolution results, for point δ p at δ p calculate the first derivative SSD´(δ p ) of the correspondence function SSD(δ p ), and in this way, obtain the function value of the correspondence function SSD(δ p ) at point δ p or the function value of its first derivative SSD´(δ p ), and - The correspondence function SSD (δ p The extreme value of ) or the correspondence function SSD (δ p The first derivative of the above SSD' (δ p Determining the zero crossing of ) and - One of the extreme values, point δ p or one of the zero crossings, point δ p Outputting this as parallax δ, or - Said point δ p To calculate and output the subpixel precision value of the parallax in the above case, A correspondence analyzer (1) configured to perform the following.

2. - In the signal model of each signal V within the aforementioned spatial frequency range, k max Convolution operation of an even number of functions and l max Each of the convolution operations of the odd number of functions has an amplitude A m By transmitting the sum with a group of weighted signal components of the spatial frequencies having the above, two partial sums are obtained for each of the spatial frequencies of each signal V and index m in the correspondence function SSD(δ), and the two partial sums have a squared amplitude A from the result of the convolution operation using the even function. m 2 The first term characterized by the above and the squared amplitude A from the result of the convolution operation using the odd function m 2 It has a second term characterized by the first partial sum and the second partial sum being the sum of the two partial sums SSD inv (δ) is the subject phase Δ m The convolution kernel is selected such that it is combinatable according to the Pythagorean trigonometric function without depending on the other factors. - In determining the parallax, the convolution kernel is selected such that, in particular, even if the shift of a planar subject having intensity modulation along the direction of the epipolar line including spatial frequencies within the range of spatial frequencies, or a planar subject having a corresponding texture, and the shift of a subject occurring along the epipolar line at a certain distance Z from the camera are 0.1 pixels, a local standard deviation of the parallax measurement value of less than 0.2 pixels is achieved. The correspondence analyzer (1) according to claim 1, wherein at least one of the following is applied.

3. - A correspondence analyzer (1) according to claim 1 or 2, comprising at least one of the following features. In the signal model of each signal V within the aforementioned spatial frequency range, k max Convolution operation of an even number of functions and l max Each of the convolution operations of the odd number of functions has an amplitude A m The convolution kernel is selected such that two terms are obtained for each of the signals V and each of the spatial frequencies having index m in the correspondence function SSD(δ) by transmitting a sum with a group of weighted signal components of the spatial frequencies having the first term being the square of the amplitude A m 2 The product of the first constant and the square of the sine function, where the second term is the square of the amplitude A. m 2 The first constant is the product of a second constant and the square of the cosine function, characterized in that the values of the first and second constants are equal or equal within a tolerance of ±20%. - At least one, preferably all, of the convolution kernels comprises a weighting function suitable for incorporating information from different parts of the image patch to different degrees in correspondence analysis, particularly in the determination of the disparity. - At least one filter kernel is characterized by a weighting function that weights portions of an image patch, wherein portions of the image patch close to the centroid are weighted more strongly using the weighting function than portions further from the centroid. - The arithmetic unit is configured to select a weighting function based on image characteristics, in particular on the signal-to-noise ratio, or on jumps in depth information in the vicinity of or within the image patch, wherein the jumps are determined by previous measurements or values deemed reasonable.

4. The aforementioned computing device (3) The system is configured to generate v max of the multiple signals YLsignal,v from the reference image patch by performing a convolution operation on the data of the reference image patch which is approximately perpendicular to the epipolar line, and to generate v max of the multiple signals YRsignal,v from each of the search image patches by performing a convolution operation on the data of each of the search image patches which is approximately perpendicular to the epipolar line. Convolution operation, k, that generates signal YL signal,v and signal YR signal,v. max The convolution operation of the aforementioned even functions, and l max The convolution operation of the aforementioned odd number of functions is In the signal model, the latter multiple convolution operations are selected to each transmit their sum in multiple weighted signal components of multiple spatial frequencies, which are represented by multiple different values of index m, For each signal, in the correspondence function SSD(δ), the subject phase Δ m A first partial sum consisting of terms independent of the subject phase Δ is obtained, and the subject phase Δ m A second partial sum is obtained consisting of terms that depend on , The aforementioned v max When the first partial sums of the individual signals YL signal,v and YR signal,v are accumulated, a constructive sum is obtained in which the individual terms do not cancel each other out. The aforementioned v max When the second partial sums of the individual signals YL signal,v and YR signal,v are integrated, a statistical sum is obtained in which these noise components statistically cancel each other out at least partially. A correspondence analyzer (1) according to any one of claims 1 to 3, configured as follows.

5. - The signal format of the aforementioned even-numbered convolution kernel is the Fourier coefficient c k,n The signal format of the odd convolution kernel is approximated by a Fourier series having the Fourier coefficient s l,n It is approximated by a Fourier series having, where n is the index of each of the spatial frequencies of each of the Fourier series, - Profile vector g showing each of the transformed spatial frequencies m and their corresponding weights m The Fourier coefficient c for k,n and s l,n This is the solution to the following nonlinear equation, where k max represents the number of even convolution kernels and l max represents the number of odd convolution kernels: When there are four values for each of the aforementioned indices m and n, the coefficient AEV n,m and coefficient AOD n,m The correspondence analyzer (1) according to any one of claims 1 to 4, wherein the values are determined by the following matrices or have an error of 0.8 to 1.2 times from each of the values of these matrices.

6. The correspondence function SSD (δ p The first derivative SSD'(δp) of ) is determined using the following relationship: Here, δ p-1 is, δ p The parallax is the preceding parallax in the order of the search image patches up to the aforementioned search image patch, FL u,v This is the convolution kernel at index u from a set of u max convolution kernels used for convolution of the signal YL signal,v This is the result of folding FR u,v (δ) is the signal YR of the search image patch having disparity δ, as determined by the convolution kernel of index u. signal,v A correspondence analyzer (1) according to any one of claims 1 to 5, which is the result of convolution.

7. The calculation device (3) uses one of the following relational expressions to determine the value of the subpixel precision of the group disparity δ near the extreme value. sub , or the parallax δ p It is configured to determine the zero crossing of the first derivative SSD'(δp) of the correspondence function SSD(δp) at the location of the search image patch having the following characteristics: Here, δ p-1 is, δ p The preceding disparity in the order of the search image patches up to the aforementioned search image patch is δ p+1 is, δ p The parallax is the subsequent parallax in the order of the search image patches up to the aforementioned search image patch, δ sub A correspondence analyzer (1) according to any one of claims 1 to 6, which outputs the parallax δ.

8. The system includes a computing device (3) configured to select an image patch from each of the two individual digital images (25, 26), in which case at least one image patch from one of the two individual digital images is selected as the reference image patch, the search image patch is selected from the other individual digital image, and a plurality of candidate disparity values are calculated from the image patch. Correspondence analyzer (1) according to any one of claims 1 to 7, wherein the calculation device (3) is further configured to select information from the reference image patch and the search image patch, and based on the information, select a confidence vector for possible disparity values that is suitable for estimating whether each result shows an actual correspondence between the reference image patch and each of the search image patches.

9. Correspondence analyzer (1) according to any one of claims 1 to 8, wherein the calculation unit (3) is configured to generate a list of candidate disparity values for a specific reference image patch, preferably select a confidence vector for each candidate, and select all or some of the candidates as valid, or to consider none of the candidates as valid for the specific reference image patch, based on the confidence vector and / or other selection criteria.

10. Correspondence analyzer (1) according to claim 9, wherein the calculation device (3) is configured to select a value for at least one element of the confidence vector for reference image patches and search image patches classified into at least several categories, using a function that classifies candidates as valid or invalid with a higher probability than the probability when the correspondence function SSD(δp) is used alone.

11. The aforementioned computing device (3) - A point δ related to the threshold derived from the extreme values of the correspondence function SSD(δp) of all candidates for the aforementioned reference image patch. p The candidate correspondence function SSD (δ) in p The relationship or difference between ) - The gray value relationship between a portion of the reference image patch and each portion of the search image patch, or features derived from the difference in gray values, - The color relationship, or features derived from the color difference, between a portion of the reference image patch and each portion of the search image patch. - The relationship between the signal intensity of each of the aforementioned search image patches and the signal intensity of the aforementioned reference image patch, - In each case, the normalized cross-correlation coefficient between a portion of the data from the reference image patch and a portion of the data from each of the search image patches, which are approximately perpendicular to the epipolar line. The system is configured to select the values of multiple elements of the confidence vector using one or more of the following as features: The correspondence analyzer (1) according to claim 9 or 10 is preferable in which noise is avoided by performing a minor low-pass filter process along the epipolar line.

12. The correspondence analyzer (1) according to claim 10 or 11, wherein the calculation device (3) is configured to make available to the user of the correspondence analyzer the list of candidates, preferably only the valid candidates, and more preferably with their respective confidence vectors.

13. Multiple correspondence functions SSD (δp) with different parameterizations, and their even and odd convolution kernels, and preferably the profile vector g corresponding to each of them. m However, this is either stored in the correspondence analyzer (1) or determined at runtime. The correspondence analyzer (1) according to claim 5, further configured to select a plurality of correspondence functions SSD(δp) and a portion of their convolution kernels based on available classifications of the individual digital images or the image patches, or based on classifications of the individual digital images or the image patches that are advantageous for further processing.

14. The parameters of at least one correspondence function SSD(δp) and its convolution kernel are the respective profile vector g m The correspondence analyzer (1) according to claim 13, wherein the weighting coefficient for the highest spatial frequency is selected to be smaller than at least one of the other weighting coefficients of the profile vector g m.

15. Correspondence analyzer (1) according to any one of claims 1 to 14, wherein the classification or profile vectors on which a plurality of correspondence functions SSD(δp) and their convolution kernels are selected are selected based on the power spectrum of the data of the individual digital images or the image patch, preferably with consideration to the optical transfer function.

16. The correspondence analysis is performed using two or more correspondence functions SSD(δp) and convolution kernels with different parameterizations, and the computing unit combines two or more obtained results or selects a partial result from these results, preferably based on the determined confidence vector, according to any one of claims 1 to 15, correspondence analyzer (1).

17. The correspondence analyzer (1) according to claim 16, wherein the calculation unit is configured to use a disparity value determined or estimated by a correspondence analysis using a first correspondence function SSD (δ p) from among the two or more correspondence functions SSD (δ p) in order to predict the result or control the correspondence analysis, and the second correspondence function SSD (δ p) transmits signal components with a higher frequency than the first correspondence function SSD (δ p) from the image patch using appropriately selected parameters or a convolution function.

18. The aforementioned arithmetic unit (3) uses a low-pass filter, - The calculated parallax value, - Confidence value, - Parallax values weighted by confidence levels, Correspondence analyzer (1) according to any one of claims 1 to 16, configured to filter at least one of the following.

19. k max f is equal to 2, and the even convolution kernel is given by the following equation. even,1 and f even,2 Includes, l max f is equal to 2, and the odd convolution kernel is given by the following equation. odd,1 and f odd,2 Includes, off even,1 and off even,2 Correspondence analyzer (1) according to any one of claims 1 to 18, wherein the even convolution kernel is selected to be approximately mean free, and at least one of the coefficients 3.4954, 0.7818, 4.9652, 1.8416, 4.0476, 0.2559, 6.0228, or 0.0332 can be greater or less than 10% by up to 10%.

20. The calculation unit (3) performs averaging of the reference image patch, in particular the correspondence function SSD (δ) of the reference image patch. p The value of ) and multiple other reference image patches, in particular, the correspondence function SSD (δ) of adjacent reference image patches. p Correspondence analyzer (1) according to any one of claims 1 to 19, configured to calculate the arithmetic mean or weighted mean of the value of ) and, further configured to process the averaged correspondence function SSD(δp) in the same manner as described in claims 1 to 19, in particular to calculate and output the subpixel precision value of the disparity at point δ.

21. Correspondence analyzer (1) according to any one of claims 1 to 20, wherein the calculation device (3) is configured to normalize at least one, preferably all, convolution results of one, preferably all, of the signals of one, preferably all, of the image patches with a value corrected by the signal intensity of the respective image patch, in particular the signal intensity of the signal of the image patch used in the correspondence analysis.

22. A stereo camera (2) comprising two cameras (21, 22), each of the two cameras (21, 22) having a camera sensor (5) and lenses (8, 9), wherein the optical centers of the lenses (8, 9) in the camera sensor (5) are spaced apart from each other by the width of the base B, the stereo camera (2) comprising the correspondence analyzer (1) according to any one of claims 1 to 21.

23. The stereo camera (2) according to claim 22, wherein one of the lenses (8, 9) is held in an adjustable eccentric, and by rotating the lens (8, 9) within the eccentric in front of the test image, the coplanarity error can be corrected and the coplanarity of the optical axes of the lens (8, 9) can be adjusted.

24. The stereo camera (2) according to claim 22 or 23, wherein the stereo camera is configured to additionally evaluate the disparity of corresponding image patches in a direction substantially perpendicular to the epipolar line in order to correct coplanarity alignment errors during execution, and corrects the mean deviation of the disparity from zero, i.e., the deviation from the ideal epipolar geometry, by shifting one of the images substantially perpendicular to the epipolar line in the reverse direction by correcting the rectification parameter.

25. The arithmetic unit (3) of the correspondence analyzer (1) is configured to normalize at least one, preferably all, of the features calculated from the image data of the left camera and the right camera by the respective signal intensity of the cameras, as described in any one of claims 22 to 24.

26. A method for determining the parallax of corresponding pixels in two separate digital images (25, 26), preferably rectified to a stereo normal state using a correspondence analyzer (1) as described in any one of claims 1 to 19, In order to determine the disparity δ, which is the shift of corresponding pixels in two separate digital images (25, 26), the arithmetic unit (3) performs the following: - Selecting an image patch from the two individual digital images (25, 26), the step of selecting the image patch from one of the two individual digital images selected as the reference image patch, and a series of search image patches selected from the other of the two individual digital images, and - A step of generating v max multiple signals YLsignal,v from the reference image patch, and generating v max multiple signals YRsignal,v from the search image patch, and - The multiple signals YL of the reference image patch are processed using an even convolution kernel having a weighted sum of multiple even harmonic functions of different spatial frequencies, and an odd convolution kernel having a weighted sum of multiple odd harmonic functions of different spatial frequencies, both stored in memory (6). signal,v The steps include performing the convolution in the spatial window, and - Using the multiple convolution kernels stored in the memory (6), for each of the search image patches, the multiple signals YR signal,v The steps include performing the convolution in the spatial window, - Each signal pair YL signal,v and YR signal,v The steps involve calculating the difference between the respective convolution results, and - By non-linearly processing the difference of the convolution results for each search image patch and integrating them, the point δ indicating the distance from the search image to the reference image p to obtain the function value of the correspondence function SSD(δ p ), or, from the difference of the convolution results, for the point δ p in δ p to calculate the first derivative SSD'(δ p ) of the correspondence function SSD(δ p ), and in this way, to calculate the function value of the correspondence function SSD(δ p ) at the point δ p or the function value of its first derivative SSD'(δ p ), and - The correspondence function SSD (δ p The extreme value of ) or the correspondence function SSD (δ p The first derivative of the above SSD' (δ p The steps include determining the zero crossing of ) and, - One of the extreme values, point δ p or one of the zero crossings, point δ p A step of outputting as parallax δ, or - the point δ p A method used in the step of calculating and outputting the value of the subpixel accuracy of the parallax at.