Stereo camera device

The stereo camera device adjusts pixel values and exposure settings to maintain low-light performance and image contrast across cameras with different focal lengths, ensuring accurate stereo matching and image processing.

JP2026108925APending Publication Date: 2026-07-01ASTEMO LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
ASTEMO LTD
Filing Date
2024-12-19
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Existing stereo camera systems with cameras of different focal lengths and aperture values face challenges in maintaining low-light performance and image contrast due to unequal incident light amounts, leading to issues in stereo matching.

Method used

The stereo camera device employs gradation and tone conversion units to adjust pixel values and exposure settings, ensuring consistent brightness sensitivity across cameras, thereby maintaining low-light performance and image contrast.

Benefits of technology

This approach allows for accurate stereo matching by preserving image contrast and low-light capabilities, enhancing the system's ability to capture and process images effectively.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026108925000001_ABST
    Figure 2026108925000001_ABST
Patent Text Reader

Abstract

This system maintains the low-light performance advantage of cameras with high incident light levels while simultaneously maintaining image contrast when aligning the tonal range between images during stereo matching. [Solution] A stereo camera device comprising a first camera 110, a second camera 120, an exposure amount calculation unit 204 for calculating the exposure of an image, and a tone control unit 207 for determining the tone conversion curve used by the second camera 120 based on the calculated exposure, wherein the tone control unit 207 is configured to select the tone conversion curve used for converting the pixel information of an image captured by the second camera 120 from a plurality of tone conversion curves, including a second tone conversion curve having the same characteristics as a first tone conversion curve used by the first camera, and a third tone conversion curve having a steeper slope than the second tone conversion curve in regions where the amount of light is less than the amount of light at which the pixel value is maximum in the first tone conversion curve.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to an image processing apparatus, and more particularly to a stereo camera apparatus that performs image processing using images captured by a plurality of cameras.

Background Art

[0002] In recent years, as an apparatus for measuring three-dimensional objects by images, a stereo camera apparatus equipped with two cameras and capable of measuring the distance and size to an object using the parallax of images captured by each camera has become widespread.

[0003] In a stereo camera apparatus, for example, if the shooting conditions of two cameras are different, correct measurement may not be possible. As a technique related to such a problem, for example, Patent Document 1 discloses that when performing stereo matching using images captured by two cameras with different angle-of-view, the luminance distribution ranges between the images are made uniform. Further, Patent Document 2 discloses a technique for aligning the pixel values of images when the sensitivities of images captured by two cameras are different.

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Patent Document 2

Summary of the Invention

Problems to be Solved by the Invention

[0005] Patent Document 1 describes a method for aligning the brightness distribution range by calculating the ratio of the maximum brightness value of one image to the maximum brightness value of the other image, and multiplying all brightness values ​​of the former by this ratio. However, if the image before correction lacks gradation and has lost contrast, it is difficult to restore the contrast necessary for stereo matching.

[0006] Furthermore, generally speaking, when the aperture diameter of camera lenses is the same, cameras with shorter focal lengths and wider angles of view receive more light (incident light) into the image sensor pixels (incident light) compared to cameras with longer focal lengths and narrower angles of view, giving them an advantage in low-light photography. If the lenses of the two cameras in a stereo camera system have different focal lengths and therefore different incident light amounts, setting the brightness sensitivity of both cameras to the same brightness and the same pixel value may result in the loss of tonal gradation in low light and crushing of blacks in the camera with the longer focal length, or the signal with low pixel values ​​in the camera with the shorter focal length being buried in noise, thus preventing the advantage of low-light photography from being utilized. In other words, it becomes difficult to achieve both the maintenance of tonal gradation in low light with the camera with the longer focal length and the advantage of low-light photography with the camera with the shorter focal length.

[0007] In view of the problems in the prior art described above, the object of the present invention is to achieve both the maintenance of the low-light performance advantage of the camera with a higher amount of incident light and the maintenance of image contrast when matching the gradation between images in stereo matching, in a stereo camera device using two cameras with different amounts of incident light to the image sensor. [Means for solving the problem]

[0008] To solve the above-mentioned problems, the stereo camera device of the present invention, in one preferred embodiment, comprises: a first camera comprising: a first lens; a first image sensor that converts and outputs electrical pixel information corresponding to the amount of light incident through the first lens; and a first gradation conversion unit configured to convert the pixel information output from the first image sensor using preset first gradation conversion information and acquire a first image having the pixel values ​​obtained by the conversion; a second lens having a smaller aperture value than the first lens; a second image sensor that converts and outputs electrical pixel information corresponding to the amount of light incident through the second lens; and a preset second gradation conversion information having characteristics similar to the first gradation conversion information. The second camera includes a second tone conversion unit configured to convert pixel information output from a second image sensor using tone conversion information selected from a plurality of tone conversion information, and to acquire a second image having the pixel values ​​obtained by the conversion; an exposure amount calculation unit configured to acquire a first image and a second image and to acquire the exposure amount of a predetermined imaging area of ​​the first image and the second image, respectively; a tone control unit configured to select tone conversion information to be used by the second tone conversion unit from a plurality of tone conversion information based on the exposure amount calculated by the exposure amount calculation unit; and a stereo recognition unit configured to perform image recognition based on stereo field images captured in both the first image and the second image. Preferably, the plurality of tone conversion information includes a third tone conversion information for performing a conversion such that, in a region where the amount of light is less than the amount of light corresponding to the maximum pixel value that the pixel value of the image acquired based on the first tone conversion information can take, the amount of change in the pixel value in relation to the amount of light is greater than or equal to the amount of change in the pixel value in relation to the amount of light in the image acquired using the second tone conversion information. [Effects of the Invention]

[0009] According to the present invention, in a stereo camera device using two cameras with different amounts of incident light to the image sensor, it is possible to maintain the advantage of the low-light performance of the camera with a higher amount of incident light, while suppressing the occurrence of insufficient image contrast when matching the gradation between images in stereo matching. [Brief explanation of the drawing]

[0010] [Figure 1] This is a schematic block diagram showing the configuration of one embodiment of a stereo camera device. [Figure 2] This is a schematic diagram illustrating the conversion method of grayscale conversion curves used in conventional technology. [Figure 3] This is a schematic diagram illustrating the third grayscale conversion curve set for the second camera 120. [Figure 4] This is a schematic diagram illustrating the fourth grayscale conversion curve set for the second camera 120. [Figure 5] This is a schematic diagram illustrating the fifth tone conversion curve set for the second camera 120. [Figure 6] This flowchart outlines the processing flow performed by the image recognition processing unit 200. [Figure 7] This is a table showing examples of gradation conversion curve settings when using a camera where the bit depth or frame rate of each pixel cannot be changed. [Figure 8] This is a table showing examples of gradation conversion curve settings when using a camera that allows changing the bit depth and frame rate for each pixel. [Figure 9] This is a schematic block diagram showing the system configuration for calibrating sensitivity correction values. [Figure 10] This flowchart outlines the process performed during the calibration of sensitivity correction values. [Figure 11] This is a schematic diagram illustrating an example of the relationship between a brightness chart and the pixel values ​​of an image obtained by imaging the brightness chart. [Modes for carrying out the invention]

[0011] Hereinafter, representative embodiments of the present invention will be described with reference to the drawings. Note that the embodiments and drawings described below are examples for explaining the present invention, and for the sake of clarity of explanation, appropriate omissions or simplifications are made. Also, note that in order to facilitate understanding of the invention, the positions, sizes, shapes, ranges, etc. of the respective components shown in the drawings may not necessarily represent them accurately.

[0012] FIG. 1 is a schematic block diagram showing the configuration in an embodiment of a stereo camera device to which the present invention is applied.

[0013] The stereo camera device 10 includes an image recognition processing unit 100, a first camera 110, and a second camera 120b.

[0014] The image recognition processing unit 100 is connected to each of the first camera 110 and the second camera 120 so as to be able to exchange information therewith, controls the first camera 110 and the second camera 120, and acquires images captured by the first camera 110 and the second camera 120. The image recognition processing unit 100 is also connected to an external device (not shown), exchanges information with the connected external device, and outputs the result of recognition processing performed based on the images captured by the first camera 110 and the second camera 120 and the captured images.

[0015] The first camera 110 includes a lens 111, an image sensor 112, a tone conversion unit 113, and an interface unit (IF unit) 114. The second camera 120 also has the same configuration as the first camera 110 and includes a lens 121, an image sensor 122, a tone conversion unit 123, and an IF unit 124.

[0016] Lenses 111 and 121 form images of the incident light on image sensors 112 and 122 respectively. Lens 121 has a shorter focal length and a smaller aperture F-number than lens 111. The F-number is an index indicating the brightness of a lens, obtained as the value of the effective lens aperture divided by the focal length. The smaller the value, the greater the amount of light obtained on the light-receiving surface of the image sensor. If the lens apertures are the same, a lens with a longer focal length has a larger F-number and the amount of light obtained by the image sensor is smaller.

[0017] Image sensors 112 and 122 have a plurality of pixels arranged in an array on their surfaces. Each pixel of image sensors 112 and 122 outputs a voltage value correlated with the amount of light incident on the pixel, whereby a pixel signal with an intensity corresponding to the luminance of the imaging target can be obtained. Image sensors 112 and 122 have an amplifier for amplifying the pixel signal acquired by each pixel. The gain of the amplifier can be changed, and the intensity of the pixel signal changes according to the gain value. The pixel signal amplified by the amplifier is converted into a digital value by an analog-to-digital converter and is acquired as, for example, 24-bit pixel information.

[0018] The gradation conversion units 113 and 123 correct the pixel information of each pixel so that the sensitivity of each pixel in the image acquired by the image sensors 112 and 122 matches the reference sensitivity, and acquire image information with pixel information represented by digital values ​​with fewer bits than the pixel information acquired by the image sensors 112 and 122, such as 12 bits or 16 bits, according to a gradation conversion curve that is set in advance as gradation conversion information. In this specification, the pixel information after conversion by the gradation conversion units 113 and 123 is referred to as the pixel value to distinguish it from the image information output from the image sensors 112 and 122. Furthermore, the image information can be used to display an image on a display device such as a computer, and the pixel value of the image information corresponds to the brightness of each pixel in the displayed image. In this specification, for ease of understanding, image information is simply referred to as the image. Furthermore, in the following, the number of bits in the pixel value will be explained as indicating the bit depth related to the brightness of the pixels in the image represented by the acquired image information.

[0019] The reference sensitivity is the sensitivity of pixels at peripheral positions on the optical axis of the image sensors 112 and 122. The grayscale conversion units 113 and 123 correct the sensitivity of each pixel so that the sensitivity of each pixel matches the reference sensitivity, taking into account the sensitivity differences of each pixel in the image sensors 112 and 122 and the effect of light attenuation around the lenses 111 and 121. The sensitivity correction value used to correct the sensitivity of each pixel is set in advance by calibration.

[0020] The gradation conversion curve is represented graphically, with the horizontal axis representing the intensity of the pixel signal and the vertical axis representing the output pixel value. For example, using a hybrid log-gamma as the gradation conversion curve allows for enriching the gradation in dark areas and reducing the gradation in bright areas, thereby maintaining contrast while converting to a low-bit pixel value. In this embodiment, the conversion curve is stored in the gradation conversion units 113 and 123 in the form of a lookup table that holds the output low-bit pixel value corresponding to the input 24-bit pixel information.

[0021] By representing the grayscale conversion curve, for example, with an interval linear function, the amount of data transferred during setup and the capacity of the storage device used as a lookup table can be reduced.

[0022] In this embodiment, the tone conversion unit 113 is pre-set with, for example, a first tone conversion curve that follows hybrid log-gamma. In addition, the tone conversion unit 123 is set with a second tone conversion curve that follows hybrid log-gamma and has similar characteristics to the first tone conversion curve, as well as one or more tone conversion curves with different characteristics that are used to adjust the brightness sensitivity according to the imaging environment so that the same brightness becomes the same pixel value as the first camera 110. These tone conversion curves are switched and used according to instructions from the image recognition processing unit 200.

[0023] The IF units 114 and 124 are external interfaces and, in this embodiment, are connected to the image recognition processing unit 200. The first camera 110 and the second camera 120 output images obtained by converting pixel information in the grayscale conversion units 113 and 123 to the image recognition processing unit 200 via the IF units 114 and 124. The IF units 114 and 124 also receive setting information from the image recognition processing unit 200 for setting the grayscale conversion curve used in the grayscale conversion units 113 and 123, an imaging trigger that indicates when imaging should be initiated, and / or setting information for the number of bits per pixel of the output image and the frame rate, which is the imaging period.

[0024] The image recognition processing unit 200 includes camera interface units (camera IF units) 201 and 202, a sensitivity correction value storage unit 203, an exposure amount calculation unit 204, a camera image control unit 205, an imaging trigger control unit 206, a grayscale control unit 207, an image conversion unit 208, a stereo recognition unit 209, a monocular recognition unit 210, a gaze area determination unit 211, and a communication interface unit (communication IF unit) 212.

[0025] The camera IF units 201 and 202 are connected to the first camera 110 and the second camera 120, respectively. The image recognition processing unit 200 acquires images captured by the first camera 110 and the second camera 120 via the camera IF units 201 and 202, and also transmits various setting information and imaging triggers to the first camera 110 and the second camera 120.

[0026] The sensitivity correction value storage unit 203 stores sensitivity correction values ​​used in the sensitivity correction of the image sensors 112 and 122 performed in the tone conversion units 113 and 123 of the first camera 110 and the second camera 120. The sensitivity correction values ​​are acquired, for example, through calibration work performed on a manufacturer's production line or maintenance factory, and stored in the sensitivity correction value storage unit 203 via the communication IF unit 212. The sensitivity correction values ​​stored in the sensitivity correction value storage unit 203 are set in the tone conversion units 113 and 123 of the first camera 110 and the second camera 120, respectively, via the camera IF units 201 and 202.

[0027] The exposure calculation unit 204 calculates the exposure amounts for the first camera 110 and the second camera 120 based on the pixel values ​​of the images acquired via the camera IF units 201 and 202, and determines whether the exposure is underexposed or overexposed based on the brightness of the subject, and outputs the result. For example, it looks at the average and variance of the pixel values ​​of the coordinates corresponding to the road surface to determine whether the average value of the pixel values ​​of the road surface is appropriate, or whether the contrast of the road surface is sufficient.

[0028] The camera image control unit 205 determines the bit depth and frame rate for each pixel of the image acquired from the first camera 110 and the second camera 120, respectively. The determined bit depth is sent to the first camera 110 and the second camera 120 via the camera IF units 201 and 202, and is set as the bit depth for each pixel of the image captured by each camera. The frame rate is used by the imaging trigger control unit 206 to generate a trigger signal that determines the timing of image capture.

[0029] The imaging trigger control unit 206 generates a trigger signal that triggers imaging at a period according to the frame rate determined by the camera image control unit 205, and transmits it to the first camera 110 and the second camera 120, respectively, via the camera IF units 201 and 202.

[0030] The timing of trigger signal generation and transmission is determined in the stereo recognition unit 209 so that when images captured by both the first camera 110 and the second camera 120 are used, the lines containing predetermined pixels of the image sensor are scanned at the same timing in each camera. For example, if the frame rate of the first camera 110 is 20fps (frames per second) and the frame rate of the second camera 120 is 10fps, and imaging is performed at different frame rates, the first camera 110 will capture an image at the same timing as the second camera 120 every other time. When the imaging timing of the two cameras is the same, the trigger signal is generated and transmitted so that the lines containing predetermined pixels are scanned at the same timing. On the other hand, when imaging is performed only by the first camera 110, the trigger signal is generated and transmitted at a timing simply determined by the frame rate.

[0031] A line containing a predetermined pixel is, for example, a line containing pixels on the image sensor corresponding to the center or vanishing point of each camera's lens, and it is desirable that such lines be scanned at the same time. By generating and transmitting a trigger signal at this timing, the image regions used for stereo matching are captured at approximately the same time, which suppresses the inclusion of image movement due to changes in camera orientation or position into parallax and reduces the occurrence of distance errors.

[0032] The tone control unit 207 determines the tone conversion curve used by the tone conversion unit 123 according to the exposure determination result of the exposure amount calculation unit 204, and instructs the second camera 120 to switch the tone conversion curve to be used via the camera IF unit 202. For example, since the road surface does not produce much contrast, by selecting a tone conversion curve that can allocate more tones to the road surface, the effective parallax of the road surface used in the matching process increases, making it easier to detect the shape of the road surface and irregularly shaped objects on the road surface.

[0033] The image conversion unit 208 converts the tonal range of images captured by the first camera 110 and the second camera 120 so that they have a tonal range suitable for processing performed by the stereo recognition unit 209 and the monocular recognition unit 210. For example, if two images captured by the first camera 110 and the second camera 120 are used by the stereo recognition unit 209, the tonal range is converted in both images so that the pixel values ​​of parts where the brightness of the same object is the same are the same. In this embodiment, the tonal range conversion performed by the image conversion unit 208 is performed using a lookup table according to the tonal range conversion curve used by the tonal range conversion units 113 and 123.

[0034] In the image conversion unit 208, for example, the image captured by the first camera 110 is not converted, but the pixel value of the image captured by the second camera 120 is multiplied by the ratio of the brightness of the first camera 110 to that of the second camera 120. The brightness ratio is the product of the ratio of the pixel value on the gradation conversion curve set for the second camera 120 to the pixel value on the conversion curve set for the first camera 110 for the same amount of light, the ratio of the reference sensitivity and gain of the image sensor 112 to that of the image sensor 122, and the ratio of the F-number of lens 111 to that of lens 121.

[0035] The stereo recognition unit 209 performs a parallelization process on two images captured by the first camera 110 and the second camera 120, which have undergone grayscale conversion by the image conversion unit 208, so that the epipolar lines of the two images are parallel to each other, and converts them so that the same object is captured with a horizontal shift for each distance of depth. The stereo recognition unit 209 performs a matching process using the two parallelized images to measure how much the same object is shifted in the two images and obtain the distance to the object. For example, ZSAD (Zero mean Sum of Absolute Differences) can be used for the matching process.

[0036] The stereo recognition unit 209 further identifies the object whose distance has been acquired, assigns a number to the identified object to identify it as the same object, tracks the object, and acquires the object's speed of movement based on the acquired distance to the object, thereby recognizing what kind of object it is. For example, when the stereo camera device 10 is mounted on a vehicle such as an automobile and applied to a driver assistance system or preventive safety system that assists the driver's driving operations, the stereo recognition unit 209 detects and identifies vehicles and pedestrians ahead, and recognizes, for example, whether the identified vehicle is subject to automatic braking, whether it is subject to following the vehicle, or, if the identified object is a person, whether it is a pedestrian that poses a risk of suddenly running into the road.

[0037] The monocular recognition unit 210 detects objects captured by the first camera 110 and the second camera 120 and identifies the detected objects. For objects in areas not captured by the first camera 110 but only captured by the second camera 120, the monocular recognition unit 210 calculates the distance by monocular distance measurement. Identified objects are assigned a number to identify them as the same object, similar to the stereo recognition unit 209. Based on this number, the monocular recognition unit 210 tracks the identified object, acquires the object's movement speed, etc., similar to the stereo recognition unit 209, and recognizes what kind of object it is. In particular, the monocular recognition unit 210 can detect pedestrians and vehicles in wide-angle areas that cannot be captured by the first camera, and is used to recognize pedestrians and vehicles in these wide-angle areas.

[0038] The gaze area determination unit 211, when an object is recognized or tracked by the stereo recognition unit 209 or the monocular recognition unit 210, determines the gaze area within the image based on the field of view and pixel values ​​of the object, and sets the gaze area.

[0039] When the stereo camera device 10 is applied to, for example, a driver assistance system or a preventive safety system, if the stereo recognition unit 209 or monocular recognition unit 210 recognizes a vehicle that is the target of automatic braking or following, a fixation area is set near the center of the image showing the vehicle in front. The fixation area is also determined based on the position and pixel value of the object in the image. Furthermore, if, for example, a pedestrian posing a risk of suddenly appearing is recognized in the wide-angle area outside the illumination range of the headlights, the wide-angle area can be set as the fixation area.

[0040] The fixation area can be set according to the field of view and brightness in which the recognized object is captured, even in applications other than in-vehicle use. The fixation area may be set to a wide area, such as the central or wide-angle portion, depending on the field of view, or it may be set to a narrow area in which a specific object is captured. When a wide area is set as the fixation area, the exposure is made appropriate over a wide range, resulting in appropriate gradation of pixels within that range, which is advantageous for detecting new objects. On the other hand, when a narrow area in which a recognized object is located is set as the fixation area, the exposure is set to be specialized for that object, allowing for detailed imaging and recognition of the object. The method of setting the area can be selected depending on the camera's application.

[0041] The communication IF unit 212 is an interface for communicating with devices outside the stereo camera system. The communication IF unit 212 is connected to an external device, for example, via a network (not shown), receives information sent from the external device, and transmits the results of recognition processing by the stereo recognition unit 209 and the monocular recognition unit 210, as well as captured images, to the external device.

[0042] The image recognition processing unit 200 is actually configured as a processing unit equipped with a device such as a microprocessing unit (MPU) which includes a calculation unit (not shown), memory used by the calculation unit, and an input / output unit (I / O unit) for inputting and outputting data by the calculation unit. The calculation unit may be a so-called CPU (Central Processing Unit) or GPU (Graphics Processing Unit), and the functions of each part described above are realized by the calculation unit executing a program stored in memory. The sensitivity correction values ​​stored by the sensitivity correction value storage unit 203 and the data used by the calculation unit for processing can be stored, for example, in the memory of the MPU, or in a storage device accessible from the MPU, such as flash memory or a hard disk drive. Furthermore, some or all of the functions of each part described above may be realized by hardware using FPGAs (Field Programmable Gate Arrays), ASICs (Application Specific Integrated Circuits), or CPLDs (Complex Programmable Logic Devices).

[0043] Figure 2 is a schematic diagram illustrating the conversion method of the grayscale conversion curve used in conventional technology.

[0044] Here, we consider a case where the left and right cameras constituting a telephoto camera have different f-numbers, and for example, the tone conversion curve of the camera with the smaller f-number is transformed so that the brightness sensitivity of both cameras becomes the same and the pixel value is the same. In Figure 2, curve 300 shows the tone conversion curve of the camera with the smaller f-number before transformation. Light quantity a is the light quantity that results in the maximum possible value for the pixel value (hereinafter referred to as the maximum pixel value) in the state before the tone conversion curve is transformed. Light quantity b is the light quantity that results in the maximum pixel value for the camera with the larger f-number.

[0045] In conventional technology, when adjusting the amount of light that results in the maximum pixel value in the camera with a smaller F-number to the amount of light b that results in the maximum pixel value in the camera with a larger F-number, the tone conversion curve is converted to a tone conversion curve 301 in which the entire curve is compressed to the left by using the position of the amount of light a that results in the maximum pixel value in the tone conversion curve 300 before conversion as a clipping point.

[0046] This conversion causes the gradient conversion curve 301 to have a steeper slope in low-light regions, allowing more tones to be assigned to changes in light intensity. However, especially when the gradient conversion curve is based on hybrid log-gamma, the slope of the gradient conversion curve 301 becomes smaller than that of the gradient conversion curve 300 near the connection point between gamma and log, which is enclosed by the dashed line in Figure 2. As a result, in some light intensity regions, the gradient conversion curve 301 may lose tones and reduce contrast compared to the original gradient conversion curve 300. This could lead to insufficient contrast during stereo matching processing, potentially preventing the calculation of parallax.

[0047] Figure 3 is a schematic diagram illustrating the third grayscale conversion curve set for the second camera 120.

[0048] Curve 400 shows the tone conversion curve generated for the second camera 120 according to hybrid log-gamma. In this embodiment, tone conversion curve 400 is a reference tone conversion curve set for the second camera 120 as the second tone conversion curve.

[0049] Curve 401 represents the third tone conversion curve set for the second camera 120. Similar to the conventional technology, the third tone conversion curve 401 is converted so that the maximum pixel value A is obtained at the light intensity b that corresponds to the maximum pixel value on the first tone conversion curve set for the first camera 110. In this case, instead of compressing the second tone conversion curve 400 to the left, the origin is fixed and the conversion is performed so that the pixel value B corresponding to light intensity b on the second tone conversion curve 400 becomes the maximum pixel value A. In other words, the third tone conversion curve 401 is generated by using the point on the second tone conversion curve 400 where light intensity b and pixel value B are clipping points, and converting the curve in the region where the light intensity is less than light intensity b to be stretched vertically by the same ratio so that the pixel value becomes A.

[0050] The third tone conversion curve 401 generated in this way slopes more overall than the second tone conversion curve 400 in dark areas where the light intensity is less than light intensity b. This means that the change in pixel value in response to the change in light intensity is larger, suppressing a decrease in contrast. As a result, the amount of light at which the pixel value becomes the maximum pixel value A can be matched to the amount of light at which the first camera 110 becomes the maximum pixel value, while avoiding the issue of insufficient contrast in stereo matching processing. Note that in areas where the light intensity exceeds light intensity b, the third tone conversion curve becomes saturated, with the pixel value fixed at the maximum pixel value A.

[0051] Figure 4 is a schematic diagram illustrating the fourth grayscale conversion curve set for the second camera 120.

[0052] The fourth tone conversion curve 402 focuses on the pixel value C, which corresponds to the light intensity c at the upper end of the range where tone gradation is to be secured in dark places, in the second tone conversion curve 400. Using the point between light intensity c and light intensity b, which corresponds to the maximum pixel value in the first camera 110, as a clipping point, the curve between light intensity c and b is translated upward. As a result, the pixel value at light intensity b shifts to the maximum pixel value A, and the pixel value at light intensity c shifts to pixel value D. In the region below light intensity c, the entire curve is stretched upward by the same ratio in accordance with the ratio of pixel values ​​C and D.

[0053] The fourth tone conversion curve 402 generated in this way has a steeper slope than the second tone conversion curve 400 in dark areas where the light intensity is less than light intensity c, resulting in richer tonality. On the other hand, in bright areas where the light intensity is greater than light intensity c, the slope of the fourth tone conversion curve 402 becomes the same as the slope of the second tone conversion curve 400, maintaining tonality and preventing a lack of contrast. Therefore, the fourth tone conversion curve 402 can also avoid the issue of insufficient contrast during stereo matching processing.

[0054] The fourth tone conversion curve 402 has a steeper slope than the third tone conversion curve 401 in dark areas with low light intensity, resulting in richer tonal gradation in dark areas. Therefore, it can be used when imaging is required in particularly dark places and the contrast is insufficient even with the second tone conversion curve. Note that, similar to the third tone conversion curve, in areas where the light intensity exceeds light intensity b, the pixel value of the fourth tone conversion curve is fixed to the maximum pixel value A, resulting in a saturated state.

[0055] Figure 5 is a schematic diagram illustrating the fifth grayscale conversion curve set for the second camera 120.

[0056] In the fifth tone conversion curve 403, the number of bits in the pixel value is increased compared to the number of bits in the pixel value used in the second tone conversion curve 400, so that the pixel value at the light intensity a that results in the maximum pixel value A in the second tone conversion curve 400 becomes the pixel value E. By increasing the number of bits used in the pixel value, the entire tone conversion curve can be stretched upward, and the overall slope can be increased compared to the second tone conversion curve 400, making it possible to increase the contrast across the entire range. As a result, it is possible to suppress the lack of contrast compared to the first camera 110, especially in dark areas with low light levels.

[0057] The fifth grayscale conversion curve 403 can be used when the number of bits per pixel of the second camera 120 can be made larger than the number of bits per pixel of the first camera 110. However, increasing the number of bits per pixel will increase the amount of data transferred between the second camera 120 and the image recognition processing unit 200, which may result in insufficient bandwidth in the communication channel between the IF unit 124 and the camera IF unit 202. In such cases, the frame rate of the second camera 120 can be reduced to suppress the overall increase in the amount of data transferred and compensate for the bandwidth shortage.

[0058] If the frame rate of the second camera 120 is reduced, it is preferable to set the frame rate so that, once every few images captured by the first camera 110, the second camera 120 captures an image in sync with the imaging timing of the first camera 110. This enables stereo matching processing using images captured by both cameras at the same timing. The frame rate can be set in the camera image control unit 205 in accordance with the switching of the gradation conversion curve used, which is performed in the gradation control unit 207. Images captured by the first camera 110 when the second camera 120 is not capturing an image are used in object detection, identification, and tracking by the monocular recognition unit 210, enabling early detection, identification, and tracking of objects at a high frame rate, and maintaining calculations such as speed with high accuracy.

[0059] The second through fifth tone conversion curves may all be used, or they may be used selectively depending on the camera being used.

[0060] Figure 6 is a flowchart illustrating the general flow of processing performed by the image recognition processing unit 200.

[0061] First, the image recognition processing unit 200 determines the tone conversion curve to be used by the second camera 120 in the tone control unit 207 and instructs the second camera 120 to set the tone conversion curve to be used via the camera IF units 201 and 202. The second camera 120 then sets the tone conversion unit 123 to use the determined tone conversion curve.

[0062] In this embodiment, if the second camera 120 does not allow changes to the bit depth of each pixel or the imaging frame rate, the third and fourth tone conversion curves described above are used in addition to the reference second tone conversion curve. Furthermore, if the second camera 120 allows changes to the bit depth of each pixel or the imaging frame rate, the fifth tone conversion curve described above is used in addition to the second tone conversion curve, and the bit depth of the pixels in the output image is changed according to the tone conversion curve used.

[0063] Figure 7 is a table showing examples of gradation conversion curve settings when using a camera that does not allow changing the bit depth or frame rate of each pixel.

[0064] Each row in the table corresponds to the exposure state of the fixation area in the compound field of view that can be imaged by both the first camera 110 and the second camera 120, and each column corresponds to the exposure state of the wide-angle field of view, which is a monocular field of view imaged by only the second camera.

[0065] If the exposure in the wide-angle area is appropriate (for example, the average pixel value in the wide-angle area is greater than half of the maximum pixel value Pmax), a second tone conversion curve similar to the tone conversion curve of the first camera 110 is determined to be used, regardless of the presence or absence of a fixation area in the binocular field of view or the amount of exposure.

[0066] If the exposure in the wide-angle area is insufficient (for example, the pixel value in the wide-angle area is less than half of Pmax but greater than one-quarter), and there is no fixation area in the binocular field of view, or if there is, the exposure is excessive (for example, the average pixel value in the fixation area is greater than two-thirds of Pmax), then the third tone conversion curve is determined to be used. Also, even if the exposure in the wide-angle area is insufficient, if there is a fixation area in the binocular field of view and its exposure is appropriate or insufficient (for example, the average pixel value in the fixation area is less than two-thirds), then the second tone conversion curve is determined to be used.

[0067] If the exposure in the wide-angle area is significantly underexposed (for example, the average pixel value in the wide-angle area is less than one-quarter of Pmax), and there is no fixation area in the binocular field of view, or if there is, the exposure is excessive, the fourth tone conversion curve is determined to be used. Also, even if the exposure in the wide-angle area is significantly underexposed, if there is a fixation area in the binocular field of view and its exposure is appropriate or insufficient, the second tone conversion curve is determined to be used.

[0068] Figure 8 is a table showing examples of gradation conversion curve settings when using a camera that allows changing the bit depth and frame rate for each pixel.

[0069] Each row in the table corresponds to the exposure state of the fixation area in the compound eye field that can be imaged by both the first camera 110 and the second camera 120, as in Figure 7, and each column corresponds to the exposure state of the wide-angle area that is imaged by the second camera only.

[0070] If the exposure of the wide-angle portion is appropriate, a second tone conversion curve similar to the tone conversion curve of the first camera 110 is determined to be used, regardless of the presence or absence of a fixation area in the compound eye field or the amount of exposure.

[0071] If the exposure in the wide-angle area is insufficient (or significantly insufficient), and there is no fixation area in the binocular field of view, or if there is an area that is overexposed, the fifth tone conversion curve is determined to be the tone conversion curve to be used. Also, even if the exposure in the wide-angle area is insufficient, if there is a fixation area in the binocular field of view and its exposure is appropriate or insufficient, the second tone conversion curve is determined to be the tone conversion curve to be used.

[0072] Regardless of whether the camera can change the bit depth or frame rate of each pixel, at the start of image capture, a second tone conversion curve, which is a reference tone conversion curve, is set as the tone conversion curve used by the tone conversion unit 123 (step S100).

[0073] Returning to Figure 6, the image recognition processing unit 200, using the camera image control unit 205, determines the number of bits per pixel and the frame rate of the image captured by each camera based on the grayscale conversion curve used by the second camera 120. For example, if the grayscale conversion curve used by the second camera 120 is one of the first to fourth grayscale conversion curves, the number of bits per pixel and the frame rate of the images captured by the first camera 110 and the second camera 120 are determined to be the same value, for example, 8 bits and 20 fps. If the grayscale conversion curve used by the second camera 120 is the fifth grayscale conversion curve, the number of bits per pixel and the frame rate of the image from the second camera are determined to be, for example, 16 bits and 10 fps. As a result, the second camera 120 acquires an image with twice the number of bits as the image acquired by the first camera 110 every other time the first camera 110 takes an image.

[0074] Furthermore, if there are objects that are being identified and tracked by the stereo recognition unit 209 and the monocular recognition unit 210, the image recognition processing unit 200 may, for example, determine whether sufficient contrast is obtained for these objects, and if sufficient contrast is obtained, it may be configured to increase the frame rate (step S110).

[0075] The imaging trigger control unit 206 generates trigger signals to be transmitted to the first camera 110 and the second camera 120 according to the determined frame rate. The image recognition processing unit 200 transmits the generated trigger signals to the first camera 110 and the second camera 120 via the camera IF units 201 and 202 to perform imaging. The captured images are acquired from the first camera 110 and the second camera 120 via the camera IF units 201 and 202 (step S120).

[0076] When the image recognition processing unit 200 acquires images captured by the first camera 110 and the second camera 120 via the camera IF units 201 and 202, if these images are to be used in the stereo matching processing unit, the image conversion unit 208 converts the gradation of the images captured by the first camera 110 and the images captured by the second camera 120 so that they have a gradation suitable for processing performed in the stereo recognition unit 209 and the monocular recognition unit 210 (step S130).

[0077] The image whose gradation has been converted in the gradation conversion unit is sent to the stereo recognition unit 209 and the monocular recognition unit 210, where object detection, identification, and tracking are performed (step S140). Then, based on the object identification and tracking results from the stereo recognition unit 209 and the monocular recognition unit 210, the gaze area determination unit 211 sets the gaze area (step S150).

[0078] The image recognition processing unit 200 further determines whether the exposure is excessive or insufficient in the exposure amount calculation unit 204. Excessive and insufficient exposure is determined, for example, by looking at the pixel values ​​of a certain range of the image; if the average value is less than or equal to half of the maximum pixel value, the exposure is insufficient, and if it is less than or equal to one-quarter, the exposure is excessively insufficient. The range of the image used to determine the exposure can be, for example, a range pre-set as the area in which the road surface is visible, or a range in which pedestrians are visible in the wide-angle section. If a gaze area is set in the gaze area determination unit 211, the range including the area set as the gaze area may be used to determine the exposure.

[0079] If there are multiple image ranges used to determine exposure, and the dynamic range of the image sensors 112 and 122 is narrow, making it impossible to achieve an appropriate exposure across all ranges, the image range for which exposure is appropriate may be switched for each imaging frame, allowing for multiple images to be captured with different exposure levels. In this case, if the image is to be used in stereo matching processing, the exposure levels of the first camera 110 and the second camera 120 are changed in conjunction.

[0080] Since the pixel values ​​change when the tone conversion curve is switched, it is preferable that the exposure is determined by the pixel values ​​of the image to which a specific tone conversion curve, for example, a reference first conversion curve, is applied, in order to prevent hunting from occurring when the tone conversion curve is switched (step S160).

[0081] If the exposure is deemed appropriate, the image recognition processing unit 200 returns to step S120, and the next frame is captured. If the exposure is not appropriate, the process returns to step S100, and after settings such as the grayscale conversion curve and frame rate are changed, the next image is captured (step S170). The image recognition processing unit 200 is configured to continuously perform the above-described processing until an event occurs that requires the processing to be stopped, such as the power supply to the stereo camera device 10 being cut off.

[0082] Figure 9 is a schematic block diagram showing the system configuration for calibrating the sensitivity correction value of the stereo camera device 10.

[0083] The stereo camera device 10 is calibrated, for example, at a manufacturer's production line or maintenance factory, to check various camera parameters, including the sensitivity correction value mentioned above. For calibration of the sensitivity correction value, a brightness chart 20 with multiple areas of different brightness levels and a uniform brightness light source 30 capable of covering the entire field of view captured by the first camera 110 and the second camera 120 with uniform brightness are used. In addition, a calibration device 40 is connected to the communication IF unit 212 of the image recognition processing unit 200.

[0084] The calibration device 40 is connected to a communication IF unit 212 and has an interface that allows information to be exchanged with the communication IF unit 212. It has a calculation function that realizes the function of a sensitivity correction value calculation unit 50 that calculates a sensitivity correction value by program processing. The calibration device 40 can be realized by a general-purpose calculation device such as a so-called personal computer, server, or tablet device. The calibration device 40 may also be configured as a dedicated device.

[0085] For calibration of the sensitivity correction value, first, the field of view is illuminated by the light source 30 so that the entire field of view is captured with uniform brightness by the first camera 110 and the second camera 120. The field of view illuminated by the light source 30 is captured by the first camera 110 and the second camera 120, the difference in sensitivity of each pixel of the image sensors 112 and 122 is measured, and a correction value is calculated based on the intensity of the pixel signal of the pixel at the optical axis position.

[0086] Figure 10 is a flowchart illustrating the process performed during the calibration of sensitivity correction values.

[0087] In the calibration of the sensitivity correction value, first, the field of view illuminated by the light source 30 is captured by the first camera 110 and the second camera 120, respectively. The capture may be performed simultaneously by the first camera 110 and the second camera 120, or separately. The images captured by the first camera 110 and the second camera 120 are sent to the calibration device 40 via the communication IF unit 212 (step S200).

[0088] In the calibration device 40, the sensitivity correction value calculation unit 50, using the pixel values ​​near the optical axis of the captured image as a reference, reduces the sensitivity correction value for pixels that are brighter than the reference pixel value and increases the sensitivity correction value for pixels that are darker than the sensitivity correction value already set in the sensitivity correction value storage unit 203. Furthermore, if the reference pixel values ​​differ between the first camera 110 and the second camera 120, the ratio is calculated, and the sensitivity correction values ​​are modified using the calculated ratio so that the sensitivities of the first camera 110 and the second camera 120 are the same, and these are acquired as new sensitivity correction values ​​for each camera (step S210).

[0089] The sensitivity correction value acquired in step S210 is sent from the calibration device 40 to the image recognition processing unit 200 and stored in the sensitivity correction value storage unit 203. Furthermore, this sensitivity correction value is sent to the first camera 110 and the second camera 120 and set in the grayscale conversion units 113 and 123 (step S220).

[0090] After new sensitivity correction values ​​are set for the first camera 110 and the second camera 120, imaging is performed again by the first camera 110 and the second camera 120. The calibration device 40 acquires the image captured by the image recognition processing unit 200 (step S230), and the sensitivity correction value calculation unit 50 checks whether the variation in pixel values ​​of the entire image falls within a predetermined range (step S240).

[0091] If the results of the check in step S240 show that the variation in pixel values ​​of the images captured by both cameras is within a predetermined range, the calibration process is completed. If the variation in pixel values ​​is not within the predetermined range, the process returns to step S210, and the sensitivity correction value is calculated again. If the newly acquired sensitivity correction value fluctuates up and down, the process is completed without returning to step S210 (step S250).

[0092] Figure 11 is a schematic diagram showing an example of the relationship between a luminance chart and the pixel values ​​of an image obtained by imaging the luminance chart.

[0093] The luminance chart 20 is a chart with varying brightness levels that allows for imaging at different luminance levels. By imaging the luminance chart 20 with the first camera 110 and the second camera 120, the relationship between the amount of light and the pixel value corresponding to each brightness level of the luminance chart 20 can be obtained from the resulting images, as shown in graph 500.

[0094] Graph 500 shows the characteristics in the region where the amount of light is less than the minimum amount of light that results in the maximum pixel value in the image captured by the first camera 110, which has a long focal length and a large F-number. The characteristics indicated by the black circles 501 show the relationship between the amount of light and the pixel value in the image captured by the first camera 110, and the characteristics indicated by the white circles show the relationship between the amount of light and the pixel value in the image captured by the second camera 120.

[0095] From Graph 500, it can be seen that, especially in the dark areas with low light levels to the left of the middle of the horizontal axis of the graph, the image captured by the first camera 110, which has a longer focal length and a larger F-number, shows a greater change in pixel values ​​in response to light levels and therefore has richer tonal gradation compared to the second camera 120, which has a shorter focal length and a smaller F-number.

[0096] By verifying the characteristics of the captured image using such a brightness chart, it is possible to verify whether the expected image can be obtained by the first camera 110 and the second camera 120. The image recognition processing unit 200 may be configured to verify whether the image captured by the second camera 120 using the third and fourth grayscale conversion curves is as expected when, for example, the calibration device 40 is connected and image capture is instructed. Specifically, imaging is performed using the third and fourth grayscale conversion curves, and for example, the grayscale control unit 207 compares the amount of change in pixel value with respect to the amount of change in brightness in the area of ​​the image captured by the second camera 120 that is darker than a predetermined brightness, with that of the image captured by the first camera. As a result of the comparison, the grayscale control unit 207 can output to the calibration device 40 the verification result as pass if the amount of change in pixel value with respect to the amount of change in brightness in the image captured by the second camera 120 is larger, and as fail otherwise. During calibration, images are captured using a brightness chart. Therefore, for each image, the difference in pixel values ​​between adjacent patterns in areas darker than a predetermined brightness on the brightness chart can be obtained and compared. For example, if the difference in pixel values ​​between patterns in the image captured by the second camera is greater than that in the image captured by the first camera, the verification result is considered successful.

[0097] In the embodiments described above, the third and fourth tone conversion curves and the fifth tone conversion curve were used interchangeably depending on the specifications of the camera being used. However, for example, in a camera where the number of pixels and the frame rate can be changed, the third and fourth tone conversion curves may be used in addition to the fifth tone conversion curve. Furthermore, even in a camera where the frame rate cannot be changed, if sufficient bandwidth can be secured for the transfer of high-bit images, the fifth tone conversion curve can be used and the number of pixels can be changed in accordance with the tone conversion curve.

[0098] According to the embodiments described above, in a stereo camera device using cameras with different focal lengths, it is possible to match the tonal gradation of the pixel values ​​of the left and right images in stereo matching while maintaining the advantage of the shorter focal length camera's performance in low light conditions.

[0099] Although the present invention has been described above using representative embodiments as examples, the present invention is not limited thereto and can be implemented in various ways without departing from the spirit of the invention as described in the claims. Furthermore, the embodiments described above are explained in detail for the purpose of clearly illustrating the present invention and are not necessarily limited to those having all the configurations described. [Explanation of Symbols]

[0100] 10...Stereo camera device, 110...First camera, 120...Second camera, 111, 121...Lens, 112, 122...Image sensor, 113, 123...Gradation conversion unit, 114, 124...IF unit, 200...Image recognition processing unit, 201, 202...Camera IF unit, 203...Sensitivity correction value storage unit, 204...Exposure amount calculation unit, 205...Camera image control unit, 206...Imaging trigger control unit, 207...Gradation control unit, 208...Image conversion unit, 209...Stereo recognition unit, 210...Monocular recognition unit, 211...Looking area determination unit, 212...Communication IF unit

Claims

1. A first camera comprising: a first lens; a first image sensor that converts and outputs electrical pixel information corresponding to the amount of light incident through the first lens; and a first gradation conversion unit configured to convert the pixel information output from the first image sensor using preset first gradation conversion information and acquire a first image having the pixel values ​​obtained by the conversion; A second camera comprising: a second lens having a smaller aperture F-number than the first lens; a second image sensor that converts and outputs electrical pixel information corresponding to the amount of light incident through the second lens; and a second tone conversion unit configured to convert the pixel information output from the second image sensor using tone conversion information selected from a plurality of preset tone conversion information, including second tone conversion information having similar characteristics to the first tone conversion information, and to acquire a second image having the pixel values ​​obtained by the conversion; An exposure amount calculation unit configured to acquire the first image and the second image, and to acquire the exposure amount of a predetermined imaging area in each of the first image and the second image, A tone control unit is configured to select tone conversion information to be used by the second tone conversion unit from among the plurality of tone conversion information based on the exposure amount calculated by the exposure amount calculation unit, A stereo recognition unit configured to perform image recognition based on stereo field-of-view images captured in both the first and second images, It has, A stereo camera device in which the plurality of grayscale conversion information includes a third grayscale conversion information for performing the conversion such that, in a region where the amount of light is less than the amount of light corresponding to the maximum possible pixel value of the image acquired based on the first grayscale conversion information, the amount of change in the pixel value in relation to the amount of light change in the image acquired using the second grayscale conversion information is greater than or equal to the amount of change in the pixel value in relation to the amount of light change in the image acquired using the second grayscale conversion information.

2. The stereo camera device according to claim 1, wherein the first lens and the second lens have different focal lengths.

3. The system further includes an image conversion unit that converts the pixel values ​​of the first image and the second image so that they have the same pixel value for the same brightness. The stereo camera device according to claim 1, wherein the stereo recognition unit is configured to perform the image recognition using the image converted by the image conversion unit.

4. The stereo camera device according to claim 1, wherein the third grayscale conversion information is configured to perform the conversion such that the pixel values ​​of the second image saturate to a certain pixel value in a region where the amount of light is greater than the predetermined amount of light.

5. The stereo camera device according to claim 1, wherein the third tone conversion information is tone conversion information such that the pixel value converted based on the third tone conversion information is equal to the value obtained by multiplying the pixel value converted based on the second tone conversion information by a certain multiplier.

6. The stereo camera device according to claim 5, wherein the maximum value of the pixel value converted based on the third grayscale conversion information is greater than the maximum value of the pixel value converted based on the second grayscale conversion information.

7. The second grayscale conversion unit is configured to allow changing the number of bits in the converted pixel value, The stereo camera device according to claim 5, further comprising a camera image control unit configured to set the number of bits of the converted pixel value to a number of bits greater than the number of bits of the pixel value converted using the second tone conversion information when the tone control unit selects the third tone conversion information to be used for the conversion.

8. The stereo camera device according to claim 7, wherein the camera image control unit is configured to change the period of image acquisition by the second camera when the bit depth setting is changed.

9. The stereo camera device according to claim 1, wherein the third tone conversion information is tone conversion information such that in a part of the region where the amount of light is less than the amount of light that is converted to the maximum pixel value when the first tone conversion information is used, the third tone conversion information becomes equal to the value obtained when the second tone conversion information is used multiplied by a certain multiplier.

10. The stereo camera device according to claim 1, wherein the tone control unit is configured to select tone conversion information other than the second tone conversion information from among the plurality of tone conversion information when it is determined that the exposure amount of the second image calculated by the exposure amount calculation unit is insufficient.

11. A monocular recognition unit configured to perform image recognition based on a monocular field image that is included in either the first image or the second image but not in the other image, A gaze area determination unit is configured to set a gaze area that is deemed important in image recognition by the stereo recognition unit or the monocular recognition unit based on the recognition results of the stereo recognition unit and the monocular recognition unit, It further possesses, The stereo camera device according to claim 1, wherein the exposure amount calculation unit is configured to calculate the exposure amount using the area set as the gaze area as the predetermined imaging area.

12. The stereo camera device according to claim 11, wherein the tone control unit is configured to select tone conversion information based on the exposure of the region set as the gaze region in the stereo field of view and the exposure of the monocular field of view.

13. The stereo camera device according to claim 12, wherein the tone control unit is configured to select tone conversion information other than the second tone conversion information from among the plurality of tone conversion information when it is determined that the exposure of the monocular field of view is insufficient.

14. The stereo camera apparatus according to claim 1, wherein the tone control unit is configured to compare the amount of change in pixel value with respect to the amount of change in brightness in a region darker than a predetermined brightness between an image captured by the second camera using the third tone conversion curve and an image captured by the first camera, and to verify the third tone conversion curve.