Low-cost real-time monocular stereo imaging method and system based on aperture modulation
By arranging a half-aperture occlusion mask on the lens aperture stop and performing differential processing, the problems of high cost and parameter drift in binocular stereo vision systems are solved, achieving low-cost, real-time monocular stereo imaging effects suitable for various scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2026-01-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing binocular stereo vision systems are costly and bulky, and their camera parameters are prone to drift in temperature-changing environments, resulting in relative pose errors and failures in weak texture scenes.
A half-aperture occlusion mask is placed on the aperture stop or conjugate pupil surface of the lens, and the occlusion and non-occlusion states are switched by a driving device. Combined with the image processing module, differential processing is performed to directly obtain the view images of the two half apertures, thereby realizing scene depth estimation.
It achieves low-cost, real-time monocular stereo imaging, is suitable for various embedded devices with limited computing power, and can still work effectively in weak texture scenes without the need for complex calibration and inversion reconstruction processes.
Smart Images

Figure CN121509634B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of stereo vision technology, specifically to a low-cost real-time monocular stereo imaging method and system based on aperture modulation. Background Technology
[0002] Light field imaging is a relatively advanced technique that simultaneously records the angle and spatial coordinates of a light field. Existing compressed light field imaging techniques overcome the inherent trade-off between angular and spatial resolution in light field imaging by modulating various image planes or pupil planes. However, theoretically, to achieve 3D imaging, only light field information from two angles needs to be recorded.
[0003] The most mature 3D imaging solution currently available is the binocular stereo vision algorithm, which uses dual lenses and dual cameras to mimic the working principle of the human eye and obtain images from two different perspectives. However, on the one hand, dual lenses and dual cameras increase the system cost and size to some extent. On the other hand, dual lenses and dual sensors are essentially two imaging systems, and there is an inherent mismatch between the two images obtained, so precise calibration is necessary beforehand. Furthermore, camera intrinsic and extrinsic parameters are prone to drift over long-term use or in environments with temperature changes, which limits the application scenarios of binocular cameras. In addition, there are also issues such as relative pose error between the two cameras and near-failure in weak texture scenes.
[0004] This invention enables any imaging system to achieve 3D imaging functionality with simple and low-cost modifications by adding a semi-occluded mask and a driving device to the normal imaging optical path. Compared to other aperture coding schemes, the image processing algorithm of this invention is very simple, requiring no inversion reconstruction, making it suitable for various embedded devices, mobile terminals, and other systems with limited computing power, and providing real-time imaging capabilities. In scenarios where stereo matching algorithms fail, such as those with weak textures, the method can be viewed as a sparse light field with an angular resolution of 2*1, making it applicable to various scenarios. Summary of the Invention
[0005] The purpose of this invention is to address the shortcomings of existing technologies by proposing a low-cost real-time monocular stereo imaging method and system based on aperture modulation, thereby realizing a low-cost single-lens stereo imaging scheme that uses aperture coding and differential reconstruction of complementary views.
[0006] The objective of this invention is achieved through the following technical solution: a low-cost real-time monocular stereo imaging method based on aperture modulation, the method comprising the following steps:
[0007] (1) Arrange a half-aperture mask on the aperture stop surface or its conjugate pupil surface of the lens;
[0008] (2) Obtain an image with a half-aperture view under masked conditions;
[0009] (3) Obtain images with full aperture view in an unobstructed state;
[0010] (4) Perform differential processing on the images in step (2) and step (3) to obtain the independent view of the occluded half aperture, and combine the half aperture view to estimate the scene depth.
[0011] Furthermore, the semi-aperture blocking mask is an amplitude blocking structure with symmetrical or asymmetrical segmentation.
[0012] Furthermore, the half-aperture occlusion mask can include two modes: left half occlusion and right half occlusion. Based on the two occlusion states of left half occlusion and right half occlusion, image pairs from the two half apertures can be directly obtained to achieve scene depth estimation.
[0013] Furthermore, the half-aperture occlusion mask ensures that the views from the half-aperture and full-aperture perspectives differ only in the angular dimension without altering the object-side field of view.
[0014] Furthermore, depth estimation includes optical field depth estimation methods and optical flow estimation algorithms.
[0015] On the other hand, the present invention also provides a low-cost real-time monocular stereo imaging system based on aperture modulation, the system comprising:
[0016] Lens;
[0017] A semi-aperture blocking mask is set on the aperture stop surface or its conjugate pupil surface of the lens;
[0018] A driving device is used to drive the half-aperture occlusion mask to switch between occlusion and non-occlusion states in order to obtain images with a half-aperture view or a full-aperture view.
[0019] The image processing module is used to connect to the image sensor, acquire images of the half-aperture view under occlusion and the full-aperture view under unocclusion, perform differential processing to obtain the independent view of the occluded half-aperture, and combine the half-aperture view to estimate the scene depth.
[0020] Furthermore, the driving device is one or a combination of a rotary electromagnet, a voice coil actuator, a stepper / servo mechanism, a piezoelectric actuator, and a micro motor, and forms a rigid or flexible connection with the mask.
[0021] Furthermore, the image processing module can perform image difference processing and depth estimation algorithms, including light field depth estimation and optical flow estimation algorithms.
[0022] Furthermore, the image sensor is a global shutter sensor and is synchronized with the driving device via a trigger signal.
[0023] Furthermore, the system can be expanded to a dual half-mask occlusion + dual drive device control, which can switch between left half occlusion and right half occlusion states to directly obtain image pairs from the two half apertures. Furthermore, the mask can be any low-cost light-blocking material.
[0024] Furthermore, the physical position of the mask does not need to be precisely fixed in each occlusion state; the goal is simply to block half of the aperture, which provides ample redundancy for practical engineering applications.
[0025] Furthermore, the differential step in the image processing module can be selected from various programmable processors such as FPGA, DSP, Raspberry Pi, etc. In applications with strict constraints on power consumption, size, and cost, the differential step can also be replaced by a simple customized digital circuit.
[0026] Furthermore, the two images can be directly regarded as the result of binocular imaging, and the disparity estimation map can be directly obtained by using the classical algorithm or neural network algorithm of optical flow estimation.
[0027] Furthermore, the two images can be directly regarded as a sparse light field with an angular resolution of 2*1, and the parallax cues can be directly derived from the slope of each pixel of the EPI.
[0028] Furthermore, after obtaining the disparity map estimate or disparity cues, the depth map can be obtained through a simple reciprocal and linear transformation after prior calibration.
[0029] The beneficial effects of this invention are:
[0030] 1. Compared with mature conventional binocular vision, this invention does not require precise adjustment of parameters between two cameras, strict control of camera pose, or two sensors and two lenses, and can achieve equivalent or even better stereoscopic vision effects.
[0031] 2. Compared with light field imaging, this invention does not require recording multiple angular resolution views, and can achieve stereoscopic vision effects by obtaining only two sub-aperture images.
[0032] 3. Compared with other aperture coding schemes, this invention does not require deconvolution or other inversion reconstruction, nor does it require expensive coding devices, and can achieve low-cost, real-time stereoscopic vision effects.
[0033] 4. This invention has extremely low computing power requirements and can be deployed on various embedded devices, mobile terminals and other systems with limited computing power.
[0034] 5. The stereoscopic visual effect of this invention is also applicable in low-texture scenes.
[0035] 6. This invention requires minimal modification to various existing optical systems and can be deployed directly. Attached Figure Description
[0036] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0037] Figure 1 This is a schematic diagram of the monocular stereo vision optical path provided in an embodiment of the present invention;
[0038] Figure 2 The monocular stereo vision image processing procedure provided in the embodiments of the present invention. Detailed Implementation
[0039] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0040] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0041] This invention provides a low-cost real-time monocular stereo imaging method based on aperture modulation, the method comprising the following steps:
[0042] (1) A half-aperture occlusion mask is arranged on the aperture stop plane or its conjugate pupil plane of the lens; the half-aperture occlusion mask is an amplitude occlusion structure with symmetrical or asymmetrical segmentation; the half-aperture occlusion mask can include two modes: left half occlusion and right half occlusion. Based on the two occlusion states of left half occlusion and right half occlusion, image pairs from the two half apertures can be directly obtained to realize scene depth estimation; and it is necessary to ensure that the view of the half-aperture view and the view of the full aperture view differ only in the angular dimension without changing the object field of view.
[0043] (2) Obtain an image with a half-aperture view under masked conditions;
[0044] (3) Obtain images with full aperture view in an unobstructed state;
[0045] (4) Perform differential processing on the images in step (2) and step (3) to obtain the independent view of the occluded half aperture, and combine the half aperture view to perform scene depth estimation, including the light field depth estimation method and the optical flow estimation algorithm.
[0046] On the other hand, the present invention also provides a low-cost real-time monocular stereo imaging system based on aperture modulation, the system including a lens, a half-aperture occlusion mask, a driving device and an image processing module; wherein, the half-aperture occlusion mask is disposed on the aperture stop surface of the lens or its conjugate pupil surface.
[0047] The driving device is one or a combination of a rotary electromagnet, a voice coil actuator, a stepper / servo mechanism, a piezoelectric actuator, and a micro motor, and forms a rigid or flexible connection with the mask; it is used to drive the half-aperture mask to switch between two states of occlusion and non-occlusion to obtain images of half-aperture or full-aperture perspective.
[0048] The image processing module connects to an image sensor, which is a global shutter sensor, and is synchronized with the driving device via a trigger signal. It acquires images of a half-aperture view under occlusion and a full-aperture view under unocclusion. Image differential processing and depth estimation algorithms are performed. Differential processing yields an independent viewpoint of the occluded half-aperture, and scene depth estimation is performed using this half-aperture viewpoint, including optical field depth estimation and optical flow estimation algorithms. The differential step can be performed using various programmable processors such as FPGAs, DSPs, and Raspberry Pis. In applications with strict constraints on power consumption, size, and cost, the differential step can also be replaced by a simple, customized digital circuit.
[0049] The system of this invention can also be expanded to a dual-half-mask occlusion + dual-drive device control, which can switch between left half occlusion and right half occlusion states to directly obtain image pairs from both halves of the aperture. Furthermore, the mask can be any low-cost light-blocking material. The physical position of the mask in each occlusion state does not need to be precisely fixed; the goal is only to block half of the aperture, which provides sufficient redundancy for practical engineering applications.
[0050] This invention can directly treat two images as the result of binocular imaging, and use classical or neural network algorithms for optical flow estimation to directly obtain a disparity estimation map.
[0051] This invention can directly treat two images as a sparse light field with an angular resolution of 2*1, and directly derive disparity cues from the slope of each pixel in the EPI. After obtaining the disparity map estimate or disparity cues, a depth map can be obtained through pre-calibration and a simple reciprocal and linear transformation. Example
[0052] like Figure 1 The diagram shows the optical path. It only requires adding a semi-obstruction mask and a driving device to the normal imaging optical path.
[0053] The mask is controlled to be in a blocked or unblocked state by a driving device, such as a rotating electromagnet or a voice coil actuator. It is worth noting that the present invention does not require extremely precise position during driving control, so control precision can be sacrificed to improve the speed of mask opening / closing, which can generally reach the millisecond level. Example
[0054] In the above embodiment, the driving device is synchronized with the camera shutter to capture two images: one with the mask covering and the other with it open. The difference between these images yields two perspective images, such as... Figure 2 The diagram shows the difference processing in the image processing module.
[0055] Alternatively, if increased cost and complexity are permissible, a two-mask approach can be directly selected. By driving two masks separately to block the two half-apertures, two-view images can be output directly without interpolation.
[0056] Alternatively, two images from different perspectives can be obtained by using a mask to block two half-areas of the aperture, but this will increase the control precision requirements of the drive device to some extent.
[0057] Optionally, after obtaining images from two different viewpoints, neural network models such as RAFT, or other algorithms, can be used to obtain optical flow estimates between the two images, which is also known as disparity estimation. After pre-calibrating camera intrinsic and extrinsic parameters, this can be converted into depth estimation.
[0058] Optionally, for low-texture scenes or devices with insufficient computing power, the per-pixel slope of the EPI can be directly calculated as a sparse light field and then converted into depth through prior calibration.
[0059] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.
Claims
1. A low-cost real-time monocular stereo imaging method based on aperture modulation, characterized in that, The method includes the following steps: (1) Arrange a half-aperture mask on the aperture stop surface or its conjugate pupil surface of the lens; (2) Obtain an image with a half-aperture view under masked conditions; (3) Obtain images with full aperture view in an unobstructed state; (4) Based on the images in step (2) and step (3), differential processing is performed to obtain the independent viewpoints of the occluded half-aperture. The two half-aperture viewpoint images form a result similar to binocular imaging. Optical flow estimation is used to directly obtain the disparity estimation map or the two images are regarded as a sparse light field with an angular resolution of 2*1. The disparity cues are directly obtained from the slope of each pixel of the EPI. The scene depth is estimated after the pre-calibrated camera intrinsic and extrinsic parameters.
2. The method according to claim 1, characterized in that, The semi-aperture masking structure is an amplitude masking structure with symmetrical or asymmetrical segmentation.
3. The method according to claim 1, characterized in that, The half-aperture occlusion mask can include two modes: left half occlusion and right half occlusion. Based on the two occlusion states, image pairs from the two half-apertures can be directly obtained to achieve scene depth estimation.
4. The method according to claim 1, characterized in that, The half-aperture occlusion mask ensures that the views from the half-aperture and full-aperture perspectives differ only in the angular dimension without altering the object-side field of view.
5. The method according to claim 1, characterized in that, Depth estimation includes optical field depth estimation and optical flow estimation algorithms.
6. A low-cost real-time monocular stereo imaging system based on aperture modulation, implementing the method of any one of claims 1-5, characterized in that, The system includes Lens; A semi-aperture blocking mask is set on the aperture stop surface or its conjugate pupil surface of the lens; A driving device is used to drive the half-aperture occlusion mask to switch between occlusion and non-occlusion states in order to obtain images with a half-aperture view or a full-aperture view. The image processing module is used to connect to the image sensor, acquire images of the half-aperture view under occlusion and the full-aperture view under unocclusion, perform differential processing to obtain the independent view of the occluded half-aperture, and combine the half-aperture view to estimate the scene depth.
7. The system according to claim 6, characterized in that, The driving device is one or a combination of a rotary electromagnet, a voice coil actuator, a stepper / servo mechanism, a piezoelectric actuator, and a micro motor, and forms a rigid or flexible connection with the mask.
8. The system according to claim 6, characterized in that, The image processing module can perform image difference processing and depth estimation algorithms, including light field depth estimation and optical flow estimation algorithms.
9. The system according to claim 6, characterized in that, The image sensor is a global shutter sensor and is synchronized with the driving device via a trigger signal.
10. The system according to claim 6, characterized in that, The system can be expanded to dual half-mask occlusion + dual drive device control, and can switch between left half occlusion and right half occlusion states to directly obtain image pairs from the two half apertures.