A low-complexity stereoscopic image super-resolution reconstruction method and device
By constructing a low-complexity stereo image super-resolution reconstruction model, and utilizing multi-level structural pruning and cross-viewpoint progressive distillation techniques, the problem of high computational complexity in existing methods is solved, and efficient stereo image reconstruction is achieved on resource-limited devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN UNIV OF COMMERCE
- Filing Date
- 2026-02-26
- Publication Date
- 2026-06-19
AI Technical Summary
Existing stereo image super-resolution reconstruction methods mainly focus on image quality, ignoring the high computational complexity of deep learning models. They are difficult to deploy effectively on resource-limited edge devices, and existing methods fail to effectively reduce network structure redundancy and cannot efficiently explore the correlation between viewpoints.
A teacher network is constructed by stacking multiple soft-gated feature extraction units, a lightweight student network is obtained by pruning, and a stereo bridging progressive distillation mechanism is introduced to guide the student network learning, thereby constructing a low-complexity stereo image super-resolution reconstruction model.
While maintaining high-quality, high-resolution stereo image reconstruction, it significantly reduces network computational complexity, improves the deployment efficiency of the model on resource-constrained devices, and achieves efficient stereo image super-resolution reconstruction.
Smart Images

Figure CN122243741A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of deep learning and image super-resolution reconstruction, and in particular to a low-complexity stereo image super-resolution reconstruction method and apparatus based on multi-level structure pruning and cross-viewpoint progressive distillation. Background Technology
[0002] Vision is the primary means by which humans perceive the world and acquire information. For a long time, humanity has strived to create a harmonious visual perception environment that is rich in visual details, realistically natural, and immersive. The information perceived by human eyes in nature is three-dimensional, possessing strong expressive power and rich content. Binocular stereo vision technology, with its superior three-dimensional spatial perception capabilities, has been widely applied in fields such as humanoid robots, autonomous driving, and drones, becoming a development direction and trend of next-generation information technology, and possessing significant academic research value and practical application value.
[0003] Binocular stereo vision technology reproduces the depth information of a scene, enabling edge devices to perceive the distance, depth, and spatial distribution of objects within the scene. Stereo images, as a crucial data carrier for binocular stereo vision, are typically acquired by binocular cameras and include left and right viewpoint images with horizontal parallax. High-quality, high-resolution stereo images possess clear texture details, enhancing not only the viewer's visual experience but also facilitating deeper and more efficient utilization of visual information within the image by the machine, thereby improving the performance of subsequent computer vision tasks. However, in practical applications, limitations imposed by hardware, imaging conditions, and transmission bandwidth often result in stereo images with resolutions insufficient for application requirements. To obtain high-quality, high-resolution stereo images, researchers have attempted to improve hardware, such as reducing detector pixel size and enhancing optical lens technology to improve imaging system performance. Nevertheless, due to current manufacturing limitations, significant breakthroughs in hardware improvements are unlikely in the short term, and high-precision hardware typically comes with higher costs. Therefore, there is an urgent need to continuously explore methods to improve stereo image resolution at the algorithmic level.
[0004] Stereo image super-resolution reconstruction aims to utilize the correlation between left and right viewpoints to predict missing detail information in low-resolution left and right viewpoint images and reconstruct corresponding high-resolution left and right viewpoint images. Chu et al. significantly improved the quality of reconstructed images by constructing a stereo cross-attention module to mine inter-viewpoint information. Qiu et al. further improved the perceptual quality of reconstructed images by introducing a perceptual loss function based on image patch perceptual similarity. Wan et al. proposed a multi-stage stereo image super-resolution reconstruction network, which explores detail information with consistent left and right viewpoint structures by constructing an edge-guided stereo attention mechanism, progressively improving the quality of super-resolution reconstruction from coarse to fine. Lin et al. proposed using a Transformer network structure to solve the problem of stereo image super-resolution reconstruction. The proposed network includes three parts: cross-attention feature extraction, inter-viewpoint information aggregation, and image reconstruction, effectively improving performance. Liu et al. proposed a cross-viewpoint interaction network based on coarse to fine, which simulates human visual mechanisms and constructs a cascaded disparity attention structure from coarse to fine to obtain more reliable stereo correspondences, thereby improving the reconstruction quality of high-resolution stereo images. However, edge devices such as robots and drones are limited by factors such as size, heat dissipation, and power consumption, resulting in relatively limited hardware resources and placing higher demands on the computational efficiency of stereo image super-resolution reconstruction. Currently, existing stereo image super-resolution reconstruction methods primarily focus on the quality of the reconstructed image, neglecting the high computational complexity of deep learning models. Further research and development are warranted to explore ways to reduce network redundancy, efficiently investigate inter-view correlations, and construct low-complexity stereo image super-resolution reconstruction models. Summary of the Invention
[0005] This invention provides a low-complexity stereo image super-resolution reconstruction method and apparatus. Addressing the issue that existing technologies primarily focus on the reconstruction quality of stereo images while neglecting the high computational complexity of deep learning models and the difficulty in deploying them on edge devices, this invention explores the redundancy of a dual-branch network architecture for stereo images, efficiently mining inter-view correlations, constructing a low-complexity stereo image super-resolution reconstruction model, and outputting high-quality, high-resolution stereo images. See the description below for details:
[0006] A first aspect is a method for super-resolution reconstruction of low-complexity stereo images, the method comprising:
[0007] A teacher network is constructed by stacking multiple layers of soft-gated feature extraction units;
[0008] Pruning the teacher network to obtain a lightweight student network;
[0009] Construct a three-dimensional bridging progressive distillation mechanism and utilize teacher networks to guide student network learning;
[0010] Train the student network, constrain it using a stereo bridging asymptotic distillation mechanism, and reconstruct high-resolution stereo images based on the trained student network.
[0011] The method of constructing a teacher network by stacking multiple layers of soft-gated feature extraction units is as follows:
[0012] For the nth layer soft-gated feature extraction unit, calculate the first importance of the feature extraction block within that unit; use the first importance as the soft-gated weight, and fuse the input and output features of the feature extraction block within the viewpoint;
[0013] Based on the fused features, the second importance of the mutual attention mechanism within the unit is calculated. The second importance is used as the soft gating weight. The input and output features of the mutual attention mechanism are fused. The left and right viewpoint features output by the last soft gating feature extraction unit are reconstructed into a high-resolution left and right view through the sub-pixel convolutional layer.
[0014] The first importance is:
[0015]
[0016] in, This indicates the importance of the feature extraction block in the nth layer. This represents the importance estimate within the viewpoint. This represents the sigmoid activation function. This indicates the calculation of the average value. This represents the left-viewpoint input feature of the nth soft-gated feature extraction unit. This represents the right-viewpoint input feature of the nth layer soft-gated feature extraction unit.
[0017] The second importance is:
[0018]
[0019] in, This indicates the importance of the mutual attention mechanism in the nth layer. This represents the importance estimate between viewpoints. This indicates the calculation of cosine similarity.
[0020] The process of pruning the teacher network to obtain a lightweight student network is as follows:
[0021] The number of channels in a feature extraction block is pruned based on its primary importance, and the mutual attention mechanism is retained based on its secondary importance, resulting in a lightweight student network.
[0022] The pruning process is as follows:
[0023]
[0024]
[0025]
[0026] in, This represents the number of channels in the nth layer feature extraction block of the teacher network. This represents the number of channels in the nth layer feature extraction block of the student network. Indicates multiplication. Indicates rounding up. A flag indicating whether the mutual attention mechanism at layer n is retained. and These represent the teacher network and the student network, respectively. This indicates a pruning operation, and N represents the total number of feature extraction blocks (N layers).
[0027] The three-dimensional bridging progressive distillation mechanism is as follows:
[0028] The features of the teacher and student networks are fused using separable convolutions, as follows:
[0029]
[0030]
[0031] in, and Let these represent the left and right viewpoint features of the Nth layer of the teacher network, respectively. and Let the left and right viewpoints of the Nth layer of the student network be represented respectively. and These represent the fused left and right viewpoint features, respectively. Indicates a cascading operation. This indicates a separable convolution;
[0032] The mutual attention mechanism is used to interact with the left and right viewpoint features, as follows:
[0033]
[0034] in, and These represent the interaction features of the left and right viewpoints, respectively.
[0035] The method further includes: constructing a three-dimensional bridging asymptotic distillation loss, jointly constraining the three-dimensional bridging asymptotic distillation branches and the student network, wherein the three-dimensional bridging asymptotic distillation loss is defined as follows:
[0036]
[0037]
[0038]
[0039] in, This indicates the bridging progressive distillation loss. Indicates the weighting coefficient. Represents a sub-pixel convolutional layer. Represents the distance metric function. and These represent the high-resolution ground truth values for the left and right viewpoints, respectively. Indicates the losses incurred during reconstruction; This indicates distillation loss.
[0040] In a second aspect, a low-complexity stereo image super-resolution reconstruction apparatus, the apparatus comprising: a processor and a memory, the memory storing program instructions, the processor calling the program instructions stored in the memory to cause the apparatus to perform the method described in any of the first aspects.
[0041] Third aspect, a computer-readable storage medium storing a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method described in any one of the first aspects.
[0042] The beneficial effects of the technical solution provided by this invention are:
[0043] 1. This invention utilizes the feature representation capabilities of deep learning to effectively reduce the computational complexity of the network while reconstructing high-quality, high-resolution stereo images by removing redundancy in the network structure.
[0044] 2. This invention designs a multi-level structural pruning strategy. By constructing soft-gated feature extraction units, the importance of different levels of the network is calculated. Based on the calculated importance, the number of channels and the mutual attention mechanism of the network feature extraction units are pruned, thereby obtaining a lightweight student network while maintaining reconstruction quality.
[0045] 3. This invention designs a stereo bridging progressive distillation mechanism. By introducing a bridging distillation branch, the difference in expressive ability between the teacher network and the student network is reduced, promoting more effective knowledge transfer, thereby improving the reconstruction quality of the student network for high-resolution stereo images.
[0046] 4. Through experimental verification on multiple datasets, the proposed method maintains excellent stereo image super-resolution reconstruction quality while having low computational complexity. Attached Figure Description
[0047] Figure 1 This is a flowchart of a low-complexity stereo image super-resolution reconstruction method based on multi-level structure pruning and cross-viewpoint progressive distillation. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below.
[0049] Example 1
[0050] To overcome the shortcomings of existing technologies, this invention presents a low-complexity stereo image super-resolution reconstruction method based on multi-level structure pruning and cross-viewpoint progressive distillation.
[0051] I. Feature Mapping
[0052] For the input low-resolution left and right views and In this embodiment of the invention, two branches are used, each employing a convolutional layer to map the left and right views into left and right viewpoint features. and .
[0053] II. Constructing a Soft-Gated Feature Extraction Unit
[0054] For left and right viewpoint features, soft-gated feature extraction units are constructed to further extract spatial features of the left and right viewpoints. This embodiment of the invention constructs a teacher network by stacking multiple layers of soft-gated feature extraction units. For the nth layer of soft-gated feature extraction units, the importance of the feature extraction blocks within that layer is first calculated. The importance of a feature extraction block can be defined as follows:
[0055] (1)
[0056] in, This indicates the importance of the feature extraction block in the nth layer. This represents the importance estimate within the viewpoint. This represents the sigmoid activation function. This indicates the calculation of the average value. This represents the left-viewpoint input feature of the nth soft-gated feature extraction unit. This represents the right-viewpoint input feature of the nth layer soft-gated feature extraction unit.
[0057] Then, As soft-gating weights, they fuse the input and output features of the in-view feature extraction block. The weighted fusion process can be defined as follows:
[0058] (2)
[0059] (3)
[0060] in, Indicates a feature extraction block. This indicates pixel-level multiplication. and These represent the fusion features of the left and right viewpoints, respectively.
[0061] Based on fusion features and The importance of the mutual attention mechanism within the unit of this layer is calculated using the following formula:
[0062] (4)
[0063] in, This indicates the importance of the mutual attention mechanism in the nth layer. This represents the importance estimate between viewpoints. This indicates the calculation of cosine similarity.
[0064] Similarly, As soft-gating weights, they are used to fuse the input and output features of the mutual attention mechanism. The weighted fusion process can be defined as follows:
[0065] (5)
[0066] (6)
[0067] (7)
[0068] in, This indicates a mutual attention mechanism. and These represent the left and right viewpoint features output by the mutual attention mechanism, respectively. and These represent the left and right viewpoint features output by the nth layer soft-gated feature extraction unit, respectively.
[0069] Finally, subpixel convolutional layers are used to reconstruct the left and right viewpoint features output by the last soft-gated feature extraction unit into high-resolution left and right views.
[0070] III. Multi-level structural pruning
[0071] After training the teacher network, it is pruned based on the importance calculated for each soft-gated feature extraction unit to obtain a lightweight student network. Specifically, this is done according to the importance of the feature extraction blocks. Prune the number of channels in the feature extraction blocks according to the importance of the mutual attention mechanism. This determines whether the mutual attention mechanism of this layer should be retained.
[0072] The above pruning process is defined as follows:
[0073] (8)
[0074] (9)
[0075] (10)
[0076] in, This represents the number of channels in the nth layer feature extraction block of the teacher network. This represents the number of channels in the nth layer feature extraction block of the student network. Indicates multiplication. This indicates rounding up to the nearest integer. This is a flag indicating whether the mutual attention mechanism at layer n is retained. and These represent the teacher network and the student network, respectively. This indicates a pruning operation, and N represents the total number of feature extraction blocks (N layers).
[0077] IV. Constructing a three-dimensional bridging progressive distillation mechanism
[0078] There are differences in the expressive capabilities of teacher and student networks. Using the teacher network to guide the learning of student networks can help improve the performance of student networks. However, directly constructing feature-level constraints for knowledge transfer has certain limitations. Therefore, a three-dimensional bridging asymptotic distillation mechanism is constructed. This mechanism introduces a three-dimensional bridging asymptotic distillation branch to narrow the difference in expressive capabilities between teacher and student networks, thereby promoting more effective knowledge transfer.
[0079] Based on features extracted from the teacher and student networks, the stereo bridging progressive distillation branch enables the student network to participate in the knowledge transfer process through feature fusion. First, separable convolutions are used to fuse the features of the teacher and student networks, with the following formula defined:
[0080] (11)
[0081] (12)
[0082] in, and Let these represent the left and right viewpoint features of the Nth layer of the teacher network, respectively. and Let the left and right viewpoints of the Nth layer of the student network be represented respectively. and These represent the fused left and right viewpoint features, respectively. Indicates a cascading operation. This indicates a separable convolution.
[0083] Then, a mutual attention mechanism is used to interact with the viewpoint information of the left and right viewpoint features, and the formula is defined as follows:
[0084] (13)
[0085] in, and These represent the interaction features of the left and right viewpoints, respectively.
[0086] Finally, to facilitate effective knowledge transfer, a three-dimensional bridging asymptotic distillation loss is constructed, jointly constraining the three-dimensional bridging asymptotic distillation branches and the student network. The three-dimensional bridging asymptotic distillation loss is defined as follows:
[0087]
[0088] (15)
[0089] (16)
[0090] in, This indicates the bridging progressive distillation loss. Indicates the weighting coefficient. Represents a sub-pixel convolutional layer. This represents the distance metric function. and These represent the high-resolution ground truth values for the left and right viewpoints, respectively. Indicates the losses incurred during reconstruction; This represents the distillation loss. That is, the above-mentioned three-dimensional bridging progressive distillation mechanism consists of formulas (11) to (16).
[0091] 5. Training the student network, using a stereo bridging progressive distillation mechanism for constraints. During this training, the student network is a lightweight stereo image super-resolution reconstruction network obtained by pruning the teacher network through multi-level structural pruning, including: feature mapping layers, feature extraction modules, and sub-pixel convolutional layers.
[0092] During this training process, the number of channels in the feature extraction unit of the teacher network was set to 84, and the feature extraction module contained 32 layers of feature extraction units. In the stereo bridging progressive distillation mechanism, the weight coefficient was set to 0.1.
[0093] VI. Reconstructing High-Resolution Stereo Images Based on Trained Student Networks
[0094] After training the student network, a low-complexity stereo image super-resolution reconstruction model can be obtained. Inputting the low-resolution stereo image into this model reconstructs a high-resolution stereo image.
[0095] Example 2
[0096] A low-complexity stereo image super-resolution reconstruction apparatus includes a processor and a memory. The memory stores program instructions, and the processor invokes the program instructions stored in the memory to cause the apparatus to execute the following method steps in Embodiment 1:
[0097] A teacher network is constructed by stacking multiple layers of soft-gated feature extraction units;
[0098] Pruning the teacher network to obtain a lightweight student network;
[0099] Construct a three-dimensional bridging progressive distillation mechanism and utilize teacher networks to guide student network learning;
[0100] Train the student network, constrain it using a stereo bridging asymptotic distillation mechanism, and reconstruct high-resolution stereo images based on the trained student network.
[0101] The teacher network is constructed by stacking multiple layers of soft-gated feature extraction units as follows:
[0102] For the nth layer soft-gated feature extraction unit, calculate the first importance of the feature extraction block within that unit; use the first importance as the soft-gated weight, and fuse the input and output features of the feature extraction block within the viewpoint;
[0103] Based on the fused features, the second importance of the mutual attention mechanism within the unit is calculated. The second importance is used as the soft gating weight. The input and output features of the mutual attention mechanism are fused. The left and right viewpoint features output by the last soft gating feature extraction unit are reconstructed into a high-resolution left and right view through the sub-pixel convolutional layer.
[0104] The most important of these is:
[0105]
[0106] in, This indicates the importance of the feature extraction block in the nth layer. This represents the importance estimate within the viewpoint. This represents the sigmoid activation function. This indicates the calculation of the average value. This represents the left-viewpoint input feature of the nth soft-gated feature extraction unit. This represents the right-viewpoint input feature of the nth layer soft-gated feature extraction unit.
[0107] The second most important is:
[0108]
[0109] in, This indicates the importance of the mutual attention mechanism in the nth layer. This represents the importance estimate between viewpoints. This indicates the calculation of cosine similarity.
[0110] Among these, pruning the teacher network to obtain a lightweight student network is as follows:
[0111] The number of channels in a feature extraction block is pruned based on its primary importance, and the mutual attention mechanism is retained based on its secondary importance, resulting in a lightweight student network.
[0112] Among them, pruning is:
[0113]
[0114]
[0115]
[0116] in, This represents the number of channels in the nth layer feature extraction block of the teacher network. This represents the number of channels in the nth layer feature extraction block of the student network. Indicates multiplication. Indicates rounding up. A flag indicating whether the mutual attention mechanism at layer n is retained. and These represent the teacher network and the student network, respectively. This indicates a pruning operation, and N represents the total number of feature extraction blocks (N layers).
[0117] The three-dimensional bridging progressive distillation mechanism is as follows:
[0118] The features of the teacher and student networks are fused using separable convolutions, as follows:
[0119]
[0120]
[0121] in, and Let these represent the left and right viewpoint features of the Nth layer of the teacher network, respectively. and Let the left and right viewpoints of the Nth layer of the student network be represented respectively. and These represent the fused left and right viewpoint features, respectively. Indicates a cascading operation. This indicates a separable convolution;
[0122] The mutual attention mechanism is used to interact with the left and right viewpoint features, as follows:
[0123]
[0124] in, and These represent the interaction features of the left and right viewpoints, respectively.
[0125] This also includes: constructing a three-dimensional bridging asymptotic distillation loss, jointly constraining the three-dimensional bridging asymptotic distillation branches and the student network, and defining the three-dimensional bridging asymptotic distillation loss as follows:
[0126]
[0127]
[0128]
[0129] in, This indicates the bridging progressive distillation loss. Indicates the weighting coefficient. Represents a sub-pixel convolutional layer. Represents the distance metric function. and These represent the high-resolution ground truth values for the left and right viewpoints, respectively. Indicates the losses incurred during reconstruction; This indicates distillation loss.
[0130] It should be noted that the device descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention will not be repeated here.
[0131] The execution entities of the aforementioned processor and memory can be devices with computing functions such as computers, microcontrollers, and single-chip microcomputers. In specific implementations, the embodiments of the present invention do not limit the execution entities and can select them according to the needs of actual applications.
[0132] Data signals are transmitted between the memory and the processor via a bus, which will not be elaborated upon in this embodiment of the invention.
[0133] Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, the storage medium including a stored program, which, when the program is running, controls the device where the storage medium is located to execute the method steps in the above embodiments.
[0134] The computer-readable storage medium includes, but is not limited to, flash memory, hard disk, solid-state drive, etc.
[0135] It should be noted that the description of the readable storage medium in the above embodiments corresponds to the description of the method in the embodiments, and the embodiments of the present invention will not be repeated here.
[0136] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of the present invention is generated.
[0137] A computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions can be stored in or transmitted through a computer-readable storage medium. A computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available medium can be magnetic or semiconductor, etc.
[0138] Unless otherwise specified, the model numbers of the various devices in this embodiment of the invention are not limited, and any device that can perform the above functions is acceptable.
[0139] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of a preferred embodiment, and the sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0140] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for super-resolution reconstruction of low-complexity stereo images, characterized in that, The method includes: A teacher network is constructed by stacking multiple layers of soft-gated feature extraction units; Pruning the teacher network to obtain a lightweight student network; Construct a three-dimensional bridging progressive distillation mechanism and utilize teacher networks to guide student network learning; Train the student network, constrain it using a stereo bridging asymptotic distillation mechanism, and reconstruct high-resolution stereo images based on the trained student network.
2. The method for low-complexity stereo image super-resolution reconstruction according to claim 1, characterized in that, The method of constructing a teacher network by stacking multiple layers of soft-gated feature extraction units is as follows: For the n The layer soft-gated feature extraction unit calculates the first importance of the feature extraction block within the layer unit; the first importance is used as the soft-gated weight, and the input and output features of the feature extraction block within the viewpoint are fused; Based on the fused features, the second importance of the mutual attention mechanism within the unit is calculated. The second importance is used as the soft gating weight. The input and output features of the mutual attention mechanism are fused. The left and right viewpoint features output by the last soft gating feature extraction unit are reconstructed into a high-resolution left and right view through the sub-pixel convolutional layer.
3. The method for low-complexity stereo image super-resolution reconstruction according to claim 2, characterized in that, The first importance is: ; in, Indicates the first n The importance of layer feature extraction blocks This represents the importance estimate within the viewpoint. This represents the sigmoid activation function. This indicates the calculation of the average value. Indicates the first n The left-viewpoint input features of the layer soft-gated feature extraction unit. Indicates the first n The right-viewpoint input features of the layer soft-gated feature extraction unit.
4. The method for low-complexity stereo image super-resolution reconstruction according to claim 2, characterized in that, The second importance is: ; in, Indicates the first n The importance of inter-layer attention mechanisms This represents the importance estimate between viewpoints. This indicates the calculation of cosine similarity.
5. The method for low-complexity stereo image super-resolution reconstruction according to claim 1, characterized in that, The process of pruning the teacher network to obtain a lightweight student network is as follows: The number of channels in a feature extraction block is pruned based on its primary importance, and the mutual attention mechanism is retained based on its secondary importance, resulting in a lightweight student network.
6. The method for low-complexity stereo image super-resolution reconstruction according to claim 5, characterized in that, The pruning is as follows: ; ; ; in, Indicating the first in the teacher network n The number of channels in the layer feature extraction block. Indicates the first in the student network n The number of channels in the layer feature extraction block. Indicates multiplication. Indicates rounding up. Indicates the first n A flag indicating whether the inter-layer attention mechanism is retained. and These represent the teacher network and the student network, respectively. This indicates a pruning operation, and N represents the total number of feature extraction blocks (N layers).
7. The method for low-complexity stereo image super-resolution reconstruction according to claim 1, characterized in that, The three-dimensional bridging progressive distillation mechanism is as follows: The features of the teacher and student networks are fused using separable convolutions, as follows: ; ; in, and These represent the first and second parts of the teacher network. N Left and right viewpoint features of the layer. and These represent the first and second lines of the student network. N Left and right viewpoint features of the layer. and These represent the fused left and right viewpoint features, respectively. Indicates a cascading operation. This indicates a separable convolution; The mutual attention mechanism is used to interact with the left and right viewpoint features, as follows: ; in, and These represent the interaction features of the left and right viewpoints, respectively.
8. The method for low-complexity stereo image super-resolution reconstruction according to claim 1, characterized in that, The method further includes: constructing a three-dimensional bridging asymptotic distillation loss, jointly constraining the three-dimensional bridging asymptotic distillation branches and the student network, wherein the three-dimensional bridging asymptotic distillation loss is defined as follows: ; ; ; in, This indicates the bridging progressive distillation loss. Indicates the weighting coefficient. Represents a sub-pixel convolutional layer. Represents the distance metric function. and These represent the high-resolution ground truth values for the left and right viewpoints, respectively. Indicates the losses incurred during reconstruction; This indicates distillation loss.
9. A low-complexity stereo image super-resolution reconstruction device, characterized in that, The device includes a processor and a memory, the memory storing program instructions, the processor invoking the program instructions stored in the memory to cause the device to perform the method according to any one of claims 1-8.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method described in any one of claims 1-8.