A neural network-based image stitching method and device
By using a neural network-based image stitching method, image cropping and scaling are performed using preset constraints and a trained neural network. This solves the problems of computational speed and applicability in multi-camera image stitching technology, and enables real-time image stitching and high-quality monitoring of wide-angle scenes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU ANYKA MICROELECTRONICS CO LTD
- Filing Date
- 2024-12-20
- Publication Date
- 2026-06-23
Smart Images

Figure CN122265022A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and in particular to an image stitching method and apparatus based on neural networks. Background Technology
[0002] Multi-camera systems are primarily used in locations requiring high-quality monitoring of wide-angle scenes. Examples include security in banks, shopping malls, and outdoor settings; monitoring of highways and railways; accident detection and handling; security monitoring in mines, petrochemical plants, and power equipment; and other scenarios requiring real-time monitoring or recording of wide-angle scenes.
[0003] Currently, when expanding the field of view with multiple cameras, feature point alignment is usually used (such as the SIFT algorithm). However, this method is slow to calculate and difficult to process in real time. Furthermore, the feature point alignment algorithm is not suitable for all scenarios and cannot be widely adopted for security monitoring in scenarios such as mines, petrochemical plants, and power equipment. Summary of the Invention
[0004] This application provides an image stitching method and apparatus based on neural networks, which can realize real-time stitching processing of video image frames and is applicable to various application scenarios with high applicability.
[0005] In a first aspect, embodiments of this application provide an image stitching method based on a neural network, including:
[0006] Acquire the first video from the first camera and the second video from the second camera, and process each first image frame in the first video and each second image frame in the second video to satisfy preset constraints.
[0007] The first and second image frames at the same time are acquired and input into the trained stitching optimization neural network to obtain the first optimal stitching of the first image frame and the second optimal stitching of the second image frame.
[0008] The first image frame is cropped based on the first optimal stitching seam.
[0009] The first target image frame is obtained by scaling each pixel row in the cropped first image frame.
[0010] The second image frame is cropped based on the second optimal stitching seam.
[0011] The pixel rows in the cropped second image frame are stretched to obtain the second target image frame.
[0012] The first target image frame and the second target image frame are input into the trained stitching neural network to obtain the stitched image.
[0013] Furthermore, the preset constraints include: the seam between the first image frames at adjacent time points is within a first preset pixel range.
[0014] Furthermore, the preset constraints also include: in each first image frame, the seam between adjacent pixel rows is within a second preset pixel range.
[0015] Furthermore, the preset constraints also include: the maximum aberration between each first image frame is within a third preset pixel range.
[0016] Furthermore, the preset constraints also include: the first optimal seam and the second optimal seam are symmetrical from left to right.
[0017] Furthermore, the above-mentioned scaling processing of each pixel row in the cropped first image frame to obtain the first target image frame includes:
[0018] Calculate the average length of each pixel row in the first image frame after cropping;
[0019] Each pixel row is stretched to achieve an average length.
[0020] Furthermore, the method also includes:
[0021] Before scaling each pixel row, interpolation is performed on a preset number of pixels at the cropping edge.
[0022] Furthermore, the method also includes: performing frequency domain transformation on the first image frame and the second image frame before inputting the first image frame and the second image frame into the stitching optimization neural network;
[0023] After obtaining the stitched image, an inverse frequency domain transform is performed on the stitched image.
[0024] Furthermore, the frequency domain conversion is a one-dimensional frequency domain conversion based on the direction of the optical center connection between the first and second cameras.
[0025] Furthermore, the method also includes: normalizing the brightness and color of the first image frame and the second image frame before inputting the first image frame and the second image frame into the stitching optimization neural network;
[0026] After obtaining the stitched image, the brightness and color of the stitched image are normalized and restored.
[0027] Furthermore, the stitching optimization neural network is a 3- to 5-layer convolutional neural network or a Transformer neural network.
[0028] Furthermore, the concatenated neural network includes the same first feature extraction chain, second feature extraction chain, concat function, and output convolutional layer; the first feature extraction chain extracts the first feature map of the first target image frame;
[0029] The second feature extraction link extracts the second feature map of the second target image frame;
[0030] The concat function connects the first feature map and the second feature map to obtain a concatenated feature map;
[0031] The output convolutional layer reconstructs the stitched feature maps into a stitched image.
[0032] Furthermore, the first feature extraction chain includes a two-dimensional convolutional layer and a channel attention layer;
[0033] A two-dimensional convolutional layer extracts a shallow image of the first target image frame;
[0034] The channel attention layer performs max pooling and average pooling on the shallow image to obtain the first feature map.
[0035] Secondly, embodiments of this application provide an image stitching device based on a neural network, comprising:
[0036] The acquisition module is used to acquire the first video from the first camera and the second video from the second camera, and to process each first image frame in the first video and each second image frame in the second video to make them meet preset constraints.
[0037] The seam optimization module is used to acquire the first image frame and the second image frame at the same time, and input them into the trained seam optimization neural network to obtain the first optimal seam of the first image frame and the second optimal seam of the second image frame.
[0038] The first cropping module is used to crop the first image frame based on the first optimal stitching seam.
[0039] The first scaling module is used to scale each pixel row in the cropped first image frame to obtain the first target image frame.
[0040] The second cropping module crops the second image frame based on the second optimal stitching seam.
[0041] The second scaling module is used to scale each pixel row in the cropped second image frame to obtain the second target image frame.
[0042] The stitching module is used to input the first target image frame and the second target image frame into the trained stitching neural network to obtain the stitched image.
[0043] Furthermore, the first telescopic module includes:
[0044] The calculation unit is used to calculate the average length of each pixel row in the first image frame after cropping.
[0045] The scaling unit is used to scale each pixel row to achieve an average length.
[0046] Furthermore, the device also includes an interpolation module for interpolating a preset number of pixels at the cropping edge before scaling each pixel row.
[0047] Furthermore, the device also includes:
[0048] The frequency domain conversion module is used to perform frequency domain conversion on the first image frame and the second image frame before inputting them into the stitching optimization neural network.
[0049] The frequency domain inverse transform module is used to perform frequency domain inverse transform on the stitched image after it has been obtained.
[0050] Furthermore, the device also includes:
[0051] The normalization module is used to normalize the brightness and color of the first image frame and the second image frame before inputting them into the stitching optimization neural network.
[0052] The restoration module is used to normalize and restore the brightness and color of the stitched image after it has been obtained.
[0053] Thirdly, embodiments of this application provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it performs the steps of a neural network-based image stitching method as described in any of the above embodiments.
[0054] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of a neural network-based image stitching method as described in any of the above embodiments.
[0055] In summary, compared with the prior art, the beneficial effects of the technical solution provided in this application include at least the following:
[0056] This application provides an image stitching method based on a neural network. First, by pre-processing with preset constraints, the computational load of the neural network is reduced, indirectly reducing the processing time. Then, a trained stitching seam optimization neural network is used to obtain the optimal stitching seam between two image frames. After cropping and scaling, the images are stitched together using a trained stitching neural network. This method utilizes the powerful processing capabilities of neural networks to achieve real-time stitching processing of video image frames. Furthermore, when using neural network processing, the requirements for image frames are much smaller than those of the SIFT algorithm, making it suitable for monitoring applications in various scenarios. Attached Figure Description
[0057] Figure 1 A flowchart of a neural network-based image stitching method provided as an exemplary embodiment of this application.
[0058] Figure 2 This is a schematic diagram of image seam optimization and stretch stitching provided for an exemplary embodiment of this application.
[0059] Figure 3 This is a schematic diagram illustrating preset constraints provided for an exemplary embodiment of this application.
[0060] Figure 4 A structural diagram of a spliced neural network provided for an exemplary embodiment of this application.
[0061] Figure 5 This is a schematic diagram of the camera setup during training, provided as an exemplary embodiment of this application.
[0062] Figure 6 This is a structural diagram of a neural network-based image stitching device provided as an exemplary embodiment of this application. Detailed Implementation
[0063] The technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.
[0064] Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0065] Please see Figure 1 This application provides an image stitching method based on a neural network, comprising:
[0066] Step S1: Obtain the first video from the first camera and the second video from the second camera, and process each first image frame in the first video and each second image frame in the second video to satisfy preset constraints.
[0067] The first and second cameras should be in a parallel direction, and the line connecting the centers of the two cameras should be parallel to the long or short side of the captured video image, with a sufficient amount of overlap in the captured scene.
[0068] Step S2: Obtain the first image frame and the second image frame at the same time, and input them into the trained stitching optimization neural network to obtain the first optimal stitching of the first image frame and the second optimal stitching of the second image frame.
[0069] Step S3: Crop the first image frame based on the first optimal stitching seam.
[0070] Step S4: Scale each pixel row in the cropped first image frame to obtain the first target image frame.
[0071] Specifically, the average length of each pixel row in the first image frame after cropping is calculated, and each pixel row is stretched to reach the average length; the same stretching process is applied to the second image frame.
[0072] This application performs scaling on the entire pixel row, rather than just on a few pixels at the cropping edge, so that the first target image frame will not have obvious distortion at the stitching point, and the final stitched image effect will be more natural.
[0073] Furthermore, before the scaling process, a preset number of pixels on the cropping edge are interpolated to achieve a smooth transition at the splicing point, eliminate splicing marks, and improve the splicing effect.
[0074] Step S5: Crop the second image frame based on the second optimal stitching seam.
[0075] Step S6: Scale each pixel row in the cropped second image frame to obtain the second target image frame.
[0076] Please see Figure 2 Specifically, for the Nth row of the Tth frame image, it is divided into an image from camera 1 and an image from camera 2. The images are scanned within preset constraints, and the optimal seam within the preset constraints is selected. After obtaining the optimal seam, the two camera images are stretched and resized in a single row, and the processed images are then stitched together into a stitched image.
[0077] Step S7: Input the first target image frame and the second target image frame into the trained stitching neural network to obtain a stitched image. The stitched image is the view captured by a wide-angle camera positioned at or behind the center of the line connecting the two cameras.
[0078] The above embodiment provides a neural network-based image stitching method. First, by pre-processing with preset constraints, the computational load of the neural network is reduced, indirectly reducing the processing time. Then, the optimal stitching seam between two image frames is obtained through a trained stitching seam optimization neural network. After cropping and scaling, the images are stitched together using a trained stitching neural network. This method utilizes the powerful processing capabilities of neural networks to achieve real-time stitching processing of video image frames. Furthermore, when using neural network processing, the requirements for image frames are much smaller than those of the SIFT algorithm, making it suitable for monitoring applications in various scenarios.
[0079] To achieve real-time video stream processing on a chip, it is necessary to overcome the bottlenecks in chip computing power and data transmission. Furthermore, video processing also requires considering the continuity between consecutive frames. Overcoming these bottlenecks necessitates reducing computational load and the number of image rows involved in a single calculation. Considering the continuity between frames requires the use of gradual transitions between them. Additionally, depending on the camera's usage scenario, the specific range of pixels at the image stitching points can be limited.
[0080] Therefore, please see Figure 3 In some embodiments, the preset constraints may include:
[0081] 1. The seam between the first image frames at adjacent time points is within a first preset pixel range; similarly, the seam between the second image frames at adjacent time points should also be within a first preset pixel range; the first preset pixel range can be 1-5 pixels.
[0082] Since the image is displayed as video, the transition between frames should be minimal. Ideally, the seam between each frame and the previous frame should be limited to 1-5 pixels. The specific pixel range can be determined by the actual usage scenario and experience.
[0083] 2. In each first image frame, the seam between adjacent pixel rows is within a second preset pixel range; similarly, in each second image frame, the seam between adjacent pixel rows is also within a second preset pixel range. The second preset pixel range can be 0-1 pixels.
[0084] Specifically, in addition to the constraints between consecutive frames of the video, the stitching position of the image within a frame, that is, between the up and down lines, should also be constrained. The stitching positions of consecutive lines of an image usually do not differ much. For a complete object, the stitching position between the up and down lines can be considered to differ by 0-1 pixels. The edges of objects in the image are not subject to this constraint.
[0085] 3. The maximum aberration between each first image frame is within a third preset pixel range; similarly, the maximum aberration between each second image frame is also within a third preset pixel range. The third preset pixel range can be 10-100 pixels.
[0086] Since cameras typically have minimum and maximum operating distances, according to the aberration formula, the distance between the object and the camera determines the aberration. The minimum monitorable distance determines the maximum aberration, and the maximum monitorable distance determines the minimum aberration. At infinity, the aberration is zero. This application constrains the aberration of the two cameras to within a few dozen pixels.
[0087] 4. The first and second optimal seams are symmetrical from left to right.
[0088] Since the two cameras are positioned parallel to each other at the center, the images captured have a perspective relationship of objects appearing larger when closer and smaller when farther away. The objects at the stitching point are in the same position, so the aberrations present in the center pixels are consistent with those in the center pixels. Therefore, the images should be symmetrical from left to right.
[0089] The above embodiments utilize preset constraints such as inter-frame constraints, intra-frame interline constraints, symmetry constraints, and aberration constraints, which can greatly reduce the computational load of the neural network, thereby meeting the requirements for real-time stitching processing of camera video data.
[0090] In some embodiments, the method further includes performing frequency domain transformation on the first image frame and the second image frame before inputting them into the stitching optimization neural network. Specifically, frequency domain transformation methods such as Discrete Cosine Transform (DCT) and Fast Fourier Transform (FFT) can be used.
[0091] After obtaining the stitched image, it is necessary to perform an inverse frequency domain transform on the stitched image to obtain the stitched image in the spatial domain.
[0092] The above embodiments perform frequency domain transformation on the image before inputting it into the neural network, which makes it easier for the neural network to understand the frequency domain information of the image, and the resulting stitching effect is better than using spatial domain information as input.
[0093] In some embodiments, the frequency domain transformation can be a one-dimensional frequency domain transformation based on the direction of the optical center connection between the first camera and the second camera. Because in this application, the parallax of the two cameras, i.e., the pixel deviation, exists only in the direction of the optical center connection between the two cameras, image stitching and matching only needs to consider one-dimensional frequency domain information, i.e., the direction of the optical center connection. This can greatly reduce the computational load of the image in the frequency domain transformation; that is, a single one-dimensional frequency domain transformation is sufficient.
[0094] The principle of using one-dimensional image information also applies when no frequency domain transformation is performed, i.e., when spatial domain information is used directly as input and output data. In practical applications, whether spatial or frequency domain information is used for matching, only a one-dimensional matching relationship needs to be considered, thereby further reducing computational load and increasing stitching processing speed.
[0095] In some embodiments, the method further includes: normalizing the brightness and color of the first image frame and the second image frame before inputting the first image frame and the second image frame into the stitching optimization neural network.
[0096] After obtaining the stitched image, the brightness and color of the stitched image are normalized and restored.
[0097] Specifically, besides using one-dimensional processing to reduce computation, brightness and color correction can be performed on images acquired by cameras. In actual image acquisition, images from different cameras exhibit variations in brightness and color. Even with the same camera, differences in brightness and color can occur under algorithms such as Automatic Exposure (AE) and Automatic White Balance (AWB). In fine-grained processing like image stitching, these differences in brightness and color directly affect the final result. Therefore, it is necessary to correct the brightness and color of images during preprocessing. Normalization can typically be used to ensure consistent brightness and color for the same object across multiple images. After image stitching, the brightness and color of the stitched image are then restored.
[0098] The above embodiments utilize image brightness and color preprocessing to reduce the interference factors on the stitching effect caused by differences in camera exposure and white balance, thereby improving the stitching effect and further reducing the requirements for image frames, thus improving applicability.
[0099] In some embodiments, the stitching optimization neural network is a 3- to 5-layer convolutional neural network or a Transformer neural network.
[0100] Specifically, when selecting a suitable neural network, options include CNNs (convolutional neural networks) and transformers. Among these, CNNs such as ResNet and UNet can be considered. The advantages of CNNs are their full utilization of image domain information, relative maturity, and moderate computational cost. Transformers are superior to CNNs in both information extraction and adaptive capabilities; however, currently, transformers have high computational costs and relatively low processing speed on portable devices.
[0101] Using CNN-type neural networks is a good choice for currently available neural network chips. In the future, when more powerful and versatile chips are available, using image-based transformers will be a better choice.
[0102] Please see Figure 4 The concatenated neural network includes the same first feature extraction link, second feature extraction link, concat function and output convolutional layer; the first feature extraction link extracts the first feature map of the first target image frame.
[0103] The second feature extraction link extracts the second feature map of the second target image frame.
[0104] Specifically, the first feature extraction chain includes a two-dimensional convolutional layer and a channel attention layer. The two-dimensional convolutional layer extracts the shallow image of the first target image frame; the channel attention layer performs max pooling and average pooling on the shallow image to obtain the first feature map.
[0105] The extraction process of the second feature extraction link is the same as that of the first feature extraction link.
[0106] The concat function connects the first feature map and the second feature map to obtain a concatenated feature map.
[0107] The output convolutional layer reconstructs the stitched feature maps into a stitched image.
[0108] In the specific neural network training process, it is necessary to collect training input data and target data simultaneously as much as possible. The input data consists of images captured by multiple cameras, and the network's output data format is consistent with the target data format, being images captured by a wide-angle lens. To improve the stitching effect and achieve stitching processing of frequency domain image information, the training input data needs to undergo frequency domain transformation preprocessing. The target data is the data obtained after frequency domain transformation of the images captured by the wide-angle lens. The placement of the multiple cameras and the wide-angle camera is as follows... Figure 5 As shown.
[0109] Training input data is collected using multiple cameras, and the placement of these cameras should correspond to the relative positions of the cameras in the final product. Typically, the camera placement must meet the following constraints: ensuring sufficient overlap in the scenes captured by the two cameras, while also providing a sufficiently large field of view; ideally, 30 to 100 columns of pixels should overlap between the two cameras. To ensure the integrity of the stitched image, the two cameras should be in a (parallel) orientation, and the line connecting their centers should be parallel to either the long or short side of the captured image.
[0110] For the selection of training input data, it can be directly selected from real-world scenes, and should be as representative as possible, with objects and lighting consistent with the actual scene and a rich variety of types. Furthermore, if input and target data cannot be collected simultaneously, moving people or objects should be avoided in the collected images as much as possible. When capturing training data, all objects appearing in the target data should also appear in the input data; the network should not be forced to learn objects that are not present in the input data.
[0111] In addition, the input data for training can also be processed according to preset constraints, which can make the training speed of the neural network faster, but the fault tolerance will be reduced. Therefore, the choice can be made according to the actual needs in specific applications.
[0112] The target data for the training process is collected using a wide-angle camera. The wide-angle camera should be able to clearly capture the details at the line connecting the two cameras, so that the algorithm can learn how to generate images well.
[0113] The wide-angle camera should be positioned at the center of the line connecting the two cameras or slightly behind it, capturing the seam between the two cameras in the scene as clearly as possible. The seam area, or core area, is the overlapping region captured by both cameras, specifically the area to the right of the left camera or the area to the left of the right camera. This area is limited not only by the field of view but also by the actual shooting distance, i.e., the overlapping area within the actual scene shooting distance. Ideally, the wide-angle camera acquiring target data could cover all the field of view of multiple cameras. However, this is an ideal scenario; in real-world products, it's difficult to achieve single-camera coverage of multiple cameras while maintaining image clarity. Typically, achieving a large field of view sacrifices edge sharpness; for example, a fisheye lens with a near 180-degree field of view can cover two cameras, but edge sharpness is poor. Therefore, when implementing this algorithm, the focus is more on learning the core area and image generation.
[0114] When designing the details of a neural network, spatial domain information, i.e., the original image information, can be directly used as the input and output of the neural network. Considering that the frequency domain information of the images can be compared during actual stitching, the frequency domain information of the images can be used when inputting data into the neural network and when comparing the output data with the target data. That is, the input image is first transformed in the frequency domain before being input into the neural network. Similarly, when the neural network outputs the results, it can be compared with the frequency domain information of the target image to obtain the loss function for training the neural network. Likewise, when performing frequency domain transformation, one-dimensional frequency domain transformation can be used for optical constraints, and the brightness and color of the input image can be normalized. The training process and the practical application process correspond to each other.
[0115] Training and testing of neural networks can be completed entirely on a computer, server, or even in the cloud. During training and testing, the same computational and storage precision, data compression storage, and transmission methods as the final deployed chip should be used as possible. This ensures that the network on the server side is as close as possible to the network actually running on the chip, minimizing accuracy loss.
[0116] During neural network training, the output data format is exactly the same as the target data format, meaning the resolution is identical to the target image. The quality of the output image is directly compared to a wide-angle photograph. The comparison method can employ common techniques for image reconstruction neural networks: pixel-by-pixel comparison, using PSNR, SSIM, or similar evaluation methods to calculate the output's quality and determine if training is complete. If frequency domain transformation preprocessing was used, the output can be inversely transformed and then compared to the target image.
[0117] When adjusting network structures and deploying neural networks on a processing chip, the structure of the neural network can be fine-tuned on the chip. For example, a single operator on a server can be split into multiple operators or multiple operators can be merged into a single operator. These adjustments usually do not have a significant impact on the calculation results, but they can slightly change the accuracy of the calculation results.
[0118] Please see Figure 6 Another embodiment of this application provides an image stitching device based on a neural network, comprising:
[0119] The acquisition module 101 is used to acquire the first video from the first camera and the second video from the second camera, and to process each first image frame in the first video and each second image frame in the second video to make them meet preset constraints.
[0120] The stitching optimization module 102 is used to acquire the first image frame and the second image frame at the same time, and input them into the trained stitching optimization neural network to obtain the first optimal stitching of the first image frame and the second optimal stitching of the second image frame.
[0121] The first cropping module 103 is used to crop the first image frame based on the first optimal stitching seam.
[0122] The first scaling module 104 is used to scale each pixel row in the cropped first image frame to obtain the first target image frame.
[0123] The second cropping module 105 crops the second image frame based on the second optimal stitching seam.
[0124] The second scaling module 106 is used to scale each pixel row in the cropped second image frame to obtain the second target image frame.
[0125] The stitching module 107 is used to input the first target image frame and the second target image frame into the trained stitching neural network to obtain the stitched image.
[0126] Furthermore, the first telescopic module includes:
[0127] The calculation unit is used to calculate the average length of each pixel row in the first image frame after cropping.
[0128] The scaling unit is used to scale each pixel row to achieve an average length.
[0129] Furthermore, the device also includes an interpolation module for interpolating a preset number of pixels at the cropping edge before scaling each pixel row.
[0130] Furthermore, the device also includes:
[0131] The frequency domain conversion module is used to perform frequency domain conversion on the first image frame and the second image frame before inputting them into the stitching optimization neural network.
[0132] The frequency domain inverse transform module is used to perform frequency domain inverse transform on the stitched image after it has been obtained.
[0133] Furthermore, the device also includes:
[0134] The normalization module is used to normalize the brightness and color of the first and second image frames before inputting them into the stitching optimization neural network.
[0135] The restoration module is used to normalize and restore the brightness and color of the stitched image after it has been obtained.
[0136] The specific limitations of the neural network-based image stitching device provided in this embodiment can be found in the embodiment of the neural network-based image stitching method described above, and will not be repeated here. Each module in the aforementioned neural network-based image stitching device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0137] This application provides a computer device that may include a processor, memory, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface communicates with external terminals via a network connection. When the computer program is executed by the processor, it causes the processor to perform the steps of a neural network-based image stitching method as described in any of the above embodiments.
[0138] The working process, working details, and technical effects of the computer device provided in this embodiment can be found in the embodiment of an image stitching method based on a neural network described above, and will not be repeated here.
[0139] This application provides a computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements the steps of a neural network-based image stitching method as described in any of the above embodiments. The computer-readable storage medium refers to a data storage carrier, which may include, but is not limited to, floppy disks, optical disks, hard disks, flash memory, USB flash drives, and / or Memory Sticks. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The working process, details, and technical effects of the computer-readable storage medium provided in this embodiment can be found in the embodiments of a neural network-based image stitching method described above, and will not be repeated here.
[0140] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and RAMbus dynamic RAM (RDRAM).
[0141] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0142] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. An image stitching method based on neural networks, characterized in that, include: Acquire a first video from a first camera and a second video from a second camera, and process each first image frame in the first video and each second image frame in the second video to satisfy preset constraints. The first image frame and the second image frame at the same time are acquired and input into the trained stitching optimization neural network to obtain the first optimal stitching of the first image frame and the second optimal stitching of the second image frame. The first image frame is cropped based on the first optimal stitching seam. The first target image frame is obtained by scaling each pixel row in the cropped first image frame. The second image frame is cropped based on the second optimal stitching seam. The pixel rows in the cropped second image frame are stretched to obtain the second target image frame. The first target image frame and the second target image frame are input into the trained stitching neural network to obtain a stitched image.
2. The image stitching method based on neural networks according to claim 1, characterized in that, The preset constraints include: the seam between the first image frames at adjacent times is within a first preset pixel range.
3. The image stitching method based on neural networks according to claim 2, characterized in that, The preset constraint also includes: in each of the first image frames, the seam between adjacent pixel rows is within a second preset pixel range.
4. The image stitching method based on neural networks according to claim 3, characterized in that, The preset constraint also includes: the maximum aberration between each of the first image frames is within a third preset pixel range.
5. The image stitching method based on neural networks according to claim 4, characterized in that, The preset constraints also include: the first optimal seam and the second optimal seam are symmetrical from left to right.
6. The image stitching method based on neural networks according to claim 1, characterized in that, The step of scaling each pixel row in the cropped first image frame to obtain the first target image frame includes: Calculate the average length of each pixel row in the first image frame after cropping; Each of the pixel rows is stretched to achieve the average length.
7. The image stitching method based on neural networks according to claim 1, characterized in that, Also includes: Before scaling each pixel row, a preset number of pixels at the cropping edge are interpolated.
8. The image stitching method based on neural networks according to claim 1, characterized in that, Also includes: Before inputting the first image frame and the second image frame into the stitching optimization neural network, the first image frame and the second image frame are subjected to frequency domain transformation; After obtaining the stitched image, an inverse frequency domain transform is performed on the stitched image.
9. The image stitching method based on neural networks according to claim 8, characterized in that, The frequency domain conversion is a one-dimensional frequency domain conversion based on the direction of the optical center line connecting the first camera and the second camera.
10. The image stitching method based on neural networks according to claim 1, characterized in that, Also includes: Before inputting the first image frame and the second image frame into the stitching optimization neural network, the brightness and color of the first image frame and the second image frame are normalized. After obtaining the stitched image, the brightness and color of the stitched image are normalized and restored.
11. The image stitching method based on neural networks according to claim 1, characterized in that, The stitching optimization neural network is a 3- to 5-layer convolutional neural network or a Transformer neural network.
12. The image stitching method based on neural networks according to claim 1, characterized in that, The concatenated neural network includes the same first feature extraction chain, second feature extraction chain, concat function, and output convolutional layer; The first feature extraction link extracts the first feature map of the first target image frame; The second feature extraction link extracts the second feature map of the second target image frame; The concat function connects the first feature map and the second feature map to obtain a concatenated feature map; The output convolutional layer reconstructs the stitched feature map into the stitched image.
13. The image stitching method based on neural networks according to claim 12, characterized in that, The first feature extraction chain includes a two-dimensional convolutional layer and a channel attention layer; The two-dimensional convolutional layer extracts the shallow image of the first target image frame; The channel attention layer performs max pooling and average pooling on the shallow image to obtain the first feature map.
14. An image stitching device based on a neural network, characterized in that, include: The acquisition module is used to acquire the first video from the first camera and the second video from the second camera, and to process each first image frame in the first video and each second image frame in the second video to make them meet preset constraints. The seam optimization module is used to acquire the first image frame and the second image frame at the same time, and input them into the trained seam optimization neural network to obtain the first optimal seam of the first image frame and the second optimal seam of the second image frame. The first cropping module is used to crop the first image frame based on the first optimal stitching seam. The first scaling module is used to scale each pixel row in the cropped first image frame to obtain the first target image frame. The second cropping module crops the second image frame based on the second optimal stitching seam; The second scaling module is used to scale each pixel row in the cropped second image frame to obtain the second target image frame. The stitching module is used to input the first target image frame and the second target image frame into a trained stitching neural network to obtain a stitched image.
15. The image stitching device based on a neural network according to claim 14, characterized in that, The first telescopic module includes: A calculation unit is used to calculate the average length of each pixel row in the first image frame after cropping; The stretching unit is used to stretch each of the pixel rows to achieve the average length.
16. The image stitching device based on a neural network according to claim 14, characterized in that, It also includes an interpolation module, which is used to interpolate a preset number of pixels at the cropping edge before scaling each of the pixel rows.
17. The image stitching device based on a neural network according to claim 14, characterized in that, Also includes: The frequency domain conversion module is used to perform frequency domain conversion on the first image frame and the second image frame before inputting the first image frame and the second image frame into the stitching optimization neural network; The frequency domain inverse transform module is used to perform a frequency domain inverse transform on the stitched image after obtaining the stitched image.
18. The image stitching device based on a neural network according to claim 14, characterized in that, Also includes: The normalization module is used to normalize the brightness and color of the first image frame and the second image frame before inputting the first image frame and the second image frame into the stitching optimization neural network. The restoration module is used to normalize and restore the brightness and color of the stitched image after obtaining the stitched image.
19. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the neural network-based image stitching method as described in any one of claims 1 to 13.
20. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the neural network-based image stitching method as described in any one of claims 1 to 13.