Multi-view three-dimensional image reconstruction method, device, equipment, medium and program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By separating and filtering the target and background regions of multi-view images, feature points are extracted for point cloud reconstruction, solving the problem of limited model quality in existing technologies and achieving high-quality, low-cost 3D model generation.

CN116523947BActive Publication Date: 2026-06-12INDUSTRIAL AND COMMERCIAL BANK OF CHINA

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date: 2023-03-13
Publication Date: 2026-06-12

Application Information

Patent Timeline

13 Mar 2023

Application

12 Jun 2026

Publication

CN116523947B

IPC: G06T7/194; G06T7/136; G06T17/20

CPC: G06T7/194; G06T7/136; G06T17/20; G06T2207/10028; G06T2207/20021

AI Tagging

Application Domain

Image enhancement Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing multi-view 3D reconstruction technologies, the integrity and accuracy of the reconstructed 3D point cloud model are limited by the quality of multi-view image data. Furthermore, images taken in ordinary environments have complex textures, which affects the model quality and reduces its applicability and practicality.

⚗Method used

By separating the target area and background area of multi-view images, feature points are extracted, and feature points are matched to reconstruct sparse and dense point clouds. A dual filtering logic is then used to remove the influence of the background area, ultimately generating a high-quality 3D model.

🎯Benefits of technology

It enables the generation of high-quality 3D models in ordinary environments, reduces usage costs, decreases computational overhead, and eliminates visual noise points to ensure model clarity.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116523947B_ABST

Patent Text Reader

Abstract

The disclosure provides a multi-view three-dimensional image reconstruction method, which can be applied to the field of artificial intelligence technology. The multi-view three-dimensional image reconstruction method comprises: separating a target region and a background region in a plurality of multi-view pictures collected; preprocessing the target region in the separated multi-view picture to obtain feature points; matching the feature points of the plurality of multi-view pictures to obtain matched feature points; performing sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain a dense point cloud; filtering the dense point cloud according to a first preset filtering logic to obtain a first dense point cloud; filtering the first dense point cloud according to a second preset filtering logic to obtain a second dense point cloud; and generating a three-dimensional model based on the second dense point cloud. The disclosure also provides a multi-view three-dimensional image reconstruction device, equipment, storage medium and program product.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence, and specifically to a method, apparatus, device, medium, and program product for multi-view 3D image reconstruction. Background Technology

[0002] With the continuous development of computer theory, technology, and applications, the computational power of computers for image processing has been greatly enhanced. This forms the foundation for a more comprehensive understanding of image content. Currently, 3D reconstruction has a solid theoretical basis, and reconstructing point cloud models from multi-view images has become one of the mainstream algorithms for 3D reconstruction.

[0003] Despite significant advancements in 3D reconstruction technology, the following challenges remain regardless of the method employed: the integrity and accuracy of the reconstructed 3D point cloud model are limited by the quality of the multi-view image data. Currently, commonly used multi-view datasets are mostly open-source image sets from the internet. These datasets require acquisition under specific environmental conditions, making reconstruction using this method costly and limiting its widespread application. Furthermore, multi-view images captured in ordinary environments often exhibit complex textures, affecting the quality of the reconstructed model and significantly reducing the applicability and practicality of multi-view 3D reconstruction algorithms. Summary of the Invention

[0004] In view of the above problems, this disclosure provides a method, apparatus, device, medium and program product for multi-view three-dimensional image reconstruction to improve the quality of the reconstructed model.

[0005] According to a first aspect of this disclosure, a multi-view 3D image reconstruction method is provided, comprising: separating a target region and a background region in multiple acquired multi-view images; preprocessing the target region in the separated multi-view images to obtain feature points; matching the feature points of the multiple multi-view images to obtain matched feature points; performing sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain a dense point cloud; filtering the dense point cloud according to a first preset filtering logic to obtain a first dense point cloud, wherein the first preset filtering logic is a filtering logic with visual constraints; filtering the first dense point cloud according to a second preset filtering logic to obtain a second dense point cloud, wherein the second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region; and generating a 3D model based on the second dense point cloud.

[0006] According to embodiments of this disclosure, separating the target region and background region in multiple acquired multi-view images includes: extracting a target mask from the multi-view images based on a preset neural network model; and segmenting the target region and background region of the multi-view images based on the target mask.

[0007] According to an embodiment of this disclosure, the step of segmenting the target region and background region of the multi-view image based on the target mask includes: performing a closing operation logic on the target mask to obtain a final mask, wherein the closing operation is a dilation followed by an erosion operation; selecting the final mask region as the target region; and selecting the area outside the final mask region as the background region.

[0008] According to an embodiment of this disclosure, after separating the target region and the background region in the multiple multi-view images, the method further includes: clustering the pixels in the target region based on RGB values to obtain multiple color clusters; and selecting the RGB values of outlier clusters among the multiple color clusters as the RGB values of the background region.

[0009] According to an embodiment of this disclosure, before separating the target area and background area in the multiple multi-view images, the method further includes: determining whether the acquired multi-view images are of the same size; and if the multi-view images are of different sizes, cropping the multi-view images to a uniform size.

[0010] According to an embodiment of this disclosure, the preprocessing of the target region in the segmented multi-view image to obtain feature points includes: calculating the values of pixel points within a preset pixel region based on the difference of Gaussians to obtain multiple pixel point values; and selecting the maximum and minimum value points among the multiple pixel point values as the feature points.

[0011] According to an embodiment of this disclosure, the step of matching feature points of the multiple multi-view images to obtain matching feature points includes: calculating the gradient and direction in the neighborhood of the feature point; calculating the distance between feature points in different multi-view images based on the gradient and the direction; and determining the feature point as the matching feature point if the distance between the feature points is less than a preset distance threshold.

[0012] According to embodiments of this disclosure, the step of performing sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain the dense point cloud includes: calculating the relative positional relationship of the cameras based on the pairwise matched feature points; generating a projection relationship based on the relative positional relationship of the cameras; generating a sparse point cloud based on the projection relationship; and expanding the sparse point cloud by estimating a depth map to generate a dense point cloud.

[0013] According to embodiments of this disclosure, the projection relationship includes at least a viewable image and a set of viewable images, and the first preset filtering logic includes: filtering the corresponding dense point cloud when the center of the dense point cloud is not on the surface of the target object; and / or filtering the corresponding dense point cloud when the number of the set of visualized images of the dense point cloud does not reach a preset atlas threshold.

[0014] According to an embodiment of this disclosure, the second preset filtering logic includes: clustering the first dense point cloud based on RGB values to obtain multiple point cloud clusters; calculating point clouds in the multiple point cloud clusters whose RGB values are less than a preset threshold compared to the RGB values of the background region, and using them as background point clouds; and filtering the background point clouds in the first dense point cloud to obtain the second dense point cloud.

[0015] A second aspect of this disclosure provides a multi-view 3D image reconstruction apparatus, comprising: a background segmentation module for segmenting a target region and a background region in multiple acquired multi-view images; a preprocessing module for preprocessing the target region in the segmented multi-view images to obtain feature points; a feature point matching module for matching feature points in the multiple multi-view images to obtain matched feature points; a point cloud reconstruction module for performing sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain a dense point cloud; a first filtering module for filtering the dense point cloud according to a first preset filtering logic to obtain a first dense point cloud, wherein the first preset filtering logic is a visually constrained filtering logic; a second filtering module for filtering the first dense point cloud according to a second preset filtering logic to obtain a second dense point cloud, wherein the second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region; and a 3D model generation module for generating a 3D model based on the second dense point cloud.

[0016] According to an embodiment of this disclosure, the background segmentation module is used to extract a target mask of the multi-view image based on a preset neural network model; and to segment the target region and background region of the multi-view image based on the target mask.

[0017] According to an embodiment of this disclosure, the background segmentation module is configured to perform a closing operation logic on the target mask to obtain a final mask, wherein the closing operation is dilation followed by erosion; select the final mask region as the target region; and select the area outside the final mask region as the background region.

[0018] According to an embodiment of this disclosure, the device further includes an RGB setting module, configured to cluster pixels in the target area based on RGB values to obtain multiple color clusters; and to select the RGB values of outlier clusters among the multiple color clusters as the RGB values of the background area.

[0019] According to an embodiment of this disclosure, the apparatus further includes: an image cropping module, configured to determine whether the acquired multi-view images are of the same size; and to crop the multi-view images to a uniform size if the multi-view images are of different sizes.

[0020] According to an embodiment of this disclosure, the preprocessing module is used to calculate the values of pixels within a preset pixel region based on the difference of Gaussians, thereby obtaining multiple pixel values; and to select the maximum and minimum values among the multiple pixel values as the feature points.

[0021] According to an embodiment of this disclosure, the feature point matching module is configured to calculate the gradient and direction in the neighborhood of the feature point; calculate the distance between feature points in different multi-view images based on the gradient and the direction; and determine the feature point as the matching feature point if the distance between the feature points is less than a preset distance threshold.

[0022] According to an embodiment of this disclosure, the point cloud reconstruction module is configured to calculate the relative positional relationship of the cameras based on the pairwise matching feature points; generate a projection relationship based on the relative positional relationship of the cameras; generate a sparse point cloud based on the projection relationship; and expand the sparse point cloud by estimating a depth map to generate a dense point cloud.

[0023] According to embodiments of this disclosure, the projection relationship includes at least a view image and a set of view images. The first filtering module is used to filter the corresponding dense point cloud when the center of the dense point cloud is not on the surface of the target object; and / or to filter the corresponding dense point cloud when the number of the set of visualized images of the dense point cloud does not reach a preset atlas threshold.

[0024] According to an embodiment of this disclosure, the second module is configured to cluster the first dense point cloud based on RGB values to obtain multiple point cloud clusters; calculate the point clouds in the multiple point cloud clusters whose RGB values are less than a preset threshold compared to the RGB values of the background region, and use them as background point clouds; and filter the background point clouds in the first dense point cloud to obtain the second dense point cloud.

[0025] A third aspect of this disclosure provides an electronic device comprising: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors perform the above-described multi-view three-dimensional image reconstruction method.

[0026] A fourth aspect of this disclosure also provides a computer-readable storage medium having executable instructions stored thereon, which, when executed by a processor, cause the processor to perform the above-described multi-view three-dimensional image reconstruction method.

[0027] A fifth aspect of this disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described multi-view three-dimensional image reconstruction method.

[0028] In the embodiments disclosed herein, the following beneficial effects can be achieved: 1. High-quality physical models can be generated using only the acquired multi-view images, resulting in low usage costs and facilitating promotion; 2. Separating the image from the background ensures that only the image in the target area is calculated during subsequent point cloud computing, reducing computational overhead and freeing up computing power; 3. By performing dual filtering on dense point clouds, visual noise points in the final product can be eliminated, ensuring that the final 3D model is clear. Attached Figure Description

[0029] The foregoing contents, as well as other objects, features, and advantages of this disclosure, will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:

[0030] Figure 1 This illustration schematically depicts an application scenario of the multi-view three-dimensional image reconstruction method according to embodiments of the present disclosure.

[0031] Figure 2 A flowchart illustrating a multi-view three-dimensional image reconstruction method according to an embodiment of the present disclosure is shown schematically.

[0032] Figure 3 A flowchart illustrating a background segmentation method according to an embodiment of the present disclosure is shown schematically.

[0033] Figure 4 A flowchart illustrating a target region determination method according to an embodiment of the present disclosure is shown schematically.

[0034] Figure 5 A flowchart illustrating a method for setting RGB values of a background area according to an embodiment of the present disclosure is shown schematically.

[0035] Figure 6 A flowchart illustrating a preprocessing method according to an embodiment of the present disclosure is shown schematically.

[0036] Figure 7 A flowchart illustrating a feature point matching method according to an embodiment of the present disclosure is shown schematically.

[0037] Figure 8 A flowchart illustrating a point cloud reconstruction method according to an embodiment of the present disclosure is shown schematically.

[0038] Figure 9 A flowchart illustrating a first filtering method according to an embodiment of the present disclosure is shown schematically;

[0039] Figure 10 A flowchart illustrating a second filtering method according to an embodiment of the present disclosure is shown schematically.

[0040] Figure 11This schematically illustrates the entire process of multi-view 3D image reconstruction according to embodiments of the present disclosure;

[0041] Figure 12 A schematic diagram illustrating the structure of a multi-view three-dimensional image reconstruction apparatus according to embodiments of the present disclosure is shown; and

[0042] Figure 13 A block diagram schematically illustrates an electronic device suitable for implementing a multi-view three-dimensional image reconstruction method according to an embodiment of the present disclosure. Detailed Implementation

[0043] The embodiments of the present disclosure will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the disclosure. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of the present disclosure for ease of explanation. However, it will be apparent that one or more embodiments may be practiced without these specific details. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concepts of the present disclosure.

[0044] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.

[0045] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.

[0046] When using expressions such as "at least one of A, B, and C", they should generally be interpreted in accordance with the meaning that is commonly understood by a person skilled in the art (e.g., "a system having at least one of A, B, and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B, and C, etc.).

[0047] Before providing a detailed description of the solutions in the embodiments of this disclosure, the key technical terms involved in the embodiments of this disclosure will be explained in detail:

[0048] Point cloud: A massive collection of points representing the surface characteristics of a target, resembling a nebula.

[0049] To address the technical problems existing in the prior art, embodiments of this disclosure provide a multi-view 3D image reconstruction method, comprising: separating a target region and a background region in multiple acquired multi-view images; preprocessing the target region in the separated multi-view images to obtain feature points; matching the feature points of the multiple multi-view images to obtain matched feature points; performing sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain a dense point cloud; filtering the dense point cloud according to a first preset filtering logic to obtain a first dense point cloud, wherein the first preset filtering logic is a filtering logic with visual constraints; filtering the first dense point cloud according to a second preset filtering logic to obtain a second dense point cloud, wherein the second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region; and generating a 3D model based on the second dense point cloud.

[0050] In the embodiments disclosed herein, the following beneficial effects can be achieved: 1. High-quality physical models can be generated using only the acquired multi-view images, resulting in low usage costs and facilitating promotion; 2. Separating the image from the background ensures that only the image in the target area is calculated during subsequent point cloud computing, reducing computational overhead and freeing up computing power; 3. By performing dual filtering on dense point clouds, visual noise points in the final product can be eliminated, ensuring that the final 3D model is clear.

[0051] Figure 1 The illustration shows an application scenario of the multi-view three-dimensional image reconstruction method according to an embodiment of the present disclosure.

[0052] like Figure 1 As shown, application scenario 100 according to this embodiment may include terminal devices 101, 102, and 103. Network 104 is used as a medium to provide a communication link between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables, etc.

[0053] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social media platform software, etc. (for example only).

[0054] Terminal devices 101, 102, and 103 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, and desktop computers.

[0055] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using terminal devices 101, 102, and 103 (for example only). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.

[0056] It should be noted that the multi-view 3D image reconstruction method provided in this disclosure embodiment can generally be executed by terminal devices 101, 102, 103 or server 105. Correspondingly, the multi-view 3D image reconstruction apparatus provided in this disclosure embodiment can generally be located in terminal devices 101, 102, 103 or server 105. Regarding server 105, the multi-view 3D image reconstruction method provided in this disclosure embodiment can also be executed by a server or server cluster that is different from server 105 but capable of communicating with server 105. Correspondingly, the multi-view 3D image reconstruction apparatus provided in this disclosure embodiment can also be located in a server or server cluster that is different from server 105 but capable of communicating with server 105.

[0057] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0058] The following will be based on Figure 1 The described scene, through Figures 2 to 11 The multi-view three-dimensional image reconstruction method of the disclosed embodiments is described in detail.

[0059] Figure 2 A flowchart illustrating a multi-view three-dimensional image reconstruction method according to an embodiment of the present disclosure is shown schematically.

[0060] like Figure 2 As shown, the multi-view three-dimensional image reconstruction method of this embodiment includes operations S210 to S260, and the multi-view three-dimensional image reconstruction method can be executed by terminal devices 101, 102, 103 or server 105.

[0061] In operation S210, the target area and background area are separated in multiple multi-view images.

[0062] The essence of multi-view 3D reconstruction is the inverse process of generating digital images. In today's world where computers are ubiquitous, 3D structural reconstruction techniques for target objects or scenes are increasingly used, mostly in situations requiring the preservation of target objects or scenes or the identification of scenes. Major application areas include: visual navigation in artificial intelligence, reconstruction of biological limbs and organs, and 3D description of important cultural relics and buildings.

[0063] The aforementioned multi-view images were obtained by pre-capturing the target object from multiple perspectives. In general, this multi-view 3D reconstruction method consists of a background segmentation stage and a point cloud reconstruction stage. In the background segmentation stage, a pre-trained machine learning model can be used to extract the target or background region from each multi-view image, thus segmenting the target and background regions within the image.

[0064] According to an embodiment of this disclosure, before separating the target area and background area in the multiple multi-view images, the method further includes: determining whether the acquired multi-view images are of the same size; and if the multi-view images are of different sizes, cropping the multi-view images to a uniform size.

[0065] Of course, if the input multi-view images are of different sizes, they will all be cropped to the same size.

[0066] In operation S220, the target region in the segmented multi-view image is preprocessed to obtain feature points.

[0067] The preprocessing involves extracting feature points from each pixel in the target region. Each pixel can also be a preset set of pixels, i.e., pixels within a certain region are used as feature points.

[0068] In operation S230, feature points of the multiple multi-view images are matched to obtain the matched feature points.

[0069] In operation S240, sparse point cloud reconstruction and dense point cloud reconstruction are performed based on the matched feature points to obtain a dense point cloud.

[0070] Specifically, operations S220 to S240 involve calculating feature points of the input multi-view images, performing feature point matching on the multi-view images based on these feature points, calculating the relative relationship between the matched feature points of two images, and generating a sparse 3D point cloud through projection relationships. Furthermore, the sparse 3D point cloud is expanded by generating a dense point cloud from each feature point cloud based on the non-feature point cloud generated during the image matching process.

[0071] In operation S250, the dense point cloud is filtered according to the first preset filtering logic to obtain the first dense point cloud. The first preset filtering logic is a filtering logic for visualization constraints.

[0072] Specifically, N dense point clouds are filtered to obtain M first dense point clouds, where N is greater than or equal to M. The visualization constraint determines whether the first dense point cloud meets the conditions for forming the final imaging model; if the first dense point cloud does not meet the visualization constraint, it is discarded.

[0073] In operation S260, the first dense point cloud is filtered according to the second preset filtering logic to obtain the second dense point cloud. The second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region.

[0074] Specifically, the M first dense point clouds are filtered to obtain L second dense point clouds, where M is greater than or equal to L. The regularized filtering logic that affects the background region is due to the fact that after segmenting the background region and the target region (through machine learning models or other methods), there may still be certain background elements in the target region (i.e., the RGB values are largely different from the RGB values of the pixels in the target region). These background elements will lead to deviations in the final image quality and an increase in noise points.

[0075] In operation S270, a three-dimensional model is generated based on the second dense point cloud.

[0076] Specifically, using the second dense point cloud as a fixed point, a complete 3D model of the target is obtained by Poisson reconstruction.

[0077] In the embodiments disclosed herein, the following beneficial effects can be achieved: 1. High-quality physical models can be generated using only the acquired multi-view images, resulting in low usage costs and facilitating promotion; 2. Separating the image from the background ensures that only the image in the target area is calculated during subsequent point cloud computing, reducing computational overhead and freeing up computing power; 3. By performing dual filtering on dense point clouds, visual noise points in the final product can be eliminated, ensuring that the final 3D model is clear.

[0078] Figure 3 A flowchart illustrating a background segmentation method according to an embodiment of the present disclosure is shown schematically.

[0079] like Figure 3 As shown, the background segmentation method of this embodiment includes operations S310 to S320, which can at least partially perform the above-mentioned operation S210.

[0080] In operation S310, the target mask of the multi-view image is extracted based on a preset neural network model.

[0081] Specifically, a pre-trained U2Net neural network model can be used to segment each multi-view image, obtaining a segmentation mask. The U2Net neural network model consists of a six-level encoder and a five-level decoder, extracting multi-layer features of the image. The loss function of the U2Net neural network model is shown below:

[0082]

[0083] in, The losses are the outputs of the 5 decoders and the output of the 1 saliency map fusion model, respectively. It is the loss of the saliency map output by the last layer of fusion. and ω fuse These are the weights for each loss term. Each loss term is calculated using standard binary cross-entropy, and the formula for each loss term is shown below:

[0084]

[0085] Where (r, c) are pixel coordinates, and (H, W) are the height and width of the image. G(r，c) and p S(r，c) These represent the ground truth pixel values and the predicted saliency probability maps, respectively. The entire loss is iteratively minimized during training. The output results are shown below. As a background segmentation result mask.

[0086] In operation S320, the target region and background region of the multi-view image are segmented based on the target mask.

[0087] Specifically, the target region and background region of each multi-view image can be segmented using different "background segmentation result" masks mentioned above.

[0088] In the embodiments of this disclosure, a pre-defined neural network model can be used to efficiently extract the target mask initially, and the target region and background region can be initially separated by the mask. Of course, the obtained mask can be further processed to make the segmentation result more accurate.

[0089] Figure 4 A flowchart illustrating a target region determination method according to an embodiment of the present disclosure is shown schematically.

[0090] like Figure 4 As shown, the target region determination method of this embodiment includes operations S410 to S430, which can at least partially perform the above-mentioned operation S320.

[0091] In operation S410, a closing operation logic is performed on the target mask to obtain the final mask. The closing operation is an expansion followed by erosion.

[0092] In operation S420, the final mask area is selected as the target area.

[0093] In operation S430, the area outside the final mask area is selected as the background area.

[0094] Specifically, morphological operations are performed on the segmentation result "background segmentation result" in the above operation S310. This can be achieved by using a closing operation followed by dilation and then erosion. For example, first, a circular convolution kernel with a diameter of r pixels is used to dilate the mask, and then a circular convolution kernel with a diameter of r pixels is used to erode the mask. The value of r is calculated from the input image size. The formula for calculating the diameter is as follows:

[0095]

[0096] Where h is the image height, w is the image width, and F is the floor function.

[0097] In the embodiments of this disclosure, by performing a closing operation on the target mask to obtain the final mask, it is equivalent to correcting the mask again, and the outline of the mask can also be guaranteed to be complete. Under this corrected mask, the above-mentioned segmentation operation is then performed.

[0098] Figure 5 A flowchart illustrating a method for setting RGB values of a background area according to an embodiment of the present disclosure is shown schematically.

[0099] like Figure 5 As shown, the method for setting the RGB value of the background area in this embodiment includes operations S510 to S520, which are performed after the above-mentioned operation S430.

[0100] In operation S510, the pixels in the target area are clustered based on RGB values to obtain multiple color clusters.

[0101] In operation S520, the RGB value of the outlier cluster among the multiple color clusters is selected as the RGB value of the background area.

[0102] Specifically, K-means clustering (or other clustering models) is performed on the RGB values of all pixels within the target region to extract the primary color of the target. The number of clusters, for example, is set to k = 8. Then, the SLSQP least squares optimization algorithm is used to calculate the RGB values (r, g, b) in the RGB space that are furthest from the k cluster centers (or a predetermined number of cluster centers from farthest to closest) as the background color of the target. The objective function for optimization is:

[0103]

[0104] 0≤r≤255

[0105] 0≤g≤255

[0106] 0≤b≤255 Equation (4)

[0107] Among them, (r i g i b i ) represents the RGB value of the i-th cluster center, ρ i This represents the percentage of the i-th class.

[0108] Furthermore, the RGB values of the background area are set to (r i g i b i ).

[0109] In the embodiments of this disclosure, setting the RGB value of the background area to a single and outlier value can ensure that the target area is highlighted in the subsequent image.

[0110] Figure 6 A flowchart illustrating a preprocessing method according to an embodiment of the present disclosure is shown schematically.

[0111] like Figure 6 As shown, the preprocessing method of this embodiment includes operations S610 to S620, and operations S610 to S620 can at least perform the above-mentioned operation S220.

[0112] In operation S610, the values of pixels within a preset pixel region are calculated based on the difference of Gaussians, resulting in multiple pixel values.

[0113] Specifically, the image object for Gaussian difference calculation can be an image segmented after processing by the above operations S310-320, or an image segmented after processing by the above operations S510-S520 with a single background color. Of course, it can also be the original image without segmentation processing.

[0114] In operation S620, the maximum and minimum value points among the plurality of pixel values are selected as the feature points.

[0115] Specifically, the original image (taking the original image as an example) is converted to grayscale, and feature points are detected in each image of the multi-view data using the difference of Gaussian (DOG). The formula for calculating the pixel value using the difference of Gaussian is shown below:

[0116]

[0117] Where σ1 and σ2 are Gaussian smoothing parameters, and x and y are the pixel coordinates of the image. In a multi-layered Gaussian difference map, a maximum of 26 pixels at adjacent scales are detected. If a point is the maximum or minimum value of all its neighboring pixels, it is marked as a feature point.

[0118] Figure 7 A flowchart illustrating a feature point matching method according to an embodiment of the present disclosure is shown schematically.

[0119] like Figure 7 As shown, the feature point matching method of this embodiment includes operations S710 to S730, and operations S710 to S730 can at least perform the above-mentioned operation S230.

[0120] In operation S710, the gradient and direction in the neighborhood surrounding the feature point are calculated.

[0121] In operation S720, the distance between feature points between different multi-view images is calculated based on the gradient and the direction.

[0122] In operation S730, if the distance to the feature point is less than a preset distance threshold, it is determined to be the matching feature point.

[0123] Specifically, the SIFT algorithm is used to generate a feature descriptor for each feature point, which is the pixel gradient and orientation within a 4×4 neighborhood centered on that feature point. The feature descriptor is then used to match feature points in different images. During the matching process, the Euclidean distance between the feature operators is used to determine similarity, and two feature points with an Euclidean distance less than a threshold are selected as matching points.

[0124] Figure 8 A flowchart illustrating a point cloud reconstruction method according to an embodiment of the present disclosure is shown schematically.

[0125] like Figure 8 As shown, the point cloud reconstruction method in this embodiment includes operations S810 to S840, and operations S810 to S840 can at least perform the above-mentioned operation S240.

[0126] In operation S810, the relative positional relationship of the cameras is calculated based on the pairwise matching feature points.

[0127] In operation S820, a projection relationship is generated based on the relative positional relationship of the cameras.

[0128] In operation S830, a sparse point cloud is generated based on the projection relationship.

[0129] In operation S840, the sparse point cloud is expanded by estimating the depth map to generate a dense point cloud.

[0130] Specifically, for image I i If a feature point f is found on the target region, then feature points f′ of the same type as f are searched sequentially on other images at positions within two pixels of the epipolar line corresponding to feature point f. For each pair (f, f′), a three-dimensional coordinate point in space is generated using the pinhole imaging model based on the camera parameters. The coordinate points are then compared with the camera optical center O(I). i The point cloud is initialized sequentially at these coordinate points in increasing order of distance between them, resulting in the initial point cloud center c(p); the vector n(p) pointing from the center c(p) to the camera optical center, the reference image R(p), and the visible image set V(p), are then used according to the formula:

[0131]

[0132] V(p)={I i |I i ∈V(p), h(p, I) i Equation (7) , R(p))≤α}

[0133] To update the visible image set V(p) and minimize the sum of grayscale differences between images, as shown below:

[0134]

[0135] Where h(p, I) i The gray-level difference function between images R(p) is set with parameters τ = π / 3 and α = 0.6. When V(p) satisfies V(p) ≥ 10, the point cloud p is successfully reconstructed. The point cloud is a rectangular point cloud with center coordinates and a reverse vector. In image I... i The image is divided into 5x5 pixel grids. For a reconstructed point cloud p, a new point cloud p′ is reconstructed in the neighboring image grid C(p) of its corresponding feature points. The center c(p′) of the point cloud is located through a line passing through cell C. iThe point cloud is initialized using the plane containing (x, y) and p. R(p′) and V(p′) are both initialized based on the corresponding parameters of p. c(p′) and n(p′) are optimized by minimizing the gray-level difference function g*(p′). After optimization, images that meet the depth conditions are added to V(p′) through depth testing, and V*(p′) is updated accordingly. If V*(p′)≥γ, that is, at least γ images have high gray-level consistency, the point cloud generation is considered successful, and the candidate point cloud p′ is saved to the corresponding image grid in the visible image, completing the expansion of the point cloud set.

[0136] Researchers found that although the dense point cloud obtained at this point can directly generate the final target model, the imaging effect of the target model generated by the dense point cloud at this point is not ideal. Therefore, further filtering of unqualified dense point clouds is still needed.

[0137] Figure 9 A flowchart illustrating a first filtering method according to an embodiment of the present disclosure is shown schematically.

[0138] like Figure 9 As shown, the first filtering method in this embodiment includes operations S910 to S920, and operations S910 to S920 can at least perform the above-mentioned operation S250.

[0139] In operation S910, if the center of the dense point cloud is not on the surface of the target object, the corresponding dense point cloud is filtered.

[0140] Specifically, if the center of the point cloud is not located on the actual surface of the target object and the scene in three-dimensional space, it will be filtered out.

[0141] In operation S920, if the number of the set of visualized images of the dense point cloud does not reach the preset atlas threshold, the corresponding dense point cloud is filtered.

[0142] Specifically, for each point cloud p, the pixel depth is calculated using the pinhole imaging model, and the set of visible images V(p) of the point cloud is calculated based on the depth map. If the number of V(p) does not reach the threshold γ, it is filtered out.

[0143] Figure 10 A flowchart illustrating a second filtering method according to an embodiment of the present disclosure is shown schematically.

[0144] like Figure 10 As shown, the second filtering method in this embodiment includes operations S1010 to S1030, which can at least perform the above-mentioned operation S260.

[0145] It should be noted that some point clouds in the first dense point cloud obtained at this stage may still contain background elements. Therefore, it is still necessary to identify and filter these point clouds containing background elements in order to minimize the impact of these point clouds on the final imaging model.

[0146] In operation S1010, the first dense point cloud is clustered based on RGB values to obtain multiple point cloud clusters.

[0147] In operation S1020, the point clouds whose RGB values are less than a preset threshold compared to the RGB values of the background region in the plurality of point cloud clusters are calculated and used as background point clouds.

[0148] The background point cloud refers to the RGB values of a certain amount of background area in some first-dense point clouds.

[0149] In operation S1030, the background point cloud in the first dense point cloud is filtered to obtain the second dense point cloud.

[0150] Specifically, the point cloud is clustered using K-means based on its RGB values, with k = 1 clusters. The distances between the k = 1 cluster centers and (r, g, b) in the RGB space are calculated, and the closest cluster is selected as the background cluster. The distance calculation formula is as follows:

[0151] D(r, g, b) = (r j -r) 2 +(g j -g) 2 +(b j -b) 2 Where j = 1, 2, ..., k+1 Equation (9)

[0152] Where (r) j g j b j ) represents the RGB value of the j-th cluster center.

[0153] In the embodiments of this disclosure, dense point clouds containing a certain background color are identified as point clouds that need to be filtered, thereby further ensuring that the images subsequently generated from the dense point clouds have less noise.

[0154] According to embodiments of this disclosure, the second filtering method further includes: (1) for each point cloud, searching for point clouds in the visible image cell and neighboring cells where the point cloud is located; if the ratio of the found point cloud in the entire set is lower than a preset threshold (e.g., 0.25), the point cloud is filtered out. (2) reprojecting the point cloud onto the image plane using a pinhole imaging model; if the point cloud projection position is in the image background area, it is filtered out.

[0155] Figure 11 The diagram illustrates the entire process of multi-view 3D image reconstruction according to an embodiment of the present disclosure.

[0156] like Figure 11 As shown, the entire process of multi-view 3D image reconstruction in this embodiment includes operations S1110 to S1190.

[0157] Take multi-view pictures while operating the S1110.

[0158] In operation S1120, image segmentation is performed.

[0159] In operation S1130, the background color is calculated.

[0160] In operation S1140, the image is converted to grayscale.

[0161] In operation S1150, feature point matching is performed.

[0162] In operation S1160, a sparse point cloud is generated.

[0163] In operation S1170, the sparse point cloud expands to generate a dense point cloud.

[0164] In operation S1180, non-target point clouds are filtered out from dense point clouds.

[0165] In operation S1190, a three-dimensional model is generated.

[0166] Based on the above-described multi-view 3D image reconstruction method, this disclosure also provides a multi-view 3D image reconstruction apparatus. The following will be combined with... Figure 12 The device is described in detail.

[0167] Figure 12 A schematic block diagram of a multi-view three-dimensional image reconstruction apparatus according to an embodiment of the present disclosure is shown.

[0168] like Figure 12 As shown, the multi-view 3D image reconstruction device 1200 of this embodiment includes a background segmentation module 1210, a preprocessing module 1220, a feature point matching module 1230, a point cloud reconstruction module 1240, a first filtering module 1250, a second filtering module 1260, and a 3D model generation module 1270.

[0169] Background segmentation module 1210 is used to separate the target region and background region in multiple acquired multi-view images. In one embodiment, background segmentation module 1210 can be used to perform the operation S210 described above, which will not be repeated here.

[0170] The preprocessing module 1220 is used to preprocess the target region in the segmented multi-view image to obtain feature points. In one embodiment, the preprocessing module 1220 can be used to perform the operation S220 described above, which will not be repeated here.

[0171] The feature point matching module 1230 is used to match feature points from the multiple multi-view images to obtain matched feature points. In one embodiment, the feature point matching module 1230 can be used to perform the operation S230 described above, which will not be repeated here.

[0172] The point cloud reconstruction module 1240 is used to perform sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain a dense point cloud. In one embodiment, the point cloud reconstruction module 1240 can be used to perform the operation S240 described above, which will not be repeated here.

[0173] The first filtering module 1250 is used to filter the dense point cloud according to a first preset filtering logic to obtain a first dense point cloud. The first preset filtering logic is a filtering logic for visualization constraints. In one embodiment, the first filtering module 1250 can be used to perform the operation S250 described above, which will not be repeated here.

[0174] The second filtering module 1260 is used to filter the first dense point cloud according to a second preset filtering logic to obtain a second dense point cloud. The second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region. In one embodiment, the second filtering module 1260 can be used to perform the operation S260 described above, which will not be repeated here.

[0175] The 3D model generation module 1270 is used to generate a 3D model based on the second dense point cloud. In one embodiment, the 3D model generation module 1270 can be used to perform the operation S270 described above, which will not be repeated here.

[0176] In the embodiments disclosed herein, the following beneficial effects can be achieved: 1. High-quality physical models can be generated using only the acquired multi-view images, resulting in low usage costs and facilitating promotion; 2. Separating the image from the background ensures that only the image in the target area is calculated during subsequent point cloud computing, reducing computational overhead and freeing up computing power; 3. By performing dual filtering on dense point clouds, visual noise points in the final product can be eliminated, ensuring that the final 3D model is clear.

[0177] According to an embodiment of this disclosure, the background segmentation module is used to extract a target mask of the multi-view image based on a preset neural network model; and to segment the target region and background region of the multi-view image based on the target mask.

[0178] According to an embodiment of this disclosure, the background segmentation module is configured to perform a closing operation logic on the target mask to obtain a final mask, wherein the closing operation is dilation followed by erosion; select the final mask region as the target region; and select the area outside the final mask region as the background region.

[0179] According to an embodiment of this disclosure, the device further includes an RGB setting module, configured to cluster pixels in the target area based on RGB values to obtain multiple color clusters; and to select the RGB values of outlier clusters among the multiple color clusters as the RGB values of the background area.

[0180] According to an embodiment of this disclosure, the apparatus further includes: an image cropping module, configured to determine whether the acquired multi-view images are of the same size; and to crop the multi-view images to a uniform size if the multi-view images are of different sizes.

[0181] According to an embodiment of this disclosure, the preprocessing module is used to calculate the values of pixels within a preset pixel region based on the difference of Gaussians, thereby obtaining multiple pixel values; and to select the maximum and minimum values among the multiple pixel values as the feature points.

[0182] According to an embodiment of this disclosure, the feature point matching module is configured to calculate the gradient and direction in the neighborhood of the feature point; calculate the distance between feature points in different multi-view images based on the gradient and the direction; and determine the feature point as the matching feature point if the distance between the feature points is less than a preset distance threshold.

[0183] According to an embodiment of this disclosure, the point cloud reconstruction module is configured to calculate the relative positional relationship of the cameras based on the pairwise matching feature points; generate a projection relationship based on the relative positional relationship of the cameras; generate a sparse point cloud based on the projection relationship; and expand the sparse point cloud by estimating a depth map to generate a dense point cloud.

[0184] According to embodiments of this disclosure, the projection relationship includes at least a view image and a set of view images. The first filtering module is used to filter the corresponding dense point cloud when the center of the dense point cloud is not on the surface of the target object; and / or to filter the corresponding dense point cloud when the number of the set of visualized images of the dense point cloud does not reach a preset atlas threshold.

[0185] According to an embodiment of this disclosure, the second module is configured to cluster the first dense point cloud based on RGB values to obtain multiple point cloud clusters; calculate the point clouds in the multiple point cloud clusters whose RGB values are less than a preset threshold compared to the RGB values of the background region, and use them as background point clouds; and filter the background point clouds in the first dense point cloud to obtain the second dense point cloud.

[0186] According to embodiments of this disclosure, any and multiple modules among the background segmentation module 1210, preprocessing module 1220, feature point matching module 1230, point cloud reconstruction module 1240, first filtering module 1250, second filtering module 1260, and 3D model generation module 1270 can be combined into one module, or any one of these modules can be split into multiple modules. Alternatively, at least some of the functions of one or more of these modules can be combined with at least some of the functions of other modules and implemented in one module. According to embodiments of this disclosure, at least one of the background segmentation module 1210, preprocessing module 1220, feature point matching module 1230, point cloud reconstruction module 1240, first filtering module 1250, second filtering module 1260, and 3D model generation module 1270 can be at least partially implemented as hardware circuits, such as field-programmable gate arrays (FPGAs), programmable logic arrays (PLAs), systems-on-a-chip, systems-on-a-substrate, systems-on-package, application-specific integrated circuits (ASICs), or any other reasonable means of integrating or packaging circuits, or implemented in software, hardware, or firmware, or in any suitable combination of any of these three implementation methods. Alternatively, at least one of the background segmentation module 1210, preprocessing module 1220, feature point matching module 1230, point cloud reconstruction module 1240, first filtering module 1250, second filtering module 1260, and 3D model generation module 1270 can be at least partially implemented as computer program modules, which can perform corresponding functions when the computer program module is run.

[0187] Figure 13 A block diagram schematically illustrates an electronic device suitable for implementing a multi-view three-dimensional image reconstruction method according to an embodiment of the present disclosure.

[0188] like Figure 13As shown, an electronic device 1300 according to an embodiment of the present disclosure includes a processor 1301, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1302 or a program loaded from a storage portion 1308 into a random access memory (RAM) 1303. The processor 1301 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 1301 may also include onboard memory for caching purposes. The processor 1301 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of the present disclosure.

[0189] RAM 1303 stores various programs and data required for the operation of electronic device 1300. Processor 1301, ROM 1302, and RAM 1303 are interconnected via bus 1304. Processor 1301 performs various operations of the method flow according to embodiments of the present disclosure by executing programs in ROM 1302 and / or RAM 1303. It should be noted that the programs may also be stored in one or more memories other than ROM 1302 and RAM 1303. Processor 1301 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in said one or more memories.

[0190] According to embodiments of this disclosure, the electronic device 1300 may further include an input / output (I / O) interface 1305, which is also connected to a bus 1304. The electronic device 1300 may also include one or more of the following components connected to the I / O interface 1305: an input section 1306 including a keyboard, mouse, etc.; an output section 1307 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1308 including a hard disk, etc.; and a communication section 1309 including a network interface card such as a LAN card, modem, etc. The communication section 1309 performs communication processing via a network such as the Internet. A drive 1310 is also connected to the I / O interface 1305 as needed. A removable medium 1311, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 1310 as needed so that computer programs read from it can be installed into the storage section 1308 as needed.

[0191] This disclosure also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs that, when executed, implement the method according to the embodiments of this disclosure.

[0192] According to embodiments of this disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as including, but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this disclosure, the computer-readable storage medium may include ROM 1302 and / or RAM 1303 and / or one or more memories other than ROM 1302 and RAM 1303 described above.

[0193] Embodiments of this disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code is used to cause the computer system to implement the methods provided in the embodiments of this disclosure.

[0194] When the computer program is executed by the processor 1301, it performs the functions defined in the system / apparatus of this disclosure embodiments. According to embodiments of this disclosure, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0195] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in a reliable manner over a network medium, and downloaded and installed via the communication section 1309, and / or installed from the removable medium 1311. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.

[0196] In such an embodiment, the computer program can be downloaded and installed from a network via the communication section 1309, and / or installed from the removable medium 1311. When the computer program is executed by the processor 1301, it performs the functions defined in the system of this disclosure embodiment. According to embodiments of this disclosure, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0197] According to embodiments of this disclosure, program code for executing the computer programs provided in embodiments of this disclosure can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages include, but are not limited to, languages such as Java, C++, Python, "C", or similar programming languages. The program code can execute entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0198] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0199] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this disclosure can be combined or combined in various ways, even if such combinations or combinations are not explicitly described in this disclosure. In particular, the features described in the various embodiments and / or claims of this disclosure can be combined or combined in various ways without departing from the spirit and teachings of this disclosure. All such combinations and / or combinations fall within the scope of this disclosure.

[0200] The embodiments of this disclosure have been described above. However, these embodiments are for illustrative purposes only and are not intended to limit the scope of this disclosure. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. The scope of this disclosure is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of this disclosure, and all such substitutions and modifications should fall within the scope of this disclosure.

Claims

1. A multi-view 3D image reconstruction method, comprising: Separate the target area and background area from multiple captured images from multiple perspectives; Preprocess the target region in the segmented multi-view image to obtain feature points; The feature points of the multiple multi-view images are matched to obtain the matching feature points; Based on the matched feature points, sparse point cloud reconstruction and dense point cloud reconstruction are performed respectively to obtain a dense point cloud; According to the first preset filtering logic, the dense point cloud is filtered to obtain the first dense point cloud. The first preset filtering logic is the filtering logic of visualization constraints. According to the second preset filtering logic, the first dense point cloud is filtered to obtain the second dense point cloud. The second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region. The second preset filtering logic includes: filtering point clouds in the first dense point cloud that contain the background region color using clustering methods; and A 3D model is generated based on the second dense point cloud.

2. The method according to claim 1, wherein, The separation of the target region and background region in the multiple multi-view images acquired includes: Extract the target mask from the multi-view images based on a preset neural network model; and The target region and background region of the multi-view image are segmented based on the target mask.

3. The method according to claim 2, wherein, The segmentation of the target region and background region of the multi-view image based on the target mask includes: The target mask is subjected to a closing operation logic to obtain the final mask. The closing operation is performed by first dilation and then erosion. The final mask area is selected as the target area; and The area outside the final mask region is selected as the background region.

4. The method according to claim 2 or 3, wherein, After separating the target region and background region in the multiple multi-view images acquired, the method further includes: The pixels in the target region are clustered based on RGB values to obtain multiple color clusters; and The RGB values of the outlier clusters among the multiple color clusters are selected as the RGB values of the background region.

5. The method according to any one of claims 1, wherein, Before separating the target region and background region in the multiple multi-view images acquired, the following is also included: Determine whether the acquired multi-view images are of the same size; and If the multi-view images are of inconsistent sizes, the multi-view images will be cropped to a uniform size.

6. The method according to any one of claims 1-3, wherein, The preprocessing of the target region in the segmented multi-view image to obtain feature points includes: The values of pixels within a preset pixel region are calculated based on the difference of Gaussians, resulting in multiple pixel values; and The maximum and minimum values among the multiple pixel values are selected as the feature points.

7. The method according to any one of claims 1-3, wherein, The matching of feature points from the multiple multi-view images to obtain matched feature points includes: Calculate the gradient and direction within the neighborhood of the feature point; Calculate the distance between feature points in different multi-view images based on the gradient and the direction; and If the distance to the feature point is less than a preset distance threshold, it is determined to be the matching feature point.

8. The method according to any one of claims 4, wherein, The process of performing sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain the dense point cloud includes: Based on the pairwise matching feature points, the relative positional relationship of the cameras is calculated; The projection relationship is generated based on the relative positional relationship of the cameras; Generate a sparse point cloud based on the projection relationship; and By estimating the depth map, the sparse point cloud is expanded to generate a dense point cloud.

9. The method according to claim 8, wherein, The projection relationship includes at least a view image and a set of view images. The first preset filtering logic includes: If the center of the dense point cloud is not on the surface of the target object, filter the corresponding dense point cloud; and / or If the number of visualized images of the dense point cloud does not reach a preset image set threshold, the corresponding dense point cloud is filtered out.

10. The method according to claim 4, wherein, The second preset filtering logic includes: The first dense point cloud is clustered based on RGB values to obtain multiple point cloud clusters; Calculate the point clouds in the plurality of point cloud clusters whose RGB values are less than a preset threshold compared to the RGB values of the background region, and use these as background point clouds; and The background point cloud in the first dense point cloud is filtered to obtain the second dense point cloud.

11. A multi-view three-dimensional image reconstruction device, comprising: The background segmentation module is used to separate the target area and the background area in multiple captured images from multiple perspectives. The preprocessing module is used to preprocess the target region in the segmented multi-view image to obtain feature points; The feature point matching module is used to match feature points of the multiple multi-view images to obtain matched feature points; The point cloud reconstruction module is used to perform sparse point cloud reconstruction and dense point cloud reconstruction based on the matched feature points to obtain a dense point cloud. The first filtering module is used to filter the dense point cloud according to the first preset filtering logic to obtain the first dense point cloud. The first preset filtering logic is the filtering logic of visualization constraints. The second filtering module is used to filter the first dense point cloud according to a second preset filtering logic to obtain a second dense point cloud. The second preset filtering logic is a weakly regularized filtering logic based on removing the influence of the background region. The second preset filtering logic includes: filtering point clouds in the first dense point cloud that contain the background region color using clustering methods; and A 3D model generation module is used to generate a 3D model based on the second dense point cloud.

12. An electronic device, comprising: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors perform the method according to any one of claims 1 to 10.

13. A computer-readable storage medium having executable instructions stored thereon, which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 10.

14. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1 to 10.