Data processing device, method, and program

The method addresses the challenge of unwanted objects in 3DGS by retraining the model with a small number of images to remove these objects and ensure geometric and color consistency, achieving a natural-looking 3D reconstruction.

WO2026133504A1PCT designated stage Publication Date: 2026-06-25NT T INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NT T INC
Filing Date
2024-12-19
Publication Date
2026-06-25

Smart Images

  • Figure JP2024045015_25062026_PF_FP_ABST
    Figure JP2024045015_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The data processing device according to one embodiment comprises: a processing unit that derives, on the basis of a first parameter that indicates a condition at the time of capturing an image and that has been used for training a first model that reproduces a three-dimensional space from input images of a plurality of viewpoints, a second parameter indicating a condition at the time of capturing an image captured at the same place as a place where an unnecessary object is present in the first model and under a condition that the unnecessary object is not present; a processing unit that, on the basis of a rendering image obtained from the first model on the basis of the second parameter and information on the region in which the unnecessary object in the first model is present, generates a binary mask image indicating the region in which the unnecessary object in the first model is present; and a processing unit that uses the binary mask image to create a second model from which a Gaussian distribution corresponding to the region of the unnecessary object in the first model is removed, creates a fourth model by combining, with the second model, a third model in which a random initial point in a region corresponding to the Gaussian distribution removed from the first model is disposed, and outputs, on the basis of a rendering image obtained from the fourth model on the basis of the second parameter, the binary mask image, and an image captured under the condition, a model that reproduces a three-dimensional space in which the fourth model has been trained and in which the unnecessary object is not present.
Need to check novelty before this filing date? Find Prior Art

Description

Data processing device, method, and program

[0001] Embodiments of the present invention relate to data processing devices, methods, and programs.

[0002] In recent years, there has been a growing expectation for services that allow users to experience actual outdoor environments in a virtual space while remaining indoors, by reconstructing a 3D space from video footage shot outdoors. To realize such services, technology is needed to reconstruct a 3D space from 2D video footage from multiple viewpoints.

[0003] One such 3D space reconstruction technique is the 3DGS (3D Gaussian Splatting) method (see, for example, Non-Patent Document 1). 3DGS represents the 3D space using a number of Gaussian distributions, and by rendering these Gaussian distributions in the 3D space with camera parameters such as tilt and position, the conditions at the time of shooting are specified, generating a 2D image from a new viewpoint, thereby achieving 3D space reconstruction.

[0004] To create a 3DGS model that realizes such 3DGS and reconstructs a 3D space by inputting 2D images taken from multiple viewpoints, it is necessary to learn a Gaussian distribution in 3D space based on the 2D images taken from multiple viewpoints and camera parameters. As an example of application to a real service, when reconstructing a certain outdoor location as a 3D space, a challenge arises when vehicles or people are present in the video footage of the outdoors, and these appear as unwanted objects in the 3D space reproduced by the 3DGS model.

[0005] To address this challenge, a method has been proposed that utilizes Inpainting technology to interpolate unwanted object regions to create a natural appearance and generate an image from a new viewpoint where the unwanted objects do not exist (see, for example, Non-Patent Document 2).

[0006] Kerbl B, Kopanas G, Leimkuhler T, Drettakis G, “3D Gaussian Splatting for Real-Time Radiance Field Rendering,” ACM Transactions on Graphics (TOG), 42, 4 (2023), 1-14Zhiheng Liu, et, al, “InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior,” arxiv:2404.11613

[0007] The above inpainting technology utilizes a deep learning model or a generative model to interpolate unwanted object regions into a natural-looking appearance learned from a large amount of image data. Therefore, in a scene where the background of the unwanted object region is complex, it may not be possible to correctly generate the background from the surrounding elements of the unwanted object region, and the interpolation may result in an appearance different from the actual background.

[0008] This invention has been made in view of the above circumstances, and an object thereof is to provide a data processing apparatus, method, and program capable of appropriately reconstructing a three-dimensional space.

[0009] A data processing device according to one aspect of the present invention includes: a first processing unit that derives a second parameter indicating the conditions at which an image was taken, based on a first parameter indicating the conditions at which the image was taken, which was used to train a first model that reproduces a three-dimensional space using a Gaussian distribution by inputting images from multiple viewpoints, and the second processing unit that generates a binary mask image indicating the region where the unwanted object exists in the first model, based on the second parameter and information on the rendering image obtained from the first model and the region where the unwanted object exists in the first model; The system includes a third processing unit that creates a second model by removing the Gaussian distribution corresponding to the region of an unwanted object on the first model using the binary mask image, creates a fourth model by combining the second model with a third model in which random initial points in the region corresponding to the Gaussian distribution removed from the first model are placed, and outputs a fifth model that reproduces a three-dimensional space without the unwanted object, in which the fourth model has been trained based on the second parameters, the rendering image obtained from the fourth model, the binary mask image, and an image taken at the same location under conditions where the unwanted object does not exist.A data processing device according to one aspect of the present invention includes: a first processing unit that derives a second parameter indicating the conditions at which an image was taken, based on a first parameter indicating the conditions at which the image was taken, which was used to train a first model that reproduces a three-dimensional space using a Gaussian distribution by inputting images from multiple viewpoints, and which indicates the conditions at which the image was taken at the same location as where the unwanted object exists in the first model, and where the unwanted object does not exist; a second processing unit that generates a binary mask image indicating the region where the unwanted object exists in the first model, based on the rendering image obtained from the first model and information on the region where the unwanted object exists in the first model, based on the second parameter; and a depth estimation image of the image taken at the same location under conditions where the unwanted object does not exist, and The system includes: a third processing unit that learns a third model that reproduces a three-dimensional space using a Gaussian distribution based on the rendered image and the second parameter, using a rendered image obtained from the first model and the depth estimation image based on the first parameter; a fourth processing unit that creates a second model in which the Gaussian distribution corresponding to the region of unwanted objects on the first model is removed using the binary mask image, creates a fourth model by combining the third model with the second model, and outputs a fifth model that reproduces a three-dimensional space without the unwanted object, in which the fourth model has been learned based on the rendered image obtained from the fourth model based on the second parameter, the binary mask image, and an image taken at the same location under conditions where the unwanted object does not exist.

[0010] A data processing method according to an aspect of the present invention is a method performed by a data processing apparatus. The method includes: deriving, by a first processing unit of the data processing apparatus, a second parameter indicating a shooting condition of an image taken under a condition where an unnecessary object does not exist at a location where the unnecessary object exists in the first model, based on a first parameter indicating a shooting condition at the time of shooting an image used for learning a first model that inputs images of a plurality of viewpoints and reproduces a three-dimensional space by a Gaussian distribution; generating, by a second processing unit of the data processing apparatus, a binary mask image indicating a region where an unnecessary object exists on the first model, based on the rendering image obtained from the first model based on the second parameter and information on a region where the unnecessary object exists on the first model; creating, by a third processing unit of the data processing apparatus, a second model in which a Gaussian distribution corresponding to a region of the unnecessary object on the first model is removed, using the binary mask image, and creating a fourth model by combining the second model with a third model in which a random initial point within a region corresponding to the removed Gaussian distribution is arranged from the first model, and outputting, based on the rendering image obtained from the fourth model, the binary mask image, and the image taken under the condition where the unnecessary object does not exist at the same location, a fifth model that has learned the fourth model and reproduces a three-dimensional space where the unnecessary object does not exist.

[0011] According to the present invention, a three-dimensional space can be appropriately reconstructed.

[0012] Figure 1 is a diagram showing an example of the application of a data processing device according to the first embodiment of the present invention. Figure 2 is a flowchart showing an example of the procedure for processing by the data processing device according to the first embodiment of the present invention. Figure 3 is a diagram showing an example of the output of a rendered image. Figure 4 is a diagram showing an example of the derivation of camera parameters for a small number of images. Figure 5 is a diagram showing an example of the creation of a binary mask image. Figure 6 is a diagram showing an example of the creation of a combined 3DGS model. Figure 7 is a diagram showing an example of the output of a retrained 3DGS model. Figure 8 is a diagram showing an example of the application of a data processing device according to the second embodiment of the present invention. Figure 9 is a flowchart showing an example of the procedure for processing by the data processing device according to the second embodiment of the present invention. Figure 10 is a diagram showing an example of the output of a rendered image and a depth image. Figure 11 is a diagram showing an example of the output of an optimized depth estimation image. Figure 12 is a diagram showing an example of the error in the rendered image and the error in the depth image. Figure 13 is a diagram showing an example of the creation of a combined 3DGS model. Figure 14 is a diagram showing an example of the output of a retrained 3DGS model. Figure 15 is a block diagram showing an example of the hardware configuration of a data processing device according to one embodiment of the present invention.

[0013] Embodiments relating to this invention will be described below. In this embodiment, an optimal 3DGS model is constructed by retraining, using a small number of images taken under conditions different from the image acquisition conditions of the images input to the 3DGS model that reproduces a three-dimensional space. This model removes unnecessary object regions from the numerous Gaussian distributions of the already learned 3DGS model, i.e., a 3DGS model that reproduces a three-dimensional space without unnecessary object regions.

[0014] The small number of images used for this retraining were taken under different conditions, as mentioned above. The lighting and color conditions in these images differ from those of the original 3DGS model. However, the 3DGS model is retrained to match these differences to the lighting and color conditions of the original 3DGS model.

[0015] Furthermore, in this embodiment, by utilizing a depth estimation model, geometric consistency can be achieved between the 3DGS model and already trained 3DGS models when the 3DGS model is trained from a small number of images.

[0016] In this embodiment, since the 3DGS model is retrained using images taken under different conditions, it is possible to construct a 3DGS model that reproduces a three-dimensional space that looks the same as the actual background, even in scenes where the background of unwanted object regions is complex.

[0017] (First Embodiment) Figure 1 is a diagram showing an example of the application of a data processing device according to the first embodiment of the present invention. Figure 2 is a flowchart showing an example of the procedure of processing by the data processing device according to the first embodiment of the present invention. As shown in Figure 1, the data processing device 100 according to the first embodiment of the present invention includes a model input unit 10, a database 15, a model storage unit 20, a small number of image input unit 30, an unnecessary object region image creation unit 40, and a model retraining unit 50.

[0018] Figure 3 shows an example of the output of a rendered image. The model input unit 10 has already trained a 3DGS model and the camera parameters Φ shown in (1) below, which were used to train this 3DGS model. train The data is obtained from database 15 or an external source (S10). This already trained 3DGS model is used as the trained 3DGS in subsequent processing.

[0019]

[0020] Here, instead of using a pre-trained 3DGS model, we use the 2D image F shown in (2) below. train and camera parameter Φ train A dataset consisting of these may be obtained from database 15 or an external source, and the 3DGS model trained in the model input unit 10 based on the results of these acquisitions may be used as the trained 3DGS model in subsequent processing. train = (f t,1 ,f t,2 ,…,f t,N ) ... (2)

[0021] The model input unit 10 passes the learned 3DGS model shown in (3) below to the model storage unit 20, and passes the camera parameters Φ used for learning described above to the small number of image input unit 30. train It passes it to the small number of image input unit 30.

[0022]

[0023] The model storage unit 20 acquires and stores the above-mentioned learned 3DGS model from the model input unit 10. When the camera parameters shown in (4) below are input from other units, the model storage unit 20 outputs the rendering image I shown in (5) below of the above-mentioned learned 3DGS model in this camera parameter to the model re-learning unit 50 (S20).

[0024]

[0025] In addition, the model storage unit 20 outputs parameters P that can be obtained from the above-mentioned learned 3DGS model, for example, the coordinates and color information of each Gaussian of the above-mentioned learned 3DGS model, to the model re-learning unit 50.

[0026] FIG. 4 is a diagram showing an example of deriving the camera parameters of a small number of images. The small number of image input unit 30 acquires the camera parameters Φ used for learning described above from the model input unit 10. train In addition, the small number of image input unit 30 further acquires, as an input, an image taken at the same location as the location where unnecessary objects to be removed exist in the above-mentioned learned 3DGS model, and acquires n (≧2) small number of images F shown in (6) below taken under the condition that no unnecessary objects exist from the outside (S30). F = {f1, f2,..., f n}... (6)

[0027] Here, the small number of images refers to images taken at a time different from the shooting time in the learned 3DGS model. In addition to this, images that can be obtained from a street view dataset or the like that is generally recognized may also be used.

[0028] Next, the few images, apart from the presence or absence of unwanted objects, are images of the same location where the unwanted objects exist, as described above. Therefore, the camera parameters Φ used for training, obtained from the model input unit 10, are used. train Based on this, the small number of image input unit 30 sets the camera parameter Φ of the small number of images as shown in (7) below. F Derive the following (S40).

[0029]

[0030] The derivation of these camera parameters is performed using existing 3D reconstruction tools such as COLMAP or Metashape (registered trademark), which employ SfM (Structure from Motion) and MVS (Multi-view Stereo) technologies. In this case, the camera parameter Φ is used in COLMAP or Metashape. train If an image corresponding to the model is required, the small number of image input unit 30 receives the camera parameter Φ from the model storage unit 20. train Rendering image of the 3DGS model I train To obtain, or to obtain a 2D image F from the model input unit 10. train Obtain the data and use the result of this acquisition.

[0031] The small number of image input unit 30 receives the derived camera parameter Φ of the small number of images. F The data is passed to the unwanted object region image creation unit 40, which receives a small number of images F acquired from an external source, and the derived camera parameters Φ of the small number of images. F The data is passed to the model retraining unit 50.

[0032] Figure 5 shows an example of creating a binary mask image. The unwanted object region image creation unit 40 receives the camera parameter Φ of the few images from the few image input unit 30. F The unwanted object region image creation unit 40 obtains the camera parameters Φ of the few images obtained. F The data is passed to the model storage unit 20, and the rendering image I of the learned 3DGS model for each camera parameter, as shown in (8) below, is generated. F This is obtained from the model storage unit 20. This rendering image I FThis can be expressed as follows (9): I F = (I F,1 ,I F,2 ,…,I F,n ) ... (8)

[0033]

[0034] Next, the unwanted object region image creation unit 40 calculates the camera parameter Φ of the few images. F Rendered image I of the aforementioned trained 3DGS model F Based on user input from the user who confirmed the settings, the camera parameters Φ for a small number of images are determined. F The binary mask image M shown in (10) below indicates the unwanted object regions on the learned 3DGS model described above. F Create (S50). F = (M F,1 M F,2 ,…,M F,n ) ... (10)

[0035] This binary mask image M F One method for creating this is to use Segment Anything, etc., to create a rendered image I displayed on the GUI (Graphical User Interface) as shown in (11) below. F While viewing the image, this user can specify unwanted objects by clicking or manipulating the bounding box, etc., to create a mask image M. F The method for creating a mask image M is to use Grounded Segment Anything or CLIPSeg, and input text indicating unwanted objects. F A method for creating a mask image M that the user manually uses a paint tool or similar to indicate unwanted object regions. F Examples include methods for creating [the data].

[0036]

[0037] Here, (12) below are user input parameters used to indicate unwanted object regions, such as coordinates, bounding boxes, or text inputs.

[0038]

[0039] The unwanted object region image creation unit 40 generates the camera parameters Φ of the small number of images created as described above. F In the above-mentioned trained 3DGS model, a mask image M shows the unwanted object regions. F The data is passed to the model retraining unit 50.

[0040] Figure 6 shows an example of creating a combined 3DGS model. The model retraining unit 50 receives a small number of images F from the small number of image input unit 30 and the camera parameters Φ of the small number of images. F The unwanted object region image creation unit 40 obtains the mask image M F The model retraining unit 50 first obtains the 3D coordinates of each Gaussian, as shown in (13) below, from the parameters I of the trained 3DGS model passed from the model storage unit 20, and obtains the camera parameters Φ of the few images from these 3D coordinates. F Using this, the matrix calculation shown in (14) below derives the 2D coordinates projected onto the Gaussian rendering plane shown in (15) below.

[0041]

[0042] The model retraining unit 50 uses the derived 2D coordinates and the mask image M passed from the unnecessary object region image creation unit 40. F This is used to determine which Gaussians in the trained 3DGS model constitute unwanted objects.

[0043] The model retraining unit 50 removes the Gaussian 2D coordinates that are included in the region where the value of the mask image shown in (16) below is 1, and creates a 3DGS model shown in (17) below, that is, a 3DGS model in which Gaussian distributions corresponding to unnecessary object regions have been removed from the numerous Gaussian distributions in the original 3DGS model (S60).

[0044]

[0045] Subsequently, the model retraining unit 50 fixes the parameters of the created 3DGS model so that their values ​​are not updated during training. The model retraining unit 50 then creates a new 3DGS model, as shown in (19), which is a combination of the 3DGS model shown in (17) above, in which random initial points set by the user are placed within the region corresponding to the removed Gaussian distribution, and the 3DGS model shown in (17) above, from which the Gaussian distribution corresponding to the unnecessary object region has been removed.

[0046]

[0047] Figure 7 shows an example of the output of a retrained 3DGS model. The model retraining unit 50 then processes the created 3DGS model, a small number of images F, and the camera parameters Φ of the small number of images. F The new 3DGS model described above is trained using this method. During this training, the model retraining unit 50 uses the camera parameters Φ of a small number of images. F The rendering image I shown below in (20) of the new 3DGS model shown in (19) above. ref Regarding this, the new 3DGS model described above is trained using the rendering image error used for training a normal 3DGS model, as shown in (21) below, as well as the error used to evaluate whether the unwanted object region and the other regions have the same color, as shown in (22) below. ref = (I r,1 ,I r,2 ,…,I r,N ) ... (20)

[0048]

[0049] The U(•) in (22) above is a feature extractor that extracts the color characteristics of the input image, and (23) below represents the extracted feature quantity.

[0050]

[0051] Furthermore, the loss function shown in (24) below, relating to the aforementioned error, may be a general L1 norm or L2 norm error function, or an error function using feature similarity. In addition, the feature extractor may be a deep learning model used in image harmony, a histogram measuring the distribution of colors, or the mean or variance of a vectorized color of the input image.

[0052]

[0053] Finally, the model retraining unit 50 outputs an optimal 3DGS model that reproduces a 3D space free of unwanted objects, as shown in (25) below, as a result of training the new 3DGS model described above (S70).

[0054]

[0055] (Second Embodiment) As a second embodiment, an example is shown in which a 3DGS model is created from a small number of images, and this created 3DGS model is combined with a base 3DGS model and retrained. Figure 9 is a flowchart showing an example of the processing procedure by the data processing device according to the second embodiment of the present invention. Figure 10 is a diagram showing an example of the output of a rendering image and a depth image. The data processing device 100a according to the second embodiment includes, in addition to the model input unit 10, database 15, model storage unit 20, small number of image input unit 30, unnecessary object region image creation unit 40, and model retraining unit 50 described in the first embodiment, a depth parameter derivation unit 60 and a small number of image model learning unit 70.

[0056] First, similar to S10 described in the first embodiment, the model input unit 10 acquires the learned 3DGS model and the camera parameters used to learn this 3DGS model (S110). The model input unit 10 passes the learned 3DGS model to the model storage unit 20 and the camera parameters Φ used for learning. train The data is passed to the small number of image input unit 30.

[0057] Figure 10 shows an example of the output of a rendering image and a depth image. The model storage unit 20 acquires and stores the learned 3DGS model described above from the model input unit 10. When the camera parameters shown in (4) above are input from another unit, the model storage unit 20 outputs a rendering image I of the learned 3DGS model described above for these input camera parameters, as well as a depth image D, as shown in (26) below (S120).

[0058]

[0059] Next, similar to S30 and S40 described in the first embodiment, the minority image input unit 30 acquires the minority image F (S130), and the camera parameter Φ of the minority image F The result is derived (S140). Next, similar to S50 described in the first embodiment, the unwanted object region image creation unit 40 generates a binary mask image M F Create (S150).

[0060] Figure 11 shows an example of the output of an optimized depth-estimated image. Next, the depth parameter derivation unit 60 takes the camera parameter Φ used for training from the model input unit 10. train The acquired camera parameters Φ are then stored in the model storage unit 20. train The model memory unit 20 receives the rendered image I of the learned 3DGS model for each camera parameter. train and the rendered image I train Depth image D train Obtain it.

[0061] The depth parameter derivation unit 60 uses MiDaS or DepthAnything model, etc., as a depth estimation model to obtain the rendered image I train Depth estimation image E train We calculate the depth image D. train and depth estimation image E train This is because the scale and offset of the depth information of the images in question are different, so by solving the optimization problem shown in (27) below, a more optimized depth parameter, the scale variable s, can be obtained. *and offset variable t * Derive the following (S160).

[0062]

[0063] The depth parameter derivation unit 60 uses the scale variable s, which is an optimized depth parameter. * and offset variable t * The data is passed to the small-image model learning unit 70. The small-image model learning unit 70 receives the small-image F and the camera parameters Φ of the small-image F from the small-image input unit 30. F The depth parameter is obtained, and the depth parameter derivation unit 60 obtains the scale variable s, which is the optimized depth parameter. * and offset variable t * Obtain it.

[0064] The small-image model learning unit 70 uses a depth estimation model to obtain a depth-estimated image E from the acquired small-image F. F We find the scale variable s * and offset variable t * From this, we obtain the optimized depth-estimation image shown in (28) below.

[0065]

[0066] Figure 12 shows an example of errors in the rendering image and depth image. The small number of image model learning unit 70 then processes the small number of images F and the camera parameters Φ of the small number of images. F Therefore, a 3DGS model for a small number of images, namely, a small number of images F and the camera parameters Φ of the small number of images. F Based on this, a 3DGS model that reproduces the three-dimensional space using a Gaussian distribution is trained. Here, the small-image model training unit 70 trains a 3DGS model for a small number of images using the error of the depth image shown in (30) below, in addition to the error of the rendering image used for training a normal 3DGS model, as shown in (29) below, so that the scale of the depth information is the same as that of an already trained 3DGS model (S170).

[0067]

[0068] As the loss function described above, an L1 norm or L2 norm error function may be used. The small-image model learning unit 70 outputs a learned 3DGS model for a small number of images.

[0069] Figure 13 shows an example of the creation of a combined 3DGS model. The model retraining unit 50 creates a 3DGS model as shown in (17) above, in which the Gaussian distribution corresponding to the region of unnecessary object areas on the original 3DGS model has been removed (S180), similar to S60 described in the first embodiment. The model retraining unit 50 then acquires a newly learned 3DGS model for a small number of images from the small number of image model learning unit 70, and combines the learned 3DGS model for a small number of images with the 3DGS model from which the Gaussian distribution corresponding to the region of unnecessary object areas on the original 3DGS model has been removed, to create a new 3DGS model as shown in (31) below.

[0070]

[0071] Figure 14 shows an example of the output of a retrained 3DGS model. The model retraining unit 50 then uses this newly created 3DGS model to output an optimal 3DGS model that reproduces a three-dimensional space free of unwanted objects, similar to S70 described in the first embodiment (S190), by training this new 3DGS model.

[0072] In this embodiment, when the model retraining unit 50 uses a 3DGS model that has been trained on a small number of images, the rendering image error shown in (29) above does not need to be used for retraining the 3DGS model because it is already combined with the 3DGS model that has been trained on a small number of images. Instead, it uses the error shown in (22) above, which evaluates whether the unwanted object region and the other regions have the same color, to output a retrained 3DGS model in which no unwanted objects exist.

[0073] Figure 15 is a block diagram showing an example of the hardware configuration of a data processing device according to one embodiment of the present invention. In the example shown in Figure 15, the data processing device 100 according to the above embodiment is configured by, for example, a server computer or a personal computer, and has a hardware processor 111A such as a CPU (Central Processing Unit). A program memory 111B, a data memory 112, an input / output interface 113, and a communication interface 114 are connected to this hardware processor 111A via a bus 115. The data processing device 100 will be described below, but the same applies to the data processing device 100a shown in Figure 8.

[0074] The communication interface 114 includes, for example, one or more wireless communication interface units, enabling the transmission and reception of information with the communication network. As the wireless interface, for example, an interface employing a low-power wireless data communication standard such as a wireless LAN (Local Area Network) is used.

[0075] The input / output interface 113 is connected to an input device 500 and an output device 600, which are attached to the data processing device 100 and used by users or the like.

[0076] The input / output interface 113 can capture operation data entered by a user or the like through an input device 500 such as a keyboard, touch panel, or touchpad, and can also output output data to an output device 600, including a display device using liquid crystal or organic EL (electroluminescence), for display. The input device 500 and output device 600 may be devices built into the data processing device 100, or they may be input and output devices of other information terminals that can communicate with the data processing device 100 via a network.

[0077] The program memory 111B is a non-temporary tangible storage medium in which a non-volatile memory that can be written to and read at any time, such as an HDD (Hard Disk Drive) or SSD (Solid State Drive), is used in combination with another non-volatile memory such as ROM (Read Only Memory), and can store programs necessary for executing various control processes, etc., according to one embodiment.

[0078] The data memory 112 is a tangible storage medium that, for example, uses a combination of the above-mentioned non-volatile memory and volatile memory such as RAM (Random Access Memory), and can be used to store various data or information acquired and created during the process of various operations.

[0079] A data processing device 100 according to one embodiment of the present invention may be configured as a data processing device having, as a software-based processing function unit, various parts of the data processing device 100, for example, a model input unit 10, a database 15, a model storage unit 20, a small number of image input unit 30, a unit for creating unnecessary object region images 40, and a model retraining unit 50 as shown in Figure 1.

[0080] The storage devices and database 15 used as work memory by each part of the data processing device 100 may be configured using the data memory 112 shown in Figure 15. However, the storage areas configured in these storage devices are not essential to the data processing device 100, and may be areas provided in external storage media such as USB (Universal Serial Bus) memory, or in storage devices such as database servers located in the cloud.

[0081] The processing functions in the model input unit 10, model storage unit 20, small number of image input unit 30, unnecessary object region image creation unit 40, and model retraining unit 50 can all be implemented by having the hardware processor 111A read and execute a program stored in the program memory 111B. Some or all of these processing functions may be implemented in various other forms, including application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs).

[0082] Furthermore, the methods described in each embodiment can be stored as programs (software means) that can be executed by a computer on recording media such as magnetic disks (floppy disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, MOs, etc.), and semiconductor memories (ROMs, RAMs, flash memories, etc.), and can also be transmitted and distributed via communication media. The programs stored on the media also include configuration programs that configure the computer to run software means (including not only the execution program but also tables or data structures). The computer implementing this device reads the program recorded on the recording media and, if necessary, constructs the software means using the configuration program, and executes the above-described processes by controlling the operation of this software means. Note that the recording media referred to in this specification are not limited to those for distribution, but also include storage media such as magnetic disks or semiconductor memories provided inside the computer or in devices connected via a network.

[0083] It should be noted that the present invention is not limited to the embodiments described above, and can be modified in various ways during implementation without departing from its essence. Furthermore, each embodiment may be combined as appropriate, and in that case, the combined effects can be obtained. Moreover, the above embodiments include various inventions, and various inventions can be extracted by selecting combinations from the multiple constituent elements disclosed. For example, if the problem can be solved and effects obtained even if some constituent elements are deleted from all the constituent elements shown in the embodiment, then the configuration with these deleted constituent elements can be extracted as an invention.

[0084] 100, 100a...Data processing unit 10...Model input unit 15...Database 20...Model storage unit 30...Small number of image input unit 40...Unnecessary object region image creation unit 50...Model retraining unit 60...Depth parameter derivation unit 70...Small number of image model learning unit

Claims

1. A first processing unit that derives a second parameter that indicates the conditions at which an image was taken, based on a first parameter that indicates the conditions at which the image was taken, used to train a first model that reproduces a three-dimensional space using a Gaussian distribution by inputting images from multiple viewpoints, and which indicates the conditions at which the image was taken, and which indicates the conditions at which an image was taken taken in the same location as an unwanted object in the first model, but under conditions where the unwanted object is not present; and a second processing unit that generates a binary mask image indicating the region in the first model where an unwanted object is present, based on the rendering image obtained from the first model based on the second parameter and information on the region in the first model where an unwanted object is present, A data processing device comprising: a third processing unit that creates a second model by removing the Gaussian distribution corresponding to the region of an unwanted object on the first model using the binary mask image; a fourth model by combining the second model with a third model in which random initial points in the region corresponding to the Gaussian distribution removed from the first model are placed; and a fifth model that reproduces a three-dimensional space without the unwanted object, in which the fourth model has been trained based on the second parameters, the rendering image obtained from the fourth model, the binary mask image, and an image taken at the same location under conditions where the unwanted object does not exist.

2. A first processing unit that, based on a first parameter indicating the conditions at the time of image capture, is used to train a first model that reproduces a three-dimensional space using a Gaussian distribution by inputting images from multiple viewpoints, and derives a second parameter indicating the conditions at the time of image capture for an image taken in the same location as an unwanted object in the first model, but without the unwanted object; a second processing unit that, based on the second parameter, generates a binary mask image indicating the region where an unwanted object exists in the first model, based on a rendering image obtained from the first model and information on the region where an unwanted object exists in the first model; a third processing unit that obtains a depth-estimated image of an image taken in the same location under conditions where an unwanted object does not exist, and trains a third model that reproduces a three-dimensional space using a Gaussian distribution based on the image taken in the same location under conditions where an unwanted object does not exist and the second parameter, using the rendering image obtained from the first model and the depth-estimated image based on the first parameter; A data processing device comprising: a fourth processing unit that creates a second model from which the Gaussian distribution corresponding to the region of an unwanted object on the first model is removed using the binary mask image; a fourth model is created by combining the third model with the second model; and a fifth model that reproduces a three-dimensional space without the unwanted object, from which the fourth model has been trained based on the second parameters, the rendering image obtained from the fourth model, the binary mask image, and an image taken at the same location under conditions where the unwanted object does not exist.

3. A method performed by a data processing device, comprising: a first processing unit of the data processing device deriving a second parameter indicating the conditions at which an image was taken, based on a first parameter indicating the conditions at which the image was taken, which was used to train a first model that reproduces a three-dimensional space using a Gaussian distribution by inputting images from multiple viewpoints, and which indicates the conditions at which the image was taken, and which indicates the conditions at which an image was taken, at the same location where an unwanted object exists in the first model, but under conditions where the unwanted object does not exist; and a second processing unit of the data processing device generating a binary mask image indicating the region where an unwanted object exists in the first model, based on the second parameter and information on the region where an unwanted object exists in the first model. A data processing method comprising: a third processing unit of the data processing device creating a second model in which a Gaussian distribution corresponding to the region of an unwanted object on the first model is removed using the binary mask image; creating a fourth model by combining the second model with a third model in which random initial points in the region corresponding to the Gaussian distribution removed from the first model are placed; and outputting a fifth model that reproduces a three-dimensional space without the unwanted object, in which the fourth model has been trained, based on a rendering image obtained from the fourth model based on the second parameters, the binary mask image, and an image taken at the same location under conditions where the unwanted object does not exist.

4. A data processing program that causes a processor to function as a component of the data processing device described in claim 1 or claim 2.