A system, method, and program for creating datasets for training image generation models.

The system addresses the labor-intensive requirement of high-precision 3D scene generation by aligning images and point cloud data in equirectangular formats, facilitating the creation of complete 3D virtual spaces with reduced effort and increased efficiency.

JP2026100352APending Publication Date: 2026-06-19株式会社MATRIX

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
株式会社MATRIX
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Generating high-precision 3D scenes using techniques like Gaussian splatting requires a large number of photographs, which is labor-intensive.

Method used

A system for creating datasets that includes multiple pairs of images and point cloud images, aligning viewpoints, and generating equirectangular formats to facilitate the training of image generation models, enabling the generation of high-precision 3D scenes with a reduced number of photographs.

🎯Benefits of technology

Enables the construction of high-precision 3D scenes by generating a large number of images from point cloud data, allowing for more complete 3D virtual spaces with improved efficiency and reduced effort.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100352000001_ABST
    Figure 2026100352000001_ABST
Patent Text Reader

Abstract

To provide a system for creating datasets for training image generation models. [Solution] A system for creating a dataset for training an image generation model comprises: a first receiving means for receiving a plurality of images acquired at a plurality of locations, wherein the plurality of locations include a first location and a second location, and the plurality of images include a first image acquired at the first location and a second image acquired at the second location; a second receiving means for receiving point cloud data in a predetermined three-dimensional space, wherein the predetermined three-dimensional space includes the first location and the second location; and a generating means for generating a plurality of point cloud images from the point cloud data, wherein the generating means generates a first point cloud image having the same viewpoint and field of view as the first image and a second point cloud image having the same viewpoint and field of view as the second image.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The present invention relates to a system, method, and program for creating a dataset for learning an image generation model. The present invention also relates to an image generation model learned from a dataset created by a system or the like for creating a dataset for learning an image generation model, and a method or the like for creating a dataset for learning a 3D scene generation model using the image generation model. 【Background Art】 【0002】 Techniques for generating a 3D scene from a plurality of photos are known. A 3D scene generated from real-world photos can be used, for example, to construct a virtual space representing the real world, and a user can virtually experience the real world by wearing a head-mounted display or the like and immersing themselves in the virtual space. 【0003】 As a technique for generating a 3D model from a plurality of photos, Gaussian Splatting has been developed (Non-Patent Document 1). 【Prior Art Documents】 【Non-Patent Documents】 【0004】 【Non-Patent Document 1】 Bernhard et al, “3D Gaussian Splatting for Real-Time Radiance Field Rendering”, ACM Trans. Graph., Vol. XX, No. X, Article . Publication date: August XXXX. 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0005】 To generate high-precision 3D scenes using techniques such as Gaussian splatting, a large number of photographs are required. However, taking a large number of photographs requires a tremendous amount of effort. 【0006】 One of the objectives of this invention is to provide a system for creating datasets for training image generation models in order to facilitate the generation of high-precision 3D scenes. [Means for solving the problem] 【0007】 This invention provides a system for creating a dataset that includes multiple pairs of multiple images and multiple point cloud images. The present invention provides, for example, the following items: (Item 1) A system for creating datasets for training image generation models, A first receiving means for receiving multiple images acquired at multiple locations, wherein the multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. A second receiving means for receiving point cloud data in a predetermined three-dimensional space, wherein the predetermined three-dimensional space includes the first point and the second point, A generation means for generating a plurality of point cloud images from the point cloud data, wherein the generation means generates a first point cloud image having the same viewpoint and field of view as the first image, and a second point cloud image having the same viewpoint and field of view as the second image. A system comprising, wherein the dataset includes a plurality of pairs of the plurality of images and the plurality of point cloud images, and the plurality of pairs include a pair of the first image and the first point cloud image, and a pair of the second image and the second point cloud image. (Item 2) The system described in the above item, wherein each of the plurality of images is an equirectangular image, and each of the plurality of point cloud images is an equirectangular point cloud image. (Item 3) The generating means is The relative relationship between each point in the point cloud data and the reference point is changed so that the given viewpoint is aligned with the reference point. For each point in the modified point cloud data, calculate the distance r from the reference point. Based on the aforementioned distance r, the azimuth angle θ and elevation angle φ are calculated, Mapping each point to its respective (θ,φ) and A system according to any one of the above items, which generates a point cloud image in equirectangular format. (Item 4) Changing the relative relationship between each point in the aforementioned point cloud data and the reference point is, Calculating a rotation matrix for the aforementioned point cloud data, Translating each point in the point cloud data and / or rotating it using the rotation matrix A system including any one of the items listed above. (Item 5) An image generation model trained using a dataset created by any one of the systems described in the above items. (Item 6) A method for creating a dataset for training a 3D scene generation model, The generation means provided in any one of the above items generates a third point cloud image of a third point in the predetermined three-dimensional space, The image generation model described in item 5, upon receiving the third point cloud image as input, generates a third image that appears to have been acquired at the third location. A method comprising the above, wherein the dataset includes the first image, the second image, and the third image. (Item 7) To generate at least one first partial image from the first image, To generate at least one second partial image from the aforementioned second image, To generate at least one third partial image from the aforementioned third image and The method according to any one of the above items, further comprising, wherein the dataset includes the at least one first partial image instead of the first image, the at least one second partial image instead of the second image, and the at least one third partial image instead of the third image. (Item 8) A 3D scene generation model trained on a dataset created by the method described in item 6 or item 7. (Item 9) A method for creating a dataset for training an image generation model, Receiving multiple images acquired at multiple locations, wherein the multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. The method involves receiving point cloud data of a predetermined three-dimensional space, wherein the predetermined three-dimensional space includes the first point and the second point. The method of generating multiple point cloud images from the point cloud data includes generating a first point cloud image having the same field of view as the first image and a second point cloud image having the same field of view as the second image. A method comprising, wherein the dataset comprises a plurality of pairs of the plurality of images and the plurality of point cloud images, the plurality of pairs comprising a pair of the first image and the first point cloud image, and a pair of the second image and the second point cloud image. (Item 9A) The method described in item 9, wherein the method has the characteristics described in any one of the above items. (Item 10) A program for creating a dataset for training an image generation model, wherein the program is executed on a computer equipped with a processor, and the program is Receiving multiple images acquired at multiple locations, wherein the multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. Receiving point cloud data of a predetermined three-dimensional space, the predetermined three-dimensional space including the first point and the second point, and Generating a plurality of point cloud images from the point cloud data, including generating a first point cloud image having the same field of view as the first image and a second point cloud image having the same field of view as the second image Causing the processor to execute a process including: the dataset including a plurality of pairs of the plurality of images and the plurality of point cloud images, the plurality of pairs including a pair of the first image and the first point cloud image and a pair of the second image and the second point cloud image, program (Item 10A) The program according to item 10, comprising the feature according to any one of the above items (Item 10B) A computer-readable storage medium storing the program according to item 10 or item 10A [Advantages of the Invention] 【0008】 According to the present invention, it is possible to provide a system for creating a dataset for learning an image generation model, etc., which enables the construction of an image generation model capable of outputting a large number of photos, and ultimately, the generation of a high-precision three-dimensional scene can be facilitated by a large number of photos output from the image generation model [Brief Description of the Drawings] 【0009】 [Figure 1A] A diagram showing an example of a flow for generating a three-dimensional scene using the system of the present invention and presenting it to a user [Figure 1B] A diagram showing an example of a flow for generating a three-dimensional scene using the system of the present invention and presenting it to a user [Figure 2A] A diagram showing an example of the configuration of the dataset generation system 100 [Figure 2B] A diagram showing an example of the configuration of the system 1000 [Figure 2C]Diagram showing another example of the configuration of System 1000 [Figure 3] This diagram shows an example configuration when System 1000 is implemented as a server device. [Figure 4] This diagram shows an example of the configuration of System 1000, which is implemented as a server device. [Figure 5A] This diagram illustrates an example of aligning the reference point O of point cloud data with the desired viewpoint P. [Figure 5B] This diagram illustrates an example of aligning the reference point O of point cloud data with the desired viewpoint P. [Figure 6] Detailed example of generating an equirectangular point cloud image from point cloud data at viewpoint P. [Figure 7] A flowchart showing an example of processing 700 in the system 1000 of the present invention. [Figure 8] A flowchart showing an example of processing 800 in the system 1000 of the present invention. [Modes for carrying out the invention] 【0010】 Embodiments of the present invention will be described below with reference to the drawings. 【0011】 1. Flow for generating a 3D scene Figures 1A and 1B illustrate an example of a flow for generating and presenting a 3D scene to a user using the system of the present invention. The system of the present invention comprises a dataset generation system 100, an image generation model 200, and a 3D scene generation model 300. Figure 1A shows an example of a flow for the training phase of the image generation model 200, and Figure 1B shows an example of a flow for the utilization phase of the image generation model 200 that has undergone training processing according to the flow in Figure 1A. A 3D scene is generated from the output of the image generation model 200 and presented to user U. The flows in Figures 1A and 1B can be performed, for example, by a service provider that provides users with an experience in a 3D virtual space. 【0012】 In the learning phase flow shown in Figure 1A, input data for the dataset generation system 100 is first prepared. The input data includes multiple images I and point cloud data C in three-dimensional space. In this case, the multiple images are images acquired at points in the three-dimensional space represented by the point cloud data C (for example, photographs taken with a camera). For example, the multiple images I include a first image and a second image, where the first image is an image acquired at a first point in the three-dimensional space represented by the point cloud data C, and the second image is an image acquired at a second point in the three-dimensional space represented by the point cloud data C. The multiple images I may be, for example, images taken at intervals of 5 to 10 meters in three-dimensional space. 【0013】 The multiple images I may be, for example, images taken from a viewpoint in only a specific direction (i.e., images with a field of view in only a specific direction, so-called planar photographs), or images taken from a viewpoint in all directions (i.e., images with a field of view in all directions). Images taken from a viewpoint in all directions are also called equirectangular images. The multiple images I may preferably be equirectangular images. Equirectangular images have the advantage of being able to capture a field of view in all directions at once, and can obtain the same amount of data at once as if many planar photographs were taken at once, thus improving work efficiency. 【0014】 The point cloud data C contains the 3D coordinate values ​​and color values ​​of each of several feature points in 3D space. The 3D coordinate values ​​are typically represented by (x,y,z) in a Cartesian coordinate system, but may also be represented by (r,θ,z) in a cylindrical coordinate system, for example, or by (r,θ,φ) in a spherical coordinate system. The color values ​​are typically represented by RGB values, but may also be represented by CMY values, for example. The point cloud data C may typically be data acquired by a laser scanner, but may be acquired by any other known method. 【0015】 In step S1, multiple images I and point cloud data C are input to the dataset generation system 100. 【0016】 When multiple images I and point cloud data C are input, system 100 generates multiple point cloud images corresponding to the multiple images I. A point cloud image is a visual representation of point cloud data, and more specifically, a point cloud image can be formed by representing pixels, which are represented by the coordinate values ​​of each point in the point cloud data, with a color corresponding to the color value of that point on the image plane. 【0017】 The generated point cloud images will have the same field of view as the corresponding images I. For example, if the images I have a field of view only in a specific direction from the viewpoint, the generated point cloud images will also have a field of view only in that specific direction from the viewpoint. Similarly, if the images I are in equirectangular format, the generated point cloud images will also be in equirectangular format. 【0018】 Specifically, system 100 generates, for example, a first point cloud image corresponding to a first image among a plurality of images I, and a second point cloud image corresponding to a second image among a plurality of images I. The first point cloud image has the same viewpoint as the first image, and the second point cloud image has the same viewpoint as the second image. In other words, the first point cloud image corresponds to a point cloud representation of the first image, and the second point cloud image may correspond to a point cloud representation of the second image. 【0019】 In step S2, the system 100 outputs a training dataset T consisting of multiple images I and multiple point cloud images corresponding to the multiple images I. In the training dataset T, each of the multiple images I is paired with a corresponding point cloud image. For example, the first image is paired with the first point cloud image, and the second image is paired with the second point cloud image. 【0020】 In step S3, the training dataset T is input to the image generation model 200. The image generation model 200 is trained using the training dataset T. Specifically, the image generation model 200 learns the relationship between images and corresponding point cloud images using pairs of multiple images I and their corresponding point cloud images. As a result, the image generation model 200 can output a corresponding image when given a point cloud image as input. Therefore, as long as a point cloud image is available, the image generation model 200 can generate images that were not included in the multiple images I, for example. This makes it possible to generate a large number of images. 【0021】 In the usage phase of the image generation model 200 shown in Figure 1B, first, multiple point cloud image CIs are prepared. Multiple point cloud image CIs can be generated from point cloud data C in three-dimensional space. Some of the multiple point cloud image CIs may be point cloud images generated by the dataset generation system 100 in the flow of Figure 1A (e.g., the first point cloud image and the second point cloud image), but the multiple point cloud image CIs should consist of more images than the point cloud images generated in the flow of Figure 1A. Some or all of the multiple point cloud image CIs may be point cloud images generated by the dataset generation system 100 separately from the flow of Figure 1A. The dataset generation system 100 generates point cloud images with viewpoints at each of the multiple points in the three-dimensional space represented by the point cloud data C. For example, the dataset generation system 100 generates point cloud images with viewpoints at a third point (or a fourth point, a fifth point, a sixth point, ...) between the first point corresponding to the first image and the second point corresponding to the second image. This allows subsequent processing to generate an image that interpolates between the first and second images. For example, the dataset generation system 100 generates a point cloud image with viewpoints at a third point (or a fourth, fifth, sixth, ...) that is separate from the first point corresponding to the first image and the second point corresponding to the second image. This allows subsequent processing to generate an image that extrapolates between the first and second images. 【0022】 In a preferred example, multiple point cloud image CIs are generated by the dataset generation system 100, but the present invention is not limited thereto, and multiple point cloud image CIs can also be generated by other methods or other systems. 【0023】 In step S4, multiple point cloud image CIs are input to the image generation model 200. As described above, the image generation model 200 has been trained to output corresponding images when given point cloud images as input, and therefore can generate multiple images corresponding to the multiple input point cloud image CIs. 【0024】 For example, if multiple point cloud image CIs include a point cloud image with a viewpoint at a third point (or a fourth, fifth, sixth, ...) between a first point corresponding to the first image and a second point corresponding to the second image, the image generation model 200 will generate an image that appears as if it were taken at the third point (or a fourth, fifth, sixth, ...) between the first point corresponding to the first image and the second point corresponding to the second image. This enables the generation of an image that interpolates between the first and second images. For example, even if a service provider that provides a user with an experience in a 3D virtual space does not have an image between a first image taken at a first point and a second image taken at a second point, the image generation model 200 can interpolate the first image taken at the first point and the second image taken at the second point, enabling the construction of a more complete 3D virtual space. 【0025】 For example, if multiple point cloud image CIs include a point cloud image with a viewpoint at a third point (or a fourth, fifth, sixth, ...) that is separate from the first point corresponding to the first image and the second point corresponding to the second image, the image generation model 200 will generate an image that appears as if it were taken at the third point (or a fourth, fifth, sixth, ...) that is separate from the first point corresponding to the first image and the second point corresponding to the second image. This enables the generation of an image that extrapolates the first and second images. For example, even if a service provider that provides a user with an experience in a 3D virtual space does not have an image that is separate from the first image taken at the first point and the second image taken at the second point, the image generation model 200 can extrapolate the first image taken at the first point and the second image taken at the second point, enabling the construction of a more complete 3D virtual space. 【0026】 For example, when the image generation model 200 receives a first point cloud image as input, it may output the first image used during the training phase as is, or it may generate a new image corresponding to the first point cloud image and output that image. Similarly, when the image generation model 200 receives a second point cloud image as input, it may output the second image used during the training phase as is, or it may generate a new image corresponding to the second point cloud image and output that image. 【0027】 In step S5, the multiple images generated by the image generation model 200 are output as multiple training images TI. 【0028】 It should be noted that the number of training images TI is far greater than the number of images used in the flow of Figure 1A. The number of training images TI can be, for example, at least 5, 10, 20, 50, 100, 1000, 10000 times the number of images used in the flow of Figure 1A. Preferably, the number of training images TI can be such that the same object in the space in which the 3D virtual space is to be constructed can be associated among the training images TI. This allows for the construction of a more complete 3D virtual space compared to constructing a 3D virtual space using the number of images used in the flow of Figure 1A. 【0029】 In step S6, multiple training images TI are input to the 3D scene generation model 300. The 3D scene generation model 300 can generate a 3D scene based on the input training images TI using methods known in the art. For example, the 3D scene generation model 300 can generate a 3D scene using Gaussian splatting. Gaussian splatting is a method for reconstructing a 3D space from 2D images from multiple viewpoints. Generally, it involves estimating 3D point cloud data from 2D images, converting the estimated point cloud data into a Gaussian model, and optimizing the Gaussian parameters to generate a 3D space. For example, the 3D scene generation model 300 can generate a 3D scene using NeRF (Neural Radiance Fields). NeRF is a method for reconstructing 3D space from 2D images from multiple viewpoints, and generally uses a neural network to generate 3D space (Ben Mildenhall et al, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”, https: / / doi.org / 10.48550 / arXiv.2003.08934). 【0030】 For example, if multiple training images TI are in equirectangular format, the multiple training images TI may be trimmed into images suitable for processing by the 3D scene generation model 300 before being input to the 3D scene generation model 300. For example, if the 3D scene generation model 300 can generate a 3D scene from images that have a field of view in only a specific direction, images with a field of view in only a specific direction may be extracted or generated from the multiple training images TI and input to the 3D scene generation model 300. For example, since equirectangular images contain information in all directions, it can be understood that some of them contain images with a field of view in only a specific direction. Therefore, if multiple training images TI are in equirectangular format, it is possible to extract or generate even more images from the multiple training images TI, making it possible to input more images to the 3D scene generation model 300. Of course, if the 3D scene generation model 300 supports equirectangular images, multiple training images TI that are in equirectangular format can be used as is. 【0031】 In step S7, the user U is provided with data of a 3D scene (i.e., a 3D virtual space) generated by the 3D scene generation model 300. This allows user U to experience the 3D virtual space, for example, through a head-mounted display. 【0032】 Thus, the system of the present invention makes it possible to generate a more complete 3D scene even when there is not enough images to generate a 3D scene. This could lead to the widespread adoption of user experiences in 3D virtual spaces. 【0033】 The system of the present invention can also be implemented as system 100, for example, as described later. 【0034】 2. Configuration of the present invention system As described above, the system 1000 of the present invention comprises a dataset generation system 100, an image generation model 200, and a three-dimensional scene generation model 300. It should be noted that the present invention can encompass not only the system 1000, but also the dataset generation system 100, the image generation model 200, and the three-dimensional scene generation model 300, each individually or in combination. 【0035】 Figure 2A shows an example of the configuration of the dataset generation system 100. 【0036】 The dataset generation system 100 comprises a first receiving means 110, a second receiving means 120, and a generation means 130. 【0037】 The first receiving means 110 is configured to receive a plurality of images. The plurality of images may be, for example, the plurality of images I shown in Figure 1A, and are a plurality of images acquired at a plurality of locations. The plurality of locations include a first location and a second location, and the plurality of images include a first image acquired at the first location and a second image acquired at the second location. 【0038】 The multiple received images are passed to the generation means 130. 【0039】 The second receiving means 120 is configured to receive point cloud data. The point cloud data may be, for example, the point cloud data C shown in Figure 1A, and is point cloud data of a predetermined three-dimensional space including a first point and a second point. 【0040】 The received point cloud data is passed to the generation means 130. 【0041】 The generation means 130 is configured to generate a plurality of point cloud images based on a plurality of images received by the first receiving means 110 and point cloud data received by the second receiving means 120. The generation means 130 generates, for example, a first point cloud image having the same viewpoint and field of view as a first image included in the plurality of images, and a second point cloud image having the same viewpoint and field of view as a second image included in the plurality of images. For example, if the first image has a field of view only in a specific direction from a first point, the first point cloud image will also have a field of view only in a specific direction from the first point, and if the second image has a field of view only in a specific direction from a second point, the second point cloud image will also have a field of view only in a specific direction from the second point. For example, if the first image has a field of view in all directions from the first point (i.e., it is an equirectangular image), then the first point cloud image will also have a field of view in all directions from the first point (i.e., it will be an equirectangular image). Similarly, if the second image has a field of view in all directions from the second point (i.e., it is an equirectangular image), then the second point cloud image will also have a field of view in all directions from the second point (i.e., it will be an equirectangular image). 【0042】 In one example, the generation means 130 can generate a point cloud image by changing the relative relationship between each point in the point cloud data and the reference point so that the reference point of the point cloud data is aligned with a desired viewpoint (for example, a point corresponding to a first location when generating a first point cloud image; for example, a point corresponding to a second location when generating a second point cloud image), and by mapping each point within the desired field of view in the transformed point cloud data. 【0043】 For example, as shown in Figure 5A, the relative relationship between each point in the point cloud data and the reference point O can be changed by moving the point cloud data so that the reference point O of the point cloud data aligns with the desired viewpoint P. In Figure 5A, for simplicity, the three-dimensional space represented by the point cloud data is represented as a rectangular prism. 【0044】 As shown in Figure 5A(a), if the coordinates of the reference point O in 3D space are (0,0,0), then the coordinates of each point shown in the point cloud data become the relative coordinates of each point with respect to the reference point O. Therefore, by mapping the coordinates of each point shown in the point cloud data onto a desired plane (for example, a 2D plane or an equirectangular plane), a point cloud image can be generated. 【0045】 In Figure 5A(b), the 3D space is moved (specifically, translated and / or rotated) to generate a point cloud image as seen from viewpoint P. As a result, the coordinates of each point shown in the point cloud data when viewpoint P is (0,0,0) become the relative coordinates of each point with respect to viewpoint P. Therefore, by mapping the coordinates of each point shown in the transformed point cloud data onto a desired plane (for example, a 2D plane or an equirectangular plane), a point cloud image as seen from viewpoint P can be generated. 【0046】 A more detailed explanation of this process is shown in Figure 6. Figure 6 illustrates the generation of an equirectangular point cloud image at viewpoint P from the point cloud data. 【0047】 First, a CSV file and point cloud data are prepared, and in step S601, this data is read. The CSV file contains the position coordinates and orientation of points in 3D space. The CSV file also contains the position and orientation of viewpoint P, where the position of viewpoint P is represented as (pano_pos_x,pano_pos_y,pano_pos_z), and the orientation at viewpoint P may be represented in quaternion format as (pano_ori_w,pano_ori_x,pano_ori_y,pano_ori_z). The point cloud data contains the position coordinates (X,Y,Z) and color values ​​(e.g., RGB values) of each point (e.g., feature point in 3D space). 【0048】 The loaded data can be used to generate each of multiple point cloud images. That is, after step S601 is performed once, steps S602 to S605, enclosed by dashed lines, can be performed for each of the multiple images. 【0049】 In step S602, a rotation matrix is ​​calculated. The rotation matrix is ​​used to perform a rotational transformation to align the orientation of the 3D space represented by the point cloud data with the orientation at viewpoint P. Let the orientation at viewpoint P be p = {pano_ori_w, pano_ori_x, pano_ori_y, pano_ori_z}. e0=p[0] e1=p[1] e²=p[2] e3=p[3] So, The rotation matrix is, ([[e0*e0+e1*e1-e2*e2-e3*e3,2*e1*e2-2*e0*e3,2*e0*e2+2*e1*e3], [2*e0*e3+2*e1*e2,e0*e0-e1*e1+e2*e2-e3*e3,2*e2*e3-2*e0*e1], [2*e1*e3-2*e0*e2,2*e0*e1+2*e2*e3,e0*e0-e1*e1-e2*e2+e3*e3]]) It can be expressed as follows. 【0050】 In step S603, the three-dimensional space represented by the point cloud data is translated and / or rotated. 【0051】 Translation is performed by subtracting the position coordinates of the viewpoint P from the position coordinates of each point in the point cloud data. The position coordinates of each point after translation are as follows. Rotation is performed by multiplying the position coordinates of each point in the point cloud data by a rotation matrix. Therefore, the coordinates [X_new, Y_new, Z_new] of each point after translation and rotation are as follows. [X_new,Y_new,Z_new]=Rotation matrix×([X,Y,Z]-[pano_pos_x,pano_pos_y,pano_pos_z]) It is represented as follows. 【0052】 In step S604, the distance r from the reference point (0,0,0) is calculated for each transformed point, and the azimuth angle θ and elevation angle φ are calculated using trigonometric functions. r, θ, and φ can be calculated using the following formulas, respectively. r=sqrt(X_new*X_new+Y_new*Y_new+Z_new*Z_new) θ=(arctan2(Y_new,X_new)+π) / (2*π) φ=(arccos(Z_new / r) / π) 【0053】 The obtained θ and φ can be normalized and scaled to match the pixel count of the final equirectangular image. The equirectangular image may, for example, have 4096 × 2048 pixels, and can be normalized and scaled to match this pixel count. 【0054】 In step S605, an equirectangular point cloud image is generated by mapping (θ,φ) onto a plane. Specifically, the horizontal axis is θ and the vertical axis is φ, and an equirectangular image is generated by plotting a point of color, represented by the color value of each point, at the (θ,φ) coordinate of each point in the point cloud data. 【0055】 Alternatively, as shown in Figure 5B, the relative relationship between each point in the point cloud data and the reference point can be changed by moving the reference point O so that it aligns with the desired viewpoint P. In Figure 5B, for simplicity, the three-dimensional space represented by the point cloud data is shown as a rectangular prism. 【0056】 As shown in Figure 5B(a), if the coordinates of the reference point O in three-dimensional space are (0,0,0), then the coordinates of each point shown in the point cloud data become the relative coordinates of each point with respect to the reference point O. Therefore, by mapping the coordinates of each point shown in the point cloud data onto a desired plane (for example, a two-dimensional plane or an equirectangular plane), a point cloud image can be generated. 【0057】 In Figure 5B(b), the reference point O is moved (specifically, translated and / or rotated) in order to generate a point cloud image as seen from viewpoint P. By mapping the relative coordinates of each point in the point cloud data, where the reference point O is (Px, Py, Pz), onto a desired plane (e.g., a two-dimensional plane or an equirectangular plane), a point cloud image as seen from viewpoint P can be generated. 【0058】 In the case of Figure 5B(b), the same process as shown in Figure 6 can be used to generate an equirectangular point cloud image at viewpoint P from the point cloud data. 【0059】 For example, when generating a point cloud image from point cloud data that has a field of view in only a specific direction, rather than in equirectangular format, the process shown in Figure 6 can also be applied. However, in steps S604 and S605 of Figure 6, each point after transformation is mapped so that the space is cropped to have a field of view in only that specific direction. 【0060】 Figure 2B shows an example of the configuration of system 1000. 【0061】 System 1000 comprises a dataset generation system 100 and an image generation model 200. 【0062】 The dataset generation system 100 has a configuration similar to that described above, with reference to Figure 2A. The dataset generated by the dataset generation system 100 is provided to the image generation model 200. The dataset includes multiple images and multiple corresponding point cloud images. That is, in the dataset, each of the multiple images is paired with its corresponding point cloud image. This allows the image generation model 200 to easily undergo training. 【0063】 The image generation model 200 is configured to undergo training using a dataset. The image generation model 200 can be trained by taking one point cloud image from the dataset as input and the corresponding image as output data, so that when a point cloud image is input, it can output a corresponding image. 【0064】 The image generation model 200 can be any model structure as long as it can generate an image, and may be, for example, a VAE, GAN, flow-based model, or diffusion-based model. 【0065】 The trained image generation model 200 can be used for any purpose. For example, if point cloud data exists but no corresponding image exists, the image generation model 200 can generate an image using the point cloud image generated from the point cloud data by the generation means 130 of the dataset generation system 100. Alternatively, the image generation model 200 can generate an image using a separately generated point cloud image. 【0066】 For example, as shown in Figure 2C, the image generation model 200 can be used to generate an input image for the 3D scene generation model 300. 【0067】 Figure 2C shows another example of the configuration of system 1000. The example shown in Figure 2C is similar to the configuration shown in Figure 2B, except that it further includes a 3D scene generation model 300. 【0068】 System 1000 comprises a dataset generation system 100, an image generation model 200, and a 3D scene generation model 300. 【0069】 The dataset generation system 100 has a configuration similar to that described above, with reference to Figure 2A. In particular, the generation means 130 can generate a point cloud image from the point cloud data received by the second receiving means 120, and the generated point cloud image is provided to the image generation model 200. 【0070】 As described above with reference to Figure 2B, the image generation model 200 has already undergone training using the dataset. Therefore, the image generation model 200 is configured to generate an image corresponding to the input point cloud image. A corresponding image is an image that has the same viewpoint and field of view as the input point cloud image. For example, if the input point cloud image has a field of view in only a specific direction, the output image will also have a field of view in only that specific direction. For example, if the input point cloud image is in equirectangular format, the output image will also be in equirectangular format. 【0071】 For example, the generation means 130 of the dataset generation system 100 can generate a large number of point cloud images from point cloud data, and by inputting these large number of point cloud images into the image generation model 200, it is possible to generate a large number of images corresponding to the large number of point cloud images. 【0072】 The 3D scene generation model 300 is configured to generate a 3D scene based on input images. The 3D scene generation model 300 can generate a 3D scene using, for example, Gaussian splatting. Gaussian splatting is preferred because it enables the generation of a 3D scene from a large number of images with relatively short computation time. 【0073】 The generated 3D scene is provided, for example, to the user's terminal device (e.g., a head-mounted display), enabling the user to experience the 3D virtual space through the head-mounted display. 【0074】 As described above, even when there are not enough images of the space in which a 3D virtual space is to be generated, the system 1000 of the present invention makes it possible to generate a large number of images, and thus facilitates the provision of various 3D virtual spaces to users. For example, it facilitates the use of Gaussian splatting, which enables the provision of 3D virtual spaces in a shorter time. This can lead to improvements in the field of virtual reality. 【0075】 The system 1000 described above can be implemented, for example, as a server device. 【0076】 Figure 3 shows an example configuration when system 1000 is implemented as a server device. System 1000 (server device) is connected to the database unit 400. System 1000 is connected to at least one terminal device 500 and at least one head-mounted display 600 via network N. Here, the type of network N is not specified. For example, network N may be the internet or a LAN. 【0077】 For example, the database unit 400 connected to system 1000 can store multiple images generated by system 1000. 【0078】 Terminal device 500 may be a terminal device of a user using system 1000. A user using system 1000 uses system 1000, for example, to generate a 3D scene, to obtain an image generation model to obtain images for generating a 3D scene, or to obtain a training dataset to obtain an image generation model. Terminal device 500 may be, but is not limited to, a smartphone, tablet computer, smartwatch, mobile phone, etc. Three terminal devices 500 are depicted in Figure 3, but the number of terminal devices 500 is not limited to this. The number of terminal devices 500 may be one or more. 【0079】 The head-mounted display 600 may be a device for presenting a three-dimensional scene to a user. The head-mounted display 600 may be any other device, as long as it is capable of presenting a three-dimensional scene to a user. 【0080】 Figure 4 shows an example of the configuration of system 1000 implemented as a server device. 【0081】 System 1000 comprises an interface unit 1100, a memory unit 1300, and a processor unit 1200. System 1000 is connected to a database unit 400. 【0082】 As described above, the database unit 400 connected to the system 1000 can store multiple images generated by the system 1000. The database unit 400 can also store multiple point cloud images corresponding to the multiple images. 【0083】 The interface unit 1100 controls communication via the network N. The interface unit 1100 also controls communication with the database unit 400. The processor unit 1200 of system 1000 can receive information from outside system 1000 and transmit information to outside system 1000 via the interface unit 1100. The processor unit 1200 of system 1000 can receive information (e.g., multiple images and point cloud data) from terminal device 500 and transmit information to terminal device 500 via the interface unit 1100. The processor unit 1200 of system 1000 can receive information (e.g., a request for 3D scene data) from head-mounted display 600 and transmit information (e.g., 3D scene data) to head-mounted display 600 via the interface unit 1100. The interface unit 1100 can control communication in any manner. For example, at least a portion of the first receiving means 110 and the second receiving means 120 of system 1000 may be implemented by the interface unit 1100. 【0084】 The memory unit 1300 stores programs necessary for executing the processes of the system 1000, as well as data necessary for executing those programs. For example, it stores part or all of a program that causes the processor unit 1200 to perform processing to create a dataset for training an image generation model (for example, a program that implements the processing shown in Figure 7, described later). For example, it stores part or all of a program that causes the processor unit 1200 to perform processing to create a dataset for training a 3D scene generation model (for example, a program that implements the processing shown in Figure 8, described later). Here, it is not relevant how the programs are stored in the memory unit 1300. For example, the programs may be pre-installed in the memory unit 1300. Alternatively, the programs may be installed in the memory unit 1300 by being downloaded via a network. The programs may be stored on a computer-readable tangible storage medium. The programs may be implemented as computer program products. The memory unit 1300 can be implemented by any storage means. 【0085】 The processor unit 1200 controls the operation of the entire system 1000. The processor unit 1200 reads a program stored in the memory unit 1300 and executes that program. This makes it possible to make the system 1000 function as a device that executes a desired step. The processor unit 1200 may be implemented by a single processor or by multiple processors. At least a portion of the first receiving means 110 and the second receiving means 120 of the system 1000 may be implemented by the processor unit 1200. Furthermore, the generation means 130 may be implemented by the processor unit 1200. In addition, the image generation model 200 and the 3D scene generation model 300 may be implemented by the processor unit 1200. 【0086】 In the example shown in Figure 4, each component of system 1000 is located within system 1000, but the present invention is not limited to this. Any of the components of system 1000 may be located outside of system 1000. For example, if the memory unit 1300 and the processor unit 1200 are each composed of separate hardware components, each hardware component may be connected via any network. The type of network is not limited. Each hardware component may be connected, for example, via a LAN, wirelessly, or via a wired connection. System 1000 is not limited to a specific hardware configuration. For example, configuring the processor unit 1200 with analog circuits instead of digital circuits is also within the scope of the present invention. The configuration of system 1000 is not limited to those described above, as long as its function can be realized. 【0087】 In the examples shown in Figures 3 and 4, the database unit 400 is located outside the system 1000, but the present invention is not limited to this. The database unit 400 can also be located inside the system 1000. In this case, the database unit 400 may be implemented by the same storage means as the storage means that implements the memory unit 1300, or it may be implemented by a different storage means than the storage means that implements the memory unit 1300. In any case, the database unit 400 is configured as a storage unit for the system 1000. The configuration of the database unit 400 is not limited to a specific hardware configuration. For example, the database unit 400 may be composed of a single hardware component or of multiple hardware components. For example, the database unit 400 may be configured as an external hard disk drive for the system 1000, or as cloud storage connected via a network. 【0088】 3. Processing in the system of the present invention Figure 7 shows an example of processing 700 in the system 1000 of the present invention. Processing 700 is a process for creating a dataset for training an image generation model. Processing 700 is executed, for example, in the processor unit 1200 that implements the system 1000. 【0089】 In step S701, the processor unit 1200 receives multiple images. These multiple images may be, for example, the multiple images I shown in Figure 1A, and are multiple images acquired at multiple locations. The multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. 【0090】 In step S702, the processor unit 1200 receives point cloud data. The point cloud data may be, for example, point cloud data C shown in Figure 1A, and is point cloud data of a predetermined three-dimensional space including a first point and a second point. 【0091】 In step S703, the processor unit 1200 generates multiple point cloud images. From the point cloud data received in step S702, the processor unit 1200 generates a first point cloud image having the same viewpoint and field of view as the first image included in the multiple images received in step S701, and a second point cloud image having the same viewpoint and field of view as the second image included in the multiple images received in step S701. 【0092】 In one example, when generating an equirectangular point cloud image, the processor unit 1200 converts the point cloud data so that the reference point of the point cloud data matches the desired viewpoint, and then generates the point cloud image by mapping each point within the desired field of view in the converted point cloud data. Specifically, the processor unit 1200, This involves changing the relative relationship between each point in the point cloud data and the reference point so that the given viewpoint is aligned with the reference point, For each point in the modified point cloud data, calculate the distance r from the reference point, and Based on the distance r, calculate the azimuth angle θ and elevation angle φ, Mapping each point to its respective (θ,φ) and This allows for the generation of a point cloud image in equirectangular format. Here, changing the relative relationship between each point in the point cloud data and a reference point includes translating and / or rotating each point in the point cloud data, and for this purpose, the processor unit 1200, Calculating a rotation matrix for point cloud data, Translating each point in the point cloud data and / or rotating it using a rotation matrix. To do so. 【0093】 The point cloud images generated in this way are paired with their corresponding images. Therefore, the dataset output by process 700 will contain multiple pairs of multiple images and multiple point cloud images, and in particular, the multiple pairs will include pairs of the first image and the first point cloud image, and pairs of the second image and the second point cloud image. 【0094】 The process 700 may include a step for creating an image generation model 200, in which case, in step S704, the processor unit 1200 creates an image generation model using the dataset created in steps S701 to S703. The dataset includes pairs of multiple images and multiple point cloud images. This allows the image generation model to easily learn pairs of images and point cloud images. 【0095】 The trained image generation model 200 can be used for any purpose. For example, as shown in Figure 8, it can be used to create a dataset for training a 3D scene generation model. 【0096】 Figure 8 shows an example of processing 800 in the system 1000 of the present invention. Processing 800 is a process for creating a dataset for training a 3D scene generation model. Processing 800 is executed, for example, in the processor unit 1200 that implements the system 1000. 【0097】 In step S801, the processor unit 1200 generates a point cloud image. The processor unit 1200 can generate a point cloud image from point cloud data, for example. For example, by specifying a given viewpoint in three-dimensional space, the processor unit 1200 can generate a point cloud image at the given viewpoint. 【0098】 In step S802, the processor unit 1200 generates an image corresponding to the point cloud image. The processor unit 1200 can generate an image corresponding to the point cloud image using the trained image generation model 200. When the point cloud image is input to the trained image generation model 200, the corresponding image is output. 【0099】 By repeating steps S801 and S802 multiple times, multiple images corresponding to multiple point cloud images can be obtained. For example, even if there are not enough images in 3D space and only point cloud data exists, process 800 can generate a large number of images in 3D space. 【0100】 The process 800 may include a step for creating a 3D scene generation model 300, in which case, in step 803, the processor unit 1200 creates the 3D scene generation model 300 using the image generated in step S802. 【0101】 For example, the 3D scene generation model 300 may be created using the image generated in step S802 as is, or at least one partial image may be generated from the image generated in step S802, and the 3D scene generation model 300 may be created using at least one partial image. In particular, if the image is in equirectangular format, a partial image having a field of view in a specific direction may be generated from the equirectangular image, and the 3D scene generation model 300 may be created using the partial image. This can be done, for example, if the 3D scene generation model 300 does not support the equirectangular format. Furthermore, it is possible to generate a large number of partial images from a single equirectangular image, which is useful in that it can further increase the number of images for 3D scene generation. 【0102】 Referring to Figures 7 and 8, the above examples illustrate that each step is performed in a specific order. However, the order shown is merely an example, and the order in which each step is performed is not limited to this. Each step can be performed in any logically possible order. In addition, other steps may be performed in addition to, or instead of, the steps shown. 【0103】 Furthermore, in the examples described above with reference to Figures 7 and 8, the processing of each step shown in Figures 7 and 8 can be realized by a program stored in the processor unit 1200 and the memory unit 1300 that implement the system 1000. However, the present invention is not limited thereto. At least one of the processing steps shown in Figures 7 and 8 may be realized by a hardware configuration such as a control circuit. 【0104】 The present invention is not limited to the embodiments described above. It is understood that the scope of the present invention should be interpreted solely by the claims. Those skilled in the art will understand that, based on the description of specific preferred embodiments of the present invention and common technical knowledge, an equivalent scope can be practiced. [Industrial applicability] 【0105】 This invention is useful as it provides a system for creating datasets for training image generation models. [Explanation of Symbols] 【0106】 100 Dataset Generation Systems 200 Image Generation Models 300 3D Scene Generation Models 1000 System

Claims

[Claim 1] A system for creating datasets for training image generation models, A first receiving means for receiving multiple images acquired at multiple locations, wherein the multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. A second receiving means for receiving point cloud data of a predetermined three-dimensional space, wherein the predetermined three-dimensional space includes the first point and the second point, A generation means for generating a plurality of point cloud images from the point cloud data, wherein the generation means generates a first point cloud image having the same viewpoint and field of view as the first image, and a second point cloud image having the same viewpoint and field of view as the second image. A system comprising, wherein the dataset includes a plurality of pairs of the plurality of images and the plurality of point cloud images, and the plurality of pairs include a pair of the first image and the first point cloud image, and a pair of the second image and the second point cloud image. [Claim 2] The system according to claim 1, wherein each of the plurality of images is an equirectangular image, and each of the plurality of point cloud images is an equirectangular point cloud image. [Claim 3] The generating means is The relative relationship between each point in the point cloud data and the reference point is changed so that the given viewpoint is aligned with the reference point. For each point in the modified point cloud data, calculate the distance r from the reference point. Based on the aforementioned distance r, the azimuth angle θ and elevation angle φ are calculated, Mapping each point to its respective (θ, φ) and The system according to claim 2, wherein the equirectangular point cloud image is generated by the above method. [Claim 4] Changing the relative relationship between each point in the aforementioned point cloud data and the reference point is, Calculating a rotation matrix for the aforementioned point cloud data, Translating each point in the point cloud data and / or rotating it using the rotation matrix The system according to claim 3, including the system described in claim 3. [Claim 5] An image generation model trained using a dataset created by the system described in claim 1. [Claim 6] A method for creating a dataset for training a 3D scene generation model, The generation means of the system described in claim 1 generates a third point cloud image of a third point in the predetermined three-dimensional space, The image generation model according to claim 5, upon receiving the third point cloud image as input, generates a third image that appears to have been acquired at the third location. A method comprising the above, wherein the dataset includes the first image, the second image, and the third image. [Claim 7] To generate at least one first partial image from the first image, To generate at least one second partial image from the second image, To generate at least one third partial image from the third image, The method according to claim 6, further comprising, wherein the dataset includes the at least one first partial image instead of the first image, the at least one second partial image instead of the second image, and the at least one third partial image instead of the third image. [Claim 8] A 3D scene generation model trained on a dataset created by the method described in claim 6 or claim 7. [Claim 9] A method for creating a dataset for training an image generation model, The method involves receiving multiple images acquired at multiple locations, wherein the multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. The method involves receiving point cloud data in a predetermined three-dimensional space, wherein the predetermined three-dimensional space includes the first point and the second point. The method of generating multiple point cloud images from the point cloud data includes generating a first point cloud image having the same field of view as the first image and a second point cloud image having the same field of view as the second image. A method comprising, wherein the dataset comprises a plurality of pairs of the plurality of images and the plurality of point cloud images, the plurality of pairs comprising a pair of the first image and the first point cloud image, and a pair of the second image and the second point cloud image. [Claim 10] A program for creating a dataset for training an image generation model, wherein the program is executed on a computer equipped with a processor, and the program is The method involves receiving multiple images acquired at multiple locations, wherein the multiple locations include a first location and a second location, and the multiple images include a first image acquired at the first location and a second image acquired at the second location. The method involves receiving point cloud data in a predetermined three-dimensional space, wherein the predetermined three-dimensional space includes the first point and the second point. The method of generating multiple point cloud images from the point cloud data includes generating a first point cloud image having the same field of view as the first image and a second point cloud image having the same field of view as the second image. A program that causes the processor to perform a process including the following, wherein the dataset includes a plurality of pairs of the plurality of images and the plurality of point cloud images, and the plurality of pairs include a pair of the first image and the first point cloud image, and a pair of the second image and the second point cloud image.