Encoding method, decoding method, encoding device, and decoding device
The encoding method integrates three-dimensional generative model data with metadata for viewpoint information, enhancing image generation accuracy by aligning with training data, addressing the data compression needs of three-dimensional point clouds.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- PANASONIC INTELLECTUAL PROPERTY CORP OF AMERICA
- Filing Date
- 2025-12-10
- Publication Date
- 2026-06-18
AI Technical Summary
The large amount of data in three-dimensional point clouds necessitates efficient compression methods to facilitate their accumulation and transmission, while existing techniques do not adequately address the accuracy of image generation using three-dimensional generative models on the decoding side.
An encoding method that includes encoded data of a three-dimensional generative model and metadata indicating viewpoint information, generating a bitstream for integrated transmission and decoding to enhance image generation accuracy.
Improves the accuracy of image generation by ensuring the decoding side can utilize viewpoint information effectively, aligning with training data for precise image reconstruction.
Smart Images

Figure JP2025043014_18062026_PF_FP_ABST
Abstract
Description
Symbolization method, decoding method, symbolization device, and decoding device 【0007】 , 【0006】 , , 【0001】 The present disclosure relates to a symbolization method, a decoding method, a symbolization device, and a decoding device. 【0002】 In the future, the spread of devices or services that utilize three-dimensional data is expected in a wide range of fields such as computer vision, map information, monitoring, infrastructure inspection, or video distribution for autonomous operation of automobiles or robots. Three-dimensional data is acquired by various methods such as distance sensors such as lidar, stereo cameras, or combinations of multiple monocular cameras. 【0003】 As one method of expressing three-dimensional data, there is a method of expression called point cloud that represents the shape of a three-dimensional structure by a point group in a three-dimensional space. In a point cloud, the positions and colors of the point group are stored. Although the point cloud is expected to become mainstream as a method of expressing three-dimensional data, the amount of data of the point group is extremely large. Therefore, in the accumulation or transmission of three-dimensional data, similar to two-dimensional moving images (for example, MPEG-4 AVC or HEVC standardized by MPEG), compression of the amount of data by symbolization is essential. 【0004】 Also, regarding the compression of point cloud, it is partially supported by public libraries (Point Cloud Library) that perform point cloud-related processing. 【0005】 Also, a technique of searching for and displaying facilities located around a vehicle using three-dimensional map data is known (for example, see Patent Document 1). 【0006】 International Publication No. 2014 / 020663 【0007】ISO / IEC 15938-17:2022 (Information technology - Multimedia content description interface - Part 17: Compression of neural networks for multimedia content description and analysis (https / / www.iso.org / standard / 78480.html)) 【0008】 This disclosure aims to provide an encoding method and the like that can improve the accuracy of image generation using a three-dimensional generative model on the decoding side. 【0009】 An encoding method according to one aspect of the present disclosure acquires encoded data of a three-dimensional generative model generated by learning about three-dimensional space and one or more viewpoint information corresponding to one or more images used for learning to generate the three-dimensional generative model, generates metadata indicating the one or more viewpoint information, and generates a bitstream including the encoded data of the three-dimensional generative model and the metadata. 【0010】 A decoding method according to one aspect of the present disclosure acquires a bitstream including encoded data of a three-dimensional generative model generated by learning about three-dimensional space and metadata indicating one or more viewpoint pieces of information corresponding to one or more images used for learning to generate the three-dimensional generative model, and decodes the encoded data of the three-dimensional generative model and the metadata from the bitstream to acquire the three-dimensional generative model and the one or more viewpoint pieces of information. 【0011】 These comprehensive or specific embodiments may be implemented as devices, systems, integrated circuits, computer programs, or recording media such as computer-readable CD-ROMs, or as any combination of devices, systems, methods, integrated circuits, computer programs, and recording media. 【0012】 The encoding method and other related technologies disclosed herein can improve the accuracy of image generation using a three-dimensional generative model on the decoding side. 【0013】Figure 1 is a diagram showing an example configuration of a three-dimensional data encoding and decoding system according to Embodiment 1. Figure 2 is a diagram showing the configuration of point cloud data in Embodiment 1. Figure 3 is a diagram showing an example configuration of a data file describing information about point cloud data in Embodiment 1. Figure 4 is a diagram showing the configuration of three-dimensional mesh data in Embodiment 1. Figure 5 is a diagram showing an example configuration of a data file describing information about three-dimensional mesh data in Embodiment 1. Figure 6 is a diagram for explaining a three-dimensional model in Embodiment 1. Figure 7 is a block diagram showing an example of an apparatus for generating three-dimensional data using the Gaussian Splatting method in Embodiment 1. Figure 8 is a table showing the components of Gaussian data described in PLY format in Embodiment 1. Figure 9 is a block diagram for explaining an example of rendering processing in Embodiment 1. Figure 10 is a block diagram showing an example of processing using spherical harmonics of color in Embodiment 1. Figure 11 is a block diagram for explaining another example of rendering processing in Embodiment 1. Figure 12 is a diagram for explaining the number of FH coefficients for each level of spherical harmonics in Embodiment 1. Figure 13 is a block diagram illustrating an example of the process of encoding and multiplexing Gaussian data in Embodiment 1. Figure 14 is a block diagram illustrating an example of the configuration of a system that decodes Gaussian data and presents it in an application in Embodiment 1. Figure 15 is a diagram illustrating the types of three-dimensional data in Embodiment 1. Figure 16 is a diagram illustrating the encoding process of three-dimensional data in Embodiment 1. Figure 17 is a diagram illustrating the decoding process of three-dimensional data in Embodiment 1. Figure 18 is a diagram schematically showing tiles and slices of three-dimensional data in two dimensions in Embodiment 1. Figure 19 is a block diagram illustrating an example of the functional configuration of a server and a terminal in Embodiment 1. Figure 20 is a block diagram illustrating another example of the data generation unit of a server in Embodiment 1. Figure 21 is a diagram illustrating the relationship between three-dimensional space and encoded data in Embodiment 1.Figure 22 is a diagram showing an example of the syntax of an encoding scheme unit in Embodiment 1. Figure 23 is a diagram showing an example of the syntax of an encoded point cloud in Embodiment 1. Figure 24 is a diagram showing an example of the syntax of an encoded mesh in Embodiment 1. Figure 25 is a diagram showing an example of the syntax of an encoded three-dimensional model in Embodiment 1. Figure 26 is a diagram showing an example of the syntax of three-dimensional data information in Embodiment 1. Figure 27 is a diagram illustrating the data structure of an encoded point cloud in Embodiment 1. Figure 28 is a diagram illustrating the data structure of an encoded mesh in Embodiment 1. Figure 29 is a diagram illustrating the data structure of an encoded three-dimensional model in Embodiment 1. Figure 30 is a diagram showing an example of multiple three-dimensional spaces in two dimensions in Embodiment 1. Figure 31 is a diagram showing an example of a bounding box in Embodiment 1. Figure 32 is a diagram showing an example of the syntax of three-dimensional spatial information in Embodiment 1. Figure 33 is a flowchart showing an example of partial decoding in Embodiment 1. Figure 34 is a diagram showing an example of a three-dimensional spatial region targeted for partial decoding in Embodiment 1. Figure 35 is a diagram showing an example of the data structure of a partially decoded encoded point cloud in Embodiment 1. Figure 36 is a diagram showing an example of the data structure of a partially decoded encoded mesh in Embodiment 1. Figure 37 is a diagram showing an example of the data structure of a partially decoded encoded three-dimensional model in Embodiment 1. Figure 38 is a diagram showing an example of the configuration of a decoding device in Embodiment 1. Figure 39 is a flowchart showing an example of a decoding method by the decoding device in Embodiment 1. Figure 40 is a flowchart showing another example of a decoding method by the decoding device in Embodiment 1. Figure 41 is a diagram showing an example of the configuration of an encoding device in Embodiment 1. Figure 42 is a flowchart showing an example of an encoding method by the encoding device in Embodiment 1. Figure 43 is a block diagram showing an example of a device that generates format data from data in Embodiment 1. Figure 44 is a block diagram showing an example of a device that restores the original data from the format data in Embodiment 1.Figure 45 is a conceptual diagram illustrating an example of how the encoded data bitstream is stored in the system format in Embodiment 1. Figure 46 is a diagram illustrating an example of the box structure of ISOBMFF in Embodiment 1. Figure 47 is a diagram illustrating the training process of the three-dimensional generative model in Embodiment 2. Figure 48 is a diagram illustrating the process of generating a still image of a subject viewed from an arbitrary viewpoint using the three-dimensional generative model in Embodiment 2. Figure 49 is a diagram illustrating a method for generating moving images using the three-dimensional data generation model of Embodiment 1 in Embodiment 2. Figure 50 is a diagram illustrating a first example of the configuration of the encoding device of Embodiment 1 in Embodiment 2. Figure 51 is a diagram illustrating a first example of the configuration of the decoding device of Embodiment 1 in Embodiment 2. Figure 52 is a diagram illustrating a second example of the configuration of the encoding device of Embodiment 1 in Embodiment 2. Figure 53 is a diagram illustrating a second example of the configuration of the decoding device of Embodiment 1 in Embodiment 2. Figure 54 is a diagram illustrating a method for generating moving images using the extended three-dimensional data generation model of Embodiment 2 in Embodiment 2. Figure 55 is a diagram illustrating a first example of the configuration of the encoding device of Embodiment 2 in Embodiment 2. Figure 56 is a diagram showing a first example of the configuration of the decoding device of Embodiment 2. Figure 57 is a diagram showing a second example of the configuration of the encoding device of Embodiment 2. Figure 58 is a diagram showing a second example of the configuration of the decoding device of Embodiment 2. Figure 59 is a diagram illustrating a video generation method using an extended three-dimensional data generation model according to a modified example of Embodiment 2. Figure 60 is a diagram illustrating a video generation method using a three-dimensional data generation model according to a modified example of Embodiment 2. Figure 61 is a diagram showing an example of the configuration of the encoding device in Embodiment 2. Figure 62 is a flowchart showing an example of an encoding method by the encoding device in Embodiment 2. Figure 63 is a diagram showing an example of the configuration of the decoding device in Embodiment 2. Figure 64 is a flowchart showing an example of a decoding method by the decoding device in Embodiment 2. Figure 65 is a diagram showing an example of the configuration of the encoding device. Figure 66 is a diagram showing an example of the configuration of the decoding device.Figure 67 is a diagram illustrating the training process of the three-dimensional data generation model in Embodiment 3. Figure 68 is a diagram illustrating the process of generating a still image of a subject viewed from an arbitrary viewpoint using the three-dimensional data generation model in Embodiment 3. Figure 69 is a diagram illustrating the video generation method using the extended three-dimensional data generation model of Example 1 in Embodiment 3. Figure 70 is a diagram showing a first example of the configuration of the encoding device of Example 1 in Embodiment 3. Figure 71 is a diagram showing a first example of the configuration of the decoding device of Example 1 in Embodiment 3. Figure 72 is a diagram showing a second example of the configuration of the encoding device of Example 1 in Embodiment 3. Figure 73 is a diagram showing a second example of the configuration of the decoding device of Example 1 in Embodiment 3. Figure 74 is a block diagram showing an example of the configuration of an encoding device that encodes multiple networks in Example 2 of Embodiment 3. Figure 75 is a diagram showing an example of encoded data of a trained first network in Embodiment 3. Figure 76 is a diagram showing an example of encoded data of a trained second network in Embodiment 3. Figure 77 is a block diagram showing an example of the configuration of a decoding device that decodes multiple networks in Embodiment 3. Figure 78 is a diagram showing an example of the configuration of the encoding device in Embodiment 3. Figure 79 is a flowchart showing a first example of the encoding method by the encoding device in Embodiment 3. Figure 80 is a flowchart showing a second example of the encoding method by the encoding device in Embodiment 3. Figure 81 is a diagram showing an example of the configuration of the decoding device in Embodiment 3. Figure 82 is a flowchart showing a first example of the decoding method by the decoding device in Embodiment 3. Figure 83 is a flowchart showing a second example of the encoding method by the decoding device in Embodiment 3. Figure 84 is a block diagram showing an example of the transmission and reception system in Embodiment 4. Figure 85 is a flowchart showing an example of the process of encoding a three-dimensional model including viewpoint information used for learning in Embodiment 4. Figure 86 is a flowchart showing an example of the process of utilizing multiple viewpoint information used for learning in Embodiment 4.Figure 87 is a block diagram illustrating an example of a configuration in Embodiment 4 where viewpoint information used for learning is used for rendering for display. Figure 88 is a flowchart illustrating an example of a processing procedure in Embodiment 4 where viewpoint information used for learning is used for rendering for display. Figure 89 is a block diagram illustrating an example of a device in Embodiment 4 that switches between viewpoint information used for learning and the user viewpoint for rendering. Figure 90 is a flowchart illustrating an example of a process for switching the rendering viewpoint in the device shown in Figure 89 in Embodiment 4. Figure 91 is a diagram illustrating an example of a user interface for a user to operate the rendering viewpoint switching process shown in Figures 89 and 90 in Embodiment 4. Figure 92 is a block diagram illustrating an example of a system configuration used for retraining a three-dimensional generative model in Embodiment 4. Figure 93 is a diagram illustrating an example of the syntax of viewpoint information metadata in Embodiment 4. Figure 94 is a diagram illustrating an example of the syntax of pose information in Embodiment 4. Figure 95 is a diagram illustrating an example of the syntax of camera parameter information in Embodiment 4. Figure 96 is a diagram illustrating an example of the storage configuration of viewport_type in Embodiment 4. Figure 97 is a diagram illustrating an example of a method for generating and transmitting viewpoint information metadata for a three-dimensional model frame by frame in Embodiment 4. Figure 98 is a diagram illustrating an example of the configuration of metadata in Embodiment 4 that shows the contribution used in training, in addition to the viewpoint information of the viewpoint images used for training. Figure 99 is a diagram illustrating an example of the syntax of metadata related to the training images of a three-dimensional model in Embodiment 4. Figure 100 is a diagram illustrating an example of the configuration of training image metadata for a three-dimensional model that transmits image information in addition to the viewpoint information of the images used for training, in Embodiment 4. Figure 101 is a flowchart illustrating an example of the processing flow for displaying a thumbnail or the initial rendering screen in Embodiment 4. Figure 102 is a flowchart illustrating an example of the processing flow for evaluating the quality of three-dimensional data in Embodiment 4. Figure 103 is a block diagram illustrating a configuration in Embodiment 4 for transmitting two-dimensional video images and viewpoint information using a video-image encoding scheme.Figure 104 shows an example of storing encoded data and viewpoint information of a three-dimensional model in a system format in Embodiment 4. Figure 105 shows an example of the configuration of a receiving device in Embodiment 4. Figure 106 is a flowchart showing an example of a receiving method by the receiving device in Embodiment 4. Figure 107 shows an example of the configuration of a transmitting device in Embodiment 4. Figure 108 is a flowchart showing an example of a transmission method by the transmitting device in Embodiment 4. 【0014】 An encoding method according to a first aspect of this disclosure acquires encoded data of a three-dimensional generative model generated by learning about three-dimensional space and one or more viewpoint information corresponding to one or more images used for learning to generate the three-dimensional generative model, generates metadata indicating the one or more viewpoint information, and generates a bitstream including the encoded data of the three-dimensional generative model and the metadata. 【0015】 According to this method, the encoded data of the three-dimensional generative model and one or more viewpoint information corresponding to one or more images used for training can be included in the same bitstream, associated, and transmitted. Therefore, when one or more viewpoint information is input to the three-dimensional generative model decoded using that bitstream to generate an image in three-dimensional space, it becomes easier to obtain a high-precision image corresponding to the viewpoint used for training, thereby improving the accuracy of image generation using the three-dimensional generative model. 【0016】 The encoding method according to a second aspect of the present disclosure is the encoding method according to a first aspect, wherein the one or more viewpoint pieces of information include a plurality of viewpoint pieces of information corresponding to each of a plurality of images used for learning to generate the three-dimensional generative model. 【0017】 According to this method, multiple viewpoint information corresponding to each of the multiple images used for training can be stored as metadata, and each viewpoint information can be individually referenced on the decoding side and used for image generation using a three-dimensional generative model. As a result, it is possible to generate images that appropriately reflect the diverse viewpoints used for training. 【0018】An encoding method according to a third aspect of this disclosure is an encoding method according to the first or second aspect, wherein the three-dimensional generation model is generated by learning using one or more images obtained from one or more viewpoints indicated by the one or more viewpoint information. 【0019】 According to this, since the three-dimensional generative model is generated by learning using one or more images obtained from one or more viewpoints indicated by one or more viewpoint information, when one or more viewpoint information is input into the three-dimensional generative model to generate an image in three-dimensional space, it becomes easier to obtain an image that is consistent with the viewpoints used for learning, and the accuracy of image generation using the three-dimensional generative model can be improved. 【0020】 An encoding method according to a fourth aspect of the present disclosure is an encoding method according to any one of the first to third aspects, wherein in the generation of the bitstream, a first bitstream including encoded data of the three-dimensional generation model and metadata is generated, and a second bitstream different from the first bitstream and including one or more images is generated. 【0021】 According to this, since the encoded data of the three-dimensional generative model and metadata indicating one or more viewpoints are included in the same first bitstream, the decoding side can use the one or more viewpoints associated with the three-dimensional generative model in an integrated manner simply by acquiring the first bitstream. 【0022】 An encoding method according to a fifth aspect of the present disclosure is an encoding method according to any one of the first to third aspects, wherein in the generation of the bitstream, a first bitstream containing encoded data of the three-dimensional generation model and a second bitstream different from the first bitstream and containing the metadata and one or more images are generated. 【0023】 According to this, the first bitstream containing the encoded data of the three-dimensional generative model is separated from the second bitstream, which contains metadata indicating one or more viewpoint pieces of information and one or more images. Therefore, the decoding side can use the images used for training and the one or more viewpoint pieces of information corresponding to those images as a whole simply by acquiring the second bitstream. 【0024】 An encoding method according to a sixth aspect of the present disclosure is an encoding method according to any one aspect of the first to fifth aspects, wherein each of the one or more viewpoint pieces of information includes a type, and the type includes a first value indicating that the viewpoint piece of information corresponding to the type is a viewpoint piece of information recommended by the user, or a second value indicating that the viewpoint piece of information corresponding to the type corresponds to an image used for training to generate the three-dimensional generative model. 【0025】 According to this, each viewpoint information can be assigned a classification, such as whether it is a recommended viewpoint or a viewpoint used for training. This allows the decoding side to easily select a representative viewpoint to present to the user, and furthermore, viewpoint information based on viewpoints used for training can be prioritized as highly reliable information. 【0026】 An encoding method according to a seventh aspect of the present disclosure is an encoding method according to any one of the first to sixth aspects, wherein the metadata includes confidence information indicating whether each of the plurality of images was used for training to generate the three-dimensional generative model. 【0027】 According to this, it is possible to explicitly indicate in the confidence information whether or not each of the multiple images was used for training to generate the three-dimensional generative model, and the decoding side can distinguish between images that contributed to training and those that did not. 【0028】 An encoding method relating to the eighth aspect of this disclosure is an encoding method relating to any one aspect of the first to seventh aspects, wherein the metadata includes reliability information indicating the reliability in learning for the generation of the three-dimensional generative model, which is the reliability of each of the plurality of images. 【0029】 According to this method, confidence information indicating the degree of contribution to learning can be assigned to each of multiple images, and the decoding side can prioritize images with high contributions and evaluate or select images associated with the 3D generative model, thereby efficiently improving the quality of the 3D generative model. 【0030】An encoding method according to the ninth aspect of this disclosure is an encoding method according to the seventh or eighth aspect, wherein the metadata includes identification information indicating whether or not the trust information exists, and the identification information indicates that the trust information exists when the type included in one of the one or more viewpoint pieces of viewpoint information indicates that the viewpoint piece of viewpoint information corresponds to an image used for training to generate the three-dimensional generative model. 【0031】 According to this, the existence or non-existence of trust information can be explicitly indicated in the identification information, and the existence of trust information can be indicated by the identification information for viewpoint information of a type that corresponds to the image used for training. 【0032】 An encoding method according to a tenth aspect of the present disclosure is an encoding method according to any one aspect of the first to ninth aspects, further comprising: obtaining encoded data of an additional three-dimensional generative model corresponding to the next timing of the three-dimensional generative model; and update information indicating whether an additional one or more viewpoint pieces corresponding to the additional three-dimensional generative model have been updated from the one or more viewpoint pieces; wherein the metadata includes the update information; and the bitstream includes additional metadata indicating the additional one or more viewpoint pieces if the update information indicates an update. 【0033】 According to this, for each three-dimensional generative model updated in the time direction, one or more additional viewpoint pieces of information corresponding to that model can be included in the metadata as update information, indicating whether or not they have changed from one or more viewpoint pieces of information. Therefore, when viewpoint pieces are updated, additional metadata indicating one or more additional viewpoint pieces of information can be included in the bitstream, and the amount of coding related to viewpoint pieces of information can be suppressed even when dealing with consecutive three-dimensional generative models. 【0034】An encoding method according to an eleventh aspect of the present disclosure is an encoding method according to any one aspect of the first to tenth aspects, wherein the metadata includes sequence information indicating whether the one or more viewpoint pieces are fixed or changeable in a sequence unit, and if the sequence information indicates that the one or more viewpoint pieces are fixed in a sequence unit, the metadata does not include one or more additional viewpoint pieces in the sequence unit, and if the sequence information indicates that the one or more viewpoint pieces change in a sequence unit, the metadata includes one or more additional viewpoint pieces in the sequence unit. 【0035】 According to this, in sequences where viewpoint information is fixed based on sequence information, one or more additional viewpoint pieces of information are not included in the metadata, thus reducing the amount of coding required for viewpoint information. 【0036】 A decoding method according to a twelfth aspect of this disclosure acquires a bitstream including encoded data of a three-dimensional generative model generated by learning about three-dimensional space and metadata indicating one or more viewpoint pieces of information corresponding to one or more images used for learning to generate the three-dimensional generative model, and decodes the encoded data of the three-dimensional generative model and the metadata from the bitstream to acquire the three-dimensional generative model and the one or more viewpoint pieces of information. 【0037】 According to this method, the three-dimensional generative model and the viewpoint information can be obtained together from a bitstream containing encoded data of the three-dimensional generative model and metadata indicating one or more viewpoint information corresponding to one or more images used for training. Therefore, when inputting one or more viewpoint information into the three-dimensional generative model to generate an image in three-dimensional space, it becomes easier to obtain a highly accurate image corresponding to the viewpoint used for training, thereby improving the accuracy of image generation using the three-dimensional generative model. 【0038】 A decoding method according to a thirteenth aspect of this disclosure is a decoding method according to a twelfth aspect, wherein the one or more viewpoint pieces of information include a plurality of viewpoint pieces of information corresponding to each of a plurality of images used for learning to generate the three-dimensional generative model. 【0039】According to this method, multiple viewpoint information corresponding to each of the multiple images used for training can be obtained from the metadata, and each viewpoint information can be individually referenced on the decoding side and used for image generation using a three-dimensional generative model. As a result, it is possible to generate images that appropriately reflect the diverse viewpoints used for training. 【0040】 A decoding method according to a 14th aspect of this disclosure is a decoding method according to a 12th or 13th aspect, wherein the three-dimensional generation model is generated by learning using one or more images obtained from one or more viewpoints indicated by the one or more viewpoint information. 【0041】 According to this, since the decoded three-dimensional generative model is generated by learning using one or more images obtained from one or more viewpoints indicated by one or more viewpoint information, when one or more viewpoint information is input into the three-dimensional generative model to generate an image in three-dimensional space, it becomes easier to obtain an image consistent with the viewpoints used for learning, thereby improving the accuracy of image generation using the three-dimensional generative model. 【0042】 A decoding method according to a 15th aspect of the present disclosure is a decoding method according to any one aspect of the 12th to 14th aspects, wherein in acquiring the bitstream, a first bitstream containing encoded data of the three-dimensional generative model and metadata is acquired, and a second bitstream different from the first bitstream and containing one or more images is acquired. 【0043】 According to this method, the encoded data of the three-dimensional generative model and metadata indicating one or more viewpoints are obtained as the same first bitstream. Therefore, the decoding side can use the one or more viewpoints associated with the three-dimensional generative model in an integrated manner simply by obtaining the first bitstream. 【0044】 A decoding method according to a sixteenth aspect of the present disclosure is a decoding method according to any one aspect of the twelveth to fourteenth aspects, wherein in acquiring the bitstream, a first bitstream containing encoded data of the three-dimensional generative model and a second bitstream different from the first bitstream and containing the metadata and one or more images are acquired. 【0045】 According to this, separately from the first bit stream including the encoded data of the three-dimensional generation model, metadata indicating one or more viewpoints and one or more images are obtained as the same second bit stream. Therefore, on the decoding side, by simply obtaining the second bit stream, the images used for learning and the one or more viewpoints corresponding to the images can be integrally utilized. 【0046】 The decoding method according to the seventeenth aspect of the present disclosure is a decoding method according to any one of the twelfth aspect to the sixteenth aspect, wherein each of the one or more viewpoints includes a type, and the type includes a first value indicating that the viewpoint information corresponding to the type is a viewpoint information recommended by a user, or a second value indicating that the viewpoint information corresponding to the type corresponds to an image used for learning for generation of the three-dimensional generation model. 【0047】 According to this, on the decoding side, it is possible to identify whether each viewpoint information is a recommended viewpoint or a viewpoint used for learning, and it is possible to easily select a representative viewpoint to be presented to the user. Furthermore, the viewpoint information based on the viewpoints used for learning can be preferentially made available as highly reliable information. 【0048】 The decoding method according to the eighteenth aspect of the present disclosure is a decoding method according to any one of the twelfth aspect to the seventeenth aspect, wherein the metadata includes reliability information indicating whether each of the plurality of images was used for learning for generation of the three-dimensional generation model. 【0049】 According to this, on the decoding side, it is possible to obtain, as reliability information, whether each of the plurality of images was used for learning for generation of the three-dimensional generation model, and it is possible to distinguish between images that contributed to learning and images that did not contribute. 【0050】 The decoding method according to the nineteenth aspect of the present disclosure is a decoding method according to any one of the twelfth aspect to the eighteenth aspect, wherein the metadata includes reliability information that is the reliability of each of the plurality of images and indicates the reliability in learning for generation of the three-dimensional generation model. 【0051】According to this method, the decoding side can obtain confidence information regarding the degree of contribution of each of the multiple images to the learning process. By prioritizing images with high contributions and evaluating or selecting images associated with the 3D generative model, the quality of the 3D generative model can be efficiently improved. 【0052】 A decoding method according to a 20th aspect of the present disclosure is a decoding method according to a 18th or 19th aspect, wherein the metadata includes identification information indicating whether or not the trust information exists, and if the type included in one of the one or more viewpoint pieces of viewpoint information indicates that the one viewpoint piece of viewpoint information is a first viewpoint piece of which is an image used for learning to generate the three-dimensional generative model, then the identification information indicates that the trust information exists. 【0053】 According to this, identification information indicating whether or not reliable information exists can be managed in association with a type indicating that it corresponds to the image used for training. For viewpoint information that has been classified as corresponding to the image used for training, the existence of reliable information can be explicitly indicated by the identification information. 【0054】 A decoding method according to a 21st aspect of the present disclosure is a decoding method according to any one aspect of the 12th to 20th aspects, wherein the bitstream further includes encoded data of an additional three-dimensional generative model corresponding to the next timing of the three-dimensional generative model, and update information indicating whether one or more additional viewpoint information corresponding to the additional three-dimensional generative model has been updated from the one or more viewpoint information, the metadata includes the update information, and the bitstream includes additional metadata indicating the one or more additional viewpoint information if the update information indicates an update. 【0055】According to this, a bitstream containing metadata indicating whether or not one or more additional viewpoint information corresponding to each three-dimensional generative model updated in the time direction has changed from one or more viewpoint information can be obtained. Therefore, the bitstream can be configured to include additional metadata indicating one or more additional viewpoint information only when viewpoint information is updated, and the amount of coding related to viewpoint information can be suppressed even when dealing with consecutive three-dimensional generative models. 【0056】 A decoding method according to a 22nd aspect of the present disclosure is a decoding method according to any one aspect of the 12th to 21st aspects, wherein the metadata includes sequence information indicating whether the one or more viewpoint pieces of information are fixed or changeable in a sequence unit, and if the sequence information indicates that the one or more viewpoint pieces of information are fixed in a sequence unit, the metadata does not include one or more additional viewpoint pieces of information in the sequence unit, and if the sequence information indicates that the one or more viewpoint pieces of information change in a sequence unit, the metadata includes one or more additional viewpoint pieces of information in the sequence unit. 【0057】 According to this, based on sequence information indicating whether viewpoint information is fixed or changes on a sequence-by-sequence basis, sequences with fixed viewpoint information can be configured not to include one or more additional viewpoint information in the metadata, thereby reducing the amount of coding required for viewpoint information. 【0058】 An encoding device according to a 23rd aspect of the present disclosure comprises a circuit and a memory connected to the circuit, wherein the circuit, in operation, acquires encoded data of a three-dimensional generative model generated by learning about three-dimensional space and one or more viewpoint information corresponding to one or more images used for learning to generate the three-dimensional generative model, generates metadata indicating the one or more viewpoint information, and generates a bitstream including the encoded data of the three-dimensional generative model and the metadata. 【0059】According to this method, the encoded data of the three-dimensional generative model and one or more viewpoint information corresponding to one or more images used for training can be included in the same bitstream, associated, and transmitted. Therefore, when one or more viewpoint information is input to the three-dimensional generative model decoded using that bitstream to generate an image in three-dimensional space, it becomes easier to obtain a high-precision image corresponding to the viewpoint used for training, thereby improving the accuracy of image generation using the three-dimensional generative model. 【0060】 A decoding device according to a 24th aspect of the present disclosure comprises a circuit and a memory connected to the circuit, wherein the circuit, in operation, acquires a bitstream including encoded data of a three-dimensional generative model generated by learning about three-dimensional space and metadata indicating one or more viewpoint pieces of information corresponding to one or more images used for learning to generate the three-dimensional generative model, decodes the encoded data of the three-dimensional generative model and the metadata from the bitstream to acquire the three-dimensional generative model and the one or more viewpoint pieces of information. 【0061】 According to this method, the three-dimensional generative model and the viewpoint information can be obtained together from a bitstream containing encoded data of the three-dimensional generative model and metadata indicating one or more viewpoint information corresponding to one or more images used for training. Therefore, when inputting one or more viewpoint information into the three-dimensional generative model to generate an image in three-dimensional space, it becomes easier to obtain a highly accurate image corresponding to the viewpoint used for training, thereby improving the accuracy of image generation using the three-dimensional generative model. 【0062】 These comprehensive or specific embodiments may be implemented as a system, integrated circuit, computer program, or recording medium such as a computer-readable CD-ROM, or as any combination of system, method, integrated circuit, computer program, and recording medium. 【0063】The embodiments will be described in detail below with reference to the drawings. Note that the embodiments described below are all specific examples of this disclosure. The numerical values, shapes, materials, components, arrangement and connection configurations of components, steps, and the order of steps shown in the following embodiments are examples only and are not intended to limit this disclosure. Furthermore, among the components in the following embodiments, those not described in the independent claim representing the highest-level concept will be described as optional components. 【0064】 (Embodiment 1) The configuration of the three-dimensional data encoding and decoding system according to this embodiment will be described. Figure 1 is a diagram showing an example of the configuration of the three-dimensional data encoding and decoding system according to this embodiment. As shown in Figure 1, the three-dimensional data encoding and decoding system includes a three-dimensional data encoding system 1001, a three-dimensional data decoding system 1002, a sensor terminal 1003, and an external connection unit 1004. 【0065】 The three-dimensional data encoding system 1001 generates encoded data or multiplexed data by encoding three-dimensional data. The three-dimensional data encoding system 1001 may be a three-dimensional data encoding device implemented by a single device, or it may be a system implemented by multiple devices. Furthermore, the three-dimensional data encoding device may include some of the multiple processing units included in the three-dimensional data encoding system 1001. 【0066】 The three-dimensional data encoding system 1001 includes a three-dimensional data generation system 1011, a presentation unit 1012, an encoding unit 1013, a multiplexing unit 1014, an input / output unit 1015, and a control unit 1016. The three-dimensional data generation system 1011 also includes a sensor information acquisition unit 1017 and a three-dimensional data generation unit 1018. 【0067】 The sensor information acquisition unit 1017 acquires a sensor signal from the sensor terminal 1003 and outputs the sensor signal to the three-dimensional data generation unit 1018. The three-dimensional data generation unit 1018 generates three-dimensional data from the sensor signal and outputs the three-dimensional data to the encoding unit 1013. 【0068】The display unit 1012 presents sensor signals or three-dimensional data to the user. For example, the display unit 1012 displays information or images based on sensor signals or three-dimensional data. 【0069】 The encoding unit 1013 encodes (compresses) the three-dimensional data and outputs the resulting encoded data, control information obtained during the encoding process, and other additional information to the multiplexing unit 1014. The additional information includes, for example, sensor signals. 【0070】 The multiplexing unit 1014 generates multiplexed data by multiplexing the encoded data input from the encoding unit 1013, control information, and additional information. The format of the multiplexed data is, for example, a file format for storage or a packet format for transmission. 【0071】 The input / output unit 1015 (for example, the communication unit or interface) outputs the multiplexed data to the outside. Alternatively, the multiplexed data is stored in a storage unit such as internal memory. The control unit 1016 (or the application execution unit) controls each processing unit. In other words, the control unit 1016 performs control such as encoding and multiplexing. The control unit 1016 may also perform demultiplexing, decoding, or presentation control. 【0072】 The sensor signal may also be input to the encoding unit 1013 or the multiplexing unit 1014. Furthermore, the input / output unit 1015 may output the three-dimensional data or encoded data directly to the outside. 【0073】 The transmission signal (multiplexed data) output from the three-dimensional data encoding system 1001 is input to the three-dimensional data decoding system 1002 via the external connection unit 1004. 【0074】 The three-dimensional data decoding system 1002 generates three-dimensional data by decoding encoded data or multiplexed data. The three-dimensional data decoding system 1002 may be a three-dimensional data decoding device implemented by a single device, or it may be a system implemented by multiple devices. Furthermore, the three-dimensional data decoding device may include some of the multiple processing units included in the three-dimensional data decoding system 1002. 【0075】The three-dimensional data decoding system 1002 includes a sensor information acquisition unit 1021, an input / output unit 1022, a demultiplexing unit 1023, a decoding unit 1024, a presentation unit 1025, a user interface 1026, and a control unit 1027. 【0076】 The sensor information acquisition unit 1021 acquires sensor signals from the sensor terminal 1003. 【0077】 The input / output unit 1022 acquires the transmission signal, decodes the multiplexed data (file format or packet) from the transmission signal, and outputs the multiplexed data to the demultiplexing unit 1023. 【0078】 The demultiplexing unit 1023 acquires encoded data, control information, and additional information from the multiplexed data, and outputs the encoded data, control information, and additional information to the decoding unit 1024. 【0079】 The decoding unit 1024 reconstructs the point cloud data by decoding the encoded data. 【0080】 The presentation unit 1025 presents point cloud data to the user. For example, the presentation unit 1025 displays information or images based on the point cloud data. The user interface 1026 acquires instructions based on user operations. The control unit 1027 (or application execution unit) controls each processing unit. In other words, the control unit 1027 performs control such as demultiplexing, decoding, and presentation. 【0081】 The input / output unit 1022 may acquire point cloud data or encoded data directly from an external source. The presentation unit 1025 may acquire additional information such as sensor signals and present information based on that additional information. The presentation unit 1025 may also make presentations based on user instructions acquired through the user interface 1026. 【0082】 The sensor terminal 1003 generates a sensor signal, which is information obtained from the sensor. The sensor terminal 1003 is a terminal equipped with a sensor or camera, and may be, for example, a mobile object such as an automobile, an aerial object such as an airplane, a mobile terminal, or a camera. 【0083】The sensor signals obtainable by the sensor terminal 1003 include, for example, (1) signals indicating the distance between the sensor terminal 1003 and the object, or the reflectivity of the object, obtained from a LiDAR, millimeter-wave radar, or infrared sensor, and (2) signals indicating the distance between the camera and the object, or the reflectivity of the object, obtained from multiple monocular camera images or stereo camera images. The sensor signals may also include the sensor's attitude, orientation, gyroscope (angular velocity), position (GPS information or altitude), speed, or acceleration. Furthermore, the sensor signals may include temperature, atmospheric pressure, humidity, or magnetism. 【0084】 The external connection unit 1004 is implemented by an integrated circuit (LSI or IC), an external storage unit, communication with a cloud server via the internet, or broadcasting, etc. 【0085】 Next, we will explain point cloud data. Figure 2 shows the structure of point cloud data. Figure 3 shows an example of the structure of a data file containing information about point cloud data. 【0086】 Point cloud data contains data for multiple points. Each point's data includes location information (three-dimensional coordinates) and attribute information related to that location. A collection of these points is called a point cloud. For example, a point cloud represents the three-dimensional shape of an object. 【0087】 Position information, such as three-dimensional coordinates, is sometimes called geometry. Furthermore, the data for each point may include attribute information of multiple attribute types. Attribute types include, for example, color or reflectance. 【0088】 One location information may be associated with one attribute information, or multiple attribute information of different attribute types may be associated with one location information. Furthermore, multiple attribute information of the same attribute type may be associated with one location information. 【0089】 The example data file structure shown in Figure 3 represents a case where location information and attribute information correspond one-to-one, and it shows the location information and attribute information of the N points that make up the point cloud data. 【0090】Location information includes, for example, information for the three axes: x, y, and z. Attribute information includes, for example, RGB color information. A typical data file is a ply file. 【0091】 Next, we will explain three-dimensional mesh data. Figure 4 shows the structure of three-dimensional mesh data. Figure 5 shows an example of the structure of a data file containing information about three-dimensional mesh data. 【0092】 Three-dimensional mesh data is a data format used in computer graphics (CG), representing the three-dimensional shape of an object through a collection of multiple surface information. Each of these surface information points represents a polygon, such as a triangle or quadrilateral. Three-dimensional mesh data is also referred to as polygon or polygon mesh. 【0093】 The constituent elements are a three-dimensional point cloud, vertices (multiple three-dimensional points in the three-dimensional point cloud), edges (connecting two vertices in multiple three-dimensional points), and faces (enclosed by multiple edges). A three-dimensional point cloud is a set of points that contain positional information in three-dimensional space and attribute information corresponding to that positional information. Note that a three-dimensional point may simply be referred to as a point. 【0094】 A vertex may have attribute information such as color information, reflectivity, and normal vectors for a three-dimensional point. The relationships between vertices constituting an edge or face may be represented by information called connectivity. A vertex may also be expressed as a position. The front and back of a face may be represented by the direction of the normal vector for a three-dimensional point. Furthermore, a vertex may have attribute information for the face. 【0095】 One example of a mesh data file format is an object file. In a mesh data file like the one shown in Figure 5, the position information G(1) to G(N) and the attribute information A(1) to A(N) of the N vertices that make up the mesh are shown as vertex information. In a mesh data file, the vertex information does not necessarily have to include attribute information. 【0096】Furthermore, attribute information does not need to have a one-to-one correspondence with vertices. In the mesh data file shown in Figure 5, an example is shown where the three-dimensional mesh data has M attribute information A2s. 【0097】 Face information is represented by a combination of vertex indices. n[1,3,4] indicates a triangular face composed of three vertices: n=1, n=3, and n=4. 【0098】 Furthermore, m[2, 4, 6] indicates that the attribute information for m=2, m=4, and m=6 from attribute information A2 correspond to the three vertices, respectively. Note that although an example of a face being formed with three vertices is shown here, a face can have any number of vertices greater than or equal to three, and is not limited to three. For example, if the face is a quadrilateral, the number of vertices is four, and if the face is a polygon, the number of vertices is equal to the number of vertices that make up the polygon. 【0099】 Furthermore, attribute information A2 may be represented in a separate file from the mesh data file, and may include pointer information to that file. For example, attribute information may be stored in a two-dimensional attribute map file, and the attribute map file name and the two-dimensional coordinates in the attribute map may be represented in attribute information A2 of the mesh data file. Thus, attribute information A2 may be included in the mesh data file or represented in a separate file from the mesh data file, and either method makes it possible to specify attribute information for three-dimensional points. 【0100】 Next, we will explain the three-dimensional model. Figure 6 is a diagram illustrating the three-dimensional model. 【0101】 A three-dimensional model is a model generated based on two-dimensional or three-dimensional data. 【0102】 The three-dimensional model learning unit 1031 generates a three-dimensional model, which is a network model in which the three-dimensional shape and attribute information corresponding to the three-dimensional shape have been learned using a Neural Network or the like, by learning two-dimensional data (two-dimensional images) or three-dimensional data (point clouds or meshes). 【0103】 The three-dimensional model learning unit 1031 may generate a three-dimensional model by learning using NeRF (Neural Radiance Fields) based on two-dimensional images. Alternatively, the three-dimensional model learning unit 1031 may generate a three-dimensional model after converting two-dimensional images into three-dimensional data by performing photogrammetry using the two-dimensional images. The three-dimensional model may also be generated using three-dimensional data acquired by a sensor (distance sensor). 【0104】 Three-dimensional model data consists of elements that make up a three-dimensional model and includes information that describes the structure of a network model, such as features. Three-dimensional model data includes, for example, information about the components of a neural network. This information about components includes, for example, multiple layers such as input layers, hidden layers, and output layers, nodes in each layer, weight coefficients for the nodes, and transformation functions for the nodes. 【0105】 The three-dimensional model encoding unit 1032 may encode the three-dimensional model data and transmit the encoded three-dimensional model data. 【0106】 The three-dimensional model decoding unit 1033 receives the transmitted, encoded three-dimensional model data and decodes the three-dimensional model based on the encoded three-dimensional model data. 【0107】 The rendering reconstruction unit 1034 reconstructs (generates) two-dimensional data (two-dimensional image) or three-dimensional data (point cloud or mesh) based on the decoded three-dimensional model. For example, when using a three-dimensional model modeled with NeRF, the rendering reconstruction unit 1034 acquires viewpoint position or line-of-sight vector information, generates rendered two-dimensional data (two-dimensional image) based on the three-dimensional model and the viewpoint position or line-of-sight vector, and outputs the two-dimensional data. The generated two-dimensional data represents a two-dimensional image of a three-dimensional object as seen from the viewpoint position, or a three-dimensional image of a three-dimensional object as seen from the line of sight indicated by the line-of-sight vector. The three-dimensional object is the three-dimensional object of the subject that was the basis for the two-dimensional data or three-dimensional data input to the three-dimensional model learning unit 1031. 【0108】Figure 7 is a block diagram showing an example of a device that generates three-dimensional data using the Gaussian Platting method. 【0109】 In this embodiment, the three-dimensional generative model is learned, for example, using two-dimensional images and three-dimensional data such as point clouds or meshes, and is obtained as a result of learning the three-dimensional shape and attribute information associated with the three-dimensional shape. The three-dimensional generative model may be implemented as Gaussian Splatting data, i.e., Gaussian data. The device may perform the same processing even if it considers the Gaussian data as a three-dimensional model. Alternatively, the device may perform the same processing on point clouds or meshes instead of three-dimensional models. 【0110】 As shown in Figure 7, the device comprises a sensor information acquisition unit 1035 and a three-dimensional data generation unit 1036. The sensor information acquisition unit 1035 and the three-dimensional data generation unit 1036 correspond to the sensor information acquisition unit 1017 and the three-dimensional data generation unit 1018 in Figure 1, respectively. The sensor information acquisition unit 1035 acquires three-dimensional point cloud data obtained from the sensor and outputs the three-dimensional point cloud data to the three-dimensional data generation unit 1036. 【0111】 The three-dimensional data generation unit 1036 prepares multiple grids in three-dimensional space according to the density of the acquired three-dimensional point cloud data, and projects each point onto a nearby grid. At this time, the three-dimensional data generation unit 1036 associates attribute information such as color, reflectance, and normal associated with each point with the corresponding grid. Based on the information projected onto the grid, the three-dimensional data generation unit 1036 derives parameters such as the mean and variance of the Gaussian function, and generates and outputs Gaussian Splatting data (Gaussian data) composed of multiple three-dimensional Gaussians. 【0112】 Furthermore, the sensor information acquisition unit 1035 may acquire multiple viewpoint information and multiple two-dimensional images corresponding to multiple viewpoints, and the three-dimensional data generation unit 1036 may generate Gaussian data by performing learning based on this multiple viewpoint information and multiple two-dimensional images. 【0113】In this way, the device can flexibly generate Gaussian Splatting data, which is a three-dimensional generative model representing three-dimensional space, in accordance with observational data (three-dimensional point cloud data or multiple two-dimensional images) obtained from sensors and multiple viewpoint information. 【0114】 Figure 8 is a table showing the components of Gaussian data described in PLY format. 【0115】 As shown in this diagram, attribute information corresponding to each three-dimensional Gaussian that makes up Gaussian Splatting data, i.e., Gaussian data, is represented as a component of the PLY format. Gaussian data is a collection of multiple three-dimensional Gaussians (ellipses), and each three-dimensional Gaussian consists of data such as three-dimensional point coordinates, a 3x3 covariance matrix, color, and transmittance. In the PLY format, Gaussian data is represented in such a way that this data is associated with each field. 【0116】 The Gaussian data includes, for each three-dimensional Gaussian, data consisting of three floating-point numbers representing the three-dimensional coordinates of the three-dimensional point as Position. The Gaussian data also includes parameters representing the three-dimensional covariance matrix as Scale and Orientation(rot), where Scale represents the spread of the Gaussian and Orientation(rot) represents the orientation of the Gaussian. Furthermore, the Gaussian data includes data consisting of 48 floating-point numbers representing the coefficients of color information expressed in spherical harmonics as SH Coefficient, and data consisting of one floating-point number representing the transmittance of the Gaussian as Transparent(Opacity, Alpha). 【0117】Thus, Gaussian data is constructed as a collection of multiple three-dimensional Gaussians, each defined using Position, Orientation (rot), Scale, SH Coefficient, and Transparent (Opacity, Alpha). Therefore, based on the reference three-dimensional coordinates and the information of orientation, extent, color, and transmittance associated with those coordinates, objects in three-dimensional space can be represented with high accuracy using the Gaussian Platting method. 【0118】 Figure 9 is a block diagram illustrating an example of the rendering process. 【0119】 The rendering unit 1037 acquires Gaussian data and generates a three-dimensional model based on the Gaussian data. Here, the Gaussian data is data representing a three-dimensional generated model and may include information on the position, orientation, scale, spherical harmonics of color, and transparency of multiple three-dimensional Gaussians. Based on this information contained in the Gaussian data, the rendering unit 1037 reconstructs volume elements in three-dimensional space and generates a three-dimensional model. The generated three-dimensional model may be rendered in an application such as a viewer and presented to the user. By rendering the Gaussian data, the rendering unit 1037 may not only generate a three-dimensional model but also generate a two-dimensional image as needed, thus obtaining various display forms from the same Gaussian data. 【0120】 Figure 10 is a block diagram showing an example of processing using spherical harmonics for color. 【0121】 The spherical harmonic color unit 1038 acquires an arbitrary viewpoint representing a direction in three-dimensional space and outputs the color as seen from that viewpoint. The color may include its three elements, R, G, and B. The spherical harmonic color unit 1038 calculates the color observed from an arbitrary viewpoint by using the spherical harmonic function of color on viewpoint information representing that viewpoint. 【0122】Furthermore, spherical harmonics may be used to represent not only color, but also other attribute information such as reflectance and infrared information, in which case the coefficients of the spherical harmonics become elements representing the attribute information. The device can select the level of spherical harmonics according to the attribute information to be represented or the required resolution. 【0123】 Figure 11 is a block diagram illustrating another example of the rendering process. In this example, the rendering unit 1037a generates a three-dimensional model or a two-dimensional image from Gaussian data based on an arbitrary viewpoint. 【0124】 The rendering unit 1037a acquires Gaussian data and viewpoint information indicating an arbitrary viewpoint. Based on the position, orientation, scale, spherical harmonic function of color, and transparency information of each three-dimensional Gaussian contained in the Gaussian data, and the input viewpoint information, the rendering unit 1037a generates a three-dimensional model as seen from that viewpoint or a two-dimensional image projected from that viewpoint. The generated three-dimensional model or two-dimensional image may be presented to the user in an application such as a viewer. As a result, the rendering unit 1037a can generate a three-dimensional model or two-dimensional image as seen from an arbitrary viewpoint based on the Gaussian data and the arbitrary viewpoint, so the user can obtain a display of a different viewpoint from the same Gaussian data by changing the viewpoint information. 【0125】 Figure 12 is a diagram illustrating the number of FH coefficients for each level of spherical harmonics. 【0126】The number of coefficients in the spherical harmonics is determined by the resolution level, with one FH coefficient added at level 0, three at level 1, five at level 2, and seven at level 3. As a result, when using up to level 2, the total number of coefficients from level 0 to level 2 is nine, and when using up to level 3, the total number of coefficients from level 0 to level 3 is sixteen. In the Gaussian data of the above embodiment, an example is shown where up to level 3 is used as the spherical harmonics of color, and since there are 16 FH coefficients for each of the three elements of color, R, G, and B, the total number of coefficients is 48. If an even higher resolution is required, level 4 or higher may be set, and the level may be changed as appropriate depending on the attribute information to be represented and the required resolution. 【0127】 Figure 13 is a block diagram illustrating an example of a process for encoding and multiplexing Gaussian data. 【0128】 As shown in Figure 13, the encoding unit 1045 acquires Gaussian data, encodes the Gaussian data according to a predetermined encoding scheme, and generates encoded data (bitstream). 【0129】 The multiplexing unit 1046 acquires encoded data from the encoding unit 1045 and performs multiplexing according to a predetermined multiplexing scheme. The multiplexing unit 1046 generates multiplexed data as a result of the multiplexing process and outputs the multiplexed data. The multiplexed data may be transmitted to other devices via a communication channel, or it may be stored in a storage device and used for decoding in a later stage. Thus, in the configuration shown in Figure 13, the Gaussian data is converted into a format that can be efficiently transmitted or stored as multiplexed data by passing through the encoding unit 1045 and the multiplexing unit 1046. 【0130】 Figure 14 is a block diagram showing an example of a system configuration for decoding Gaussian data and presenting it in an application. This system decodes Gaussian data from multiplexed data and presents it in the application as a three-dimensional model or a two-dimensional image. 【0131】The demultiplexing unit 1053 acquires the multiplexed data and performs demultiplexing processing according to the multiplexing process and the corresponding multiplexing method shown in Figure 13. The demultiplexing unit 1053 extracts encoded data from the multiplexed data through the demultiplexing process and outputs it to the decoding unit 1054. 【0132】 The decoding unit 1054 acquires encoded data from the demultiplexing unit 1053, decodes the encoded data according to a predetermined encoding scheme, and reconstructs and outputs Gaussian data. As a result, the Gaussian data encoded and multiplexed in the system shown in Figure 13 is restored to its original format in the system shown in Figure 14. 【0133】 The application unit 1055 acquires Gaussian data from the decoding unit 1054, performs rendering processing, and presents the result to the user via the presentation unit 1058. The application unit 1055 includes a rendering unit 1056, an input interface 1057, and a presentation unit 1058. 【0134】 The input interface 1057 accepts user input and acquires viewpoint information corresponding to an arbitrary viewpoint or a predetermined viewpoint. The input interface 1057 may, if necessary, acquire information specifying the area to be displayed in addition to the viewpoint information. The input interface 1057 outputs the acquired viewpoint information and area specification information to the rendering unit 1056. 【0135】 The rendering unit 1056 generates a three-dimensional model or a two-dimensional image viewed from an arbitrary viewpoint based on Gaussian data and viewpoint information supplied from the input interface 1057. For example, the rendering unit 1056 may first generate coordinate information for points constituting the three-dimensional model based on the Gaussian data, and then, when viewpoint information is input, generate a two-dimensional image viewed from the viewpoint indicated by that viewpoint information. In this case, when the user specifies an area to be displayed along with the viewpoint information, the rendering unit 1056 can generate only the two-dimensional image corresponding to the specified area, thereby reducing the amount of processing compared to generating a two-dimensional image for all areas. 【0136】The display unit 1058 displays a three-dimensional model or a two-dimensional image input from the rendering unit 1056 and presents it to the user. The display unit 1058 may present the three-dimensional model as a three-dimensional display, or it may present the two-dimensional image generated by rendering as a two-dimensional display. The display mode may be set to either a three-dimensional display or a two-dimensional display depending on conditions such as the processing capacity of the device performing the rendering process and network bandwidth. For example, in a configuration with sufficient processing capacity, two-dimensional images viewed from any viewpoint can be continuously generated and presented in response to user operation, while in a configuration with limited processing capacity, the three-dimensional model may be presented as is without generating two-dimensional images. 【0137】 In the system shown in Figure 14, the multiplexed data is reconstructed as Gaussian data via the demultiplexing unit 1053 and the decoding unit 1054, and then presented to the user as a three-dimensional model or two-dimensional image viewed from any viewpoint by the rendering unit 1056 and the presentation unit 1058 in the application unit 1055. This makes it possible to use three-dimensional representations using Gaussian data flexibly and efficiently even after transmission and decoding over a communication channel. 【0138】 Next, we will explain the types of three-dimensional data. Figure 15 is a diagram illustrating the types of three-dimensional data. As shown in Figure 15, three-dimensional data includes static objects and dynamic objects. 【0139】 A static object is three-dimensional data for any given time (a specific moment). A dynamic object is three-dimensional data that changes over time. Hereafter, point cloud data for a given time will be called a PCC frame, or frame. Similarly, mesh data for any given time will be called a mesh frame, or frame. 【0140】 The object may be three-dimensional data with a somewhat limited area, like regular video data, or it may be three-dimensional data with no area limitations, like map information. 【0141】Furthermore, there may be points of varying densities, and both sparse point cloud data (sparse mesh data) and dense point cloud data (dense mesh data) may exist. 【0142】 The details of each processing unit are described below. Sensor information is acquired by various methods, such as distance sensors like LIDAR or rangefinders, stereo cameras, or combinations of multiple monocular cameras. The three-dimensional data generation unit 1018 generates point cloud data based on the sensor information obtained by the sensor information acquisition unit 1017. The three-dimensional data generation unit 1018 generates position information (geometry information) as point cloud data and adds attribute information to the position information. 【0143】 The three-dimensional data generation unit 1018 may process point cloud data when generating position information or adding attribute information. For example, the three-dimensional data generation unit 1018 may reduce the amount of data by deleting point clouds with overlapping positions. The three-dimensional data generation unit 1018 may also transform the position information (such as shifting, rotating, or normalizing it), or process the point cloud data to generate mesh data. Furthermore, the three-dimensional data generation unit 1018 may render attribute information. 【0144】 In Figure 1, the three-dimensional data generation system 1011 is included in the three-dimensional data encoding system 1001, but it may also be provided independently outside of the three-dimensional data encoding system 1001. 【0145】 The encoding unit 1013 generates encoded data by encoding three-dimensional data based on a predetermined encoding method. The encoding methods include G-PCC (a method using positional information), V-PCC (a method using a video codec), Draco (a mesh encoding method), and V-DMC (a mesh encoding method). The encoding method is not limited to these methods; for example, it may be a method for encoding a dynamic mesh, or another method combining these methods. 【0146】 The decoding unit 1024 decodes the three-dimensional data by decoding the encoded data based on a predetermined encoding method. 【0147】The multiplexing unit 1014 generates multiplexed data by multiplexing the encoded data using an existing multiplexing method. The generated multiplexed data is transmitted or stored. In addition to the encoded data of the three-dimensional data, the multiplexing unit 1014 multiplexes other media such as video, audio, subtitles, applications, files, or reference time information. Furthermore, the multiplexing unit 1014 may also multiplex attribute information related to sensor information or point cloud data. 【0148】 Multiplexing methods or file formats include ISOBMFF, ISOBMFF-based transmission methods such as MPEG-DASH, MMT, MPEG-2 TS Systems, and RTP. 【0149】 The demultiplexing unit 1023 extracts encoded data of the three-dimensional data, other media, and time information from the multiplexed data. 【0150】 The input / output unit 1015 transmits the multiplexed data using a method appropriate to the transmission medium or storage medium, such as broadcasting or communication. The input / output unit 1015 may communicate with other devices via the Internet, or with storage units such as cloud servers. 【0151】 Communication protocols such as HTTP, FTP, TCP, or UDP can be used. A pull-type communication method or a push-type communication method may be used. 【0152】 Either wired or wireless transmission may be used. Wired transmission methods include Ethernet®, USB, RS-232C, HDMI®, or coaxial cable. Wireless transmission methods include wireless LAN, Wi-Fi®, Bluetooth®, or millimeter wave. 【0153】 Furthermore, broadcasting formats such as DVB-T2, DVB-S2, DVB-C2, ATSC3.0, or ISDB-S3 may be used. 【0154】Next, we will explain the process of dividing three-dimensional data into one or more three-dimensional data points. Figure 16 is a diagram illustrating the encoding process of three-dimensional data. Figure 17 is a diagram illustrating the decoding process of three-dimensional data. 【0155】 As shown in Figure 16, the data division unit 1041 divides the three-dimensional data into one or more three-dimensional spaces and generates one or more divided three-dimensional data (i.e., one or more divided three-dimensional data). The encoding unit 1042 may encode one or more divided three-dimensional data to generate encoded data. The data division unit 1041 and the encoding unit 1042 may be included in a single encoding device as components of the said encoding device, or they may be included in separate devices. 【0156】 Each of the one or more three-dimensional spaces may be referred to as a tile or space. A three-dimensional space is, for example, a bounding box. The three-dimensional data contained in each of the divided three-dimensional spaces may be referred to as a slice. A slice is divided three-dimensional data and includes either a point cloud, a mesh, or a three-dimensional model having geometry or attribute information. Each of the multiple slices is encoded by the encoding unit 1042 for each component and output as encoded data. The encoded data includes the multiple encoded slices. 【0157】 As shown in Figure 17, in the decoding process, the decoding unit 1051 decodes one or more divided three-dimensional data (one or more slices) based on the encoded data. The data merging unit 1052 merges one or more divided three-dimensional data to restore (generate) three-dimensional data. The decoding unit 1051 and the data merging unit 1052 may be included in a single decoding device as components of the same device, or they may be included in separate devices. The one or more divided three-dimensional data decoded by the decoding unit 1051 do not have to be merged. The decoding unit 1051 may decode a portion of the divided three-dimensional data from one or more divided three-dimensional data based on a portion of the encoded data and output the decoded portion of divided three-dimensional data. In this case, the decoding device does not have to have a data merging unit 1052. 【0158】 Figure 18 is a schematic two-dimensional representation of tiles and slices of three-dimensional data. 【0159】 When encoding multiple slices, the encoding device may use dependencies between the slices or may not use dependencies. When encoding without dependencies, the encoding device can encode each slice independently, and processing time can be reduced by encoding multiple slices in parallel. Similarly, when multiple slices are encoded without dependencies, the decoding device can decode each slice independently, and processing time can be reduced by decoding multiple slices in parallel. Furthermore, the decoding device can reduce the amount of processing by performing partial decoding, which decodes only some of the multiple slices. 【0160】 When encoding using dependencies, the encoding device signals identifiers indicating dependencies and encodes the dependent data in order. When multiple slices are encoded using dependencies, the decoding device decodes them in order, starting with the dependent data, based on the identifiers. 【0161】 In the division of three-dimensional data, any number of divisions and any division method may be used. The division of three-dimensional data may be based on the shape of an object, with multiple three-dimensional points being assigned to each object. Alternatively, the division may be based on the number of three-dimensional points contained in a slice; that is, an upper limit may be set for the number of three-dimensional points in a single slice. Furthermore, three-dimensional data may be divided based on whether or not it is included in three-dimensional space (tile information) using map information or location information. Multiple tile shapes may overlap. 【0162】 By dividing three-dimensional data into multiple segmented three-dimensional data in this way, adaptive encoding according to the content or object, and parallel processing in decoding become possible. 【0163】 Next, we will explain how to select the three-dimensional data to present or transmit from among multiple three-dimensional data sets. 【0164】 The server stores multiple three-dimensional data sets for the same space. For example, the server stores point cloud data and mesh data for the same space. The server is an example of an encoding device. The terminal switches the three-dimensional data it acquires from the server based on its intended use and displays the switched three-dimensional data. The terminal may be, for example, a terminal for analyzing three-dimensional data. In this case, the terminal may switch the three-dimensional data it displays based on its intended use, such as analysis or display, and user operation. The terminal is an example of a decoding device. 【0165】 When switching between 3D data, the system may switch between presenting a point cloud or a mesh as 3D data. Furthermore, when switching between 3D data, the system may switch between transmitting a point cloud or a mesh as 3D data. For example, a terminal may send the user's selection result to a server, receive (download) 3D data based on that selection result from the server, and then present the received 3D data. The 3D data (point cloud or mesh) may or may not be encoded on the server. If the 3D data is encoded, the terminal may receive the encoded 3D data from the server, decode the 3D data based on the received encoded 3D data, and then present the decoded 3D data. 【0166】 Next, the configuration of the server 1070 and the terminal 1090 will be described. Figure 19 is a block diagram showing an example of the functional configuration of the server and the terminal. 【0167】 The server 1070 comprises a data generation unit 1071, a synchronization unit 1075, a point cloud coding unit 1076, a mesh coding unit 1077, a model coding unit 1078, a multiplexing unit 1079, and a data extraction unit 1080. 【0168】The data generation unit 1071 generates three-dimensional data based on at least one of two-dimensional data and three-dimensional data. The generated three-dimensional data includes point cloud data, mesh data, and at least two of three-dimensional model data. The data generation unit 1071 has a point cloud generation unit 1072, a mesh generation unit 1073, and a model generation unit 1074. The data generation unit 1071 only needs to have at least two of the point cloud generation unit 1072, the mesh generation unit 1073, and the model generation unit 1074. The point cloud generation unit 1072 generates point cloud data based on at least one of two-dimensional data and three-dimensional data. The mesh generation unit 1073 generates mesh data based on at least one of two-dimensional data and three-dimensional data. The model generation unit 1074 generates three-dimensional model data by machine learning based on at least one of two-dimensional data and three-dimensional data. 【0169】 The two-dimensional data input to the data generation unit 1071 may be a two-dimensional image acquired by a camera. The three-dimensional data input to the data generation unit 1071 may be, for example, point cloud data acquired by a sensor such as LiDAR in a space such as a construction site, factory, or office. The data generation unit 1071 may generate color information corresponding to each point in the point cloud data of the three-dimensional data as attribute information, using the two-dimensional image of the two-dimensional data. The three-dimensional data generated by the data generation unit 1071 may be divided into arbitrary spaces. The point cloud data, mesh data, and three-dimensional model data may each be divided into arbitrary spaces. 【0170】The synchronization unit 1075 synchronizes the spatial position or time (playback time, decoding time, acquisition time, etc.) of the point cloud data, mesh data, and three-dimensional model data generated by the data generation unit 1071. The time of each data is the playback time, decoding time, acquisition time, etc. The synchronization unit 1075 may generate synchronization information for synchronization without synchronizing the point cloud data, mesh data, and three-dimensional model data. The synchronization unit 1075 only needs to synchronize at least two types of three-dimensional data from the point cloud data, mesh data, and three-dimensional model data generated by the data generation unit 1071, or generate synchronization information (synchronization signal) for synchronization, and does not need to perform a synchronization process for all three types of three-dimensional data. 【0171】 The point cloud encoding unit 1076 encodes the point cloud data after synchronization processing has been performed by the synchronization unit 1075. The point cloud encoding unit 1076 does not necessarily have to encode the point cloud data. The point cloud data may be pre-encoded or encoded in response to a request from the terminal 1090. 【0172】 The mesh encoding unit 1077 encodes the mesh data after synchronization processing has been performed by the synchronization unit 1075. 【0173】 The model encoding unit 1078 encodes the three-dimensional model data after synchronization processing has been performed by the synchronization unit 1075. 【0174】 The multiplexing unit 1079 multiplexes encoded point cloud data (encoded point cloud), encoded mesh data (encoded mesh data), encoded three-dimensional model data, and synchronization information using a predetermined format or predetermined multiplexing method. Note that multiplexing by the multiplexing unit 1079 is not required. In this case, the server 1070 does not need to be equipped with the multiplexing unit 1079. 【0175】The data extraction unit 1080 extracts some of the three-dimensional data from the multiplexed three-dimensional data in response to a request from the terminal 1090, and transmits the extracted portion of the three-dimensional data to the terminal 1090. Note that data extraction by the data extraction unit 1080 is not required. In this case, the server 1070 does not need to be equipped with the data extraction unit 1080. If data extraction by the data extraction unit 1080 is not performed, the server 1070 may transmit the three-dimensional data multiplexed by the multiplexing unit 1079 to the terminal 1090. Furthermore, if multiplexing by the multiplexing unit 1079 is not performed, the server 1070 may transmit encoded point cloud data, encoded mesh data, encoded three-dimensional model data, and synchronization information to the terminal 1090, or it may transmit a bitstream containing encoded point cloud data, encoded mesh data, encoded three-dimensional model data, and synchronization information to the terminal 1090. 【0176】 The terminal 1090 comprises a control unit 1091, a decoding unit 1092, and a display unit 1093. 【0177】 The control unit 1091 sends a request to the server 1070 for some of the three-dimensional data to be presented. The control unit 1091 may also accept user input to identify some of the three-dimensional data. 【0178】 The decoding unit 1092 decodes some of the three-dimensional data based on the bitstream (encoded data) obtained from the server 1070. 【0179】 The display unit 1093 renders and displays a portion of the decoded three-dimensional data. 【0180】 The data generation unit 1071 in Figure 19 may be implemented by the data generation unit 1110 shown in Figure 20. Figure 20 is a block diagram showing another example of the server's data generation unit. 【0181】 The data generation unit 1110 comprises a point cloud generation unit 1111, a mesh generation unit 1112, and a model generation unit 1113. 【0182】The point cloud generation unit 1111 has the same function as the point cloud generation unit 1072. The point cloud generation unit 1111 acquires point cloud data obtained from the point cloud sensor 1101 and a two-dimensional image obtained from the camera 1102, and generates point cloud data based on the point cloud data and the two-dimensional image. The point cloud data generated by the point cloud generation unit 1111 includes position information for each point and attribute information corresponding to each point indicated by the position information, and includes attribute information (such as color information) extracted from the two-dimensional image. 【0183】 The mesh generation unit 1112 generates mesh data based on the point cloud data generated by the point cloud generation unit 1111. 【0184】 The model generation unit 1113 has the same functions as the model generation unit 1074. The model generation unit 1113 acquires point cloud data obtained from the point cloud sensor 1101 and two-dimensional images obtained from the camera 1102, and generates three-dimensional model data by performing machine learning based on the point cloud data and two-dimensional images. 【0185】 Point cloud data, mesh data, and 3D model data may be generated independently, as explained in Figure 19. Mesh data may be generated from point cloud data, as explained in Figure 20. Note that point cloud data may also be generated from mesh data. 【0186】 The mesh may be generated from the point cloud, or the point cloud may be generated from the mesh. 【0187】 The point cloud data, mesh data, and three-dimensional model data may be generated by the server 1070, or by the sensor or the terminal 1090 equipped with the sensor. The sensor is, for example, a point cloud sensor 1101 and a camera 1102. 【0188】 Next, we will explain the relationship between three-dimensional space and encoded data. Figure 21 is a diagram illustrating the relationship between three-dimensional space and encoded data. 【0189】 As mentioned above, three-dimensional data includes, for example, point cloud data, mesh data, and three-dimensional models. 【0190】 As shown in Figure 21, when three-dimensional data is divided into three three-dimensional data in three three-dimensional spaces (tiles or spaces), the encoding device encodes each of the three divided three-dimensional data and adds a header to create a data unit. The header signals (assigns) the identifier of the space to which the encoded data of the data unit belongs (Space_ID) and the identifier of the data unit (DataUnit_ID). 【0191】 A data unit is further given a header containing information such as the data unit identifier or the data unit length, and when these are unitized, an encoding scheme unit is generated. 【0192】 Next, we will explain the syntax of the coding scheme unit. Figure 22 shows an example of the syntax of the coding scheme unit. Figure 23 shows an example of the syntax of the coded point cloud. Figure 24 shows an example of the syntax of the coded mesh. Figure 25 shows an example of the syntax of the coded three-dimensional model. 【0193】 The `unit_type` parameter indicates the type of data unit stored in the encoding scheme unit. This specifies the type of data unit to be stored in the encoding scheme unit. 【0194】 The length indicates the length of the data unit. 【0195】 `data()` represents the body of the data unit. 【0196】 In Figure 23, if unit_type is 0, it indicates that the data unit is the location information (geometry) of the coded point cloud. If unit_type is 1, it indicates that the data unit is the attribute information of the coded point cloud. If unit_type is 2, it indicates that the data unit is the metadata of the coded point cloud. 【0197】In Figure 24, if unit_type is 0, it indicates that the data unit is the location information (geometry) of the encoded mesh. If unit_type is 1, it indicates that the data unit is the attribute information of the encoded mesh. If unit_type is 2, it indicates that the data unit is the metadata of the encoded mesh. 【0198】 In Figure 25, if unit_type is 0, it indicates that the data unit is element 1 of the encoded 3D model. If unit_type is 1, it indicates that the data unit is element 2 of the encoded 3D model. If unit_type is 2, it indicates that the data unit is metadata of the encoded 3D model. 【0199】 Note that the syntax shown in Figures 23 to 25 is just an example and is not limited to the above configuration. Some of the syntax components may be used, types (categories) not described above may be used, and the order of the syntax components may be changed. For example, in the syntax for an encoding scheme unit, a common encoding scheme unit configuration may be used for multiple encoding schemes as shown in Figure 22, and the unit_type, length, and data() shown in Figures 23 to 25 may be shown. 【0200】 Furthermore, a header may be added to the encoding scheme unit to indicate the type of encoding scheme unit. Examples of encoding scheme types include point_cloud_codec_unit for point cloud data, mesh_codec_unit for mesh data, and model_codec_unit for three-dimensional model data. This makes it possible to handle multiple encoding schemes in an integrated manner. 【0201】 Figure 26 shows an example of the syntax for three-dimensional data information. 【0202】In the syntax, when multiple encoding schemes are stored in a single format, the number of 3D data points (number_of_3Dformat) and the type of 3D data (format_type) contained in that format are indicated, and data for each format may be stored. This makes it possible to handle multiple encoding schemes or 3D data points in an integrated manner, and also to identify multiple encoding schemes or 3D data points. 【0203】 3Ddata_info indicates the format structure information for storing multiple three-dimensional data. 【0204】 `number_of_3Dformat` indicates the number of 3D formats used. 【0205】 The `format_type` parameter indicates the format of the stored 3D data. For example, the `format_type` number and the corresponding format may be defined as follows: If `format_type` is 0, it indicates that the stored 3D data is in the format of point cloud data. If `format_type` is 1, it indicates that the stored 3D data is in the format of mesh data. If `format_type` is 2, it indicates that the stored 3D data is in the format of G-PCC data (g-pcc). If `format_type` is 3, it indicates that the stored 3D data is in the format of V-DMC data (v-dmc). If `format_type` is 4, it indicates that the stored 3D data is in the format of 3D model data (3Dmodel). 【0206】 Next, we will explain the data structure of encoded data for multiple three-dimensional data sets, categorized by the type of three-dimensional data. Figure 27 is a diagram illustrating the data structure of encoded point clouds. Figure 28 is a diagram illustrating the data structure of encoded meshes. Figure 29 is a diagram illustrating the data structure of encoded three-dimensional models. 【0207】The encoding device divides each of the multiple types of three-dimensional data into multiple three-dimensional data for each of the multiple spatial regions, encodes each of the divided three-dimensional data (i.e., multiple divided three-dimensional data), and generates encoded data. 【0208】 Each encoded data entry is assigned a header containing at least one of the data_unit_id and space_id. 【0209】 Here, `data_unit_id` is an identifier that identifies a data unit within the encoded data, and is unique within the encoded data. `space_id` indicates the identification information of a spatial region. If `data_unit_id` or `space_id` is common across multiple 3D data sets, the same value will be shown across those sets. 【0210】 In the examples in Figures 27 to 29, the data unit with data_unit_id=0 in the encoded point cloud, the data unit with data_unit_id=3 in the encoded mesh, and the data unit with data_unit_id=0 in the encoded three-dimensional model are all assigned space_id=1. This means that they are three-dimensional data contained in a common three-dimensional space indicated by Space_ID #1. 【0211】 Data such as data and headers may be contained in a bitstream structure such as a data unit or encoding scheme, or they may be stored in a predetermined file format such as each BOX of ISOBMFF. 【0212】 Next, we will explain three-dimensional spatial information. Figure 30 is a two-dimensional diagram showing an example of multiple three-dimensional spaces. Figure 31 is a diagram showing an example of a bounding box. Figure 32 is a diagram showing an example of the syntax of three-dimensional spatial information. 【0213】 In the syntax for three-dimensional spatial information, 3Dspace_info represents information indicating a divided three-dimensional space. 3Dspace_info can be used for partial decoding. 【0214】`number_of_space` indicates the number of partitions in the three-dimensional space. 【0215】 `space_id` indicates the identifier of the partitioned three-dimensional space. 【0216】 The three-dimensional spatial information includes bounding box information as information for defining the bounding box shown in Figure 31. 【0217】 Bounding box information includes bounding_box_xyz and bounding_box_whd. 【0218】 `bounding_box_xyz` indicates the coordinates of the reference point of the bounding box. In the example in Figure 31, it is represented, for example, by the coordinate values of x, y, and z (x0, y0, z0). 【0219】 `bounding_box_whd` indicates the size of the bounding box. In the example in Figure 31, it is represented, for example, by width w, height h, and depth d (w0, h0, d0). 【0220】 Furthermore, the three-dimensional spatial information may include an identifier for each data unit of the encoded data. However, the three-dimensional spatial information does not necessarily have to include such an identifier. In other words, the identifier does not need to be signaled. 【0221】 The `pointcloud_id` indicates the identifier of the data unit of the encoded point cloud in the space corresponding to the `space_id`. 【0222】 The mesh_id indicates the identifier of the data unit of the encoded mesh in the space corresponding to the space_id. 【0223】 The `model_id` indicates the identifier of the data unit of the encoded 3D model for the space corresponding to `space_id`. 【0224】 Furthermore, if a data unit does not have a space_id but does have a data_unit_id, the identifier of the data unit for each encoded data may be stored in the information indicating each space of the three-dimensional spatial information. This allows the three-dimensional spatial information to be associated with the divided three-dimensional encoded data. 【0225】 Furthermore, if a space_id is indicated in the data unit, the space_id may be used to associate the three-dimensional spatial information with the data unit identifier for each encoded data. In this case, the data unit identifier for each encoded data does not need to be stored. 【0226】 The three-dimensional spatial information of the point cloud data and the three-dimensional spatial information of the mesh data may be made common by making the division method, the origin of each divided space, and the size of the bounding box the same for both the mesh data and the point cloud data. Alternatively, the same three-dimensional spatial information may be used for both the point cloud data and the mesh data. In this way, three-dimensional spatial information may be made common among multiple different types of three-dimensional data, or the same three-dimensional spatial information may be used. By making the three-dimensional spatial information common, it becomes easier to switch between different types of three-dimensional data (for example, switching presentation or transmission). Furthermore, in a format that handles multiple three-dimensional data integrally, it is not necessary to provide three-dimensional spatial information for each three-dimensional data, and one set of three-dimensional spatial information can be used for each three-dimensional data, thus reducing the amount of three-dimensional spatial information data. 【0227】 In addition to point cloud data and mesh data, the three-dimensional spatial information of the three-dimensional model may be synchronized with other types of three-dimensional data, or the three-dimensional spatial information may be shared with other types of three-dimensional data. 【0228】 Next, we will explain the relationship between the data structure of three-dimensional data and partial decoding. Figure 33 is a flowchart showing an example of partial decoding. Figure 34 shows an example of a three-dimensional spatial region targeted for partial decoding. Figure 35 shows an example of the data structure of an encoded point cloud to be partially decoded. Figure 36 shows an example of the data structure of an encoded mesh to be partially decoded. Figure 37 shows an example of the data structure of an encoded three-dimensional model to be partially decoded. 【0229】 In partial decoding, the decoding device first determines the three-dimensional spatial region to be partially decoded (S1001). 【0230】Next, the decoding device uses three-dimensional spatial information (3Dspace_info) to identify regions that overlap with the target three-dimensional spatial region from the bounding box information of multiple three-dimensional spatial regions, and obtains the space_id corresponding to the identified region (S1002). 【0231】 Next, the decoding device obtains a data unit having the acquired space_id from the encoded data and decodes it (S1003). In this way, the decoding device performs partial decoding, which decodes only a portion of the three-dimensional data. In partial decoding, the decoding device decodes only a portion of the three-dimensional data, not the entire three-dimensional data. 【0232】 For example, as shown in Figure 34, if the three-dimensional spatial region to be partially decoded is the region indicated by the thick line, the space_id of the three-dimensional space to be obtained is determined to be #2 from the three-dimensional spatial information. 【0233】 Then, as shown in Figures 35 to 37, the data unit associated with Space_id=#2 among the encoded data of multiple types of three-dimensional data is acquired and decoded. 【0234】 The decoding device may also obtain a data unit ID instead of a space_id from the three-dimensional spatial information, and then obtain a data unit having the obtained data unit ID to perform partial decoding. 【0235】 In the above embodiment, point cloud data, mesh data, and three-dimensional model data were given as examples of three-dimensional data representing a three-dimensional object, but the embodiment is not limited to these. For example, a three-dimensional object may be represented by multiple sets, each containing line-of-sight information indicating a line of sight and a two-dimensional image of the three-dimensional object as seen from that line of sight. In other words, data containing such multiple sets may be treated as a type of three-dimensional data. Furthermore, the three-dimensional data may be in other formats, such as Gaussian splatting data. 【0236】 Figure 38 is a diagram showing an example of the configuration of a decoding device. Figure 39 is a flowchart showing an example of a decoding method using the decoding device. 【0237】The decoding device 1130 includes a circuit 1131 and a memory 1132 connected to the circuit 1131. 【0238】 Circuit 1131 performs the following operations. 【0239】 Circuit 1131 acquires encoded data including encoding scheme information (format) indicating one encoding scheme that includes first data representing a three-dimensional object and second data representing the three-dimensional object, and identification information indicating the three-dimensional space containing the three-dimensional object (S1021). Next, circuit 1131 decodes the first data and the second data corresponding to the three-dimensional space based on the encoded data (S1022). Next, circuit 1131 renders the first data to generate first presentation data for presentation (S1023). Next, circuit 1131 renders the second data to generate second presentation data for presentation (S1024). Next, circuit 1131 switches from the generated second presentation data to the first presentation data and presents it (S1025). The first presentation data and the second presentation data are, for example, two-dimensional data or three-dimensional data generated by the rendering reconstruction unit 1034. 【0240】 According to this method, first and second presentation data are generated based on first and second data corresponding to three-dimensional space, and the presentation is switched from the second presentation data to the first presentation data. Therefore, the presentation can be carried out without spatial discrepancies occurring when switching between the two data representing a three-dimensional object. Thus, the first and second presentation data can be presented appropriately. 【0241】 For example, the first data is point cloud data representing the three-dimensional object. 【0242】 Therefore, by switching from the second presented data to the first presented data based on point cloud data, it is possible to switch and present the data in a way that does not cause spatial displacement when switching between the two data representing a three-dimensional object. 【0243】 For example, the second data is mesh data representing the three-dimensional object. 【0244】 Therefore, by switching from the second presentation data based on mesh data to the first presentation data, it is possible to switch and present the data in a way that does not cause spatial misalignment when switching between the two data representing a three-dimensional object. 【0245】 For example, the second data is three-dimensional model data representing the three-dimensional object. The three-dimensional model data represents a machine learning model obtained by machine learning multiple sets of line-of-sight and two-dimensional images. 【0246】 Therefore, by switching from the second presentation data based on the three-dimensional model data to the first presentation data, it is possible to switch and present the data in a way that does not cause spatial misalignment when switching between the two data representing the three-dimensional object. 【0247】 For example, the second data is a two-dimensional image of the three-dimensional object as viewed from a predetermined line of sight. 【0248】 Therefore, by switching from the second presentation data, which is based on a two-dimensional image, to the first presentation data, it is possible to switch and present the data in a way that does not cause spatial misalignment when switching between the two data representing a three-dimensional object. 【0249】 For example, the circuit further acquires a request from the user to switch the presented data. In the presentation, the circuit switches from the second presented data to the first presented data in response to the switching request. 【0250】 Therefore, the switch can be performed at a time specified by the user. 【0251】 For example, the circuit further accepts an operation from the user to change the presentation mode. In the presentation, the circuit changes the presentation mode in response to the operation, and switches from the second presentation data to the first presentation data in response to the change. 【0252】 Therefore, the switching can be performed at a timing that corresponds to the user's actions. 【0253】 For example, in the acquisition, the circuit acquires the encoded data from an encoding device via a communication network. In the presentation, the circuit switches from the second presentation data to the first presentation data and presents it according to the bandwidth of the communication network. 【0254】 Therefore, the switching can be performed according to the bandwidth of the communication network. For example, when the bandwidth of the communication network changes from below a predetermined bandwidth to above a predetermined bandwidth, the system can switch from presenting the second data to presenting the first data. 【0255】 For example, in the presentation, the circuit switches from the second presentation data to the first presentation data depending on the capabilities of the circuit available. 【0256】 Therefore, the switching can be performed according to the capabilities of the available circuits. For example, when the capabilities of the available circuits change from below a predetermined capability to above a predetermined capability, the data can be switched from the second presented data to the first presented data. 【0257】 For example, the encoded data includes synchronization information for synchronizing the coordinate system of the first data and the coordinate system of the second data. In the presentation, the circuit presents the first presentation data and the second presentation data based on the synchronization information. 【0258】 Therefore, the coordinate systems of the first and second presented data can be aligned, and then the switch from the second presented data to the first presented data can be performed. This allows for a smoother transition between the two data representing a three-dimensional object, minimizing spatial discrepancies. 【0259】 For example, the circuit further determines whether or not to synchronize the coordinate system of the first data and the coordinate system of the second data. If the circuit determines that the coordinate system of the first data and the coordinate system of the second data should be synchronized, the circuit presents the first presentation data and the second presentation data in the presentation based on the synchronization information. 【0260】Therefore, synchronization can be performed when necessary and skipped when unnecessary. This may reduce the processing load. 【0261】 For example, the first data and the second data each have a common structure. 【0262】 Therefore, the amount of data in the encoded data can be reduced. Consequently, communication capacity can be reduced. 【0263】 For example, the encoded data includes spatial information for identifying the three-dimensional space containing the three-dimensional object. The circuit further acquires a target region indicating a part of the three-dimensional space. Based on the spatial information, the circuit identifies a first duplicate data which is a part of the first data and overlaps with the target region. In the decoding, the circuit decodes the identified first duplicate data. 【0264】 Therefore, for example, by acquiring only the first duplicate data, the amount of data to be acquired can be reduced. Thus, communication capacity can be reduced. Also, for example, only the first duplicate data can be decrypted. Thus, the processing load can be reduced. 【0265】 Furthermore, circuit 1131 may operate as shown in the decoding method in the flowchart of Figure 40. Figure 40 is a flowchart of another example of a decoding method by a decoding device. 【0266】 Circuit 1131 decodes encoding scheme information that represents the three-dimensional object and indicates a second encoding scheme different from the first encoding scheme of the first data (S1031). Circuit 1131 decodes the second data of the second encoding scheme indicated by the encoding scheme information (S1032). The second data is used to generate second presentation data for presentation. 【0267】 According to this, in order to decode the second data of the second encoding scheme indicated by the encoding scheme information obtained by decoding, it is possible to obtain second data for generating appropriate second presentation data. 【0268】 Figure 41 is a diagram showing an example of the configuration of an encoding device. Figure 42 is a flowchart showing an example of an encoding method using the encoding device. 【0269】 The encoding device 1140 includes a circuit 1141 and a memory 1142 connected to the circuit 1141. 【0270】 Circuit 1141 performs the following operations. 【0271】 Circuit 1141 generates encoding scheme information that represents the three-dimensional object and indicates a second encoding scheme different from the first encoding scheme of the first data (S1041). Circuit 1141 generates second data of the second encoding scheme indicated by the encoding scheme information (S1042). Circuit 1141 generates a bitstream containing the encoding scheme information and the second data (S1043). The second data is used to generate second presentation data for presentation. 【0272】 According to this, in order to generate a bitstream containing encoding scheme information and second data, a decoding device that has acquired the bitstream can obtain second data for generating appropriate second presentation data. 【0273】 Figure 43 is a block diagram showing an example of a device that generates format data from data. This device comprises an encoding unit 1143 and a formatting unit 1144. 【0274】 The encoding unit 1143 encodes data consisting of moving images, audio, three-dimensional data (point cloud, mesh, 3D Gaussian splatting, NeRF), neural network models, metadata, SEI, etc., according to a predetermined encoding scheme and generates encoded data. The encoding unit 1143 outputs the generated encoded data as an encoded data bitstream to the formatting unit 1144. In addition to encoded video and audio data, the encoded data may also include metadata such as parameter sets, control information, and SEI. The encoded data may be stored in an encoding unit, which is treated as a processing unit within the encoded data bitstream. 【0275】 The formatting unit 1144 formats the input encoded data according to a predetermined system format and outputs it as format data. The formatting unit 1144 multiplexes the encoded data according to, for example, a file format for storage compliant with ISOBMFF or a packet format for transmission compliant with RTP, and can store multiple types of media and related metadata, such as video, audio, three-dimensional data (point cloud, mesh, 3D Gaussian splatting, NeRF), subtitles, application data, file information, reference time information, sensor information, and camera information, in the same format data as needed. The format data may be transmitted externally via input / output means equipped with a communication interface or user interface, or it may be stored in internal memory or storage means. 【0276】 Figure 44 is a block diagram showing an example of a device that restores the original data from formatted data. 【0277】 The reverse formatting unit 1145 analyzes the input format data according to a predetermined system format, extracts the encoded data stored in the format data, and outputs it as encoded data. The reverse formatting unit 1145 may also extract encoded data according to formats such as ISOBMFF, MPEG-DASH, MMT, AVI, MPEG-2 TS Systems, RTP, glTF, and USD. 【0278】 The decoding unit 1146 decodes the encoded data input from the inverse formatting unit 1145 according to a predetermined encoding scheme and outputs data consisting of video, audio, three-dimensional data (point cloud, mesh, 3D Gaussian splatting, NeRF), neural network models, metadata, etc. The output data may be used by applications. 【0279】 Figure 45 is a conceptual diagram illustrating an example of how an encoded data bitstream is stored in a system format. 【0280】As shown in this figure, the encoded data bitstream includes not only encoded video and audio data, but also supplementary information such as parameter sets and SEI. The encoded data, parameter sets, and SEI are grouped together as encoding units and treated as NAL units or data units. Examples of these encoding units include the Video NAL unit in video encoding, the TLV unit in G-PCC, and the V3C unit for storing video-based volumetric media, but other types of units and data units may also be used. These encoding units are stored in a system format compliant with ISOBMFF or a transmission format compliant with RTP. Furthermore, the metadata (parameter sets, control information, SEI) of the encoded data described in this embodiment, a part of the metadata, the syntax structure, and the header of the encoded data may be defined as a box structure. These boxes may be contained in the Movie box "moov" or in the Media Data box "mdat". If the metadata is per frame, a new metadata track may be defined and the metadata may be stored in sample entries within the metadata track. Furthermore, the encoded video and three-dimensional data described in this embodiment may also be defined as a box structure and may be stored in an mdat box as a sample or subsample. The header portion may be stored in a moov box and the payload portion in an mdat box. In addition, the SEI described in this embodiment may also be defined as a box structure and may be contained within a moov box or an mdat box. The encoded data may be stored in units, or parts of units may be extracted and stored, or the syntax structure may be modified as needed. Note that the above is an example of storing encoded data in ISOBMFF, and the same approach can be applied when storing encoded data in other formats. 【0281】 Figure 46 shows an example of the box structure of ISOBMFF. 【0282】ISOBMFF is an ISO-based media file format specified in ISO / IEC 14496-12, and is a media-independent file format standard for multiplexing and storing various media such as video, audio, and text. In ISOBMFF, a file is composed of multiple boxes, each box consisting of type, length, and data. This embodiment shows an example in which a File type box "ftyp", a Movie box "moov" that stores metadata such as control information, and a Media Data box "mdat" that stores media data such as encoded data are used. The ftyp box indicates the file brand using 4CC and shows compatibility, while the moov box contains a track box that shows media-specific information, including an hdlr box indicating the media type, an stsd box indicating decoding parameters, a tref box indicating reference relationships, and an stco box indicating the data storage location. The mdat box stores samples and subsamples of encoded data corresponding to each media track in the moov box. A sample is a unit of encoded data corresponding to an access unit or a frame at the same time, and a subsample is segmented data corresponding to a part of an access unit or frame. Note that the method of storing each media in ISOBMFF is specified separately; for example, the storage method for AVC video and HEVC video is specified in ISO / IEC 14496-15, and the storage method for three-dimensional data is specified in ISO / IEC 23090-10 and ISO / IEC 23090-18. 【0283】 (Embodiment 2) This section describes a method for generating a still image of a subject (three-dimensional object) viewed from an arbitrary viewpoint in a stationary space, using a three-dimensional data generation model, which is a learned model obtained based on learning. 【0284】 Figure 47 is a diagram illustrating the training process of the three-dimensional generative model in Embodiment 2. Figure 48 is a diagram illustrating the process of generating a still image of a subject viewed from an arbitrary viewpoint using the three-dimensional generative model in Embodiment 2. 【0285】 Information processing devices can generate still images from arbitrary viewpoints in static space by acquiring a three-dimensional data generation model through learning. For example, there are three-dimensional data generation models generated using methods such as NeRF (Neural Radiance Fields). 【0286】 During training, the information processing device acquires training data that includes, for example, an image of viewpoint A (ground truth value) acquired from an arbitrary viewpoint A, and viewpoint information (such as camera orientation) of viewpoint A at the time the image was acquired. The viewpoint information may include viewpoint A and the direction of line of sight from viewpoint A. The information processing device uses, for example, an evaluation function 1402 to input the viewpoint information from the training data into the three-dimensional data generation model 1401, and optimizes the network parameters included in the three-dimensional data generation model so that the difference between the generated image of viewpoint A output from the three-dimensional data generation model 1401 and the input image of viewpoint A corresponding to viewpoint A is minimized. The information processing device can obtain a three-dimensional data generation model with higher accuracy by performing this training process using multiple training data corresponding to multiple different viewpoints. The training process is performed for the training data corresponding to each of the multiple viewpoints. In other words, the same process as the training process for viewpoint A is performed for each viewpoint. 【0287】 During generation, the information processing device, when given viewpoint information for viewpoint B, outputs a generated image of viewpoint B to the trained three-dimensional data generation model 1403, and when given viewpoint information for viewpoint Z, which is different from viewpoint B, outputs a generated image of viewpoint Z. The viewpoint information for viewpoint B may include viewpoint B and the direction of line of sight from viewpoint B. The viewpoint information for viewpoint Z may include viewpoint Z and the direction of line of sight from viewpoint Z. 【0288】 In this way, by acquiring the three-dimensional data generation model 1403 through training, it is possible to generate still images viewed from an arbitrary viewpoint in a static space. However, moving images cannot be generated in this form. 【0289】Figure 48 shows an example of a three-dimensional data generation model that generates an image from a given viewpoint when viewpoint information is input. However, the model is not limited to this example, and the data format output from the three-dimensional data generation model can be in any form. For example, the three-dimensional data generation model could be a network model that outputs three-dimensional data of the target space obtained through learning in the form of point cloud data or mesh data. This allows the user to view the target space three-dimensionally using point cloud data or mesh data, and to measure the dimensions of objects in the target space output as three-dimensional data using the point cloud data or mesh data. 【0290】 [Example 1] Figure 49 is a diagram illustrating a method for generating moving images using the three-dimensional data generation model of Example 1 in Embodiment 2. In this embodiment, an example of the configuration and method of a device for encoding or decoding the three-dimensional data generation models NNt0 to NNt5 generated corresponding to times t0 to t5 is described, but the invention is not limited to this and may be applied to a device and method for encoding or decoding the three-dimensional data generation model at each time point in any given period. 【0291】 This embodiment demonstrates a method for generating moving images of a target object (subject) viewed from an arbitrary viewpoint using a three-dimensional data generation model. In this method, for example, as shown in Figure 49, by acquiring a three-dimensional data generation model corresponding to each time point, still images of the target object viewed from an arbitrary viewpoint at each time point can be generated, and a moving image can be generated by arranging the generated still images in chronological order. More specifically, when generating moving images from time t0 to t5, multiple three-dimensional data generation models NNt0 to NNt5 corresponding to time t0 to t5 are generated through learning, and viewpoint information (such as camera orientation) of viewpoint A from which the moving image is to be generated is input to the generated three-dimensional data generation models NNt0 to NNt5 corresponding to time t0 to t5. As a result, the three-dimensional data generation models NNt0 to NNt5 output generated images of viewpoint A from time t0 to t5, and by connecting them chronologically, a moving image of the target object viewed from viewpoint A from time t0 to t5 can be generated. 【0292】 However, in this case, it is necessary to maintain multiple three-dimensional data generation models corresponding to multiple time points, requiring either a huge amount of storage capacity to store the data of multiple three-dimensional data generation models in storage, or a huge amount of network bandwidth to transmit the data of multiple three-dimensional data generation models over a network. Therefore, the data size may be reduced by data encoding the multiple three-dimensional data generation models corresponding to multiple time points, for example, using NNC (Neural Network Coding) of the MPEG (Moving Picture Experts Group) standard. This disclosure describes a method for further efficiently compressing this data. 【0293】 NNC is shown in Non-Patent Document 1. 【0294】 Figure 50 shows a first example of the configuration of the encoding device of Example 1 in Embodiment 2. 【0295】 The encoding device 1420 comprises a three-dimensional data generation model acquisition unit 1421, a buffer unit 1422, and a network model encoding unit 1423. 【0296】 The three-dimensional data generation model acquisition unit 1421 acquires training data for times t0 to t5, and uses the acquired training data for times t0 to t5 to generate three-dimensional data generation models NNt0 to NNt5 for times t0 to t5 through training. The training data includes multiple viewpoint images obtained by photographing the target object from one or more viewpoint positions and in one or more line-of-sight directions for each of the multiple viewpoint images, and one or more viewpoint information indicating one or more viewpoint positions and one or more line-of-sight directions corresponding to the multiple viewpoint images. The one or more viewpoint information may be the position and orientation of the camera at the time each of the multiple viewpoint images was taken. The training data is not limited to this and may further include information obtained from other sensors. For example, the training data may include point cloud data and depth images for each time point acquired using a LiDAR or TOF sensor. This makes it possible to improve the accuracy of the three-dimensional data generation model obtained through training. 【0297】The buffer unit 1422 stores the three-dimensional data generation model at time t generated by the three-dimensional data generation model acquisition unit 1421. The buffer unit 1422 is implemented by a storage device such as memory. The three-dimensional data generation model at time t stored in the buffer unit 1422 may be used, for example, as an initial model when the three-dimensional data generation model acquisition unit 1421 acquires (generates) a three-dimensional data generation model after time t through learning. This makes it possible to improve the accuracy of the three-dimensional data generation model after time t while shortening the learning time. 【0298】 The buffer unit 1422 may store multiple three-dimensional data generation models corresponding to multiple time points. For example, based on the multiple three-dimensional data generation models stored in the buffer unit 1422, a single initial model may be generated through processing such as averaging. The three-dimensional data generation model acquisition unit 1421 can acquire a highly accurate three-dimensional data generation model by learning three-dimensional data generation models from time t onwards using this initial model. If the three-dimensional data generation model acquisition unit 1421 does not refer to past three-dimensional data generation models during learning, the encoding device 1420 does not need to include the buffer unit 1422. This reduces the amount of memory used as the buffer unit 1422. 【0299】 The network model encoding unit 1423 encodes the three-dimensional data generation models NNt0 to NNt5 acquired by the three-dimensional data generation model acquisition unit 1421 and outputs a bitstream. 【0300】 Furthermore, as a network model coding scheme, the data size may be reduced by data coding using, for example, the NNC of the MPEG standard. In other words, the network model coding unit 1423 encodes the three-dimensional data generation models NNt0 to NNt5 using NNC and adds the encoding result to the bitstream. To put it another way, the network model coding unit 1423 generates encoded data as an encoding result and generates a bitstream containing the encoded data. 【0301】Specifically, the network model coding unit 1423 first encodes the three-dimensional data generation model NNt0 at time t0 using an NNC and adds the encoded result to the bitstream. Then, the network model coding unit 1423 encodes the three-dimensional data generation model NNt1 at time t1 using an NNC and adds the encoded result to the bitstream. In this way, the network model coding unit 1423 may reduce the amount of coding by sequentially encoding the three-dimensional data generation model at each time point using an NNC and adding the respective encoded results to the bitstream. 【0302】 In this case, the network model encoding unit 1423 may add time information as metadata to the bitstream to indicate which time period the encoded three-dimensional data generation model corresponds to. This allows the decoding device to decode and refer to the metadata contained in the bitstream to know which time period the decoded three-dimensional data generation model corresponds to, and to appropriately generate moving images of the target object from any viewpoint. 【0303】 Furthermore, metadata may include not only time information, but also information related to the acquisition (generation) of training data, or information necessary for the decoding device to generate video. 【0304】 For example, the network model coding unit 1423 may add information about the camera's frame rate when acquiring (generating) the training data as metadata. This allows the decoding device to decode the frame rate of the generated video from the bitstream and set the frame rate appropriately. 【0305】 Furthermore, the network model coding unit 1423 may add a frame number corresponding to each time point to the bitstream as metadata instead of time information, and associate each frame number with time information using another parameter. For example, the network model coding unit 1423 may add the time information and frame rate of the first frame as metadata, and the decoding device may reduce the amount of coding required for each frame by calculating the time information of each frame from that metadata. 【0306】 Furthermore, the network model coding unit 1423 may add viewpoint information from the viewpoint images used for training to the bitstream. This allows the decoding device to generate high-quality video by, for example, prioritizing the selection of viewpoints close to the viewpoint positions corresponding to the images used for training. This is because the closer the viewpoint position or time is to that used during training, the higher the potential of the three-dimensional data generation model to generate higher-quality viewpoint images. 【0307】 Figure 51 shows a first example of the configuration of the decoding device of Example 1 in Embodiment 2. 【0308】 The decoding device 1425 comprises a network model decoding unit 1426 and a rendering unit 1427. 【0309】 The network model decoding unit 1426 acquires a bitstream and, based on the acquired bitstream, decodes the three-dimensional data generation models NNt0 to NNt5 for times t0 to t5, as well as metadata such as time information. 【0310】The rendering unit 1427 uses the three-dimensional data generation models NNt0 to NNt5 decoded by the network model decoding unit 1426, as well as metadata such as time information, to generate a moving image of viewpoint A based on viewpoint information specified by the user or system. Specifically, the rendering unit 1427 inputs the viewpoint information of viewpoint A into the three-dimensional data generation model NNt0 at time t0 to generate the image IMGt0 of viewpoint A at time t0, and then inputs the viewpoint information of viewpoint A into the three-dimensional data generation model NNt1 at time t1 to generate the image IMGt1 of viewpoint A at time t1. The rendering unit 1427 applies these image generation processes at each time point to the respective times t2 to t5 to generate the images IMGt2 to IMGt5 of viewpoint A at times t2 to t5. The rendering unit 1427 then generates a video of the target object as viewed from viewpoint A at times t0 to t5, using images IMGt0 to IMGt5 and metadata such as time information. The video may include, for example, images IMGt0 to IMGt5 and presentation time information for calculating the presentation time of images IMGt0 to IMGt5 based on times t0 to t5. 【0311】 The viewpoint information may be changed according to the time. For example, viewpoint information for viewpoint A may be input to the three-dimensional data generation models NNt0 to NNt3 at times t0 to t3, and viewpoint information for viewpoint B may be input to the three-dimensional data generation models NNt4 to NNt5 at times t4 to t5. As a result, the rendering unit 1427 generates multiple images of the target object as seen from viewpoint A at times t0 to t3, and multiple images as seen from viewpoint B at times t4 to t5. In other words, the rendering unit 1427 can generate a moving image of the target object, in which the viewpoint switches from viewpoint A to viewpoint B at time t4. 【0312】 Furthermore, the rendering unit 1427 does not necessarily need to generate moving images; it may also generate still images of a specified viewpoint at a specified time. This allows the user to switch between generating moving images and still images depending on their needs. 【0313】Furthermore, the rendering unit 1427 is not limited to generating moving or still images from the three-dimensional data generation model. For example, the rendering unit 1427 may generate point cloud data or mesh data from the three-dimensional data generation model and output the generated point cloud data or mesh data as dynamic point cloud data or dynamic mesh data. This allows the user to view dynamic three-dimensional data of a moving target object on an HMD (Head Mount Display) or the like, and to measure the amount of movement of the target object using the dynamic three-dimensional data. 【0314】 Figure 52 shows a second example of the configuration of the encoding device of Example 1 in Embodiment 2. 【0315】 The encoding device 1430 includes a three-dimensional data generation model acquisition unit 1431, a buffer unit 1432, a difference calculation unit 1433, and a network model encoding unit 1434. 【0316】 The three-dimensional data generation model acquisition unit 1431 is the same as the three-dimensional data generation model acquisition unit 1421 of the encoding device 1420. 【0317】 The buffer unit 1432 is similar to the buffer unit 1422 of the encoding device 1420, but differs from the buffer unit 1422 in that it inputs a three-dimensional data generation model stored in memory or the like as a reference three-dimensional data generation model to the difference calculation unit 1433. 【0318】 The difference calculation unit 1433 calculates difference information showing the difference between the three-dimensional data generation models NNt0 to NNt5 generated by the three-dimensional data generation model acquisition unit 1431 at times t0 to t5 and the three-dimensional data generation model generated by the three-dimensional data generation model acquisition unit 1431 before each time (hereinafter referred to as the reference three-dimensional data generation model). Here, the difference information may include differences in weight parameters at the nodes of each network model. For example, the difference calculation unit 1433 acquires the three-dimensional data generation model NNt5 at time t5 from the three-dimensional data generation model acquisition unit 1431 and acquires the three-dimensional data generation model NNt4 at time t4 from the buffer unit 1432 as the reference three-dimensional data generation model. 【0319】 The difference calculation unit 1433 may use the three-dimensional data generation model NNt5 and the three-dimensional data generation model NNt4 to calculate, for example, the difference (amount of change) between the weight parameters of the nodes of the network model in the three-dimensional data generation model NNt5 and the weight parameters of the nodes of the network model in the three-dimensional data generation model NNt4, and input this difference information to the network model coding unit 1434. As a result, the difference information is coded by the network model coding unit 1434. In other words, the coding device 1430 may reduce the amount of data by performing predictive coding, which predicts information related to the network model in the three-dimensional data generation model NNt5 from the three-dimensional data generation model NNt4 and codes the difference with the predicted value. With such predictive coding, for example, when the target object does not move much, and there is little change in the three-dimensional data generation model over time, the value of the difference to be coded becomes small, so coding efficiency can be improved. For example, the encoding device 1430 may set RNNt0 = 0 and RNNtn = NNt(n-1) (where n is an integer value from 1 to 5), and reduce the bit size by predictive coding using the previous time step's three-dimensional data generation model as a reference three-dimensional data generation model. 【0320】In the second example, the encoding device 1430 predictively encodes information related to the network model in the three-dimensional data generation model NNt5 from information related to the network model in the three-dimensional data generation model NNt4, but this is not necessarily limited to this. The encoding device 1430 may, for example, select a reference three-dimensional data generation model to be used for prediction from one or more three-dimensional data generation models stored in the buffer unit 1432, and predictively encode using the selected three-dimensional data generation model. In this case, the encoding device 1430 may add information indicating the selected three-dimensional data generation model (reference three-dimensional data generation model information) to the bitstream in order to transmit the selected three-dimensional data generation model to the decoding device. This allows the encoding device 1430 to select the optimal reference three-dimensional data generation model from the viewpoint of encoding efficiency, thereby improving encoding efficiency. Furthermore, the decoding device can appropriately decode the bitstream with improved encoding efficiency by decoding the reference three-dimensional data generation model information. 【0321】 Furthermore, when the encoding device 1430 performs predictive coding by referring to two or more three-dimensional data generation models stored in the buffer unit 1432, it may add information indicating the two or more reference three-dimensional data generation models to the bitstream. This allows the encoding device 1430 to improve the coding efficiency of predictive coding by using two or more reference three-dimensional data generation models. In addition, the decoding device can appropriately decode the bitstream with improved coding efficiency. 【0322】Furthermore, in cases where no reference three-dimensional data generation model is stored in the buffer unit 1432, for example, when encoding the first three-dimensional data generation model (first frame) in data order, the encoding device 1430 may encode the three-dimensional data generation model to be processed without calculating the difference from the predicted value (hereinafter referred to as intra-prediction), or it may encode after calculating the difference from the predicted value set to 0. Also, when setting a certain time t as a random access point, the encoding device 1430 may encode the three-dimensional data generation model corresponding to time t using intra-prediction, or it may encode after calculating the difference from the predicted value set to 0. As a result, the decoding device can start decoding the three-dimensional data generation model from the first three-dimensional data generation model (first frame) in data order, or from the random access point, thereby improving functionality during playback. 【0323】 Furthermore, a set of multiple three-dimensional data generation models (multiple frames) (hereinafter referred to as GOF (Group of Frame)) may be defined, and the first frame of the GOF may be encoded by intra-prediction. This allows the decoder to randomly access the first frame of the GOF, and decoding the first frame of the GOF can enhance functionality such as fast-forward playback. 【0324】 Furthermore, the encoding device 1430 may add permission information to the bitstream indicating whether or not to allow predictive referencing between GOFs. For example, if the bitstream contains permission information indicating that predictive referencing between GOFs is prohibited, the decoding device can determine that it can decode multiple GOFs in parallel. Also, for example, allowing predictive referencing between GOFs can improve encoding efficiency. 【0325】 The network model coding unit 1434 is similar to the network model coding unit 1423 of the coding device 1420, but differs in that it encodes the difference information d0 to d5 of the three-dimensional data generation models NNt0 to NNt5 input from the difference calculation unit 1433 and outputs a bitstream. 【0326】Although the encoding device 1430 is described separately as a difference calculation unit 1433 and a network model encoding unit 1434, it is not necessarily limited to this configuration. For example, the difference calculation unit 1433 may be included within the network model encoding unit 1434. In other words, the network model encoding unit 1434 may perform the processing of the difference calculation unit 1433. 【0327】 Furthermore, the encoding device 1430 may add predictive encoding information to the bitstream indicating whether the three-dimensional data generation model was encoded using intra-prediction or predictive encoding using a reference three-dimensional data generation model (hereinafter referred to as inter-prediction). This allows the decoding device to appropriately determine whether to use intra-prediction or inter-prediction to decode the three-dimensional data generation model by decoding the predictive encoding information. 【0328】 Figure 53 shows a second example of the configuration of the decoding device of Example 1 in Embodiment 2. 【0329】 The decoding device 1435 comprises a network model decoding unit 1436, an addition unit 1437, a buffer unit 1438, and a rendering unit 1439. 【0330】 The network model decoding unit 1436 acquires a bitstream and, based on the acquired bitstream, decodes the difference information d0 to d5 of the three-dimensional data generation model NNt0 to NNt5 for time t0 to t5, as well as metadata such as time information. 【0331】 The addition unit 1437 adds the difference information d0 to d5 of the three-dimensional data generation model corresponding to times t0 to t5, which has been decoded by the network model decoding unit 1436, and the reference three-dimensional data generation models RNNt0 to RNNt5 obtained from the buffer unit 1438, for the corresponding time, and calculates the three-dimensional data generation models NNt0 to NNt5. In this way, the decoding device 1435 may predict and decode the three-dimensional data generation model from the previous time using the reference three-dimensional data generation model, by setting RNNt0 = 0 and RNNtn = NNt(n-1) (where n is a value from 1 to 5). 【0332】In the second example, the decoding device 1435 is described with the addition unit 1437 and the network model decoding unit 1436 described separately. However, this is not necessarily the only option. For example, the addition unit 1437 may be included within the network model decoding unit 1436. In other words, the network model decoding unit 1436 may perform the processing of the addition unit 1437. 【0333】 Furthermore, in cases where the buffer unit 1438 does not store a reference three-dimensional data generation model, for example, when decoding the first three-dimensional data generation model (first frame) in data order, the decoding device 1435 may decode without prediction, without the adder unit 1437 adding the difference information and the reference three-dimensional data generation model (hereinafter referred to as intra-prediction), or it may decode by adding the difference information to the predicted value set to 0. Also, when setting a certain time t as a random access point, the decoding device 1435 may decode the three-dimensional data generation model corresponding to time t using intra-prediction, or it may decode by adding the difference information to the predicted value set to 0. In addition, if the bitstream contains predictive coding information indicating that the three-dimensional data generation model to be decoded has been encoded using intra-prediction, the three-dimensional data generation model may be decoded using intra-prediction, or it may be decoded by adding the difference information to the predicted value set to 0. This allows the decoding device 1435 to start decoding the three-dimensional data generation model from the first three-dimensional data generation model (first frame) in data order, a random access point, or a three-dimensional data generation model encoded by intra-prediction, thereby improving functionality during playback. 【0334】In the second example, the decoding device 1435 predictively decodes information related to the network model in the three-dimensional data generation model NNt5 from information related to the network model in the three-dimensional data generation model NNt4, but this is not necessarily limited to this. The decoding device 1435 may, for example, select a reference three-dimensional data generation model to be used for prediction from one or more three-dimensional data generation models stored in the buffer unit 1438, and predictively decode using the selected three-dimensional data generation model. In this case, the decoding device 1435 may decode information indicating the selected three-dimensional data generation model (reference three-dimensional data generation model information) from the bitstream. As a result, the decoding device 1435 can appropriately decode a bitstream with improved coding efficiency by decoding the reference three-dimensional data generation model information from the bitstream generated by the encoding device 1430 in which the optimal reference three-dimensional data generation model is selected from the viewpoint of coding efficiency. 【0335】 Furthermore, when the decoding device 1435 performs predictive decoding by referring to two or more three-dimensional data generation models stored in the buffer unit 1438, it may decode information indicating the two or more reference three-dimensional data generation models from the bitstream. This allows the decoding device 1435 to appropriately decode a bitstream in which the coding efficiency of predictive coding has been improved using two or more reference three-dimensional data generation models. 【0336】 The rendering unit 1439 is the same as the rendering unit 1427 of the decoding device 1425. The rendering unit 1439 does not necessarily need to generate moving images; it may also generate still images of a specified viewpoint at a specified time. 【0337】[Example 2] Figure 54 is a diagram illustrating a method for generating moving images using the extended three-dimensional data generation model of Example 2 in Embodiment 2. In this embodiment, an example of the configuration and method of a device for encoding or decoding the extended three-dimensional data generation models NNt0-2 and NNt3-5, which are generated corresponding to the periods t0-t2 and t3-t5, respectively, from time t0-t5, is described. However, the invention is not limited to this, and the device and method may be applied to encoding or decoding the extended three-dimensional data generation model for any period. 【0338】 This embodiment demonstrates a method for generating moving images of a target object (subject) viewed from an arbitrary viewpoint using a three-dimensional data generation model. In this method, for example, as shown in Figure 54, a three-dimensional data generation model (hereinafter referred to as the extended three-dimensional data generation model) capable of generating images from any viewpoint within a certain time range (period) can be obtained. This allows for the generation of still images of the target object viewed from an arbitrary viewpoint at any time within each period, and moving images can be generated by arranging the generated still images in chronological order. The extended three-dimensional data generation model is a three-dimensional data generation model generated using a method such as NeRF, similar to the three-dimensional data generation model in Embodiment 1. 【0339】 More specifically, when generating a video from time t0 to t5, an extended three-dimensional data generation model NNt0-2 capable of representing the period from time t0 to t2 and an extended three-dimensional data generation model NNt3-5 capable of representing the period from time t3 to t5 are generated through training. The viewpoint information (camera pose, etc.) of viewpoint A from which the video is to be generated is then input to the generated extended three-dimensional data generation models NNt0-2 and NNt3-5. As a result, the extended three-dimensional data generation models NNt0-2 and NNt3-5 output generated images of viewpoint A from time t0 to t5, and by temporally stitching these together, a video from time t0 to t5 as seen from viewpoint A of the target object can be generated. 【0340】However, in this case, it is necessary to maintain an extended three-dimensional data generation model corresponding to each period (each time zone), which requires a huge amount of storage capacity to store the data of the extended three-dimensional data generation models in storage, or a huge amount of network bandwidth to transmit the data of multiple three-dimensional data generation models over a network. Therefore, the data size may be reduced by data encoding the extended three-dimensional data generation model corresponding to each period, for example, using NNC (Neural Network Coding) of the MPEG (Moving Picture Experts Group) standard. This disclosure describes a method for further efficiently compressing this data. 【0341】 Furthermore, with the above configuration, the information processing device may generate any viewpoint image at any time within the period of time t0-t5. For example, when acquiring the extended three-dimensional data generation model NNt0-2, the information processing device generates the extended three-dimensional data generation model NNt0-2 using machine learning based on multiple viewpoint images taken at times t0, t1, and t2, and the camera poses corresponding to the multiple viewpoints, as training data. When generating a video of viewpoint A, the information processing device may generate not only viewpoint image A at times t0, t1, and t2, but also, for example, images of arbitrary viewpoints at times t0.5 and t1.5 between times t0, t1, and t2. Time t0.5 is between times t0 and t1, and time t1.5 is between times t1 and t2. 【0342】 As a result, the information processing device can generate images from an arbitrary viewpoint corresponding not only to the time corresponding to the image used during training, but also to a time shifted from the time corresponding to the image used during training, thereby enabling the generation of high-frame-rate video from viewpoint A. 【0343】 Furthermore, the information processing device may learn the extended three-dimensional data generation model NNt0-2 not only from the learning data at times t0, t1, and t2, but also, for example, from time t3. This allows for the high-precision generation of viewpoint images from any viewpoint after time t2, for example, an image of an arbitrary viewpoint at time t2.5. 【0344】 Furthermore, the information processing device may learn not only the learning data corresponding to times t3, t4, and t5, but also, for example, the learning data corresponding to times t2 and t6, as training data for the extended three-dimensional data generation model NNt3-5. This allows the information processing device to generate images of arbitrary viewpoints before time t3, or images of arbitrary viewpoints after time t5, with high accuracy. In addition, as a switching point for the extended three-dimensional data generation model, for example, in the above example, when generating a viewpoint image at time 2.5, which is between time t2 and t3 when the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5 switch, the information processing device may generate viewpoint images at time t2.5 for both the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5, and generate the average image of the two generated viewpoint images at time t2.5 as the viewpoint image at time t2.5. This enables the generation of a viewpoint image at time t2.5 with high accuracy. 【0345】 In this way, the information processing device can generate an image of the target object as seen from a specified viewpoint at a specified time, by specifying the time within the period to which the extended three-dimensional data generation model corresponds, and viewpoint information, to the extended three-dimensional data generation model. 【0346】 Figure 55 is a diagram showing a first example of the configuration of the encoding device of Embodiment 2. 【0347】 The encoding device 1450 comprises an extended three-dimensional data generation model acquisition unit 1451, a buffer unit 1452, and a network model encoding unit 1453. 【0348】The extended three-dimensional data generation model acquisition unit 1451 acquires training data for each period t0-t2 and t3-t5 from time t0-t5, and uses the acquired training data for each period to generate the extended three-dimensional data generation model NNt0-2 for period t0-t2 and the extended three-dimensional data generation model NNt3-5 for period t3-t5 through training. The training data includes multiple viewpoint images obtained by photographing the target object from one or more viewpoint positions in one or more line of sight directions for each time t0-t5, and one or more viewpoint information indicating one or more viewpoint positions and one or more line of sight directions corresponding to the multiple viewpoint images. The one or more viewpoint information may be the position and orientation of the camera at the time each of the multiple viewpoint images was taken. The training data is not limited to this and may further include information obtained from other sensors. For example, the training data may include point cloud data and depth images for each time period acquired using a LiDAR or TOF sensor. This makes it possible to improve the accuracy of the extended three-dimensional data generation model obtained through training. 【0349】 The buffer unit 1452 stores the extended three-dimensional data generation model for the period tm-n, from time tm (where m is an integer) to time tn (where n is an integer greater than m), which is generated by the extended three-dimensional data generation model acquisition unit 1451. The buffer unit 1452 is implemented by a storage device such as memory. The extended three-dimensional data generation model for the period tm-n stored in the buffer unit 1452 may be used, for example, as an initial model when the extended three-dimensional data generation model acquisition unit 1451 acquires (generates) an extended three-dimensional data generation model for a period after tm-n through learning. This makes it possible to improve the accuracy of the extended three-dimensional data generation model for a period after tm-n while shortening the learning time. 【0350】The buffer unit 1452 may store multiple extended three-dimensional data generation models corresponding to multiple periods. For example, based on the multiple extended three-dimensional data generation models stored in the buffer unit 1452, a single initial model may be generated by processing such as averaging. The extended three-dimensional data generation model acquisition unit 1451 can acquire a highly accurate extended three-dimensional data generation model by learning the extended three-dimensional data generation model for periods after period tm-n using this initial model. If the extended three-dimensional data generation model acquisition unit 1451 does not refer to the extended three-dimensional data generation model for past periods during learning, the encoding device 1450 does not need to have the buffer unit 1452. This reduces the amount of memory used as the buffer unit 1452. 【0351】 The network model encoding unit 1453 encodes the extended three-dimensional data generation models NNt0-2 and NNt3-5 acquired by the extended three-dimensional data generation model acquisition unit 1451 and outputs a bitstream. 【0352】 Furthermore, as a network model coding scheme, the data size may be reduced by data coding using, for example, the NNC of the MPEG standard. In other words, the network model coding unit 1453 encodes the extended three-dimensional data generation models NNt0-2 and NNt3-5 using NNC and adds the encoding results to the bitstream. To put it another way, the network model coding unit 1453 generates encoded data as an encoding result and generates a bitstream containing the encoded data. 【0353】Specifically, the network model coding unit 1453 first encodes the extended three-dimensional data generation model NNt0-2 for period t0 to t2 using NNC and adds the encoded result to the bitstream. Next, the network model coding unit 1453 encodes the extended three-dimensional data generation model NNt3-5 for period t3 to t5 using NNC and adds the encoded result to the bitstream. In this way, the network model coding unit 1453 may reduce the amount of code by sequentially encoding the extended three-dimensional data generation model for each period using NNC and adding the respective encoded results to the bitstream. 【0354】 In this case, the network model encoding unit 1453 may add time information as metadata to the bitstream to indicate which period the encoded extended three-dimensional data generation model corresponds to. This allows the decoding device to decode and refer to the metadata contained in the bitstream to know which period the decoded extended three-dimensional data generation model corresponds to, and to appropriately generate moving images of the target object from any viewpoint. 【0355】 Furthermore, the network model encoding unit 1453 may generate time information indicating which period of viewpoint images the extended three-dimensional data generation model can generate, and may add the generated time information to the bitstream as metadata. This allows the decoding device to decode this metadata, thereby knowing the period during which the extended three-dimensional data generation model can generate viewpoint images, and thus appropriately generating moving images. 【0356】 Furthermore, metadata may include not only time information, but also information related to the acquisition (generation) of training data, or information necessary for the decoding device to generate video. 【0357】 For example, the network model coding unit 1453 may add information about the camera's frame rate when acquiring (generating) the training data as metadata. This allows the decoding device to decode the frame rate of the generated video from the bitstream and set the frame rate appropriately. 【0358】 Furthermore, the network model coding unit 1453 may add frame numbers corresponding to each period to the bitstream as metadata instead of time information, and associate each frame number with time information using another parameter. For example, the network model coding unit 1453 may add the time information and frame rate of the first frame as metadata, and the decoding device may reduce the amount of coding required for each frame by calculating the time information of each frame from that metadata. 【0359】 Furthermore, the network model coding unit 1453 may add viewpoint information of the viewpoint image used for training or time information indicating the time when the viewpoint image was taken to the bitstream. This allows the decoding device to generate high-quality video by preferentially selecting viewpoints close to the viewpoint position corresponding to the image used for training, or timestamps close to the timestamps corresponding to the image used for training. This is because the closer the viewpoint position or timestamp is to that used during training, the higher the potential of the extended three-dimensional data generation model to generate higher-quality viewpoint images. 【0360】 Figure 56 shows a first example of the configuration of the decoding device of Embodiment 2. 【0361】 The decoding device 1455 comprises a network model decoding unit 1456 and a rendering unit 1457. 【0362】 The network model decoding unit 1456 acquires a bitstream and, based on the acquired bitstream, decodes an extended three-dimensional data generation model NNt0-2 for period t0 to t2, an extended three-dimensional data generation model NNt3-5 for period t3 to t5, and metadata such as time information corresponding to these extended three-dimensional data generation models NNt0-2 and NNt3-5. 【0363】The rendering unit 1457 uses the extended three-dimensional data generation models NNt0-2 and NNt3-5 decoded by the network model decoding unit 1456, along with metadata such as time information, to generate a moving image of viewpoint A based on viewpoint information specified by the user or system. Specifically, the rendering unit 1457 inputs the viewpoint information of viewpoint A and the time within the period t0-t2 into the extended three-dimensional data generation model NNt0-2 for the period t0-t2, and generates an image IMGt0 of viewpoint A at time t0, an image IMGt1 of viewpoint A at time t1, and an image IMGt2 of viewpoint A at time t2. The rendering unit 1457 applies this image generation process for the period t0-t2 to the extended three-dimensional data generation model NNt3-5 for the period t3-t5, thereby generating images IMGt3-IMGt5 of viewpoint A at times t3-t5. The rendering unit 1457 then generates a video of the target object as viewed from viewpoint A at times t0 to t5, using images IMGt0 to IMGt5 and metadata such as time information. The video may include, for example, images IMGt0 to IMGt5 and presentation time information for calculating the presentation time of images IMGt0 to IMGt5 based on times t0 to t5. 【0364】 The viewpoint information may be changed according to the time. For example, viewpoint information for viewpoint A may be input to the extended three-dimensional data generation model NNt0-2 for the period t0 to t2, and viewpoint information for viewpoint B may be input to the extended three-dimensional data generation model NNt3-5 for the period t3 to t5. As a result, the rendering unit 1457 generates multiple images of the target object as seen from viewpoint A at time t0 to t2, and multiple images as seen from viewpoint B at time t3 to t5. In other words, the rendering unit 1457 can generate a moving image of the target object, in which the viewpoint switches from viewpoint A to viewpoint B at time t3. 【0365】 Furthermore, the rendering unit 1457 does not necessarily need to generate moving images; it may also generate still images of a specified viewpoint at a specified time. This allows the user to switch between generating moving images and still images depending on their needs. 【0366】Furthermore, the rendering unit 1457 is not limited to generating moving or still images from the extended three-dimensional data generation model. For example, the rendering unit 1457 may generate point cloud data or mesh data for a period that the extended three-dimensional data generation model can represent, and output the generated point cloud data or mesh data as dynamic point cloud data or dynamic mesh data. This allows the user to view dynamic three-dimensional data of a moving target object on an HMD (Head Mount Display) or the like, and to measure the amount of movement of the target object using the dynamic three-dimensional data. 【0367】 Figure 57 shows a second example of the configuration of the encoding device of Embodiment 2 in Embodiment 2. 【0368】 The encoding device 1460 comprises an extended three-dimensional data generation model acquisition unit 1461, a buffer unit 1462, a difference calculation unit 1463, and a network model encoding unit 1464. 【0369】 The extended three-dimensional data generation model acquisition unit 1461 is the same as the extended three-dimensional data generation model acquisition unit 1451 of the encoding device 1450. 【0370】 The buffer unit 1462 is similar to the buffer unit 1452 of the encoding device 1450, but differs from the buffer unit 1452 in that it inputs an extended three-dimensional data generation model stored in memory or the like as a reference extended three-dimensional data generation model to the difference calculation unit 1463. 【0371】The difference calculation unit 1463 calculates difference information showing the difference between the extended three-dimensional data generation model NNt0-2 for period t0 to t2 and the extended three-dimensional data generation model NNt3-5 for period t3 to t5, both generated by the extended three-dimensional data generation model acquisition unit 1461, and the extended three-dimensional data generation model (hereinafter referred to as the reference extended three-dimensional data generation model) generated by the extended three-dimensional data generation model acquisition unit 1461 before each respective period. Here, the difference information may include differences in weight parameters at the nodes of each network model. For example, the difference calculation unit 1463 acquires the extended three-dimensional data generation model NNt3-5 for period t3 to t5 from the extended three-dimensional data generation model acquisition unit 1461, and acquires the extended three-dimensional data generation model NNt0-2 for period t0 to t2 from the buffer unit 1462 as the reference extended three-dimensional data generation model. 【0372】 The difference calculation unit 1463 may use the extended three-dimensional data generation models NNt3-5 and NNt0-2 to calculate, for example, the difference (amount of change) between the weight parameters of the nodes in the network model in the extended three-dimensional data generation model NNt3-5 and the weight parameters of the nodes in the network model in the extended three-dimensional data generation model NNt0-2, and input this difference information to the network model coding unit 1464. As a result, the difference information is coded by the network model coding unit 1464. In other words, the coding device 1460 may reduce the amount of data by performing predictive coding, which predicts information related to the network model in the extended three-dimensional data generation model NNt3-5 from the extended three-dimensional data generation model NNt0-2 and codes the difference with the predicted value. With such predictive coding, for example, when the target object does not move much, and there is little change in the extended three-dimensional data generation model over time, the value of the difference to be coded becomes small, and coding efficiency can be improved. For example, the encoding device 1460 may set RNNt0-2=0 and RNNt3-5=NNt0-2, and reduce the bit size by predictive coding using the extended three-dimensional data generation model from the previous time period as a reference extended three-dimensional data generation model. 【0373】In the second example, the encoding device 1460 predictively encodes information related to the network model in the extended three-dimensional data generation model NNt3-5 from information related to the network model in the extended three-dimensional data generation model NNt0-2, but this is not necessarily limited to this. The encoding device 1460 may, for example, select a reference extended three-dimensional data generation model to be used for prediction from one or more extended three-dimensional data generation models stored in the buffer unit 1462, and predictively encode using the selected extended three-dimensional data generation model. In this case, the encoding device 1460 may add information indicating the selected extended three-dimensional data generation model (reference extended three-dimensional data generation model information) to the bitstream in order to communicate the selected extended three-dimensional data generation model to the decoding device. This allows the encoding device 1460 to select the optimal reference extended three-dimensional data generation model from the viewpoint of encoding efficiency, thereby improving encoding efficiency. Furthermore, the decoding device can appropriately decode the bitstream with improved encoding efficiency by decoding the reference extended three-dimensional data generation model information. 【0374】 Furthermore, when the encoding device 1460 performs predictive coding by referring to two or more extended three-dimensional data generation models stored in the buffer unit 1462, it may add information indicating the two or more reference extended three-dimensional data generation models to the bitstream. This allows the encoding device 1460 to improve the coding efficiency of predictive coding using two or more reference extended three-dimensional data generation models. In addition, the decoding device can appropriately decode the bitstream with improved coding efficiency. 【0375】Furthermore, in cases where the buffer unit 1462 does not store a reference extended three-dimensional data generation model, for example, when encoding the first extended three-dimensional data generation model (first frame) in data order, the encoding device 1460 may encode the extended three-dimensional data generation model to be processed without calculating the difference from the predicted value (hereinafter referred to as intra-prediction), or it may encode after calculating the difference from the predicted value set to 0. Also, when setting a certain period tm-n as a random access point, the encoding device 1460 may encode the extended three-dimensional data generation model corresponding to the period tm-n using intra-prediction, or it may encode after calculating the difference from the predicted value set to 0. As a result, the decoding device can start decoding the extended three-dimensional data generation model from the first extended three-dimensional data generation model (first frame) in data order, or from a random access point, thereby improving functionality during playback. 【0376】 Furthermore, a set of multiple extended three-dimensional data generation models (multiple frames) (hereinafter referred to as GOF (Group of Frame)) may be defined, and the first frame of the GOF may be encoded by intra-prediction. This allows the decoder to randomly access the first frame of the GOF, and decoding the first frame of the GOF can enhance functionality such as fast-forward playback. 【0377】 Furthermore, the encoding device 1460 may add permission information to the bitstream indicating whether or not to allow predictive referencing between GOFs. For example, if the bitstream contains permission information indicating that predictive referencing between GOFs is prohibited, the decoding device can determine that it can decode multiple GOFs in parallel. Also, for example, allowing predictive referencing between GOFs can improve encoding efficiency. 【0378】 The network model coding unit 1464 is similar to the network model coding unit 1453 of the coding device 1450, but differs in that it encodes the difference information d0-2 and d3-5 of the extended three-dimensional data generation models NNt0-2 and NNt3-5 input from the difference calculation unit 1463 and outputs a bitstream. 【0379】 Although the encoding device 1460 is described separately as consisting of a difference calculation unit 1463 and a network model encoding unit 1464, it is not necessarily limited to this configuration. For example, the difference calculation unit 1463 may be included within the network model encoding unit 1464. In other words, the network model encoding unit 1464 may perform the processing of the difference calculation unit 1463. 【0380】 Furthermore, the encoding device 1460 may add predictive encoding information to the bitstream indicating whether the extended three-dimensional data generation model was encoded using intra-prediction or predictive encoding using a reference extended three-dimensional data generation model (hereinafter referred to as inter-prediction). This allows the decoding device to appropriately determine whether to use intra-prediction or inter-prediction to decode the extended three-dimensional data generation model by decoding the predictive encoding information. 【0381】 Figure 58 shows a second example of the configuration of the decoding device of Embodiment 2. 【0382】 The decoding device 1465 comprises a network model decoding unit 1466, an addition unit 1467, a buffer unit 1468, and a rendering unit 1469. 【0383】 The network model decoding unit 1466 acquires a bitstream and, based on the acquired bitstream, decodes metadata such as the difference information d0-2, d3-5 of the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5 for the period t0 to t2, as well as time information. 【0384】The addition unit 1467 adds the difference information d0-2 and d3-5 of the extended three-dimensional data generation models NNt0-2 and NNt3-5 corresponding to the periods t0-t2 and t3-t5, which have been decoded by the network model decoding unit 1466, and the reference extended three-dimensional data generation models RNNt0-2 and RNNt3-5 obtained from the buffer unit 1468, for the corresponding periods to calculate the extended three-dimensional data generation models NNt0-2 and NNt3-5. In this way, the decoding device 1465 may predict and decode the extended three-dimensional data generation model of the previous time period as the reference extended three-dimensional data generation model, setting RNNt0-2 = 0 and RNNt3-5 = NNt0-2. 【0385】 In the second example, the decoding device 1465 is described with the addition unit 1467 and the network model decoding unit 1466 described separately. However, this is not necessarily the only option. For example, the addition unit 1467 may be included within the network model decoding unit 1466. In other words, the network model decoding unit 1466 may perform the processing of the addition unit 1467. 【0386】Furthermore, in cases where the buffer unit 1468 does not store a reference extended three-dimensional data generation model, for example, when decoding the first extended three-dimensional data generation model (first frame) in data order, the decoding device 1465 may decode without prediction, without the adder unit 1467 adding the difference information and the reference extended three-dimensional data generation model (hereinafter referred to as intra-prediction), or it may decode by adding the difference information to the predicted value set to 0. Also, when setting a certain period tm-n as a random access point, the decoding device 1465 may decode the extended three-dimensional data generation model corresponding to the period tm-n using intra-prediction, or it may decode by adding the difference information to the predicted value set to 0. In addition, if the bitstream contains predictive coding information indicating that the extended three-dimensional data generation model to be decoded has been encoded using intra-prediction, the extended three-dimensional data generation model may be decoded using intra-prediction, or it may be decoded by adding the difference information to the predicted value set to 0. This allows the decoding device 1465 to start decoding the extended three-dimensional data generation model from the first extended three-dimensional data generation model (first frame) in data order, a random access point, or an extended three-dimensional data generation model encoded by intra-prediction, thereby improving functionality during playback. 【0387】In the second example, the decoding device 1465 predictively decodes information related to the network model in the extended three-dimensional data generation model NNt3-5 from information related to the network model in the extended three-dimensional data generation model NNt0-2, but this is not necessarily limited to this. The decoding device 1465 may, for example, select a reference extended three-dimensional data generation model to be used for prediction from one or more extended three-dimensional data generation models stored in the buffer unit 1468, and predictively decode using the selected extended three-dimensional data generation model. In this case, the decoding device 1465 may decode information indicating the selected extended three-dimensional data generation model (reference extended three-dimensional data generation model information) from the bitstream. As a result, the decoding device 1465 can appropriately decode a bitstream with improved coding efficiency by decoding the reference extended three-dimensional data generation model information from the bitstream generated by the encoding device 1460, in which the optimal reference extended three-dimensional data generation model is selected from the viewpoint of coding efficiency. 【0388】 Furthermore, when the decoding device 1465 performs predictive decoding by referring to two or more extended three-dimensional data generation models stored in the buffer unit 1468, it may decode information indicating the two or more reference extended three-dimensional data generation models from the bitstream. This allows the decoding device 1465 to appropriately decode a bitstream in which the coding efficiency of predictive coding has been improved using two or more reference extended three-dimensional data generation models. 【0389】 The rendering unit 1469 is the same as the rendering unit 1427 of the decoding device 1425. The rendering unit 1469 does not necessarily need to generate moving images; it may also generate still images of a specified viewpoint at a specified time. 【0390】[Modification] The encoding device 1460 may also include information regarding the number of images that the extended three-dimensional data generation model can generate (i.e., the upper limit of the number of images) in the metadata added to the bitstream of the extended three-dimensional data generation model for the period tm-n. This allows the decoding device 1465 to know the number of images that the decoded extended three-dimensional data generation model can generate, and for example, to appropriately set the frame rate of the video to be generated, or to calculate the delay number of frames until the video is displayed. 【0391】 Furthermore, the encoding device 1460 may add information to the bitstream indicating the time unit (i.e., the smallest time unit) to which the extended three-dimensional data generation model can generate viewpoint images. For example, the encoding device 1460 may add to the bitstream information indicating whether viewpoint images can be generated down to a time unit of 1 msec, or down to a time unit of 1 μmsec. This allows the decoding device 1465 to know the time unit to which viewpoint information will be generated and to generate high-frame-rate video or three-dimensional data accordingly. 【0392】 Furthermore, the encoding device 1460 may add information about the training of the extended three-dimensional data generation model to the metadata attached to the bitstream of the extended three-dimensional data generation model for the period tm-n. For example, the encoding device 1460 can attach time information or viewpoint information of the images used for training as metadata to the bitstream, and the decoding device 1465 can decode that metadata to find out the time or viewpoint information at which the extended three-dimensional data generation model can generate viewpoint images in high quality, thereby enabling the creation of high-quality video. 【0393】 Furthermore, the duration of the viewpoint images that can be generated by the extended three-dimensional data generation model may be dynamically switched, as shown in Figure 59. Specifically, the duration may be switched according to the subject. Figure 59 is a diagram illustrating a method for generating moving images using the extended three-dimensional data generation model according to a modified example of Embodiment 2. 【0394】For example, in scenes with many stationary objects (scenes where the number of stationary objects among multiple subjects is greater than or equal to a first number, or scenes where the volume (area) occupied by stationary objects among multiple subjects is greater than or equal to a first amount), the encoding device 1460 can generate an extended three-dimensional data generation model that can generate viewpoint images with high image quality over a longer period by widening the time frame of the training data used during training (i.e., lengthening the time frame). For example, in scenes with many dynamic objects (scenes where the number of dynamic objects among multiple subjects is greater than or equal to a first number, or scenes where the volume (area) occupied by dynamic objects among multiple subjects is greater than or equal to a first amount), the encoding device 1460 can generate an extended three-dimensional data generation model that can generate viewpoint images with high image quality over a shorter period, even for dynamic objects, by narrowing the time frame of the training data used during training. 【0395】 The encoding device 1460 may generate an extended three-dimensional data generation model NNtm-n for a period tm-n using training data for a certain period tm-n (for example, a GOF (Group of Frames) representing a collection of frames within the period tm-n from the training images). In this case, the encoding device 1460 buffers the training image frames for the period tm-n to generate the extended three-dimensional data generation model and transmits it in a compressed manner, resulting in a transmission delay equal to the size of the GOF. The encoding device 1460 may also add information about this transmission delay, such as the number of GOF frames or the number of delayed frames, to the bitstream. This allows the decoding device 1465 to obtain the delay information by decoding the bitstream and to appropriately reproduce the video or three-dimensional data while taking the delay into consideration. 【0396】In the above embodiment, an example was shown in which a still image of an arbitrary viewpoint at a certain time or period is generated by a three-dimensional data generation model or an extended three-dimensional data generation model, but the invention is not necessarily limited to this. For example, as shown in Figure 60, the three-dimensional data generation model or the extended three-dimensional data generation model may generate (output) three-dimensional data such as point cloud data or mesh data at a certain time within a certain period. This allows the user to measure the dimensions of a target object or to view higher-resolution three-dimensional data. Figure 60 is a diagram illustrating a method for generating moving images using a three-dimensional data generation model according to a modification of Embodiment 2. 【0397】 Furthermore, the encoding devices 1420 and 1460 may include information in the bitstream metadata indicating recommended output formats among output formats such as images, point cloud data, and mesh data, depending on the use case. This allows the user to select a recommended output format according to the use case. 【0398】 Furthermore, the encoding devices 1420 and 1460 may add one or more viewpoint information to the metadata attached to the bitstream of the three-dimensional data generation model or the extended three-dimensional data generation model. For example, the encoding devices 1420 and 1460 may include in the metadata recommended viewpoint information for viewing the target object, or the user's viewpoint information when acquiring training data. As a result, the decoding devices 1425 and 1465 can generate video or three-dimensional data using viewpoint information selected from one or more viewpoint information attached to the bitstream, according to the user's intentions. 【0399】 Furthermore, a default viewpoint may be predetermined from one or more viewpoint pieces of information, and the decoding devices 1425 and 1465 may generate video or 3D data using the predetermined default viewpoint unless otherwise specified by the user. This allows the decoding devices 1425 and 1465 to automatically generate video or 3D data without user specification. 【0400】 One example of a use case utilizing this embodiment is as follows: 【0401】 First, the encoding devices 1420 and 1460 acquire data of a dynamic object to be transmitted to a remote location using a camera or sensor, and use that data as training data to generate a three-dimensional data generation model or an extended three-dimensional data generation model of that dynamic object. 【0402】 Next, the encoding devices 1420 and 1460 encode the three-dimensional data generation model or the extended three-dimensional data generation model using the encoding method described in this embodiment, and transmit the bitstream containing the encoding result to a remote location. 【0403】 The decoding devices 1425 and 1465 then decode the bitstream received at the remote location, generate a three-dimensional data generation model of the decoded dynamic object, or an extended three-dimensional data generation model, to generate a video or three-dimensional data from an arbitrary viewpoint, and utilize the generated three-dimensional data for viewing or measurement purposes. In this way, this embodiment may be applied to a wide range of use cases where information about a certain space is shared with a remote location. 【0404】 Furthermore, if there is one or more objects in a given space that you want to transmit to a remote location, you may apply the three-dimensional data generation modeling, encoding and transmission, decoding, and rendering processes shown in this embodiment separately to each individual object. For example, you may separately model the dynamic foreground objects and the static background objects in a given space using three-dimensional data generation models and then encode and transmit them. This allows you to apply the most suitable three-dimensional data generation modeling or encoding method to each individual object, thereby improving encoding efficiency. 【0405】 Furthermore, the process is not necessarily limited to this; the three-dimensional data generation modeling, encoding and transmission, decoding, and rendering processes described in this embodiment may be applied separately to one or more objects as a single object. This allows for the transmission of one or more objects to a remote location while reducing the amount of processing required. 【0406】Figure 61 is a diagram showing an example of the configuration of the encoding device in Embodiment 2. Figure 62 is a flowchart showing an example of the encoding method by the encoding device in Embodiment 2. 【0407】 The encoding device 1470 comprises a circuit 1471 and a memory 1472. The encoding device 1470 is a device that implements the encoding devices 1420 and 1460. 【0408】 Circuit 1471 performs the following operations. 【0409】 Circuit 1471 acquires a first three-dimensional data generation model (e.g., three-dimensional data generation model NNt0) corresponding to a first time (e.g., time t0) and a second three-dimensional data generation model (e.g., three-dimensional data generation model NNt1) corresponding to a second time (e.g., time t1) (S1401). Circuit 1471 generates a bitstream by encoding the acquired first three-dimensional data generation model and the second three-dimensional data generation model (S1402). When viewpoint information including the viewpoint and line of sight is input to each of the first three-dimensional data generation model and the second three-dimensional data generation model, they output a two-dimensional image of the subject as seen from the viewpoint and line of sight. 【0410】 According to this, a bitstream can be generated that includes a first three-dimensional data generation model that obtains a two-dimensional image corresponding to a first time point in accordance with arbitrary viewpoint information, and a second three-dimensional data generation model that obtains a two-dimensional image corresponding to a second time point. Therefore, a compressed bitstream of data that obtains moving images from any viewpoint can be generated. Thus, the storage capacity required for storing data that obtains moving images from any viewpoint, or the network bandwidth required for transmitting such data, can be reduced. 【0411】 For example, the first three-dimensional data generation model and the second three-dimensional data generation model are learning models using neural networks. 【0412】 For example, the bitstream includes first time information indicating the first time and second time information indicating the second time. 【0413】For example, the bitstream includes a first frame number corresponding to the first time and a second frame number corresponding to the second time. 【0414】 For example, the bitstream includes frame rate information relating to the frame rates of a plurality of training images used to generate the first three-dimensional data generation model and the second three-dimensional data generation model. The plurality of training images are two-dimensional images obtained by taking images at multiple different timings. 【0415】 For example, the bitstream includes viewpoint information, including viewpoints and line-of-sight directions, of a plurality of training images used to generate the first three-dimensional data generation model and the second three-dimensional data generation model. 【0416】 For example, the multiple learning images are two-dimensional images obtained by photographing the subject from different viewpoints and viewing directions. The viewpoint information includes the different viewpoints and viewing directions. 【0417】 For example, in encoding the second three-dimensional data generation model, circuit 1471 calculates difference information indicating the difference between the first three-dimensional data generation model and the second three-dimensional data generation model. The bitstream includes the difference information. 【0418】 For example, the difference includes the difference in weight parameters associated with nodes included in the first three-dimensional data generation model and the second three-dimensional data generation model. 【0419】 For example, the bitstream includes reference information indicating that the difference information was calculated by referring to the first three-dimensional data generation model. 【0420】 For example, the first time point corresponds to a random access point. The first three-dimensional data generation model is encoded with intra-prediction, or with inter-prediction where the predicted value is 0. 【0421】For example, the first three-dimensional data generation model and the second three-dimensional data generation model are included in one of a plurality of sets. The first three-dimensional data generation model is the first in data order among the plurality of three-dimensional data generation models included in the one set. 【0422】 For example, the bitstream includes permission information indicating whether or not each of the encodings of the plurality of three-dimensional data generation models is permitted to reference three-dimensional data generation models included in other sets. 【0423】 For example, the first three-dimensional data generation model (e.g., extended three-dimensional data generation model NNt0-2) corresponds to a first period (e.g., period t0-t2) including the first time (e.g., time t0). The second three-dimensional data generation model (e.g., extended three-dimensional data generation model NNt3-5) corresponds to a second period (e.g., period t3-t5) including the second time (e.g., time t3). 【0424】 For example, the multiple first training images used to generate the first three-dimensional data generation model are two-dimensional images obtained by taking images at multiple different timings during the first period. 【0425】 For example, when the first three-dimensional data generation model receives a time period included in the first period as input, it outputs a two-dimensional image of the subject at the input time. 【0426】 For example, the bitstream includes numerical information indicating the upper limit of the number of images that the first three-dimensional data generation model can generate. 【0427】 For example, the bitstream includes first information relating to the plurality of first training images. The first information includes a plurality of viewpoints and a plurality of viewing directions, and a plurality of different timings, corresponding to the plurality of first training images. 【0428】 For example, the first period or the second period is dynamically determined according to the subject. 【0429】For example, circuit 1471 stores the generated first three-dimensional data generation model in memory 1472. Circuit 1471 generates the second three-dimensional data generation model based on the first three-dimensional data generation model stored in memory 1472. 【0430】 For example, circuit 1471 stores the generated first three-dimensional data generation model and the second three-dimensional data generation model in memory 1472. Circuit 1471 generates an initial model based on the first three-dimensional data generation model and the second three-dimensional data generation model stored in memory 1472. Circuit 1471 generates a third three-dimensional data generation model (e.g., three-dimensional data generation model NNt2) corresponding to a third time (e.g., time t2) based on the initial model. 【0431】 Figure 63 is a diagram showing an example of the configuration of the decoding device in Embodiment 2. Figure 64 is a flowchart showing an example of the decoding method by the decoding device in Embodiment 2. 【0432】 The decoding device 1480 comprises a circuit 1481 and a memory 1482. The decoding device 1480 is a device that implements the decoding devices 1425 and 1465. 【0433】 Circuit 1481 performs the following operations. 【0434】 Circuit 1481 acquires a bitstream (S1411). Circuit 1481 decodes from the bitstream a first three-dimensional data generation model (e.g., three-dimensional data generation model NNt0) corresponding to a first time (e.g., time t0) and a second three-dimensional data generation model (e.g., three-dimensional data generation model NNt1) corresponding to a second time (e.g., time t1) (S1412). When viewpoint information including the viewpoint and line of sight is input to the first three-dimensional data generation model and the second three-dimensional data generation model, each outputs a two-dimensional image of the subject as seen from the viewpoint and line of sight. 【0435】According to this, based on a compressed bitstream of data from which video footage from an arbitrary viewpoint is obtained, it is possible to decode a first three-dimensional data generation model that obtains a two-dimensional image corresponding to a first time point according to arbitrary viewpoint information, and a second three-dimensional data generation model that obtains a two-dimensional image corresponding to a second time point. Therefore, it is possible to appropriately decode a bitstream that reduces the storage capacity required for storing data from which video footage from an arbitrary viewpoint is obtained, or the network bandwidth required for transmitting such data. 【0436】 For example, the first three-dimensional data generation model and the second three-dimensional data generation model are learning models using neural networks. 【0437】 For example, the bitstream includes first time information indicating the first time and second time information indicating the second time. 【0438】 For example, the bitstream includes a first frame number corresponding to the first time and a second frame number corresponding to the second time. 【0439】 For example, the bitstream includes frame rate information relating to the frame rates of a plurality of training images used to generate the first three-dimensional data generation model and the second three-dimensional data generation model. The plurality of training images are two-dimensional images obtained by taking images at multiple different timings. 【0440】 For example, the bitstream includes viewpoint information, including viewpoints and line-of-sight directions, of a plurality of training images used to generate the first three-dimensional data generation model and the second three-dimensional data generation model. 【0441】 For example, the multiple learning images are two-dimensional images obtained by photographing the subject from different viewpoints and viewing directions. The viewpoint information includes the different viewpoints and viewing directions. 【0442】 For example, the bitstream includes difference information indicating the difference between the first three-dimensional data generation model and the second three-dimensional data generation model. 【0443】For example, the difference includes a difference in weight parameters associated with nodes included in the first three-dimensional data generation model and the second three-dimensional data generation model. 【0444】 For example, the bitstream includes reference information indicating that the difference information is calculated by referring to the first three-dimensional data generation model. 【0445】 For example, the first time corresponds to a random access point. The first three-dimensional data generation model is encoded by intra prediction or inter prediction with a predicted value of 0. 【0446】 For example, the first three-dimensional data generation model and the second three-dimensional data generation model are included in one of a plurality of sets. The first three-dimensional data generation model is the first in data order among the plurality of three-dimensional data generation models included in the one set. 【0447】 For example, the bitstream includes permission information indicating whether to permit reference to a three-dimensional data generation model included in another set in the encoding of each of the plurality of three-dimensional data generation models. 【0448】 For example, the first three-dimensional data generation model (for example, the extended three-dimensional data generation model NNt0-2) corresponds to a first period (for example, period t0 to t2) including the first time (for example, time t0). The second three-dimensional data generation model (for example, the extended three-dimensional data generation model NNt3-5) corresponds to a second period (for example, period t3 to t5) including the second time (for example, time t3). 【0449】 For example, the plurality of first learning images used for generating the first three-dimensional data generation model are two-dimensional images obtained by photographing at a plurality of different timings during the first period. 【0450】 For example, when a time included in the first period is input to the first three-dimensional data generation model, the first three-dimensional data generation model outputs a two-dimensional image of the subject at the input time. 【0451】For example, the bitstream includes numerical information indicating the maximum number of images that can be generated by the first three-dimensional data generation model. 【0452】 For example, the bitstream includes first information regarding the plurality of first learning images. The first information includes a plurality of viewpoints and a plurality of viewing directions corresponding to the plurality of first learning images, and a plurality of different timings. 【0453】 For example, the first period or the second period is dynamically determined according to the subject. 【0454】 For example, circuit 1471 stores the generated first three-dimensional data generation model in memory 1472. Circuit 1471 generates the second three-dimensional data generation model based on the first three-dimensional data generation model stored in memory 1472. 【0455】 For example, circuit 1471 stores the generated first three-dimensional data generation model and the second three-dimensional data generation model in memory 1472. Circuit 1471 generates an initial model based on the first three-dimensional data generation model and the second three-dimensional data generation model stored in memory 1472. Circuit 1471 generates a third three-dimensional data generation model (for example, three-dimensional data generation model NNt2) corresponding to the third time (for example, time t2) based on the initial model. 【0456】 (Other) In an example of an embodiment, a method for generating a moving image as viewed from a predetermined viewpoint is disclosed. The generation of the moving image is realized, for example, by a device including a memory and a circuit connected to the memory. An example of this device stores a three-dimensional data generation model (Neural Network) generated by learning in the memory, the circuit acquires the three-dimensional data generation model stored in the memory, and generates a moving image based on the three-dimensional data generation model. Note that the three-dimensional data generation model or the extended three-dimensional data generation model may not be stored in the memory. For example, the encoding devices 1420 and 1460 may acquire designation information for designating a URL on the network, and acquire the three-dimensional data generation model based on the designation information. 【0457】 Figure 65 shows an example of the configuration of an encoding device. 【0458】 The encoding device 1490 includes a processor 1491 and a memory 1492. 【0459】 The processor 1491 is a circuit that performs information processing and is a circuit that can access the memory 1492. For example, the processor 1491 is a dedicated or general-purpose electronic circuit that encodes a three-dimensional data generation model. The processor 1491 may be a processor such as a CPU. Alternatively, the processor 1491 may be a collection of multiple electronic circuits. Furthermore, for example, the processor 1491 may play the role of multiple components of the encoding device described above, excluding the component for storing information. 【0460】 Memory 1492 is a dedicated or general-purpose memory in which information for the processor 1491 to encode a three-dimensional data generation model is stored. Memory 1492 may be an electronic circuit and may be connected to the processor 1491. Memory 1492 may also be included in the processor 1491. Memory 1492 may also be a collection of multiple electronic circuits. Memory 1492 may also be a magnetic disk or an optical disk, or may be described as storage or a recording medium. Memory 1492 may also be a non-volatile memory or a volatile memory. 【0461】 For example, memory 1492 may store the three-dimensional data generation model to be encoded, or it may store a stream corresponding to the encoded three-dimensional data generation model. Furthermore, memory 1492 may store a program for the processor 1491 to encode the three-dimensional data generation model. 【0462】 Furthermore, in the encoding device 1490, not all of the above-mentioned components of the encoding device are to be implemented, nor are all of the above-mentioned processes to be performed. Some of the components may be included in other devices, and some of the above-mentioned processes may be performed by other devices. 【0463】 Figure 66 shows an example of the configuration of a decoding device. 【0464】 The decoding device 1495 includes a processor 1496 and a memory 1497. 【0465】 The processor 1496 is a circuit that performs information processing and is a circuit that can access the memory 1497. For example, the processor 1496 is a dedicated or general-purpose electronic circuit for decoding the stream. The processor 1496 may be a processor such as a CPU. Alternatively, the processor 1496 may be a collection of multiple electronic circuits. Furthermore, for example, the processor 1496 may play the role of multiple components of the decoding device described above, excluding the component for storing information. 【0466】 Memory 1497 is a dedicated or general-purpose memory in which information for the processor 1496 to decode the stream is stored. Memory 1497 may be an electronic circuit and may be connected to the processor 1496. Alternatively, memory 1497 may be included in the processor 1496. Furthermore, memory 1497 may be a collection of multiple electronic circuits. Also, memory 1497 may be a magnetic disk or an optical disk, or may be described as storage or a recording medium. Additionally, memory 1497 may be non-volatile memory or volatile memory. 【0467】 For example, memory 1497 may store a three-dimensional data generation model or a stream. Memory 1497 may also store a program for the processor 1496 to decode the stream. 【0468】 Furthermore, in the decoding device 1495, not all of the aforementioned components of the decoding device are to be implemented, nor are all of the above-mentioned processes to be performed. Some of the components may be included in other devices, and some of the above-mentioned processes may be performed by other devices. 【0469】(Embodiment 3) Another method for generating a still image of a subject (three-dimensional object) viewed from an arbitrary viewpoint in a stationary space, using a three-dimensional data generation model which is a learned model obtained based on learning, will be described. 【0470】 In the above embodiment, an example was shown of a method for generating a moving image by creating an extended three-dimensional data generation model that can generate images from any viewpoint within a certain time range (period), thereby generating still images viewed from any viewpoint at any time, and arranging them in chronological order, as well as a method for encoding or decoding the extended three-dimensional data generation model. In this embodiment, when acquiring the extended three-dimensional data generation model by training, the latent code Zt at time t is also trained to generate a more accurate extended three-dimensional data generation model, and an example of encoding the extended three-dimensional data generation model and the latent code is shown. 【0471】 Figure 67 is a diagram illustrating the training process of the three-dimensional data generation model in Embodiment 3. Figure 68 is a diagram illustrating the process of generating a still image of a subject viewed from an arbitrary viewpoint using the three-dimensional data generation model in Embodiment 3. 【0472】 Similar to the above embodiment, the information processing device can generate a still image viewed from an arbitrary viewpoint in a static space by acquiring a three-dimensional data generation model through learning. For example, there are three-dimensional data generation models generated using methods such as NeRF (Neural Radiance Fields). 【0473】During training, the information processing device acquires training data that includes, for example, a viewpoint A image (ground truth value) acquired from an arbitrary viewpoint A at an arbitrary time t, viewpoint information of viewpoint A at the time the image was acquired (such as camera orientation), and a latent code Zt corresponding to time t. The viewpoint information may include viewpoint A and the direction of line of sight from viewpoint A. The information processing device uses, for example, an evaluation function 1502 to input the viewpoint information from the above training data into the extended three-dimensional data generation model 1501, and optimizes the network parameters and latent codes included in the extended three-dimensional data generation model 1501 so that the difference between the generated image of viewpoint A output from the extended three-dimensional data generation model 1501 and the viewpoint A image which is the input image of viewpoint A at time t is minimized. In other words, the information processing device learns the three-dimensional data generation model at time t using the latent code Zt set for an arbitrary time t. 【0474】 The information processing device can obtain a more accurate three-dimensional data generation model by performing this learning process using one or more time points and one or more training data points corresponding to one or more viewpoints. The learning process is performed for the training data corresponding to each of the multiple viewpoints. In other words, the same process as the learning process for viewpoint A is performed for each viewpoint. 【0475】 The latent code may be, for example, a one-dimensional value, or a multidimensional vector with n-dimensional values (where n is an integer greater than or equal to 2). For example, if the latent code has 256 dimensions, the latent code Zt at time t can be represented as vector information consisting of a total of 256 values from value0t to value255t, such as Zt = (value0t, value1t, ..., value255t). By representing the latent code as a multidimensional vector in this way, the characteristics of each time point can be better expressed. 【0476】Furthermore, the initial values of the latent codes used for learning may be set to 0 or random values. If the latent codes are multidimensional vectors, all components may be set to 0, or all components may be set to random values. This reduces the processing required to calculate the initial values. 【0477】 Furthermore, the initial values of the latent codes used for learning are not limited to those described above; they may also be set to the values of the already generated extended 3D data generation model and its associated latent codes. In other words, the initial values of the latent codes may be set to the values of the latent codes corresponding to a different time than the time to which the latent code corresponds, or they may be set to the values of the learned latent codes. Specifically, when learning the extended 3D data generation model and latent codes for times t5 to t9, if the extended 3D data generation model and latent codes for times t0 to t4 have already been learned, the model parameters of the learned extended 3D data generation model for times t0 to t4 may be used as the initial values for learning the extended 3D data generation model for times t5 to t9, or the learned latent codes for times t0 to t4 may be used as the initial values for learning the latent codes for times t5 to t9. This allows for the generation of extended 3D data generation models and latent codes with high accuracy. 【0478】 Furthermore, if there is little movement or little change in objects between times t0-t4 and t5-t9, the values of the already generated extended 3D data generation model and its associated latent code may be set as initial values. This allows for the high-precision generation of the extended 3D data generation model and latent code in scenes with little movement. 【0479】 Next, during generation, if, for example, viewpoint information for viewpoint B and a latent code Zt0 corresponding to time t0 are input to the trained extended three-dimensional data generation model 1503, the generated image of viewpoint B at time t0 is output. If viewpoint information for viewpoint Z, which is different from viewpoint B, and a latent code Zt4 corresponding to time t4 are input, the generated image of viewpoint Z at time t4 is output. The viewpoint information for viewpoint B may include viewpoint B and the direction of line of sight from viewpoint B. The viewpoint information for viewpoint Z may include viewpoint Z and the direction of line of sight from viewpoint Z. 【0480】 In this way, the information processing device can generate still images from an arbitrary viewpoint within a certain time range by learning the extended three-dimensional data generation model 1503 and latent codes. The information processing device can then generate multiple still images corresponding to multiple time points and generate a moving image by arranging them in chronological order. 【0481】 Figure 68 shows an example of an extended three-dimensional data generation model that generates an image of a viewpoint at a given time when viewpoint information and a latent code at that time are input. However, the model is not limited to this example, and the data format output from the extended three-dimensional data generation model can be in any form. For example, the extended three-dimensional data generation model could be a network model (network) that outputs a three-dimensional model of the target space at a given time obtained through learning, in the form of point cloud data or mesh data. This allows the user to view the target space at a given time in three dimensions using point cloud data or mesh data, or to measure the dimensions of objects in the target space at a given time, which are output as three-dimensional data, using the point cloud data or mesh data. In this embodiment, the viewpoint image, generated image, and still image may represent two-dimensional images. 【0482】 [Example 1] Figure 69 is a diagram illustrating a method for generating moving images using the extended three-dimensional data generation model of Example 1 in Embodiment 3. In this embodiment, an example of the configuration and method of a device for encoding or decoding the extended three-dimensional data generation models NNt0-2 and NNt3-5, which are generated corresponding to the periods t0-t2 and t3-t5, respectively, from time t0-t5, and the latent codes Zt0-Zt2 and Zt3-Zt5, will be described. However, the invention is not limited to this, and the device and method may be applied to encoding or decoding the extended three-dimensional data generation model for any period. 【0483】In this embodiment, a method for generating a moving image of a target object (subject) viewed from an arbitrary viewpoint using a three-dimensional data generation model is shown. In this method, for example, as shown in FIG. 69, by obtaining a three-dimensional data generation model (hereinafter referred to as an extended three-dimensional data generation model) that can generate images of arbitrary viewpoints within a certain time range (period), a still image of the target object viewed from an arbitrary viewpoint at an arbitrary time in each period can be generated, and a moving image can be generated by arranging the generated plurality of still images in chronological order. The extended three-dimensional data generation model is, for example, a three-dimensional data generation model generated by a method such as NeRF. 【0484】 More specifically, when generating a moving image from time t0 to t5, an extended three-dimensional data generation model NNt0-2 and latent codes Zt0 to Zt2 that can represent the period from time t0 to t2, and an extended three-dimensional data generation model NNt3-5 and latent codes Zt3 to Zt5 that can represent the period from time t3 to t5 are generated by learning. For the generated extended three-dimensional data generation model NNt0-2 and extended three-dimensional data generation model NNt3-5, the viewpoint information (such as camera pose) of viewpoint A where the moving image is to be generated and the latent code Zt0-5 corresponding to time t0 to t5 are inputted. Thereby, the generated images of viewpoint A from time t0 to t5 are outputted by the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5, and a moving image of the target object viewed from viewpoint A from time t0 to t5 can be generated by connecting them temporally. 【0485】 In this embodiment, an example of encoding and decoding the extended three-dimensional data generation model NNt0-2 and latent codes Zt0 to Zt2, and the extended three-dimensional data generation model NNt3-5 and latent codes Zt3 to Zt5 generated for each of the periods t0 to t2 and t3 to t5 from time t0 to t5 is used for explanation, but it is not necessarily limited to this, and it may be applied to encoding and decoding of the extended three-dimensional data generation model in an arbitrary period. 【0486】However, in this case, it is necessary to maintain an extended three-dimensional data generation model and latent code corresponding to each period (each time zone), which requires a huge amount of storage capacity to store the data of the extended three-dimensional data generation model in storage, or a huge amount of network bandwidth to transmit the data of multiple three-dimensional data generation models over a network. Therefore, the data size may be reduced by data encoding the extended three-dimensional data generation model corresponding to each period, for example, using NNC (Neural Network Coding) of the MPEG (Moving Picture Experts Group) standard. Alternatively, the data size may be reduced by arithmetic encoding of the latent code. This disclosure describes a method for further efficiently compressing this data. 【0487】 Furthermore, with the above configuration, the information processing device may generate any viewpoint image at any time within the period of time t0-t5. For example, when acquiring the extended three-dimensional data generation model NNt0-2 and latent codes Zt0-Zt2, the information processing device generates the extended three-dimensional data generation model NNt0-2 and latent codes Zt0-Zt2 using machine learning based on multiple viewpoint images taken at times t0, t1, and t2, as well as camera poses corresponding to the multiple viewpoints and latent codes Zt0-Zt2, as training data. When generating a video of viewpoint A, the information processing device may generate not only viewpoint image A at times t0, t1, and t2, but also, for example, images of arbitrary viewpoints at times t0.5 and t1.5 between times t0, t1, and t2. Time t0.5 is between times t0 and t1, and time t1.5 is between times t1 and t2. 【0488】 As a result, the information processing device can generate images from an arbitrary viewpoint corresponding not only to the time corresponding to the image used during training, but also to a time shifted from the time corresponding to the image used during training, thereby enabling the generation of high-frame-rate video from viewpoint A. 【0489】In this case, the information processing device may calculate the latent codes Zt0.5, Zt1.5, and Zt2.5 corresponding to times t0.5, t1.5, and t2.5 using the learned latent codes. For example, the information processing device may calculate the latent code Zt0.5 using the latent codes Zt0 and Zt1. Specifically, the information processing device may calculate Zt0.5 based on the average value of Zt0 and Zt1. In this way, by calculating the latent code corresponding to a certain time from the already calculated latent codes, the information processing device does not need to store the latent codes for all times, and the data size can be reduced. Furthermore, the information processing device may learn not only the learning data for times t0, t1, and t2, but also, for example, the learning data for time t3, as training data for the extended three-dimensional data generation model NNt0-2 and the latent codes Zt0 to Zt2. This enables the information processing device to generate viewpoint images from time t2 onward, for example, the viewpoint image at time 2.5, with high accuracy. 【0490】 Furthermore, the information processing device may learn not only the learning data corresponding to times t3, t4, and t5, but also, for example, the learning data corresponding to times t2 and t6, as training data for the extended three-dimensional data generation model NNt3-5 and latent codes Zt3 to Zt5. This allows the information processing device to generate images from any viewpoint before time t3, or images from any viewpoint after time t5, with high accuracy. 【0491】 Furthermore, as a switching point for the extended three-dimensional data generation model, for example, in the above example, when generating a viewpoint image at time 2.5, which is between time t2 and t3 when the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5 switch, the information processing device may generate viewpoint images at time t2.5 for both the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5, and then generate the average image of the two generated viewpoint images at time t2.5 as the viewpoint image at time t2.5. This makes it possible to generate a viewpoint image at time t2.5 with high accuracy. 【0492】Furthermore, the information processing device may calculate the latent code Zt2.5 using the latent codes Zt2 and Zt3. Specifically, the information processing device may calculate the latent code Zt2.5 based on the average value of the latent codes Zt2 and Zt3. In this way, by calculating the latent code corresponding to a given time from already calculated latent codes, the information processing device does not need to retain the latent codes for all time points, thereby reducing the data size. 【0493】 In this way, the information processing device can generate an image of the target object as seen from a specified viewpoint at the time corresponding to the specified latent code, by specifying a latent code corresponding to a time within the period covered by the extended three-dimensional data generation model, along with viewpoint information, to the extended three-dimensional data generation model. 【0494】 Figure 70 shows a first example of the configuration of the encoding device of Example 1 in Embodiment 3. 【0495】 The encoding device 1510 comprises an extended three-dimensional data generation model acquisition unit 1511, a buffer unit 1512, a network model encoding unit 1513, and a latent code encoding unit 1514. 【0496】The extended three-dimensional data generation model acquisition unit 1511 acquires learning data for each period t0-t2 and t3-t5 from time t0-t5, and uses the acquired learning data for each period to generate the extended three-dimensional data generation model NNt0-2 and latent codes Zt0-Zt2 for period t0-t2, and the extended three-dimensional data generation model NNt3-5 and latent codes Zt3-Zt5 for period t3-t5 through learning. The learning data includes multiple viewpoint images obtained by photographing the target object from one or more viewpoint positions in one or more line of sight directions for each time t0-t5, one or more viewpoint information indicating one or more viewpoint positions and one or more line of sight directions corresponding to the multiple viewpoint images, and latent codes Zt0-Zt5 corresponding to each time t0-t5. The one or more viewpoint information may be the position and orientation of the camera at the time each of the multiple viewpoint images was taken. The learning data is not limited to this and may further include information obtained from other sensors, etc. For example, the training data may include point cloud data and depth images acquired at each time step using a LiDAR or TOF sensor. This can improve the accuracy of the extended three-dimensional data generation model obtained through training. 【0497】 The buffer unit 1512 stores the extended three-dimensional data generation model and latent codes Ztm to Ztn for the period tm-n, from time tm (where m is an integer) to time tn (where n is an integer greater than m), which are generated by the extended three-dimensional data generation model acquisition unit 1511. The buffer unit 1512 is implemented by a storage device such as memory. The extended three-dimensional data generation model and latent codes Ztm to Ztn for the period tm-n, stored in the buffer unit 1512, may be used, for example, as an initial model or initial latent code when the extended three-dimensional data generation model acquisition unit 1511 acquires (generates) an extended three-dimensional data generation model for a period after tm-n through learning. This makes it possible to improve the accuracy of the extended three-dimensional data generation model for a period after tm-n while shortening the learning time. 【0498】The buffer unit 1512 may store multiple extended three-dimensional data generation models and multiple latent codes corresponding to multiple periods. For example, based on the multiple extended three-dimensional data generation models and multiple latent codes stored in the buffer unit 1512, one initial model and one initial latent code may be generated by processing such as averaging. The extended three-dimensional data generation model acquisition unit 1511 can acquire a highly accurate extended three-dimensional data generation model by learning the extended three-dimensional data generation model for periods after period tm-n using this initial model and initial latent code. 【0499】 Furthermore, if the extended three-dimensional data generation model acquisition unit 1511 does not refer to past extended three-dimensional data generation models and latent codes during training, the encoding device 1510 does not need to have a buffer unit 1512. This reduces the amount of memory used as the buffer unit 1512. 【0500】 The network model encoding unit 1513 encodes the extended three-dimensional data generation models NNt0-2 and NNt3-5 acquired by the extended three-dimensional data generation model acquisition unit 1511 using the method described in the above embodiment, and outputs a bitstream. 【0501】 Furthermore, the data size may be reduced by encoding the input network model using, for example, the NNC of the MPEG standard as the network model encoding method. In other words, the network model encoding unit 1513 encodes the extended three-dimensional data generation models NNt0-2 and NNt3-5 using NNC and adds the encoding result to the bitstream. To put it another way, the network model encoding unit 1513 generates encoded data as an encoding result and generates a bitstream containing the encoded data. 【0502】 Furthermore, the data size may be reduced by encoding the input network model using, for example, the NNC (Neural Network Coding) standard of the MPEG specification as the network model encoding method. 【0503】 Furthermore, the network model encoding scheme may support the encoding of other network models, not just the extended three-dimensional data generation model. This eliminates the need to prepare a separate network model encoding unit 1513 for each network model to be encoded, thereby reducing memory size and circuit complexity. Note that the network model is sometimes simply referred to as a network. 【0504】 The latent code encoding unit 1514 encodes the latent codes Zt0 to Zt5 output from the extended three-dimensional data generation model acquisition unit 1511 and adds them to the bitstream. The latent code encoding unit 1514 may, for example, if the latent code is Zt = (value0t, value1t, ..., value255t), binarize each of the values from value0t to value255t, assign a context to each bit after binarization, and perform arithmetic encoding. This improves encoding efficiency when the values of value0t to value255t are close together. 【0505】 Furthermore, if there is no correlation between the values of value0t to value255t, the latent code encoding unit 1514 may perform arithmetic encoding using bypass mode without assigning a context to each bit after binarization. This reduces the number of contexts. 【0506】 Furthermore, the latent code encoding unit 1514 may assign a context to a portion of each bit after binarization and perform arithmetic encoding on the other portion using bypass mode. This can reduce the number of contexts while increasing encoding efficiency. 【0507】The latent code encoding method is not limited to this, and any encoding method may be used. For example, the latent code encoding unit 1514 may map the latent code to pixels of a two-dimensional image and encode it using image encoding. As a more specific example, if the image codec is a codec that supports YUV420, the latent code encoding unit 1514 may assign values 0t to 255t of the latent code Zt = (value0t, value1t, ..., value255t) to 256 pixels of the Y component, and set a certain value, such as value 0 or half of the maximum possible value, to the remaining YUV pixel values, and then image encode the YUV420 image. This improves encoding efficiency by applying predictive encoding or arithmetic encoding provided by image encoding. 【0508】 In this case, the latent code encoding unit 1514 may add the dimension number of the latent code to the bitstream. This allows the decoding device to identify which part of the Y component is the latent code value and to correctly decode the latent code. Similarly, when using an image codec compatible with YUV444 or YUV400, the latent code encoding unit 1514 may map the latent code to a part of the Y component, U component, or V component and encode it as an image. This allows the latent code to be encoded as an image regardless of the image codec used. 【0509】 Furthermore, the latent code encoding unit 1514 may use lossless encoding when encoding the latent code. This allows the decoding device to recover the latent code with the same value as the encoding device. 【0510】 Furthermore, the latent code encoding unit 1514 may add information to the bitstream indicating whether the latent code was encoded using arithmetic coding or whether it was encoded using an image codec. This allows the decoding device to switch between arithmetic decoding and image decoding of the latent code depending on the value added to the bitstream, thereby enabling proper decoding of the bitstream. 【0511】Furthermore, the latent code coding unit 1514 may apply predictive coding to the coding of the latent code. For example, if the latent code is a multidimensional vector such as Zt = (value0t, value1t, ..., value255t), predictive coding may be applied between the components of the vector. More specifically, instead of directly coding the value of value1t, the latent code coding unit 1514 may code the difference value obtained by subtracting value0t from value1t. Similarly, instead of directly coding the value of value2t, the latent code coding unit 1514 may code the difference value obtained by subtracting value1t from value2t. In other words, when coding the value of a certain component A, the latent code coding unit 1514 may calculate a predicted value P of component A using the values of one or more components coded or decoded before component A, and then code the value obtained by subtracting the predicted value P from the value of component A. This improves coding efficiency when there is a high correlation between components in the latent code. 【0512】 The predicted value P may be calculated from the average, minimum, or maximum value of one or more components encoded or decoded before component A. This can improve the accuracy of the predicted value P. In this way, the latent codes for each time period may be sequentially encoded by the latent code encoding unit 1514, and the amount of code may be reduced by adding each encoding result to the encoded bitstream of the network model. 【0513】 In this case, the encoding device 1510 may add time information as metadata to the bitstream to indicate which time period the encoded extended three-dimensional data generation model or latent code corresponds to. This allows the decoding device to decode and refer to the metadata to determine which time period the decoded extended three-dimensional data generation model or latent code corresponds to, and to appropriately generate moving images of the target object from any viewpoint. 【0514】 Figure 71 shows a first example of the configuration of the decoding device of Example 1 in Embodiment 3. 【0515】The decoding device 1515 comprises a network model decoding unit 1516, a latent code decoding unit 1517, and a rendering unit 1518. 【0516】 The network model decoding unit 1516 acquires a bitstream and, based on the acquired bitstream, decodes an extended three-dimensional data generation model NNt0-2 for period t0 to t2, an extended three-dimensional data generation model NNt3-5 for period t3 to t5, and metadata such as time information corresponding to these extended three-dimensional data generation models NNt0-2 and NNt3-5. 【0517】 The latent code decoding unit 1517 decodes information related to the latent code from the input bitstream and decodes the latent codes Zt0 to Zt5 for periods t0 to t2 and t3 to t5. The latent code decoding method may be switched to match the encoding method of the encoding device 1510. For example, if the encoding device 1510 uses arithmetic encoding for the latent code, the decoding device 1515 may decode the latent code using arithmetic decoding. If the encoding device 1510 uses image encoding for the latent code, the decoding device 1515 may decode the latent code using image decoding. 【0518】 Furthermore, the latent code decoding unit 1517 may switch the latent code decoding method using the information added to the bitstream. This allows the decoding device 1515 to properly decode the bitstream. 【0519】 Furthermore, if predictive coding is applied to the encoding of the latent code in the encoding device 1510, the decoding device 1515 may also apply predictive decoding. For example, if the latent code is a multidimensional vector such as Zt = (value0t, value1t, ..., value255t), the decoding device 1515 may apply predictive decoding between the components of the vector. More specifically, for example, if the difference value obtained by subtracting value0t from value1t is arithmetic coded in the encoding device 1510, the decoding device 1515 may arithmetic decode the difference value and decode value1t by adding the decoded value0t. 【0520】Similarly, if the difference value obtained by subtracting value1t from value2t is arithmetic encoded by the encoding device 1510, the decoding device 1515 may arithmetic decode the difference value and decode value2t by adding the decoded value1t. In other words, when the encoding device 1510 encodes the value of a certain component A, it may calculate a predicted value P of component A using the values of one or more components encoded or decoded before component A, and arithmetic encode the difference value obtained by subtracting the predicted value P from the value of component A, the decoding device 1515 may arithmetic decode the difference value, calculate the predicted value P in the same way as the encoding device 1510, and decode the value of component A by adding the decoded difference value and the predicted value P. This allows the decoding device 1515 to appropriately decode a bitstream with improved encoding efficiency when there is a high correlation between components in the latent code. The predicted value P may be calculated from the average, minimum, or maximum value of one or more components decoded before component A. This can improve the accuracy of the predicted value P. 【0521】The rendering unit 1518 uses the extended three-dimensional data generation models NNt0-2 and NNt3-5 decoded by the network model decoding unit 1516, and metadata such as time information, to generate a video of viewpoint A at time t, based on the latent code Zt at time t specified by the user or system, and viewpoint information of viewpoint A. Specifically, the rendering unit 1518 inputs the viewpoint information of viewpoint A and the latent codes Zt0, Zt1, and Zt2 corresponding to times within the period t0 to t2 into the extended three-dimensional data generation model NNt0-2 for the period t0 to t2, and generates an image IMGt0 of viewpoint A at time t0, an image IMGt1 of viewpoint A at time t1, and an image IMGt2 of viewpoint A at time t2. The rendering unit 1518 applies the image generation process for the period t0 to t2 to the extended three-dimensional data generation model NNt3-5 for the period t3 to t5, thereby generating images IMGt3 to IMGt5 from viewpoint A at times t3 to t5. The rendering unit 1518 then uses images IMGt0 to IMGt5 and metadata such as time information to generate a video of the target object as seen from viewpoint A at times t0 to t5. The video may include, for example, images IMGt0 to IMGt5 and presentation time information for calculating the presentation time of images IMGt0 to IMGt5 based on times t0 to t5. 【0522】 The viewpoint information may be changed according to the time. For example, the extended three-dimensional data generation model NNt0-2 for period t0-t2 may be input with viewpoint information for viewpoint A and latent codes Zt0-Zt2, and the extended three-dimensional data generation model NNt3-5 for period t3-t5 may be input with viewpoint information for viewpoint B and latent codes Zt3-Zt5. As a result, the rendering unit 1518 generates multiple images of the target object as seen from viewpoint A at time t0-t2, and multiple images as seen from viewpoint B at time t3-t5. In other words, the rendering unit 1518 can generate a moving image of the target object, in which the viewpoint switches from viewpoint A to viewpoint B at time t3. 【0523】Furthermore, the rendering unit 1518 does not necessarily need to generate moving images; it may also generate still images of a specified viewpoint at a specified time. This allows the user to switch between generating moving images and still images depending on their needs. 【0524】 Furthermore, the rendering unit 1518 is not limited to generating moving images or still images from the extended three-dimensional data generation model. For example, the rendering unit 1518 may generate point cloud data or mesh data for a period that the extended three-dimensional data generation model can represent, and output the generated point cloud data or mesh data as dynamic point cloud data or dynamic mesh data. This allows the user to view dynamic three-dimensional data of a moving target object on an HMD (Head Mount Display) or the like, and to measure the amount of movement of the target object using the dynamic three-dimensional data. 【0525】 Figure 72 shows a second example of the configuration of the encoding device of Example 1 in Embodiment 3. 【0526】 The encoding device 1520 includes an extended three-dimensional data generation model acquisition unit 1521, a buffer unit 1522, a difference calculation unit 1523, a network model encoding unit 1524, and a latent code encoding unit 1525. 【0527】 The extended three-dimensional data generation model acquisition unit 1521 is the same as the extended three-dimensional data generation model acquisition unit 1511 of the encoding device 1510. 【0528】 The buffer unit 1522 is similar to the buffer unit 1512 of the encoding device 1510, but differs from the buffer unit 1512 in that it inputs an extended three-dimensional data generation model stored in memory or the like as a reference extended three-dimensional data generation model to the difference calculation unit 1523, and inputs a latent code stored in memory or the like as a reference latent code to the difference calculation unit 1523. 【0529】The difference calculation unit 1523 calculates difference information showing the difference between the extended three-dimensional data generation model NNt0-2 for period t0 to t2 and the extended three-dimensional data generation model NNt3-5 for period t3 to t5, both generated by the extended three-dimensional data generation model acquisition unit 1521, and the extended three-dimensional data generation model (hereinafter referred to as the reference extended three-dimensional data generation model) generated by the extended three-dimensional data generation model acquisition unit 1521 before each respective period. Here, the difference information may include differences in weight parameters at the nodes of each network model. For example, the difference calculation unit 1523 acquires the extended three-dimensional data generation model NNt3-5 for period t3 to t5 from the extended three-dimensional data generation model acquisition unit 1521, and acquires the extended three-dimensional data generation model NNt0-2 for period t0 to t2 from the buffer unit 1522 as the reference extended three-dimensional data generation model. 【0530】 Furthermore, the difference calculation unit 1523 may use the extended three-dimensional data generation model NNt3-5 and the extended three-dimensional data generation model NNt0-2 to calculate, for example, the difference (amount of change) between the weight parameters of the nodes of the network model in the extended three-dimensional data generation model NNt3-5 and the weight parameters of the nodes of the network model in the extended three-dimensional data generation model NNt0-2, and input this difference information to the network model coding unit 1524. As a result, the difference information is coded by the network model coding unit 1524. In other words, the coding device 1520 may reduce the amount of data by performing predictive coding, which predicts information related to the network model in the extended three-dimensional data generation model NNt3-5 from the extended three-dimensional data generation model NNt0-2 and codes the difference with the predicted value. With such predictive coding, for example, when the target object hardly moves, and there is little change in the extended three-dimensional data generation model over time, the value of the difference to be coded becomes small, thus improving coding efficiency. For example, the encoding device 1520 may set RNNt0-2=0 and RNNt3-5=NNt0-2, and reduce the bit size by predictive coding using the extended three-dimensional data generation model from the previous time period as a reference extended three-dimensional data generation model. 【0531】In addition, the difference calculation unit 1523 calculates difference information (hereinafter referred to as the difference latent code) between the latent codes Zt0 to Zt5 for periods t0 to t2 and t3-5 generated by the extended three-dimensional data generation model acquisition unit 1521 and the latent codes generated before each period (referred to as the reference latent code). 【0532】 Here, the differential latent code is, for example, the vector difference value (amount of change) Zt1-Zt0 of the latent code calculated using the latent code Zt1 at time t1 output from the extended three-dimensional data generation model acquisition unit 1521 and the latent code Zt0 at time t0 output from the buffer unit 1512 as a reference latent code. The difference calculation unit 1523 may output the differential latent code to the latent code encoding unit 1525. As a result, the differential latent code is encoded by the latent code encoding unit 1525. In this way, the encoding device 1520 may reduce the amount of data by predicting the latent code Ztn at time n from the latent code Ztm at time m, calculating a predicted value, and encoding the latent code Ztn at time n, the predicted value, and the differential latent code (difference vector), which is the difference between them. With such predictive encoding, for example, when the target object does not move much, the value of the difference to be encoded becomes small, so the encoding efficiency can be improved. 【0533】 In the first example, the encoding device 1520 predictively encodes the latent code Ztn at time n from the latent code Ztm at time m, but it is not necessarily limited to this. The encoding device 1520 may, for example, select a reference latent code to be used for prediction from one or more reference latent codes stored in the buffer unit 1522, and predictively encode using the selected reference latent code. In this case, the encoding device 1520 may add information indicating the selected reference latent code (reference latent code information) to the bitstream in order to communicate the selected reference latent code to the decoding device. This allows the encoding device 1520 to select the optimal reference latent code from the viewpoint of encoding efficiency, thereby improving encoding efficiency. Furthermore, the decoding device can appropriately decode the bitstream with improved encoding efficiency by decoding the reference latent code. 【0534】 Furthermore, when the encoding device 1520 performs predictive coding by referring to two or more latent codes stored in the buffer unit 1522, it may add information indicating two or more reference latent codes to the bitstream. This allows the encoding device 1520 to improve the coding efficiency of predictive coding by using two or more reference latent codes. In addition, the decoding device can appropriately decode the bitstream with improved coding efficiency. 【0535】 Furthermore, in cases where no reference latent code is stored in the buffer unit 1522, for example, when encoding the first latent code (first frame) in data order, the encoding device 1520 may encode the latent code to be processed without calculating the difference from the predicted value (hereinafter referred to as intra-prediction), or it may encode after calculating the difference from the predicted value set to 0. Also, when setting a certain period tm-n as a random access point, the encoding device 1520 may encode the latent code corresponding to the period tm-n using intra-prediction, or it may encode after calculating the difference from the predicted value set to 0. This allows the decoding device to start decoding the latent code from the first latent code (first frame) in data order, or from the random access point, thereby improving functionality during playback. 【0536】 Furthermore, a set of multiple latent codes (multiple frames) (hereinafter referred to as a GOF (Group of Frame)) may be defined, and the first frame of the GOF may be encoded by intra-prediction. This allows the decoder to randomly access the first frame of the GOF, and decoding the first frame of the GOF can enhance functionality such as fast-forward playback. 【0537】 Furthermore, the encoding device 1520 may add permission information to the bitstream indicating whether or not to allow predictive referencing between GOFs. For example, if the bitstream contains permission information indicating that predictive referencing between GOFs is prohibited, the decoding device can determine that it can decode multiple GOFs in parallel. Also, for example, allowing predictive referencing between GOFs can improve encoding efficiency. 【0538】 The network model encoding unit 1524 is similar to the network model encoding unit 1513 of the encoding device 1510, but differs in that it encodes the difference information d0-2 and d3-5 of the extended three-dimensional data generation models NNt0-2 and NNt3-5 input from the difference calculation unit 1523 and outputs a bitstream. 【0539】 Furthermore, the latent code encoding unit 1525 has the same function as the latent code encoding unit 1514, but differs in that it encodes the difference latent code input from the difference calculation unit 1523 and outputs a bitstream. 【0540】 Although the encoding device 1520 is described separately as consisting of a difference calculation unit 1523, a network model encoding unit 1524, and a latent code encoding unit 1525, it is not necessarily limited to this configuration. For example, the difference calculation unit 1523 may be included within the network model encoding unit 1524 or the latent code encoding unit 1525. In other words, the network model encoding unit 1524 or the latent code encoding unit 1525 may perform the processing of the difference calculation unit 1523. 【0541】 Furthermore, although the difference calculation unit 1523 is said to calculate the difference value of the extended three-dimensional data generation model and the difference value of the latent code, it is not necessarily limited to this. For example, the encoding device may be provided with a network model difference calculation unit that calculates the difference value of the network model and a latent code difference calculation unit that calculates the difference value of the latent code. This allows for faster processing by parallelizing the difference calculation. 【0542】 Furthermore, the encoding device 1520 may add predictive encoding information to the bitstream indicating whether the latent code was encoded using intra-prediction or predictive encoding using a reference latent code (hereinafter referred to as inter-prediction). This allows the decoding device to appropriately determine whether to use intra-prediction or inter-prediction to decode the latent code by decoding the predictive encoding information. 【0543】 Figure 73 shows a second example of the configuration of the decoding device of Example 1 in Embodiment 3. 【0544】The decoding device 1530 comprises a network model decoding unit 1531, a latent code decoding unit 1532, an addition unit 1533, a buffer unit 1534, and a rendering unit 1535. 【0545】 The network model decoding unit 1531 acquires a bitstream, decodes information related to the network model based on the acquired bitstream, and decodes metadata such as the difference information d0-2, d3-5 of the extended three-dimensional data generation model NNt0-2 and the extended three-dimensional data generation model NNt3-5 for the period t0 to t2, and time information. 【0546】 The latent code decoding unit 1532 decodes information related to the latent code from the input bitstream and decodes the differential latent codes Ztd0 to Ztd5 corresponding to the period from time t0 to t2 and the period from time t3 to t5. The decoding method for the differential latent code may be switched to match the encoding method of the encoding device 1520. For example, if the encoding device 1520 uses arithmetic encoding for the differential latent code, the latent code decoding unit 1532 decodes the differential latent code using arithmetic decoding. If the encoding device 1520 uses image encoding for the differential latent code, the latent code decoding unit 1532 may use image decoding to decode the differential latent code. The latent code decoding unit 1532 may also switch the decoding method for the differential latent code using information added to the bitstream. This allows the bitstream to be decoded appropriately. 【0547】 The addition unit 1533 adds the difference information d0-2 and d3-5 of the extended three-dimensional data generation models NNt0-2 and NNt3-5 corresponding to the periods t0-t2 and t3-t5, which have been decoded by the network model decoding unit 1531, and the reference extended three-dimensional data generation models RNNt0-2 and RNNt3-5 obtained from the buffer unit 1534, for the corresponding periods to calculate the extended three-dimensional data generation models NNt0-2 and NNt3-5. In this way, the decoding device 1530 may predict and decode by setting RNNt0-2 = 0 and RNNt3-5 = NNt0-2, and using the extended three-dimensional data generation model of the previous time period as the reference extended three-dimensional data generation model. 【0548】In addition, the addition unit 1533 adds the difference latent codes Ztd0 to Ztd5 corresponding to the times t0 to t5 decoded by the latent code decoding unit 1532 and the reference latent codes RZt0 to RZt5 input from the buffer unit 1534 for each corresponding time to calculate the latent codes Zt0 to Zt5. This allows the processing to be sped up by processing the addition in parallel. 【0549】 In the second example, the decoding device 1530 is described separately as an adder 1533, a network model decoding unit 1531, and a latent code decoding unit 1532. However, it is not limited to this configuration, and for example, the adder 1533 may be included within the network model decoding unit 1531 or the latent code decoding unit 1532. In other words, the network model decoding unit 1531 may perform the processing of the adder 1533, or the latent code decoding unit 1532 may perform the processing of the adder 1533. 【0550】 Furthermore, in cases where the buffer unit 1534 does not store a reference latent code, for example, when decoding the first latent code (first frame) in data order, the decoding device 1530 may decode without prediction, without the adder unit 1533 adding the difference information and the latent code (hereinafter referred to as intra-prediction), or it may decode by adding the difference information to the predicted value set to 0. Also, when setting a certain period tm-n as a random access point, the decoding device 1530 may decode the latent code corresponding to the period tm-n using intra-prediction, or it may decode by adding the difference information to the predicted value set to 0. In addition, if the bitstream contains predictive coding information indicating that the latent code to be decoded has been coded using intra-prediction, the decoding device 1530 may decode that latent code using intra-prediction, or it may decode by adding the difference information to the predicted value set to 0. As a result, the decoding device 1530 can start decoding the latent code from the first latent code (first frame) in data order, a random access point, or a latent code coded using intra-prediction, thereby improving functionality during playback. 【0551】In the second example, the decoding device 1530 predicts and decodes the latent code corresponding to time n from the latent code at time m, but it is not necessarily limited to this. The decoding device 1530 may, for example, select a reference latent code to be used for prediction from one or more latent codes stored in the buffer unit 1534, and predict and decode using the selected latent code. In this case, the decoding device 1530 may decode information indicating the selected latent code (reference latent code) from the bitstream. As a result, the decoding device 1530 can appropriately decode a bitstream with improved coding efficiency by decoding the reference latent code from the bitstream generated by the encoding device 1460, in which the optimal reference latent code is selected from the viewpoint of coding efficiency. 【0552】 Furthermore, when the decoding device 1530 performs predictive decoding by referring to two or more latent codes stored in the buffer unit 1534, it may decode information indicating two or more reference latent codes from the bitstream. This allows the decoding device 1530 to appropriately decode a bitstream in which the coding efficiency of predictive coding has been improved by using two or more reference latent codes. 【0553】 The rendering unit 1535 is the same as the rendering unit 1518 of the decoding device 1515. The rendering unit 1535 does not necessarily have to generate moving images; it may also generate still images of a specified viewpoint at a specified time. 【0554】[Modification of Example 1] When the encoding device transmits the encoded extended three-dimensional data generation model or latent code bitstream over a network, it may transmit the extended three-dimensional data generation model bitstream and the latent code bitstream separately. For example, if the receiving device generates rendering images from time t0 to t9, the encoding device may first transmit the extended three-dimensional data generation model NNt0-9 and the latent code Zt0 related to time t0 in time to generate the rendering image at time t0. Then, the encoding device may sequentially transmit the latent codes Zt1 to Zt9 later in time to generate rendering images from time t1 to t9 at the receiving device (the receiving device, e.g., the decoding device). This reduces the load on the network's transmission capacity, and the receiving device can generate the rendering image by the desired time. The encoding device may also transmit the extended three-dimensional data generation model and the latent code related to that time at each random access point, and transmit the latent code at locations other than random access points. This allows for the overall amount of coding to be suppressed while appropriately inserting random access points. 【0555】 Furthermore, when the information processing device learns the extended three-dimensional data generation model and latent codes, it may learn with a shorter time sampling interval (time interval), and the encoding device may transmit a bitstream containing latent codes with a longer sampling interval than that used during learning. For example, during learning, the sampling interval for latent codes may be set to 0.1-second increments, while the sampling interval for latent codes included in the encoded bitstream may be set to 1-second increments. In this case, the receiving device may use the decoded 1-second latent codes to calculate the latent codes every 0.1 seconds using interpolation or filtering. This makes it possible to generate a highly accurate extended three-dimensional data generation model and latent codes through learning while suppressing the amount of code in the generated bitstream. 【0556】Furthermore, the information processing device may assign one or more common latent codes to two or more extended three-dimensional data generation models and train them. For example, the information processing device may simultaneously train the extended three-dimensional data generation models for space A and space B using one common latent code, and generate an extended three-dimensional data generation model for space A, an extended three-dimensional data generation model for space B, and one latent code. In this case, as an example of rendering operation, the receiving device may, for example, simultaneously generate rendering images A and B of space A and space B as seen from viewpoint V at time t, obtain image A from the extended three-dimensional data generation model for space A, viewpoint V information, and latent code related to time t, and obtain image B from the extended three-dimensional data generation model for space B, viewpoint V information, and latent code related to time t. In this way, by assigning one or more common latent codes to two or more extended three-dimensional data generation models, it is possible to generate rendering images of multiple spaces while reducing the amount of data. 【0557】 Furthermore, the information processing device may learn model parameters so that the extended three-dimensional data generation model is not limited to information related to the shape of three-dimensional space, but can generate multiple attribute information such as color or reflectance from a single model. This can reduce the amount of data generated in the extended three-dimensional data generation model. In addition, the extended three-dimensional data generation model may maintain separate three-dimensional data generation models for each attribute. Specifically, model parameters may be learned and maintained separately for each attribute. This makes it possible to obtain an extended three-dimensional data generation model that is optimal for the attribute information. 【0558】Furthermore, the metadata transmitted to the receiving end may include information such as which of one or more viewpoint pieces of viewpoint information is most likely to improve the quality of the rendered image. This allows the receiving device to generate high-quality video by selecting viewpoint information with high rendering quality at each time point. The metadata may also include information such as which of one or more latent codes is most likely to improve the quality of the rendered image. This allows the receiving device to generate high-quality video by selecting the time point associated with the latent code with high rendering quality. The receiving device may also generate high-quality video by generating a rendered image using the latent codes encoded and transmitted as a bitstream. The metadata may also include information indicating which attributes, such as RGB, reflectivity, and transparency, the extended 3D data generation model can output. This allows the receiving device to switch the extended 3D data generation model used depending on the application. 【0559】 [Supplement] A latent code is assigned to each image (or object or scene) at a given time, and it represents the characteristics of that image. A latent code can represent various characteristics such as appearance, shape, and lighting. As mentioned above, a latent code is a value with n dimensions (where n is an integer greater than or equal to 2), and it may be represented as a multidimensional vector. Multidimensional vectors include low-dimensional vectors (2-3 dimensions). A latent code is a concept in machine learning or deep learning, and it is information that allows a model to learn the characteristics or patterns of input data and achieve a compact representation. Latent codes are used to compress and represent the movement and changes in appearance of a scene. By using automatically learned latent information as a code instead of time, the movement of geometry or textures can be recorded more expressively. Using this latent code, temporal changes can be represented smoothly, which can be useful in generating visual effects. Note that a latent code may also be a one-dimensional value representing time. 【0560】The times t0, t1, t2, t3, t4, and t5 are consecutive times with the same interval between them. For example, if the interval between each time point is 1 second, and t0 is 0 seconds, then t1 will be 1 second later, t2 will be 2 seconds later, and so on. This is just one example, and the interval between each time point can be any value. The interval between each time point can be defined as an integer or a decimal. In other words, time point tn can also take the value of a decimal. 【0561】 In one embodiment of this application, the encoding or decoding device is shown as an example of intra-coding or intra-decoding, and inter-coding or inter-decoding, respectively. However, the encoding or decoding device may be configured to perform both intra-coding or intra-decoding and inter-coding or inter-decoding, and may be switched between. This switching may be applied to the encoding or decoding of an extended three-dimensional data generation model, or to the encoding or decoding of latent codes. This makes it possible to realize combinations that can reduce the amount of code. 【0562】 The encoding device in this embodiment may include a circuit and a memory connected to the circuit, wherein the circuit may use the memory to (1) encode three-dimensional data generation model information capable of generating an image corresponding to a time within a first range at an arbitrary viewpoint into a bitstream, and (2) encode a latent code into a bitstream. This reduces the data size by arithmetic encoding the latent code. Furthermore, a decoding device that receives the bitstream can generate a moving image from the three-dimensional data generation model. 【0563】 The decoding device in this embodiment may include a circuit and a memory connected to the circuit, wherein the circuit uses the memory to obtain (1) three-dimensional data generation model information capable of generating an image corresponding to a first range of time points at an arbitrary viewpoint, and (2) a latent code from a bitstream, and uses the three-dimensional data generation model and the latent code to generate an image within the first range. This makes it possible to generate moving images from the three-dimensional data generation model. 【0564】The three-dimensional data generation model may be (1) a first network corresponding to a time within a first range, or (2) difference information between the first network and a second network corresponding to a time in a second range prior to the first range. 【0565】 The extended three-dimensional data generation model may be generated by inputting first viewpoint information, the latent code at the first time step, and the image at the first viewpoint at the first time step into a neural network. 【0566】 The extended three-dimensional data generation model may include a first model and a second model, wherein the first model (NNt0-2) is capable of generating images corresponding to times within the range of a first time t0 to a second time t2, and the second model (NNt3-5) is capable of generating images corresponding to times within the range of a third time t3 to a fourth time t5. 【0567】 [Example 2] A three-dimensional model generated using basic NeRF may be composed of multiple networks. Here, a network refers to a learned model obtained by learning using a neural network. Multiple networks may include, for example, a network learned using sparse sampling points and a network learned using dense sampling points. Thus, multiple networks are networks that differ in the number of input sampling points or the density of said sampling points. Sampling points may be, for example, three-dimensional points indicating three-dimensional positions. 【0568】Alternatively, the multiple networks may include, for example, a network for outputting geometric information such as object density, object probability, and geometric coordinates, and a network for outputting information associated with the geometry (attribute information) such as color information, reflectivity, normal vector, color coordinates, timestamp, and object ID, based on the geometric information. The multiple networks may include two or more networks. The multiple networks may be three or more networks with different sampling points, or they may have two or more networks for outputting geometric information, or two or more networks for outputting attribute information. 【0569】 Multiple networks may be encoded using multiple network coding units. The multiple network coding units may encode the multiple networks using existing network coding means, such as NNC (Neural Network Coding) of the MPEG standard. 【0570】 The information processing device may also apply the encoding or decoding method described in this embodiment to an extended three-dimensional data generation model that can generate still images from any viewpoint at any time and arrange them in chronological order, by adding latent codes to a three-dimensional data generation model composed of two or more networks as described above and learning from it. This makes it possible to reduce the amount of coding between the extended three-dimensional data generation model composed of two or more networks and the latent codes. 【0571】 Next, using NeRF, one of the three-dimensional modeling methods, as an example, we will explain, with reference to Figure 74, a method for generating an extended three-dimensional data generation model and latent codes from multiple two-dimensional images, and for encoding the generated extended three-dimensional data generation model (network model) and latent codes. Note that the method described here is just one example and can be applied not only to the NeRF method described here, but also to other NeRF methods or three-dimensional modeling methods. 【0572】Figure 74 is a block diagram showing an example of the configuration of an encoding device for encoding multiple networks in Embodiment 2 of Embodiment 3. 【0573】 The encoding device 1540 comprises an extended three-dimensional data generation model learning unit 1541, a three-dimensional data generation model encoding unit 1545, and a latent code encoding unit 1549. The encoding device 1540 may further include a bitstream data constructor 1552. 【0574】 First, the specific configuration of the extended three-dimensional data generation model learning unit 1541 will be described. Specifically, the extended three-dimensional data generation model learning unit 1541 includes a first network learning unit 1542, a sampling point determination unit 1543, and a second network learning unit 1544. 【0575】The first network learning unit 1542 learns a three-dimensional data generation model for each viewpoint and time period using the input two-dimensional images, viewpoint information (camera pose) for each of the input two-dimensional images, and the input latent code. In other words, the first network learning unit 1542 generates a three-dimensional data generation model (first network) and a learned latent code (first latent code) by learning the two-dimensional images associated with each viewpoint information based on the input two-dimensional images and the viewpoint information for each two-dimensional image. The viewpoint information includes the viewpoint at the time the two-dimensional image was taken and the line-of-sight vector (line of sight direction) from that viewpoint. The first network learning unit 1542 may also receive sampling points (first sampling points) for learning. The first sampling points may be, for example, a set of points with large spacing between them. The coordinates of each point included in the first sampling points may be predetermined coordinates or coordinates calculated by a predetermined method. The first network learning unit 1542 outputs a three-dimensional data generation model (first network) and a latent code (first latent code) obtained through learning. The three-dimensional data generation model (first network) generated by the first network learning unit 1542 is a network that outputs density information for sampling points at a time (or time period) represented by the latent code. The first network learning unit 1542 may also output density information for sampling points at a certain time (or time period) obtained during learning. 【0576】 Here, density information refers to information indicating the density of objects at a sampling point at a given time (or time period). For example, if an object is a person or a desk, the density information is set to a high value (i.e., higher than a predetermined value), if it is a light-transmitting object like glass, it is set to a low value (i.e., lower than a predetermined value), and if there is no object, it is set to a value close to 0. Therefore, density information can also be said to be information indicating whether or not an object is present, or information indicating the probability of an object's existence. Furthermore, density information can also be called geometric information. 【0577】The sampling point determination unit 1543 determines the density of objects at a certain time (or time period) at the coordinates indicated by the coarsely sampled first sampling point, based on the density information for a certain time (or time period) for the first sampling point output from the first network learning unit 1542, and determines a second sampling point to be used for learning in the second network learning unit 1544. In determining the second sampling point, the sampling point determination unit 1543 may, for example, determine that an object exists if the density of the sampling point is greater than a predetermined density, and decide to sample the area around the object where the existence of the object has been determined (the space in which the object is determined to exist and the surrounding space) more finely. Alternatively, in determining the second sampling point, the sampling point determination unit 1543 may, for example, determine that there is no object in the space where the sampling point is located if the density of the sampling point is less than a predetermined density, and decide to sample more coarsely in that space, or decide not to sample at all. For example, a PDF sampler may be used in the sampling point determination unit 1543. 【0578】 The sampling points output by the sampling point determination unit 1543 have different meanings depending on the density determination method. For example, a sampling point output when an object is determined to exist can be considered geometric information indicating the coordinates of the object. Furthermore, depending on the density of the sampling points, it is possible to distinguish between highly transparent objects such as glass, hard materials, etc., and it is possible to process the sampling points in objects (spaces) determined to be highly transparent objects, hard materials, etc., from the sampling points to be extracted. In this way, the sampling points to be extracted can be determined according to the density of the sampling points. Therefore, by extracting sampling points of objects or materials with a specific density that satisfy specific conditions, the extracted sampling points can be determined as geometric information. 【0579】The sampling point determination unit 1543 may use a predetermined method or parameters to determine the sampling points, or it may use a method or parameters selected from a plurality of methods or parameters. In this case, information indicating the predetermined method or parameters may be encoded as learning metadata and stored in the bitstream. As a result, information indicating the predetermined method or parameters may be notified to the decoding device as learning metadata included in the bitstream. 【0580】 The configuration of the second network learning unit 1544 is the same as that of the first network learning unit 1542. The second network learning unit 1544 learns a three-dimensional model for each viewpoint at a certain time (or time period) using the input multiple two-dimensional images, the viewpoint information (camera pose) for each of the input multiple two-dimensional images, the second sampling point (detailed sampling point) output from the sampling point determination unit 1543, and the initial latent code. In other words, the second network learning unit 1544 generates a three-dimensional model (second network) by learning the two-dimensional images associated with each viewpoint information based on the multiple two-dimensional images, the viewpoint information for each two-dimensional image, the second sampling point, and the initial latent code. The second network learning unit 1544 outputs the three-dimensional data generation model (second network) and the latent code (second latent code) obtained through learning. 【0581】 Here, the second network generated by the second network learning unit 1544 is a network capable of outputting color information and density information. The second network learning unit 1544 outputs density information and color information for sampling points obtained during learning, and this density information and color information may be used for other processing. 【0582】 Next, we will describe the specific configuration of the three-dimensional data generation model encoding unit 1545. 【0583】 The three-dimensional data generation model coding unit 1545 includes a first network coding unit 1546, a second network coding unit 1547, and a metadata coding unit 1548. 【0584】 The first network coding unit 1546 encodes the trained first network generated by the first network learning unit 1542. The first network coding unit 1546 outputs encoded data of the encoded first network. 【0585】 The second network coding unit 1547 encodes the trained second network generated by the second network learning unit 1544. The second network coding unit 1547 outputs encoded data of the encoded second network. 【0586】 The metadata encoding unit 1548 encodes the metadata generated by the sampling point determination unit 1543. The metadata encoding unit 1548 outputs encoded data in which the metadata has been encoded. 【0587】 In this way, the three-dimensional data generation model encoding unit 1545 generates encoded data in which the first network, the second network, and metadata are encoded, and the generated encoded data is output. 【0588】 The three-dimensional data generation model coding unit 1545 may perform coding using existing network coding means, such as NNC (Neural Network Coding) of the MPEG standard. The trained first and second networks include multiple layers, such as an input layer, an intermediate layer, and an output layer, nodes in each layer, weight coefficients for each node, and a transformation function for each node. Each of the trained first and second networks may have a density network for outputting density information, a color network for outputting color information, a reflectance network for outputting reflectance information, etc. Alternatively, each of the trained first and second networks may have an attribute network for outputting attribute information such as color information or reflectance information. 【0589】 Figure 75 shows an example of encoded data from the trained first network in Embodiment 3. 【0590】If the trained first network is a network for generating sampling points, it may include at least a density network for outputting density information. The trained first network may further include a color network for outputting color information, or an attribute network (reflectance network) for outputting other attribute information (e.g., reflectance information). 【0591】 Figure 76 shows an example of encoded data from the trained second network in Embodiment 3. 【0592】 The trained second network may be a network for outputting color information or other attribute information for a sampling point. The trained second network may include a density network for outputting density information and a color network for outputting color information. Furthermore, the trained second network may include an attribute network (reflectance network) for outputting other attribute information (e.g., reflectance information). 【0593】 Furthermore, if the output of attribute information is not required, the trained first or second network does not need to include a color network for outputting color information, or an attribute network (reflectance network) for outputting other attribute information (e.g., reflectance information). 【0594】 Furthermore, the three-dimensional data generation model encoding unit 1545 may generate predicted values for the second network from the first network and encode the value obtained by subtracting those predicted values from the second network (prediction residual). This reduces the coding amount of the second network. For example, the first network may be used as the predicted value. In this case, the value obtained by subtracting the value of the first network from the second network is encoded as the prediction residual of the second network. This reduces the coding amount when there is a high correlation between the first network and the second network. 【0595】Furthermore, the three-dimensional data generation model encoding unit 1545 may generate predicted values for the first network from the second network and encode the value obtained by subtracting those predicted values from the first network (prediction residual). This reduces the coding amount of the first network. For example, the second network may be used as the predicted value. In this case, the value obtained by subtracting the value of the second network from the first network is encoded as the prediction residual of the first network. This reduces the coding amount when there is a high correlation between the first network and the second network. 【0596】 The latent code encoding unit 1549 includes a first latent code encoding unit 1550 and a second latent code encoding unit 1551. The first latent code encoding unit 1550 encodes the first latent code input from the extended three-dimensional data generation model learning unit 1541 and outputs the encoded first latent code to the bitstream data constructor 1552. Similarly, the second latent code encoding unit 1551 encodes the second latent code input from the extended three-dimensional data generation model learning unit 1541 and outputs the encoded second latent code to the bitstream data constructor 1552. 【0597】 Furthermore, the encoding method described using Figures 70 and 72 may be applied as the encoding scheme for the latent code. This can reduce the amount of code in the latent code. 【0598】 Furthermore, the encoded results of latent codes may be added to the encoded data of the trained network. For example, the encoded result of the first latent code may be added to the encoded data of the trained first network, and the encoded result of the second latent code may be added to the encoded data of the trained second network. This makes it easier to link trained networks with their corresponding latent codes and to manage them. 【0599】 Furthermore, the first latent code and the second latent code may be encoded sequentially using a common latent code encoding unit. This eliminates the need for multiple latent code encoding units, thereby reducing memory size and circuit complexity. 【0600】Furthermore, the first learned latent code may be set as the initial latent code for the second latent code. This can shorten the training time for the second latent code. 【0601】 The encoding device 1540 may learn and generate a common latent code by simultaneously learning the first network, the second network, and one latent code. This eliminates the need to encode multiple latent codes, thereby reducing the amount of code. 【0602】 Furthermore, the initial latent codes for the first and second latent codes may each have different initial values. This allows for efficient learning of each latent code by setting appropriate initial latent codes for each. 【0603】 The latent code encoding unit 1549 may also generate a predicted value for the second latent code from the first latent code and encode the value obtained by subtracting that predicted value from the second latent code (prediction residual). This reduces the code size of the second latent code. For example, the first latent code may be used as the predicted value. In this case, the value obtained by subtracting the value of the first latent code from the second latent code is encoded as the prediction residual of the second latent code. This reduces the code size when there is a high correlation between the first latent code and the second latent code. 【0604】 The latent code encoding unit 1549 may also generate a predicted value for the first latent code from the second latent code and encode the value obtained by subtracting that predicted value from the first latent code (prediction residual). This reduces the code size of the first latent code. For example, the second latent code may be used as the predicted value. In this case, the value obtained by subtracting the value of the second latent code from the first latent code is encoded as the prediction residual of the first latent code. This reduces the code size when there is a high correlation between the first latent code and the second latent code. 【0605】 Next, a decoding device 1560 for decoding multiple networks will be described. Figure 77 is a block diagram showing an example of the configuration of a decoding device for decoding multiple networks in Embodiment 3. 【0606】The decoding device 1560 comprises a bitstream data partitioning unit 1561, a three-dimensional data generation model decoding unit 1562, a reconstruction unit 1566, and a latent code decoding unit 1571. 【0607】 The bitstream data splitting unit 1561 splits the input bitstream into encoded data for the first network, the second network, and metadata. 【0608】 Next, the specific configuration of the three-dimensional data generation model decoding unit 1562 will be described. The three-dimensional data generation model decoding unit 1562 includes a first network decoding unit 1563, a second network decoding unit 1564, and a metadata decoding unit 1565. 【0609】 The first network decoding unit 1563 decodes the trained first network based on the encoded data of the first network. The first network decoding unit 1563 outputs the decoded, trained first network. 【0610】 The second network decoding unit 1564 decodes the trained second network based on the encoded data of the second network. The second network decoding unit 1564 outputs the decoded, trained second network. 【0611】 The metadata decoding unit 1565 decodes the metadata based on the encoded metadata data. The metadata decoding unit 1565 outputs the decoded metadata. 【0612】 Next, the specific configuration of the latent code decoding unit 1571 will be described. The latent code decoding unit 1571 includes a first latent code decoding unit 1572 and a second latent code decoding unit 1573. 【0613】 The first latent code decoding unit 1572 decodes and outputs the first latent code. 【0614】 The second latent code decoding unit 1573 decodes and outputs the second latent code. 【0615】Furthermore, the decoding method described using Figures 71 and 73 may be applied as the latent code decoding method. This allows for the proper decoding of the bitstream from which the latent code has been reduced. 【0616】 Next, the specific configuration of the reconstruction unit 1566 will be described. The reconstruction unit 1566 includes a density estimation unit 1567, a sampling point determination unit 1568, an attribute information estimation unit 1569, and a rendering unit 1570. 【0617】 The density estimation unit 1567 uses the trained first network, the first sampling point, and the first latent code related to time t to estimate density information for the first sampling point at time t, and outputs the estimated density information. 【0618】 The sampling point determination unit 1568 determines the second sampling point based on the density information. The sampling point determination unit 1568 determines the second sampling point in the same way as the encoding device 1520 by using the parameters included in the metadata. The sampling point determination unit 1568 outputs the determined second sampling point. 【0619】 The attribute information estimation unit 1569 uses the trained second network, the second sampling point, and the second latent code related to time t to estimate density information and color information corresponding to the second sampling point at time t. The attribute information estimation unit 1569 outputs the estimated density information and color information. If the attribute information estimation unit 1569 is input with a sampling point for an arbitrary viewpoint and a second latent code related to time t, it may estimate density information and color information corresponding to the input sampling point for an arbitrary viewpoint at time t, and output the estimated density information and color information. 【0620】 The rendering unit 1570 performs rendering processing based on the density information and color information for each of the second sampling points, generates a two-dimensional image for each viewpoint information, and outputs the generated two-dimensional image for each viewpoint information. 【0621】The reconstruction unit 1566 may output the second sampling point and the attribute information (density information and color information) corresponding to the second sampling point, which has been estimated by the attribute information estimation unit 1569, as is. 【0622】 As shown in this embodiment, when a three-dimensional data generation model consists of two or more networks, the latent code may be stored for each network. This allows each network to generate an appropriate latent code by using its respective latent code during training, and the accuracy of the results output by each network can be improved by using that latent code during generation. 【0623】 [Encoding Device] Figure 78 is a diagram showing an example of the configuration of the encoding device in Embodiment 3. Figure 79 is a flowchart showing a first example of the encoding method by the encoding device in Embodiment 3. 【0624】 The encoding device 1580 comprises a circuit 1581 and a memory 1582 connected to the circuit 1581. The encoding device 1580 is a device that implements the encoding devices 1510, 1520, and 1540. 【0625】 Circuit 1581 performs the following operations. 【0626】 Circuit 1581 acquires multiple three-dimensional data generation models (S1501). Circuit 1581 encodes the multiple three-dimensional data generation models to generate a bitstream (S1502). The multiple three-dimensional data generation models include a first three-dimensional data generation model corresponding to the first time step and a second three-dimensional data generation model corresponding to the second time step. 【0627】 Here, each of the multiple three-dimensional data generation models may, upon receiving viewpoint information including viewpoint and line of sight direction, and a latent code as input, output a two-dimensional image of the subject as viewed from the viewpoint and line of sight direction at the time the latent code is set. The bitstream contains the latent code. The latent code is set for the time corresponding to each of the multiple three-dimensional data generation models. 【0628】According to this method, by encoding multiple three-dimensional data generation models to encode latent codes, the data size of the generated bitstream can be reduced. 【0629】 For example, a latent code is a value set for a two-dimensional image at a specific time, and it indicates the characteristics of the two-dimensional image at that time. 【0630】 Latent codes are used, for example, to compress and represent changes in scene movement and appearance. By using automatically learned latent information as code instead of time, the movement of geometry or textures can be recorded more expressively. This latent code allows for smoother representation of temporal changes, which can be used to generate visual effects. 【0631】 For example, a latent code is a multidimensional vector with n-dimensional values (where n is an integer greater than or equal to 2). 【0632】 In this way, by representing latent codes with multidimensional vectors, the characteristics of each time point can be expressed in more detail. 【0633】 For example, in acquiring multiple three-dimensional data generation models (S1501), circuit 1581 learns using latent codes set for the time corresponding to each of the multiple three-dimensional data generation models. 【0634】 According to this approach, instead of time, automatically learned latent information is used to learn the code, allowing for more expressive recording of, for example, the movement of geometry or textures. Furthermore, because it can learn to unify images with similar subject states even at different times, the amount of data in the generated 3D data generation model can be reduced. 【0635】 For example, the initial values of the latent codes used for learning are set to 0 or random values. This reduces the process of calculating initial values. 【0636】For example, the initial value of a latent code used for training is set to the value of a latent code corresponding to a different time point than the time point to which the latent code corresponds. This reduces the process of calculating the initial value. 【0637】 For example, the initial values of the latent codes used for training are set to the values of the trained latent codes. This allows for the high-precision generation of extended 3D data models or latent codes in scenes with little movement. 【0638】 For example, circuit 1581 further predictively encodes the latent code. The latent code contained in the bitstream is the predictively encoded latent code. This improves the encoding efficiency of the latent code. 【0639】 For example, in predictive coding of a latent code, circuit 1581 encodes the residual between the latent code and the predicted value. The predicted coded latent code is the coded residual. This allows the coding amount of the latent code to be reduced because the residual is coded. 【0640】 For example, circuit 1581 switches between intra-predictive coding and inter-predictive coding when performing predictive coding of latent codes. This allows the coding method to be changed according to the purpose. 【0641】 For example, each of the multiple three-dimensional data generation models is composed of multiple networks, including a first network and a second network different from the first network. Circuit 1581 (i) acquires the multiple three-dimensional data generation models (S1501), inputs a first initial value of the latent code into the first network and outputs the first latent code, and inputs a second initial value of the latent code into the second network and outputs the second latent code. Circuit 1581 (ii) generates a bitstream (S1502), and generates a bitstream including the first latent code and the second latent code. 【0642】This allows for the output of latent codes for each network, enabling the setting of latent codes tailored to each network. Therefore, it may be possible to further reduce the data size of the bitstream generated by encoding multiple three-dimensional data generation models. 【0643】 For example, the first initial value and the second initial value are different values from each other. 【0644】 This allows for efficient learning of each latent code by setting appropriate initial latent codes for each. 【0645】 For example, circuit 1581 further encodes the first residual between the second latent code and the first predicted value based on the first latent code. The latent code included in the bitstream is the encoded first residual. 【0646】 This reduces the coding complexity of the second latent code. For example, the first latent code may be used as the first predicted value. In this case, the value obtained by subtracting the value of the first latent code from the second latent code is encoded as the prediction residual of the second latent code. This reduces the coding complexity when there is a high correlation between the first and second latent codes. 【0647】 For example, circuit 1581 further encodes the second residual between the first latent code and the second predicted value based on the second latent code. The latent code included in the bitstream is the encoded second residual. 【0648】 This reduces the coding weight of the first latent code. For example, the second latent code may be used as the second predicted value. In this case, the value obtained by subtracting the value of the second latent code from the first latent code is encoded as the prediction residual of the first latent code. This reduces the coding weight when there is a high correlation between the first latent code and the second latent code. 【0649】 Figure 80 is a flowchart showing a second example of an encoding method using an encoding device in Embodiment 3. 【0650】 Circuit 1581 may perform the following operations. 【0651】Circuit 1581 acquires a three-dimensional data generation model that includes a first network and a second network different from the first network (S1511). Circuit 1581 encodes the three-dimensional data generation model to generate a bitstream (S1512). In encoding the three-dimensional data generation model, Circuit 1581 encodes the first predicted value based on the first network and the first residual with the second network. 【0652】 According to this, the coding complexity of the second network of the three-dimensional data generation model can be reduced because it encodes the first residual obtained using the first predicted value based on the first network. 【0653】 For example, in encoding the three-dimensional data generation model, circuit 1581 further encodes the second predicted value based on the second network and the second residual with the first network. 【0654】 According to this, the coding complexity of the first network of the three-dimensional data generation model can be reduced because the second residual obtained using the second predicted value based on the second network is coded into the first network. 【0655】 [Decoding Device] Figure 81 is a diagram showing an example of the configuration of the decoding device in Embodiment 3. Figure 82 is a flowchart showing a first example of the decoding method by the decoding device in Embodiment 3. 【0656】 The decoding device 1590 comprises a circuit 1591 and a memory 1592 connected to the circuit 1591. The decoding device 1590 is a device that implements the decoding devices 1515, 1530, and 1560. 【0657】 Circuit 1591 performs the following operations. 【0658】 Circuit 1591 acquires a bitstream (S1521). Circuit 1591 decodes multiple three-dimensional data generation models and latent codes from the bitstream (S1522). The multiple three-dimensional data generation models include a first three-dimensional data generation model corresponding to the first time step and a second three-dimensional data generation model corresponding to the second time step. 【0659】Here, each of the multiple three-dimensional data generation models may, upon receiving viewpoint information including viewpoint and line of sight direction, and a latent code, output a two-dimensional image of the subject as viewed from the viewpoint and line of sight direction at the time the latent code is set. The latent code is set for the time corresponding to each of the multiple three-dimensional data generation m...
Claims
1. An encoding method that obtains encoded data of a three-dimensional generative model generated by learning about three-dimensional space, and one or more viewpoint information corresponding to one or more images used for learning to generate the three-dimensional generative model, generates metadata indicating the one or more viewpoint information, and generates a bitstream including the encoded data of the three-dimensional generative model and the metadata.
2. The encoding method according to claim 1, wherein the one or more viewpoint pieces of information include a plurality of viewpoint pieces of information corresponding to each of the plurality of images used for learning to generate the three-dimensional generative model.
3. The encoding method according to claim 1, wherein the three-dimensional generation model is generated by learning using one or more images obtained from one or more viewpoints indicated by the one or more viewpoint information.
4. The encoding method according to any one of claims 1 to 3, wherein the bitstream generation comprises a first bitstream containing encoded data of the three-dimensional generation model and metadata, and a second bitstream that is different from the first bitstream and contains one or more images.
5. The encoding method according to any one of claims 1 to 3, wherein the generation of the bitstream includes a first bitstream containing encoded data of the three-dimensional generation model and a second bitstream that is different from the first bitstream and includes the metadata and one or more images.
6. The encoding method according to any one of claims 1 to 3, wherein each of the one or more viewpoint information includes a type, and the type includes a first value indicating that the viewpoint information to which the type corresponds is viewpoint information recommended by the user, or a second value indicating that the viewpoint information to which the type corresponds corresponds to an image used for training to generate the three-dimensional generative model.
7. The encoding method according to any one of claims 1 to 3, wherein the metadata includes confidence information indicating whether each of the multiple images was used for training to generate the three-dimensional generative model.
8. The encoding method according to any one of claims 1 to 3, wherein the metadata includes the reliability of each of the plurality of images, which is confidence information indicating the reliability in learning for the generation of the three-dimensional generative model.
9. The encoding method according to claim 7, wherein the metadata includes identification information indicating whether or not the trust information exists, and the type included in one of the one or more viewpoint pieces of viewpoint information indicates that the viewpoint piece of viewpoint information corresponds to an image used for training to generate the three-dimensional generative model, the identification information indicates that the trust information exists.
10. The encoding method according to any one of claims 1 to 3, further comprising obtaining encoded data of an additional three-dimensional generative model corresponding to the next timing of the three-dimensional generative model, and update information indicating whether or not an additional viewpoint information corresponding to the additional three-dimensional generative model has been updated from the one or more viewpoint information, wherein the metadata includes the update information, and the bitstream includes additional metadata indicating the additional one or more viewpoint information if the update information indicates an update.
11. The encoding method according to any one of claims 1 to 3, wherein the metadata includes sequence information indicating whether the one or more viewpoint pieces are fixed or changeable in a sequence unit, and if the sequence information indicates that the one or more viewpoint pieces are fixed in a sequence unit, the metadata does not include one or more additional viewpoint pieces in the sequence unit, and if the sequence information indicates that the one or more viewpoint pieces change in a sequence unit, the metadata includes one or more additional viewpoint pieces in the sequence unit.
12. A decoding method for obtaining a bitstream that includes encoded data of a three-dimensional generative model generated by learning about three-dimensional space and metadata indicating one or more viewpoint pieces of information corresponding to one or more images used for learning to generate the three-dimensional generative model, and decoding the encoded data of the three-dimensional generative model and the metadata from the bitstream to obtain the three-dimensional generative model and the one or more viewpoint pieces of information.
13. The decoding method according to claim 12, wherein the one or more viewpoint pieces of information include a plurality of viewpoint pieces of information corresponding to each of the plurality of images used for learning to generate the three-dimensional generative model.
14. The decoding method according to claim 12, wherein the three-dimensional generation model is generated by learning using one or more images obtained from one or more viewpoints indicated by the one or more viewpoint information.
15. The decoding method according to any one of claims 12 to 14, wherein the acquisition of the bitstream includes a first bitstream containing encoded data of the three-dimensional generation model and the metadata, and a second bitstream that is different from the first bitstream and contains one or more images.
16. The decoding method according to any one of claims 12 to 14, wherein the acquisition of the bitstream includes a first bitstream containing encoded data of the three-dimensional generative model and a second bitstream that is different from the first bitstream and contains the metadata and one or more images.
17. The decoding method according to any one of claims 12 to 14, wherein each of the one or more viewpoint information includes a type, and the type includes a first value indicating that the viewpoint information to which the type corresponds is a viewpoint information recommended by the user, or a second value indicating that the viewpoint information to which the type corresponds corresponds to an image used for training to generate the three-dimensional generative model.
18. The decoding method according to any one of claims 12 to 14, wherein the metadata includes confidence information indicating whether each of the multiple images was used for training to generate the three-dimensional generative model.
19. The decoding method according to any one of claims 12 to 14, wherein the metadata includes the reliability of each of the plurality of images, which is confidence information indicating the reliability in learning for the generation of the three-dimensional generative model.
20. The decoding method according to claim 18, wherein the metadata includes identification information indicating whether or not the trust information exists, and the type included in one of the one or more viewpoint pieces of viewpoint information indicates that the one viewpoint piece of viewpoint information is a first viewpoint piece of which is an image used for learning to generate the three-dimensional generative model, the identification information indicates that the trust information exists.
21. The decoding method according to any one of claims 12 to 14, wherein the bitstream further includes encoded data of an additional three-dimensional generative model corresponding to the next timing of the three-dimensional generative model, and update information indicating whether an additional one or more viewpoint pieces of information corresponding to the additional three-dimensional generative model have been updated from the one or more viewpoint pieces of information, the metadata includes the update information, and the bitstream includes additional metadata indicating the additional one or more viewpoint pieces of information if the update information indicates an update.
22. The decoding method according to any one of claims 12 to 14, wherein the metadata includes sequence information indicating whether the one or more viewpoint pieces are fixed or changeable in a sequence unit, and if the sequence information indicates that the one or more viewpoint pieces are fixed in a sequence unit, the metadata does not include one or more additional viewpoint pieces in the sequence unit, and if the sequence information indicates that the one or more viewpoint pieces change in a sequence unit, the metadata includes one or more additional viewpoint pieces in the sequence unit.
23. An encoding device comprising a circuit and a memory connected to the circuit, wherein the circuit, in operation, acquires encoded data of a three-dimensional generative model generated by learning about three-dimensional space and one or more viewpoint information corresponding to one or more images used for learning to generate the three-dimensional generative model, generates metadata indicating the one or more viewpoint information, and generates a bitstream including the encoded data of the three-dimensional generative model and the metadata.
24. A decoding device comprising a circuit and a memory connected to the circuit, wherein the circuit, in operation, acquires a bitstream including encoded data of a three-dimensional generative model generated by learning about three-dimensional space and metadata indicating one or more viewpoint pieces of information corresponding to one or more images used for learning to generate the three-dimensional generative model, and decodes the encoded data of the three-dimensional generative model and the metadata from the bitstream to acquire the three-dimensional generative model and the one or more viewpoint pieces of information.