Synthetic data generation system and application for synthesizing high-resolution 3D shapes from low-resolution representations
By using a deep 3D conditional generation model and deformable tetrahedral grid, combined with implicit and explicit representations, surface reconstruction is optimized, solving the problems of artifacts and wasted computational resources in high-quality 3D shape generation in existing technologies, and achieving efficient high-resolution 3D shape generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NVIDIA CORP
- Filing Date
- 2022-04-11
- Publication Date
- 2026-06-23
AI Technical Summary
Existing 3D shape synthesis methods suffer from artifacts and high computational costs when generating high-quality geometric details. Especially in interactive, near real-time, or real-time applications, existing methods cannot effectively combine implicit and explicit representations, leading to unstable topological structures or wasted computational resources.
A deep 3D conditional generation model is adopted, and surface reconstruction is optimized by combining implicit and explicit representations through deformable tetrahedral grids and traveling tetrahedral algorithms. Differentiable surface subdivision and loss function optimization are used to generate high-resolution 3D shapes.
It achieves high-quality 3D shape generation, can handle arbitrary topological structures, reduces the waste of computing and storage resources, and improves generation efficiency and accuracy.
Smart Images

Figure CN115699094B_ABST
Abstract
Description
Background Technology
[0001] Fields such as simulation, architecture, gaming, and film rely on high-quality 3D content with rich geometric details and topological structures. However, creating high-quality 3D shapes suitable for such applications requires significant development time, computation, and memory—typically per individual shape. In contrast, creating coarse 3D shapes—for example, using voxels, blocks, sparse point clouds, etc.—consumes less time, computation, and memory and has been widely adopted by all types of users, including those who may not have 3D modeling expertise.
[0002] A robust 3D representation is a key component of a learning-based 3D content creation framework. For example, a good 3D representation for high-quality reconstruction and synthesis should be able to capture local geometric details and represent objects with arbitrary topologies, while also being memory and computationally efficient for fast inference in interactive, near-real-time, and / or real-time applications. To achieve this, previous methods have employed neural implicit representations, which use neural networks to represent the signed distance field (SDF) and / or occupancy field (OF) of shapes. However, most existing implicit methods are trained by regressing to the SDF or occupancy values and cannot utilize explicit supervision on the underlying surface (which would allow for useful constraints beneficial to training), resulting in artifacts when synthesizing fine details. To mitigate this problem, some existing methods use isosurface techniques (such as the marching cubes (MC) algorithm) to extract surface meshes from the implicit representation—a computationally expensive method that heavily relies on the grid resolution used in MC. Running isosurfaces at finite resolutions introduces quantization errors into the geometry and type of the surface. Therefore, existing implicit methods either use implicit representations that result in lower quality shape synthesis or a combination of computationally expensive implicit and explicit isosurface techniques that depend on raster resolution—making these methods unsuitable for high-quality shape synthesis in interactive, near-real-time, or real-time applications.
[0003] Some previous methods included voxel-based approaches, which represent 3D shapes as voxels, storing coarse occupancy (inner / outer) values on a regular grid. For high-resolution shape synthesis, generative adversarial networks have been used to transfer geometric details from high-resolution voxel shapes to low-resolution shapes by using discriminators defined on 3D patches of the voxel grid. However, as resolution increases, computational and memory costs increase cubically, prohibiting the reconstruction of fine geometric details and smooth curves.
[0004] Other previous methods used surface-based approaches to directly predict triangular meshes. Typically, surface-based methods assume a predefined topology for the shape and may lose accuracy for objects with complex topological variations. Furthermore, similar to voxel-based methods, computational costs increase cubically with raster resolution. Additionally, meshes generated in previous methods may contain errors such as non-manifold vertices and edges resulting from self-intersections of mesh faces. Summary of the Invention
[0005] Embodiments of this disclosure relate to high-resolution shape synthesis for deep learning systems and applications. The disclosed systems and methods utilize deep 3D conditional generative models to generate high-resolution 3D shapes from lower-resolution 3D guides—e.g., coarse voxels, sparse point clouds, scans, etc. Differentiable shape representations can be generated, combining both implicit and explicit 3D representations, and optimizing the reconstructed surface of the 3D shape to produce higher-quality shapes with finer geometric details compared to previous methods that optimize predicted SDF or occupancy values. For example, the systems and methods of this disclosure produce shapes with arbitrary topologies compared to methods that generate representations (mesh-like structures). Specifically, for example, and not limited to, the underlying 2-manifold parameterization, parameterized by a deformable tetrahedral raster encoded by implicit functions, can be predicted using the Traveling Tetrahedron (MT) algorithm, and the underlying 2-manifold can be converted into an explicit mesh. The MT algorithm can be differentiable and offer better performance than previous MC methods. This system can learn to adapt to the resolution of the raster, maintaining efficiency through deformation and selective subdivision of tetrahedrons—for example, by focusing computation only on relevant regions in space. Compared to octree-based shape synthesis, the networks of this disclosure jointly learn raster deformation and subdivision to better represent surfaces without relying on explicit supervision from pre-computed hierarchies. The deep 3D convolutional generative model can be end-to-end differentiable, allowing the network to jointly optimize the geometry and topology of the surface, as well as the hierarchy of subdivisions using a loss function explicitly defined on the surface mesh. Furthermore, previous methods claimed that singularities in the MC formula prevented type variations during training, a claim overridden by this system and method. For example, the 3D representation of this system and method extends to high resolution without requiring additional modifications to the backpropagation. Moreover, the deep 3D convolutional generative model has the ability to represent arbitrary topologies and directly optimizes surface reconstruction to alleviate these problems. Attached Figure Description
[0006] The present system and method for high-resolution shape synthesis for deep learning systems and applications are described in detail below with reference to the accompanying drawings, wherein;
[0007] Figure 1 This is a data flow diagram illustrating the process of synthesizing and reconstructing three-dimensional (3D) shapes according to some embodiments of the present disclosure;
[0008] Figure 2A Examples of volume subdivision of a tetrahedron according to some embodiments of the present disclosure are shown;
[0009] Figure 2B Visual examples of surface estimation with and without volumetric subdivision according to some embodiments of the present disclosure are shown;
[0010] Figure 3 Examples of identifying vertex locations of isosurfaces according to some embodiments of the present disclosure are shown;
[0011] Figures 4A-4B A diagram illustrating computational and memory resource requirements with and without selective volume subdivision is shown according to some embodiments of the present disclosure;
[0012] Figure 5 This is a flowchart illustrating a method for high-resolution shape synthesis according to some embodiments of the present disclosure;
[0013] Figure 6 This is a block diagram of a computing device suitable for implementing some embodiments of the present disclosure; and
[0014] Figure 7 This is a block diagram of an example data center suitable for implementing some embodiments of the present disclosure. Detailed Implementation
[0015] Systems and methods for high-resolution shape synthesis involving deep learning systems and applications are disclosed. The systems and methods described herein can be used for a variety of purposes, by way of example and without limitation, for machine control, machine motion, machine driving, synthetic data generation, model training, perception, augmented reality, virtual reality, mixed reality, robotics, security and surveillance, autonomous or semi-autonomous machine applications, deep learning, environmental simulation, data center processing, conversational artificial intelligence, optical transport simulation (e.g., ray tracing, path tracing, etc.), collaborative content creation of 3D assets, cloud computing, and / or any other suitable applications.
[0016] The disclosed embodiments can be included in a variety of different systems, such as automotive systems (e.g., control systems for autonomous or semi-autonomous machines, perception systems for autonomous or semi-autonomous machines), systems implemented using robots, aviation systems, medical systems, boating systems, intelligent area monitoring systems, systems performing deep learning operations, systems performing simulation operations, systems implemented using edge devices, systems merging one or more virtual machines (VMs), systems for performing synthetic data generation operations, systems implemented at least partially in a data center, systems for performing conversational artificial intelligence operations, systems for performing optical transmission simulations, systems for performing collaborative content creation of 3D assets, systems implemented at least partially using cloud computing resources, and / or other types of systems. Although this document is primarily described with respect to the creation, synthesis, or reconstruction of 3D shapes or content, it is not intended to be limiting. The systems and methods disclosed herein can be used for the creation, synthesis, or reconstruction of two-dimensional (2D) shapes or content without departing from the scope of this disclosure.
[0017] refer to Figure 1 , Figure 1 This is a data flow diagram illustrating a process 100 for 3D shape synthesis and reconstruction according to some embodiments of this disclosure. It should be understood that such and other arrangements described herein are presented merely as examples. Other arrangements and elements (e.g., machines, interfaces, functions, sequences, groupings of functions, etc.) may be used to supplement or replace those shown, and some elements may be omitted entirely. Furthermore, many of the elements described herein are functional entities that may be implemented as discrete or distributed components, or together with other components, in any suitable combination and location. The various functions performed by the entities described herein may be performed by hardware, firmware, and / or software. For example, various functions may be performed by a processor executing instructions stored in memory. In some embodiments, one or more of the components, features, and / or functions may be similar to... Figure 6 Example computing device 600 and / or Figure 7 Examples of data centers include those in the 700 series.
[0018] Process 100 can be used to synthesize or reconstruct high-quality 3D shapes and objects. To generate a 3D shape, input data representing one or more inputs 102 can be received and / or generated. Input 102 may include point clouds (e.g., sparse point clouds in embodiments), voxelized shapes (e.g., coarse voxelized shapes), scans (e.g., 3D scans), and / or another type—e.g., lower-quality—input 102. This input can be processed using one or more machine learning models, such as, but not limited to, [models such as...]. Figure 1(A)-(E) represent a deep 3D conditional generation model for high-resolution shape synthesis. For example, input 102 can be processed using this model to: (A) predict the symbolic distance field (SDF) at an initial raster resolution; (B) selectively subdivide the raster into tetrahedrons and interpolate the updated SDF of the subdivided raster; (C) refine the boundary SDF and deform and trim the graph; (D) perform a traveling tetrahedron method on the interpolated SDF to generate a triangular mesh; and (E) convert the triangular mesh into a parametric surface using differentiable surface subdivision. For example, operations (A)-(C) can be performed to generate an implicit function 104, and operation (D) can be performed to generate an explicit surface 106, and surface subdivision can be performed to generate one or more outputs 108 (e.g., high-quality 3D shapes or objects).
[0019] The model of process 100 can use a hybrid 3D representation designed for high-resolution reconstruction and synthesis. The 3D representation can be represented using an SDF encoded with a deformable tetrahedral grid. This grid can be fully tetrahedralized into a unit cube, where each unit in the volume can be a tetrahedron with, for example, four vertices and faces. The advantage of this representation is that the grid vertices can be deformed to more efficiently represent the geometry of the shape. Furthermore, in the embodiment, a symbolic distance value can be defined on the vertices of the grid to implicitly represent the underlying surface, rather than defining an encoded occupancy on each tetrahedron as in previous methods. The use of symbolic distance values, instead of occupancy, provides greater flexibility in representing the underlying surface. The deformable tetrahedral mesh can be used as an approximation of an implicit function. To generate the deformable tetrahedral grid, (V T ,T), where V T These are the vertices of a tetrahedral grid T, each tetrahedron T k ∈T can be represented by four vertices. Where k∈{1,…,K}, and K is a tetrahedron and The total number. SDF can be represented by inserting SDF values defined on the mesh vertices. For example, SDF values can be found at vertex v. i ∈V T Represented as s(v i The SDF value of a point located inside a tetrahedron can be interpolated using the SDF values of the four vertices encapsulating that point.
[0020] To further enhance flexibility while maintaining control over memory and computation, the tetrahedron surrounding the predicted surface can be subdivided—for example, using selective subdivision. In this way, the shape can be represented in a coarse-to-fine manner to improve efficiency. Surface tetrahedron, T surfThis can be determined by checking if the tetrahedron has vertices with different SDF symbols (e.g., one positive and one negative)—indicating that the tetrahedron intersects with a surface encoded by SDF. These surface tetrahedrons, T surf The surface tetrahedron can be subdivided; furthermore, in the embodiments, the nearest neighbors of the surface tetrahedron can also be subdivided. Resolution can be improved by adding a midpoint on each edge, such as... Figure 2A As shown, each surface is tetrahedral, T surf 202, through each original vertex 206 (e.g., 206A (or v) a ), 206B (or v) b ), 206C (or v c ) and 206D (or v d Adding midpoints 204 (e.g., 204A, 204B, 204C, 204D, and 204F) between them divides the space into eight tetrahedrons. The SDF value of the new vertex can then be calculated using, for example, the average SDF value on the edges (e.g., if the SDF value of the original vertex is -2 and +4, the SDF value of the midpoint or new vertex can be +1).
[0021] Figure 2B The results of volumetric subdivision along the surface tetrahedron are shown compared to not using volumetric subdivision. For example, visualization 230 includes a portion 236 of the estimated surface along with the ground reality surface 238, where the portion 236 of the estimated surface does not capture the contour of the ground reality surface 238. However, visualization 232 includes the portion 236 of the estimated surface after volumetric subdivision and before locally updating the vertex positions and SDF, while visualization 234 includes an updated portion 240 of the estimated surface after volumetric subdivision and after updating the vertex positions and SDF. The updated portion 240 of the estimated surface is closer to the contour of the ground reality surface 238, resulting in a more accurate implicit representation of the object.
[0022] Implicit representations based on signed distance—e.g., after subdivision—can be converted into triangular meshes using a advancing tetrahedral layer, and these meshes can be converted into parametric surfaces using a differentiable surface subdivision module. For example, the encoded SDF can be converted into an explicit triangular mesh using the advancing tetrahedral (MT) algorithm. Considering the SDF values of the tetrahedral vertices, {s(v a ),s(v b ),s(v c ),s(v d Based on the sign of s(v), the MT algorithm can be used to determine the surface type inside a tetrahedron, such as... Figure 3 As shown. In such an example, the total number of configurations could be 2. 4Or 16, after considering rotational symmetry, this falls into three distinct cases. Once the surface type inside the tetrahedron is identified, the vertex positions of the isosurfaces can be calculated at the zero-point intersections of linear interpolation along the tetrahedron's edges, such as... Figure 3 As shown. In one or more embodiments, only when the symbol s(v) a ) ≠ symbol s(v b The equation can only be evaluated when s(v) is true; therefore, singularities in the formula can be avoided (e.g., when s(v) is true). a )=s(v b When, and to customize the gradient of the loss on the extracted isosurface, it can be backpropagated to both the vertex position and the SDF value, for example, via the chainurn rule.
[0023] Differentiable surface subdivision can be performed on triangular meshes to improve the representational power and visual quality of the graphics. A loop subdivision method can be implemented, using learnable parameters instead of a fixed set of parameters. Specifically, the learnable parameters can include each mesh vertex v′. i The position, and α i The learnable parameters control the generated surface via the smoothness of weighted neighboring vertices. Compared to previous methods, and to save computational resources, the parameters for each vertex may only be predicted at the beginning and carried over to subsequent subdivision iterations. The result can be an explicit surface 106, which can be used to generate output 108—for example, a shape or object represented using a parameterized surface.
[0024] In a non-limiting embodiment, the deep neural network (DNN) used to generate the output 108 may include a 3D deep conditional generative model. For example, the DNN may use the hybrid 3D representations described herein to learn to output a high-resolution 3D mesh, M, from an input x, which may include point clouds, coarse-voxed shapes, scans, and / or the like. For example, the DNN may include one or more modules, each tasked with computing intermediate or final outputs to generate the 3D mesh, M, during the processing of the input x.
[0025] In some embodiments, such as Figure 1 As shown, the model can include one or more machine learning models that perform a task using an initial SDF prediction of 110. Therefore, the model can include extracting a 3D feature volume F from the point cloud. vol The input encoder for (x). When the input 102 is not a point cloud but a coarse voxelized shape, for example, it can be sampled from the surface of the voxelized shape to generate a point cloud. Then, the machine learning model can be used to interpolate the points to the raster vertices via trilinear interpolation. Generate feature vector F vol(x). Initial predictions of the SDF value of each vertex in the initial deformable tetrahedral grid can be used, for example, in a fully connected network s(v) = MLP(F vol (v,x),v) is generated. The fully connected network can additionally output a feature vector f(v), which can be used for surface refinement during the volume subdivision stage.
[0026] After obtaining the initial SDF, surface refinement 112 can be performed to iteratively refine the surface and subdivide the tetrahedral grid. For example, surface tetrahedron T can be identified based on the current s(v) value. surf And it can generate graphics, G = (V surf E surf ), where V surf and E surf Corresponding to T surf The vertices and edges. Position offset Δv i and SDF residual value Δs(v i ) can be used, for example, graph convolutional networks for V surf For each vertex I in the equation, prediction is performed, as shown in equations (1) and (2) below:
[0027]
[0028]
[0029] Where N surf It is V surf The total number of mid-vertices, f(v) i The updated feature is the feature of each vertex. Each vertex v i The vertex position and SDF value can be updated to v′ i =v i +Δv i and s(v′) i )=s(v i )+Δs(v i This refinement operation can flip the sign of the SDF value to refine the local typology and move vertices to improve the local geometry.
[0030] After surface refinement, volumetric subdivision can be performed, followed by additional surface refinement operations. For example, T surf It can be re-identified, and T surf Neighbors can be subdivided. Unsubdivided tetrahedra can be discarded or excluded from the complete tetrahedral grid in both operations. In the embodiment, this saves memory and computation because T surfThe size is proportional to the surface area of the object and expands quadratically rather than cubically as the raster resolution increases. For example, as shown in Figures 4A and 4B, Figure 400 illustrates volume subdivision and surface refinement calculations without excluding unsubdivided tetrahedrons, and Figure 402 illustrates volume subdivision and surface refinement calculations when excluding unsubdivided tetrahedrons.
[0031] Furthermore, because the SDF values and positions are inherited from the levels before subdivision, the loss computed on the final surface can be backpropagated to all vertices across all levels. Therefore, this model can automatically learn the subdivided tetrahedron without requiring additional loss terms in intermediate steps to supervise the learning of the octree hierarchy, as was the case with previous methods.
[0032] After extracting the surface mesh using the traveling tetrahedron algorithm (e.g., Figure 1 In operation (D), learnable surface subdivision can be applied at (E). Because the output is a mesh of triangles, learnable surface subdivision can transform the output into a parametric surface with infinite resolution, allowing for end-to-end trainability of the model. In practice, new graphs can be generated on the extracted mesh, and graph convolutional networks can be used to predict each vertex v′. i The update location, and α i Perform cyclic subdivision. This operation can remove quantization errors and can be adjusted by changing α. i To mitigate the approximation error of classical cyclic subdivision, α i This is fixed in the classic method.
[0033] In some embodiments, given a differentiable surface representation from the model, a 3D discriminator can be applied to the final surface predicted using a 3D generator (e.g., after implicit function 104, the traveling tetrahedral algorithm, and / or surface subdivision to generate an explicit surface 106). The 3D discriminator can be used to sample local patches from high-curvature regions and the predicted mesh, and a loss—e.g., the adversarial loss described herein—can drive the prediction to reconstruct high-fidelity geometric details. For example, the 3D discriminator may include a 3D convolutional neural network (CNN) and can be used to compute the SDF from the predicted mesh to capture local details. High-curvature vertices v can be randomly selected from the target mesh, and the ground-real-world SDF can be computed in the voxelized regions around v. Similarly, the SDF of the predicted surface mesh M can be calculated at the same location to obtain... S pred This can correspond to the analysis function of grid M, therefore S pred The gradient can propagate back to the vertex position of M. real and Spred and the eigenvector F at position v vol (v,x) can be fed together into the discriminator 114. The discriminator 114 can then predict the probability indicating whether the input is from a real shape or a generated shape.
[0034] The model disclosed herein—e.g., a 3D depth conditional generation model—can be end-to-end trainable. In one or more embodiments, one or more modules can be supervised to minimize the error defined on the final prediction mesh M. One or more loss functions can be used, each including one or more different loss terms. For example, in a non-limiting embodiment, a loss function including three different terms can be used: a surface alignment loss to encourage alignment with the ground reality surface; an adversarial loss to improve the realism of the generated shape; and regularization to regularize the behavior of SDF and vertex deformation.
[0035] Surface alignment loss can include the loss from the ground truth mesh M gt Extract a set of points P from the surface gt You can also find it from M. pred Extract a set of points from the data to obtain P. pred And can be in P gt and P pred This minimizes the L2 chamfer distance and normal consistency loss. For example, the surface alignment loss can be calculated using the following formula (3):
[0036]
[0037] in, It is the point corresponding to p when calculating the chamfer distance. and Let p represent p, The direction of the normal to the point.
[0038] The adversarial loss can be calculated using the following formula (4):
[0039]
[0040] Regarding regularization, the loss functions in equations (3) and (4) operate on the extracted surface, so only vertices close to the isosurface in the tetrahedral grid can receive gradients, while other vertices cannot. The surface loss may also not provide information about the interior and / or exterior, since flipping the SDF sign of all vertices in the tetrahedron would result in the same surface being extracted by the walking tetrahedron algorithm. This can lead to broken components during training, so an SDF loss can be added to regularize the SDF value. In some embodiments, the SDF regularization loss can be calculated according to the following formula (5):
[0041] L=λ cd L cd +λ normal L normal +λ G L G +λ SDF L SDF +λ def L def (5) where λ cd , λ normal , λ G , λ SDF and λ def It's a hyperparameter.
[0042] Now for reference Figure 5 Each block of the method 500 described herein includes a computational process that can be executed using any combination of hardware, firmware, and / or software. For example, various functions can be executed by a processor that executes instructions stored in memory. Method 500 can also be embodied as computer-usable instructions stored on a computer storage medium. Method 500 can be provided by a standalone application, service, or managed service (standalone or in conjunction with another managed service), or a plug-in to another product, to name a few. Furthermore, as an example, method 500 is for… Figure 1 The process 100 is described herein. However, method 500 may be performed additionally or alternatively by any process or system, or any combination of processes and systems, including but not limited to those described herein.
[0043] Figure 5 This is a flowchart illustrating a method 500 for high-resolution shape synthesis according to some embodiments of the present disclosure. In block B502, method 500 includes calculating a symbolic distance field (SDF) at an initial raster resolution of a tetrahedral grid, based at least in part on an input representation of the object. For example, using input 102, the SDF can be calculated at the initial raster resolution of the tetrahedral grid.
[0044] In box B504, method 500 includes subdividing and deforming the tetrahedral grid to generate an updated tetrahedral grid at an updated resolution. For example, the tetrahedral grid can be selectively subdivided and deformed.
[0045] In box B506, method 500 includes calculating the updated SDF using the updated tetrahedral grid and the updated SDF. For example, based on subdivision and deformation, the updated SDF values of the vertices of the updated tetrahedral grid can be calculated.
[0046] In some embodiments, the operations of block B504 and / or block B506 may be performed multiple times—for example, until the target resolution is reached.
[0047] In box B508, method 500 includes performing a traveling tetrahedral algorithm on the updated tetrahedral grid to generate a triangular mesh. For example, the traveling tetrahedral algorithm can be performed on a deformable grid (e.g., after subdivision, deformation, and updating of the SDF) to extract isosurfaces (e.g., a triangular mesh).
[0048] In box B510, method 500 includes subdividing a triangular mesh to generate a final surface representation of the object. The surface subdivision can then be applied to the isosurface to generate a parametric (e.g., explicit) surface as output 108.
[0049] Example computing device
[0050] Figure 6 The block diagram is provided for an example computing device 600 suitable for implementing some embodiments of the present disclosure. The computing device 600 may include an interconnect system 602 directly or indirectly coupled to the following devices: memory 604, one or more central processing units (CPUs) 606, one or more graphics processing units (GPUs) 608, a communication interface 610, input / output (I / O) ports 612, input / output components 614, a power supply 616, one or more presentation components 618 (e.g., displays), and one or more logic units 620. In at least one embodiment, the computing device 600 may include one or more virtual machines (VMs), and / or any component thereof may include virtual components (e.g., virtual hardware components). For a non-limiting example, one or more GPUs 608 may include one or more vGPUs, one or more CPUs 606 may include one or more vCPUs, and / or one or more logic units 620 may include one or more virtual logic units. Therefore, computing device 600 may include discrete components (e.g., a complete GPU dedicated to computing device 600), virtual components (e.g., a portion of the GPU dedicated to computing device 600), or a combination thereof.
[0051] although Figure 6The various blocks are shown connected via an interconnect system 602 with wiring, but this is not intended to be limiting and is merely for clarity. For example, in some embodiments, a presentation component 618, such as a display device, may be considered an I / O component 614 (e.g., if the display is a touchscreen). As another example, CPU 606 and / or GPU 608 may include memory (e.g., memory 604 may represent a storage device other than the memory of GPU 608, CPU 606, and / or other components). In other words, Figure 6 The computing devices mentioned are merely illustrative. No distinction is made between categories such as "workstation," "server," "laptop," "desktop," "tablet," "client device," "mobile device," "handheld device," "game console," "electronic control unit (ECU)," "virtual reality system," and / or other device or system types, as all of these are considered within the same category. Figure 6 Within the scope of computing devices.
[0052] Interconnect system 602 may represent one or more links or buses, such as address buses, data buses, control buses, or combinations thereof. Interconnect system 602 may include one or more link or bus types, such as Industry Standard Architecture (ISA) bus, Extended Industry Standard Architecture (EISA) bus, Video Electronics Standards Association (VESA) bus, Peripheral Component Interconnect (PCI) bus, Peripheral Component Interconnect Fast (PCIe) bus, and / or another type of bus or link. In some embodiments, there is a direct connection between components. As an example, CPU 606 may be directly connected to memory 604. Furthermore, CPU 606 may be directly connected to GPU 608. In cases where there is a direct or point-to-point connection between components, interconnect system 602 may include a PCIe link to perform the connection. In these examples, a PCI bus is not required in computing device 600.
[0053] The memory 604 may include any medium of a wide variety of computer-readable media. A computer-readable medium can be any available medium that can be accessed by the computing device 600. Computer-readable media may include volatile and non-volatile media, as well as removable and non-removable media. For example and without limitation, computer-readable media may include computer storage media and communication media.
[0054] Computer storage media may include volatile and non-volatile media and / or removable and non-removable media, implemented in any way or by any method or technique for storing information such as computer-readable instructions, data structures, program modules, and / or other data types. For example, memory 604 may store computer-readable instructions (e.g., representing programs and / or program elements, such as an operating system). Computer storage media may include, but is not limited to, RAM, ROM, EEPROM, flash memory or other storage technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage devices, magnetic tape cassettes, magnetic tape, disk storage devices or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by computing device 600. As used herein, computer storage media does not include the signal itself.
[0055] Computer storage media may contain computer-readable instructions, data structures, program modules, and / or other data types in modulated data signals such as carrier waves or other transmission mechanisms, and include any information transport medium. The term "modulated data signal" can refer to a signal whose characteristics are set or altered in a manner that encodes information into that signal. For example and without limitation, computer storage media may include wired media such as wired networks or direct wired connections, and wireless media such as sound, RF, infrared, and other wireless media. Any combination of the above should also be included within the scope of computer-readable media.
[0056] CPU 606 may be configured to execute at least some of computer-readable instructions to control one or more components of computing device 600 to perform one or more of the methods and / or processes described herein. Each of CPU 606 may include one or more cores (e.g., one, two, four, eight, twenty-eight, seventy-two, etc.) capable of processing a large number of software threads simultaneously. CPU 606 may include any type of processor and may include different types of processors depending on the type of computing device 600 implemented (e.g., processors with fewer cores for mobile devices and processors with more cores for servers). For example, depending on the type of computing device 600, the processor may be an advanced RISC mechanism (ARM) processor implemented using Reduced Instruction Set Computing (RISC) or an x86 processor implemented using Complex Instruction Set Computing (CISC). In addition to one or more microprocessors or supplementary coprocessors such as math coprocessors, computing device 600 may also include one or more CPUs 606.
[0057] In addition to or replacing CPU 606, GPU 608 may also be configured to execute at least some computer-readable instructions to control one or more components of computing device 600 to perform one or more of the methods and / or processes described herein. One or more GPUs 608 may be integrated GPUs (e.g., having one or more CPUs 606) and / or one or more GPUs 608 may be discrete GPUs. In embodiments, one or more GPUs 608 may be coprocessors of one or more CPUs 606. Computing device 600 may use GPU 608 to render graphics (e.g., 3D graphics) or perform general-purpose computing. For example, GPU 608 may be used for general-purpose computing on a GPU (GPGPU). GPU 608 may include hundreds or thousands of cores capable of processing hundreds or thousands of software threads simultaneously. GPU 608 may generate pixel data for outputting an image in response to rendering commands (e.g., rendering commands received via a host interface from CPU 606). GPU 608 may include graphics memory, such as display memory, for storing pixel data or any other suitable data (e.g., GPGPU data). Display memory may be included as part of memory 604. GPU 608 may include two or more GPUs operating in parallel (e.g., via links). The links may connect the GPUs directly (e.g., using NVLINK) or via a switch (e.g., using NVSwitch). When combined, each GPU 608 may generate different portions of pixel data or GPGPU data for different outputs (e.g., the first GPU for a first image, the second GPU for a second image). Each GPU may include its own memory or may share memory with other GPUs.
[0058] In addition to or replacing CPU 606 and / or GPU 608, logic unit 620 may be configured to execute at least some computer-readable instructions to control one or more components of computing device 600 to perform one or more methods and / or processes described herein. In embodiments, CPU 606, GPU 608, and / or logic unit 620 may execute any combination of methods, processes, and / or portions thereof discretely or jointly. One or more logic units 620 may be part of and / or integrated into one or more CPUs 606 and / or one or more GPUs 608, and / or one or more logic units 620 may be discrete components of CPU 606 and / or GPU 608 or otherwise external thereto. In embodiments, one or more logic units 620 may be processors of one or more CPUs 606 and / or one or more GPUs 608.
[0059] Examples of logic unit 620 include one or more processing cores and / or components thereof, such as a data processing unit (DPU), a tensor core (TC), a tensor processing unit (TPU), a pixel vision core (PVC), a vision processing unit (VPU), a graphics processing cluster (GPC), a texture processing cluster (TPC), a streaming multiprocessor (SM), a tree traversal unit (TTU), an artificial intelligence accelerator (AIA), a deep learning accelerator (DLA), an arithmetic logic unit (ALU), an application-specific integrated circuit (ASIC), a floating-point unit (FPU), input / output (I / O) elements, peripheral component interconnect (PCI) or peripheral component interconnect fast (PCIe) elements, etc.
[0060] The communication interface 610 may include one or more receivers, transmitters, and / or transceivers that enable the computing device 600 to communicate with other computing devices via electronic communication networks, including wired and / or wireless communications. The communication interface 610 may include components and functions that enable communication via any of several different networks, such as wireless networks (e.g., Wi-Fi, Z-Wave, Bluetooth, Bluetooth LE, ZigBee, etc.), wired networks (e.g., communication via Ethernet or InfiniBand), low-power wide area networks (e.g., LoRaWAN, SigFox, etc.), and / or the Internet. In one or more embodiments, the logic unit 620 and / or the communication interface 610 may include one or more data processing units (DPUs) for directly sending data received from the network and / or via the interconnect system 602 to (e.g., its memory) one or more GPUs 608.
[0061] I / O port 612 enables computing device 600 to be logically coupled to other devices, including I / O component 614, presentation component 618, and / or other components, some of which may be built into (e.g., integrated into) computing device 600. Illustrative I / O component 614 includes microphones, mice, keyboards, joysticks, game pads, game controllers, satellite dish antennas, browsers, printers, wireless devices, and so on. I / O component 614 can provide a Natural User Interface (NUI) for processing user-generated air gestures, voice, or other physiological input. In some instances, the input may be transmitted to appropriate network elements for further processing. The NUI can implement any combination of voice recognition, stylus recognition, facial recognition, biometric recognition, on-screen and adjacent-screen gesture recognition, air gestures, head and eye tracking, and touch recognition associated with the display of computing device 600 (described in more detail below). Computing device 600 may include depth cameras such as stereo camera systems, infrared camera systems, RGB camera systems, touchscreen technology, and combinations thereof for gesture detection and recognition. In addition, computing device 600 may include an accelerometer or gyroscope that enables motion detection (e.g., as part of an inertial measurement unit (IMU)). In some examples, the output of the accelerometer or gyroscope may be used by computing device 600 to render immersive augmented reality or virtual reality.
[0062] Power supply 616 may include a hard-wired power supply, a battery power supply, or a combination thereof. Power supply 616 may supply power to computing device 600 so that components of computing device 600 can operate.
[0063] The presentation component 618 may include a display (such as a monitor, touch screen, television screen, head-up display (HUD), other display types, or combinations thereof), speakers, and / or other presentation components. The presentation component 618 may receive data from other components (such as GPU 608, CPU 606, DPU, etc.) and output that data (e.g., as an image, video, sound, etc.).
[0064] Example Data Center
[0065] Figure 7 An example data center 700 is shown, which can be used in at least one embodiment of this disclosure. The data center 700 may include a data center infrastructure layer 710, a framework layer 720, a software layer 730, and an application layer 740.
[0066] like Figure 7As shown, the data center infrastructure layer 710 may include a resource coordinator 712, grouped computing resources 714, and node computing resources (“nodes CR”) 716(1)-716(N), where “N” represents any complete positive integer. In at least one embodiment, nodes CR 716(1)-716(N) may include, but are not limited to, any number of central processing units (“CPUs”) or other processors (including DPUs, accelerators, field-programmable gate arrays (FPGAs), graphics processing units or graphics processing units (GPUs), etc.), memory devices (e.g., dynamic read-only memory), storage devices (e.g., solid-state drives or disk drives), network input / output (“NW I / O”) devices, network switches, virtual machines (“VMs”), power modules and cooling modules, etc. In some embodiments, one or more nodes CR 716(1)-716(N) may correspond to servers having one or more of the aforementioned computing resources. In addition, in some embodiments, nodes CR716(1)-716(N) may include one or more virtual components, such as vGPU, vCPU, etc., and / or one or more of nodes CR716(1)-716(N) may correspond to virtual machines (VMs).
[0067] In at least one embodiment, the grouped computing resources 714 may include individual groups (not shown) of nodes CR716 housed in one or more racks, or a plurality of racks (also not shown) housed in data centers in various geographic locations. Individual groups of nodes CR716 within the grouped computing resources 714 may include computing, networking, memory, or storage resources that can be configured or allocated to support groups of one or more workloads. In at least one embodiment, several nodes CR716, including CPUs, GPUs, DPUs, and / or other processors, may be grouped within one or more racks to provide computing resources to support one or more workloads. One or more racks may also include any number of power modules, cooling modules, and / or network switches in any combination.
[0068] Resource coordinator 712 may be configured or otherwise controlled to control one or more nodes CR716(1)-716(N) and / or grouped computing resources 714. In at least one embodiment, resource coordinator 712 may include a Software Design Infrastructure (“SDI”) management entity for data center 700. Resource coordinator 712 may include hardware, software, or some combination thereof.
[0069] In at least one embodiment, such as Figure 7As shown, framework layer 720 may include a job scheduler 732, a configuration manager 734, a resource manager 736, and a distributed file system 738. Framework layer 720 may include a framework of software 732 supporting software layer 730 and / or one or more applications 742 of application layer 740. Software 732 or application 742 may respectively include web-based service software or applications, such as service software or applications provided by Amazon Web Services, Google Cloud, and Microsoft Azure. Framework layer 720 may be, but is not limited to, a free and open-source software web application framework, such as Apache Spark, which can utilize distributed file system 738 for large-scale data processing (e.g., "big data"). TM (Hereinafter referred to as "Spark"). In at least one embodiment, the job scheduler 732 may include a Spark driver for facilitating the scheduling of workloads supported by various layers of data center 700. In at least one embodiment, the configuration manager 734 may be able to configure different layers, such as software layer 730 and framework layer 720 including Spark and a distributed file system 738 for supporting large-scale data processing. The resource manager 736 is able to manage cluster or grouped computing resources mapped to or allocated for supporting distributed file system 738 and job scheduler 732. In at least one embodiment, cluster or grouped computing resources may include grouped computing resources 714 at data center infrastructure layer 710. The resource manager 736 may coordinate with resource coordinator 712 to manage these mapped or allocated computing resources.
[0070] In at least one embodiment, the software 732 included in the software layer 730 may include software used by at least a portion of the nodes CR716(1)-716(N), the grouped computing resources 714, and / or the distributed file system 738 of the framework layer 720. One or more types of software may include, but are not limited to, Internet web page search software, email virus browsing software, database software, and streaming video content software.
[0071] In at least one embodiment, the application layer 740 may include one or more applications 742 that can be used by at least a portion of the nodes CR716(1)-716(N), the grouped computing resources 714, and / or the distributed file system 738 of the framework layer 720. The one or more types of applications may include, but are not limited to, any number of genomics applications, cognitive computing and machine learning applications, including training or inference software, machine learning framework software (e.g., PyTorch, TensorFlow, Caffe, etc.), and / or other machine learning applications used in conjunction with one or more embodiments.
[0072] In at least one embodiment, any of the configuration manager 734, resource manager 736, and resource coordinator 712 can perform any number and type of self-modification actions based on any amount and type of data acquired in any technically feasible manner. Self-modification actions can alleviate the risk of data center operators of data center 700 making potentially poor configuration decisions and can prevent underutilization and / or skewed portions of the data center.
[0073] Data center 700 may include tools, services, software, or other resources for training one or more machine learning models or using one or more machine learning models to predict or infer information according to one or more embodiments described herein. For example, a machine learning model can be trained by calculating weight parameters based on a neural network architecture using the software and computing resources described above with respect to data center 700. In at least one embodiment, by using weight parameters calculated through one or more training techniques, information can be inferred or predicted using trained machine learning models corresponding to one or more neural networks, such as, but not limited to, those described herein, using the resources described above with respect to data center 700.
[0074] In at least one embodiment, the data center 700 may use a CPU, application-specific integrated circuit (ASIC), GPU, FPGA, and / or other hardware (or corresponding virtual computing resources) to perform training and / or inference. Furthermore, one or more of the aforementioned software and / or hardware resources may be configured as a service to allow a user to train or perform information inference, such as image recognition, speech recognition, or other artificial intelligence services.
[0075] Example network environment
[0076] A network environment suitable for implementing embodiments of this disclosure may include one or more client devices, servers, network-attached storage (NAS), other backend devices, and / or other device types. Client devices, servers, and / or other device types (e.g., each device) may... Figure 6 The implementation is carried out on one or more instances of computing device 600—for example, each device may include similar components, features, and / or functions of computing device 600. Furthermore, in the case of implementing back-end devices (e.g., servers, NAS, etc.), the back-end devices may be included as part of data center 700, examples of which are described herein. Figure 7 To describe in more detail.
[0077] Components of a network environment can communicate with each other via a network, which can be wired, wireless, or both. A network can include multiple networks, or networks within multiple networks. For example, a network can include one or more wide area networks (WANs), one or more local area networks (LANs), one or more public networks (such as the Internet and / or the Public Switched Telephone Network (PSTN)), and / or one or more private networks. In cases where the network includes a wireless telecommunications network, components such as base stations, communication towers, or even access points (and other components) can provide wireless connectivity.
[0078] A compatible network environment may include one or more peer-to-peer network environments (in which case the server may not be included in the network environment) and one or more client-server network environments (in which case one or more servers may be included in the network environment). In a peer-to-peer network environment, the server functionality described herein can be implemented on any number of client devices.
[0079] In at least one embodiment, the network environment may include one or more cloud-based network environments, distributed computing environments, combinations thereof, etc. A cloud-based network environment may include a framework layer, a job scheduler, a resource manager, and a distributed file system implemented on one or more servers, which may include one or more core network servers and / or edge servers. The framework layer may include a framework for supporting software at the software layer and / or one or more applications at the application layer. The software or applications may respectively include network-based service software or applications. In embodiments, one or more client devices may use the network-based service software or applications (e.g., by accessing the service software and / or applications via one or more application programming interfaces (APIs)). The framework layer may be, but is not limited to, a type of free and open-source software network application framework, such as one that can use a distributed file system for large-scale data processing (e.g., "big data").
[0080] A cloud-based network environment can provide cloud computing and / or cloud storage for any combination of the computing and / or data storage functions (or one or more portions thereof) described herein. Any of these various functions can be distributed across multiple locations from a central or core server (e.g., distributed across one or more data centers at the state, region, country, global, etc.). If the connection to a user (e.g., a client device) is relatively close to an edge server, the core server can assign at least a portion of the functionality to the edge server. A cloud-based network environment can be private (e.g., limited to a single organization), public (e.g., available to many organizations), and / or a combination thereof (e.g., a hybrid cloud environment).
[0081] Client devices may include those described in this article. Figure 6 The example computing device 600 described includes at least some components, features, and functions. By way of example and not limitation, the client device may be a personal computer (PC), laptop computer, mobile device, smartphone, tablet computer, smartwatch, wearable computer, personal digital assistant (PDA), MP3 player, virtual reality headset, global positioning system (GPS) or device, video player, camera, surveillance equipment or system, vehicle, ship, aircraft, virtual machine, drone, robot, handheld communication device, hospital equipment, gaming equipment or system, entertainment system, in-vehicle computer system, embedded system controller, remote control, electrical appliance, consumer electronics device, workstation, edge device, any combination of these described devices, or any other suitable device.
[0082] This disclosure can be described in the general context of machine-usable instructions or computer code, including computer-executable instructions such as program modules, which are executed by a computer or other machine such as a personal digital assistant or other handheld device. Typically, a program module, including routines, programs, objects, components, data structures, etc., refers to code that performs a specific task or implements a specific abstract data type. This disclosure can be practiced in a wide variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, more specialized computing devices, etc. This disclosure can also be practiced in distributed computing environments where tasks are performed by remote processing devices linked via a communication network.
[0083] As used herein, the phrase "and / or" relating to two or more elements should be interpreted as referring to only one element or a combination of elements. For example, "element A, element B, and / or element C" could include only element A, only element B, only element C, element A and element B, element A and element C, element B and element C, or elements A, B, and C. Furthermore, "at least one of element A or element B" could include at least one of element A, at least one of element B, or at least one of element A and at least one of element B. Further, "at least one of element A and element B" could include at least one of element A, at least one of element B, or at least one of element A and at least one of element B.
[0084] The subject matter of this disclosure is described in detail herein to satisfy legal requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors have envisioned that the claimed subject matter may also be embodied in other ways to include steps different from or similar combinations of steps described herein in conjunction with other current or future techniques. Moreover, although the terms “step” and / or “block” may be used herein to imply different elements of the method employed, these terms should not be construed as implying any particular order among or between the various steps disclosed herein, unless the order of the steps is explicitly described.
Claims
1. A processor, comprising: One or more circuits are used for: The symbolic distance field SDF is computed at least in part based on the input representation of the object at the initial raster resolution of the raster; The raster is subdivided and deformed to generate an updated raster at a newer resolution; The updated SDF is calculated using the SDF and the updated raster; Generate a triangular mesh using the updated SDF; as well as The triangular mesh is subdivided to generate a parametric surface representation of the object.
2. The processor of claim 1, wherein the subdivision of the triangular mesh is performed using learned surface subdivision.
3. The processor of claim 1, wherein the input representation of the object includes at least one of voxel representation, point cloud, or 3D scanning.
4. The processor of claim 1, wherein the updated SDF is interpolated from the SDF using one or more updated vertex positions of the updated grid.
5. The processor of claim 1, wherein the SDF computation is performed at least in part by: using a convolutional neural network to compute one or more first feature vectors; and using a neural network and at least in part based on the one or more first feature vectors to compute one or more SDF values and one or more second feature vectors for one or more vertices of the raster.
6. The processor of claim 1, wherein the subdivision and deformation of the grid are performed at least in part by: identifying one or more surface volumes of the grid corresponding to the surface of the object; generating a graph corresponding to one or more vertices and one or more edges of the one or more surface volumes; and using a graph convolutional network and at least in part based on the graph to compute one or more positional offsets and one or more residual SDF values of the one or more vertices.
7. The processor of claim 1, wherein the subdivision of the grid comprises selective subdivision, wherein the selective subdivision comprises subdividing at least one of: one or more first surface volumes of the grid intersecting with the surface of the object; or one or more second surface volumes adjacent to the one or more first surface volumes.
8. The processor of claim 1, wherein one or more circuits are used to generate the parameterized surface representation using a generative adversarial network (GAN).
9. The processor of claim 1, wherein the processor comprises at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing optical transmission simulation; a system for performing collaborative content creation of 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational artificial intelligence operations; a system for generating synthetic data; a system for merging one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
10. A system comprising: One or more processing units, including processing circuitry, for: calculating the symbolic distance field SDF corresponding to an initial raster, based at least in part on the input representation of the object; The initial raster is subdivided and deformed to generate an updated raster; an updated SDF is calculated using the SDF and the updated raster; The updated SDF is used to generate an explicit surface representation; and the explicit surface representation is subdivided to generate a parameterized surface representation of the object.
11. The system of claim 10, wherein the subdivision of the explicit surface representation is performed using learned surface subdivision.
12. The system of claim 10, wherein the input representation of the object comprises at least one of voxel representation, point cloud, or 3D scanning.
13. The system of claim 10, wherein the updated SDF is interpolated from the SDF using one or more updated vertex positions of the updated raster.
14. The system of claim 10, wherein the calculation of the SDF is performed at least in part by: using a convolutional neural network to calculate one or more first feature vectors; and using a neural network and at least in part based on the one or more first feature vectors to calculate one or more SDF values and one or more second feature vectors for one or more vertices of the raster.
15. The system of claim 10, wherein the subdivision and deformation of the initial grid are performed at least in part by: identifying one or more surface volumes of the initial grid corresponding to the surface of the object; generating a graph corresponding to one or more vertices and one or more edges of the one or more surface volumes; and using a graph convolutional network and at least in part based on the graph to compute one or more positional offsets and one or more residual SDF values of the one or more vertices.
16. The system of claim 10, wherein the subdivision of the grid includes selective subdivision, wherein the selective subdivision includes subdividing at least one of: one or more first surface volumes of the grid intersecting with the surface of the object; or one or more second surface volumes adjacent to the one or more first surface volumes.
17. The system of claim 10, wherein one or more circuits use a generative adversarial network (GAN) to generate the parameterized surface representation.
18. The system of claim 10, wherein the system comprises at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing optical transmission simulation; a system for performing collaborative content creation of 3D assets; a system for performing deep learning operations; a system implemented using edge devices; a system implemented using robots; a system for performing conversational artificial intelligence operations; a system for generating synthetic data; a system for merging one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
19. A processor, comprising: Processing circuitry, used for: Based on the shape-based input representation, calculate the symbolic distance field SDF corresponding to the initial grid. Subdivide the initial grid to generate an updated grid; The updated SDF is calculated using the SDF and the updated raster; Generate a mesh using the updated SDF; as well as The mesh is subdivided to generate a parametric surface representation of the shape.
20. The processor of claim 19, wherein the processor comprises at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing optical transmission simulation; a system for performing collaborative content creation of 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational artificial intelligence operations; a system for generating synthetic data; a system for merging one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
21. The processor of claim 19, wherein the calculation of the SDF is performed at least in part by: Use a convolutional neural network to compute one or more first feature vectors; and Using a neural network and based at least on the one or more first feature vectors, calculate one or more SDF values and one or more second feature vectors for one or more vertices of the raster.