Three-dimensional content generation method and system based on key point constraint

By introducing constraint point encoding and linear solution in 3D shape generation, the problem that 3D models cannot strictly pass through constraint points in existing technologies is solved, achieving low-error geometric control and efficient computation.

CN122199831APending Publication Date: 2026-06-12PEKING UNIV SHENZHEN GRADUATE SCHOOL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PEKING UNIV SHENZHEN GRADUATE SCHOOL
Filing Date
2026-05-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing 3D shape generation technologies cannot mathematically guarantee that the surface of the generated 3D model passes through the geometric constraint points specified by the user, resulting in constraint residuals in the generation results, which makes it difficult to meet the low-error geometric requirements of industrial applications.

Method used

By acquiring the set of constraint points and target constraint values ​​of the 3D object to be generated, they are encoded as low-dimensional structure voxel latent variables. Conditional denoising is then performed in the latent variable space. A basic implicit field is generated using a diffusion model. A system of linear equations is constructed by combining a constraint basis function network and a local kernel function. The combination coefficients are calculated using a differentiable linear solver, and linear fusion is performed to generate the final implicit field. Finally, isosurface extraction is performed.

🎯Benefits of technology

It combines probabilistic 3D semantic generation with deterministic algebraic equality constraints, ensuring that the generated surface strictly passes through the user-specified constraint points, reducing residuals and improving computational efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199831A_ABST
    Figure CN122199831A_ABST
Patent Text Reader

Abstract

The application discloses a three-dimensional content generation method and system based on key point constraints, relates to the technical field of image data processing, and comprises the following steps: acquiring a constraint point set and corresponding target constraint values of a three-dimensional object to be generated; encoding the three-dimensional object to be generated into a low-dimensional structure voxel latent variable, and performing conditional denoising generation through a pre-trained diffusion model to obtain a target structure voxel latent variable; inputting the target structure voxel latent variable and a query point position in a latent variable space into a basic decoder to obtain a basic implicit field; generating a local base function value corresponding to the local base function value through a constraint base function network, and performing spatial sparsification processing on the local base function value by using a local kernel function; based on the local base function value and the target constraint value, a linear equation group about a base function combination coefficient is constructed, the combination coefficient is calculated through a differentiable linear solver, and linear fusion is performed to obtain a final implicit field; and an isosurface is extracted based on the final implicit field to generate a three-dimensional model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image data processing technology, and in particular to a method and system for generating three-dimensional content based on key point constraints. Background Technology

[0002] Existing 3D shape generation techniques mostly employ an architecture combining diffusion models and neural implicit fields. By introducing multimodal conditional control such as text, images, sketches, or sparse anchor points, they can generate semantically reasonable and diverse 3D shapes. To guide the model to meet specific geometric positional requirements, existing techniques typically add constraint error terms to the training loss function or introduce soft penalty terms during sampling, using user-specified constraint points or constraint curves as reference signals to approximate the desired geometric relationship.

[0003] However, since the output of conventional neural networks is essentially based on the probabilistic results of nonlinear network parameters and lacks an algebraic structure for directly solving equality constraints, the existing technologies that rely on soft constraints can only make the generated results "approach" the constraint conditions at the probabilistic level. They cannot mathematically guarantee that the generated 3D surface strictly passes through the geometric constraint points or constraint curves specified by the user. This results in constraint residuals in the generated 3D model, which is difficult to meet the low-error geometric requirements for precise control of key points in industrial applications.

[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0005] The main purpose of this application is to provide a method and system for generating 3D content based on key point constraints, which aims to solve the technical problem that existing soft constraint approximation cannot meet the low-error requirement of generating surfaces that strictly pass through constraint points in industrial manufacturing.

[0006] To achieve the above objectives, this application proposes a method for generating 3D content based on key point constraints, the method comprising: Obtain the set of constraint points and corresponding target constraint values ​​of the 3D object to be generated; The three-dimensional object to be generated is encoded as a low-dimensional structure voxel latent variable, and the low-dimensional structure voxel latent variable is conditionally denoised in the latent variable space by a pre-trained diffusion model to obtain the target structure voxel latent variable. The latent variables of the target structure voxel and the query point positions in the latent variable space are input into the basic decoder to obtain the basic implicit field corresponding to the query point; Based on the location information of the constraint point set and the location of the query point, local basis function values ​​corresponding to each constraint point are generated through the constraint basis function network, and the local basis function values ​​are spatially sparsified using a local kernel function. Based on the local basis function values ​​after sparsification and the target constraint values, a system of linear equations about the combination coefficients of the basis functions is constructed, and the combination coefficients are calculated by a differentiable linear solver. The fundamental implicit field, the local basis function values, and the solved combination coefficients are linearly fused to obtain the final implicit field. Isosurfaces are extracted based on the final implicit field to obtain the three-dimensional model and associated three-dimensional content.

[0007] In one embodiment, the step of encoding the three-dimensional object to be generated as a low-dimensional structure voxel latent variable includes: The input 3D shape is normalized, and the normalized 3D shape is mapped to a unified coordinate system. Sampling of spatial query points in the unified coordinate system, and obtaining the true symbol distance value corresponding to the spatial query points; The normalized 3D shape is input into the encoder of the surface autoencoder, and the encoder extracts the 3D geometric features and outputs low-resolution structure voxel latent variables.

[0008] In one embodiment, the step of conditionally denoising the low-dimensional structure voxel latent variables in the latent variable space using a pre-trained diffusion model to obtain the target structure voxel latent variables includes: The set of constraint points is converted into a local Gaussian heatmap in three-dimensional space; The local Gaussian heatmap is input into the constraint encoder to extract the constraint condition features; The constraint features are injected into the intermediate layer of the denoising network of the diffusion model through an attention mechanism or a control network branch to obtain the latent variables of the target structure voxels.

[0009] In one embodiment, the step of inputting the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point includes: The query point location is encoded to obtain the query point location feature; Extract the spatial geometric features corresponding to the query point from the latent variables of the target structure element, and fuse the spatial geometric features with the query point location features; The fused features are input into the network layer of the base decoder for forward propagation, outputting the basic implicit field corresponding to the query point, and extracting the intermediate layer features of the base decoder.

[0010] In one embodiment, the step of generating local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the location information of the constraint point set and the location of the query point includes: The location information of the constraint point set is encoded to generate a constraint token; The intermediate layer features of the basic decoder are fused with the query point location to obtain the query point features; The query point features are interacted with the constraint token through a cross-attention mechanism, and the local basis function values ​​are output through a multilayer perceptron network.

[0011] In one embodiment, the step of spatially sparsifying the local basis function values ​​using a local kernel function includes: Calculate the spatial distance between the query point and the constraint point, with each constraint point as the center; When the spatial distance is greater than the preset radius of action, the local basis function value of the corresponding constraint point is set to zero; When the spatial distance is less than or equal to the radius of action, the local basis function values ​​are attenuated and weighted using a Gaussian kernel function or a radial basis kernel function.

[0012] In one embodiment, the step of calculating the combination coefficients using a differentiable linear solver includes: Substitute each constraint point into the implicit field expression containing the local basis function values ​​to construct a linear equation system matrix and a right-hand term vector with the combination coefficients as unknowns. Add a Tikhonov regularization term to the matrix of the linear equation system; The damped least squares method is used to solve the linear equation system with regularization terms added, and the combination coefficients are obtained.

[0013] In one embodiment, the step of linearly fusing the fundamental implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field includes: The local basis function values ​​are weighted and summed using the combined coefficients to output the constraint offset field; The basic implicit field and the constraint offset field are added and fused point by point to output the final implicit field that satisfies the target constraint value at each constraint point.

[0014] In one embodiment, the step of extracting isosurfaces from the final implicit field to obtain the 3D model and associated 3D content includes: Obtain the zero-level set target value of the final implicit field, where the zero-level set target value is zero; The dense query points sampled in the three-dimensional space are input into the final implicit field calculation framework after superposition to obtain the final signed distance function value corresponding to each dense query point, forming a three-dimensional scalar field. The three-dimensional scalar field and the target value of the zero level set are input into the traveling cube algorithm, and the positions of the isosurface vertices with a value of zero in the three-dimensional scalar field and the topological connectivity are determined by interpolation. Based on the vertex positions of the isosurfaces and the topological connectivity, a 3D model consisting of vertex coordinates and triangle patch indices, along with associated 3D content, is generated.

[0015] Furthermore, to achieve the above objectives, this application also proposes a 3D content generation system based on keypoint constraints, the 3D content generation system based on keypoint constraints comprising: The data preprocessing module is used to obtain the set of constraint points and the corresponding target constraint values ​​of the 3D object to be generated; The diffusion generation module is used to encode the three-dimensional object to be generated into low-dimensional structure voxel latent variables, and to perform conditional denoising on the low-dimensional structure voxel latent variables in the latent variable space through a pre-trained diffusion model to obtain the target structure voxel latent variables. The basic decoding module is used to input the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point; The constraint decoding module is used to generate local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the location information of the constraint point set and the location of the query point, and to perform spatial sparsification processing on the local basis function values ​​using a local kernel function. The linear calculation module is used to construct a system of linear equations about the combination coefficients of the basis functions based on the local basis function values ​​after sparsification and the target constraint values, and to calculate the combination coefficients by a differentiable linear solver. The linear fusion module is used to linearly fuse the basic implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field. The model generation module is used to extract isosurfaces based on the final implicit field to obtain a three-dimensional model and associated three-dimensional content.

[0016] This application proposes a method and system for generating 3D content based on keypoint constraints. The method includes: acquiring a set of constraint points of a 3D object to be generated and the corresponding target constraint values; encoding the 3D object to be generated as low-dimensional structure voxel latent variables, and performing conditional denoising on the low-dimensional structure voxel latent variables in the latent variable space using a pre-trained diffusion model to obtain target structure voxel latent variables; inputting the target structure voxel latent variables and the query point positions in the latent variable space into a basic decoder to obtain the basic implicit field corresponding to the query point; and determining the basic implicit field based on the position information of the constraint point set and the query point. At the point location, local basis function values ​​corresponding to each constraint point are generated through a constraint basis function network, and spatial sparsification is performed on the local basis function values ​​using a local kernel function. Based on the sparsified local basis function values ​​and the target constraint value, a system of linear equations about the combination coefficients of the basis functions is constructed, and the combination coefficients are calculated using a differentiable linear solver. The basic implicit field, the local basis function values, and the solved combination coefficients are linearly fused to obtain the final implicit field. Based on the final implicit field, isosurface extraction is performed to obtain the 3D model and associated 3D content. This application achieves an effective combination of probabilistic 3D semantic generation and deterministic algebraic equation constraints. It can ensure that the generated surface strictly passes through the user-specified constraint points while maintaining the overall semantic rationality and diversity of the generated shape, achieving local precise geometric control with low residuals. At the same time, by transforming nonlinear constraints into a small-scale linear system solution, the computational efficiency of hard-constrained 3D models is significantly improved. Attached Figure Description

[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0018] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating an embodiment of the 3D content generation method based on key point constraints provided in this application. Figure 2a and Figure 2b These are schematic diagrams of the structure provided by the 3D content generation method based on key point constraints in this application; Figure 3 A schematic diagram of the hard-constrained surface basis function layer provided for the key-point-constrained 3D content generation method of this application; Figure 4This is a flowchart illustrating Embodiment 2 of the 3D content generation method based on key point constraints provided in this application; Figure 5 This is a flowchart illustrating Embodiment 3 of the 3D content generation method based on key point constraints provided in this application; Figure 6 This is a schematic diagram of the module structure of the 3D content generation system based on key point constraints according to an embodiment of this application.

[0020] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0021] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0022] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0023] Because existing soft constraint approximation technologies cannot meet the low-error requirements of generating surfaces that strictly pass through constraint points in industrial manufacturing.

[0024] This application provides a solution to obtain a set of constraint points and corresponding target constraint values ​​for a 3D object to be generated; encode the 3D object to be generated as low-dimensional structure voxel latent variables, and conditionally denoise the low-dimensional structure voxel latent variables in the latent variable space using a pre-trained diffusion model to obtain target structure voxel latent variables; input the target structure voxel latent variables and the query point positions in the latent variable space into a basic decoder to obtain the basic implicit field corresponding to the query point; generate local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the position information of the constraint point set and the query point positions, and perform spatial sparsification processing on the local basis function values ​​using a local kernel function; construct a linear system of equations about the combination coefficients of basis functions based on the sparsified local basis function values ​​and the target constraint values, and calculate the combination coefficients using a differentiable linear solver; linearly fuse the basic implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field; and extract isosurfaces based on the final implicit field to obtain a 3D model and associated 3D content. This application achieves an effective combination of probabilistic 3D semantic generation and deterministic algebraic equality constraints. It can ensure that the generated surface strictly passes through the user-specified constraint points while maintaining the overall semantic rationality and diversity of the generated shape, thus achieving localized precise geometric control with low residuals. At the same time, by transforming nonlinear constraints into small-scale linear system solutions, the computational efficiency of hard-constrained 3D models is significantly improved.

[0025] Based on this, embodiments of this application provide a method for generating 3D content based on key point constraints, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the 3D content generation method based on key point constraints of this application.

[0026] In this embodiment, the 3D content generation method based on key point constraints includes steps S10 to S70: Step S10: Obtain the set of constraint points and the corresponding target constraint values ​​of the 3D object to be generated.

[0027] It should be noted that, in this embodiment, the three-dimensional object to be generated refers to a virtual three-dimensional model entity that has not yet been fully constructed but whose basic category and some geometric features have been determined; the constraint point set refers to the set of discrete spatial coordinate points that the user has specified in advance and that the generated three-dimensional surface must strictly pass through; the target constraint value refers to the mathematical function value that the generated neural implicit field must accurately reach at each coordinate point in the above constraint point set, for example, the signed distance value of the surface point should be zero.

[0028] This embodiment aims to establish the boundary conditions and benchmark data of the entire hard constraint solution mechanism. By obtaining the geometric constraints input by the user, the subsequent generation process can perform mathematical solutions around these geometric constraints, thereby avoiding the generated shape from deviating from the user's precise design intention and achieving decoupling between macroscopic semantic generation and microscopic geometric accuracy.

[0029] In one possible implementation, the set of constraint points and the corresponding target constraint values ​​of the three-dimensional object to be generated can be obtained by receiving the three-dimensional spatial coordinate points and corresponding attribute parameters set by the user manually clicking on the graphical user interface; in another possible implementation, the center points of the mounting holes and the intersection points of the contours can be automatically extracted as the set of constraint points by parsing the imported two-dimensional engineering drawings or three-dimensional CAD assembly files, and the surface constraint conditions can be automatically converted into the corresponding target constraint values.

[0030] Step S20: Encode the three-dimensional object to be generated as a low-dimensional structure voxel latent variable, and perform conditional denoising on the low-dimensional structure voxel latent variable in the latent variable space using a pre-trained diffusion model to obtain the target structure voxel latent variable.

[0031] It should be noted that, in this embodiment, low-dimensional structure voxel latent variables refer to the low-resolution 3D mesh feature matrix that retains the 3D geometric topological structure features after high-resolution 3D spatial data is compressed and reduced in dimensionality; latent variable space refers to the low-dimensional mathematical feature manifold space in which low-dimensional structure voxel latent variables exist; pre-trained diffusion model refers to a generative neural network model that has learned the prior distribution law of shape using a large amount of 3D shape data in advance and can restore reasonable shape features through stepwise denoising; conditional denoising generation refers to the process of generating latent variable features that meet specific requirements by being guided and controlled by externally injected conditional signals during the noise removal process of the diffusion model; target structure voxel latent variables refer to the final low-dimensional feature representation that meets both the overall semantic prior of the 3D object and the external conditional layout requirements after conditional denoising generation.

[0032] This embodiment aims to reduce computational overhead by leveraging the powerful semantic prior learning capability of the diffusion model to generate a coarse-grained feature base with a reasonable three-dimensional shape structure on a macroscopic level. This process transfers the nonlinear three-dimensional generation task to a low-dimensional latent variable space, greatly improving generation efficiency and providing a good initial morphological foundation for subsequent precise geometric fine-tuning.

[0033] In one possible implementation, encoding the 3D object to be generated as a low-dimensional structure voxel latent variable can be achieved through a convolutional autoencoder network based on three-plane feature representation; in another possible implementation, a feature extraction network based on hash grid encoding can also be used to map the 3D shape into a low-dimensional structure voxel latent variable.

[0034] Furthermore, it should be noted that the conditional denoising generation process performed in the latent variable space can effectively avoid the cubic storage and computational overhead problems caused by direct generation in high-resolution 3D space. For example, in one specific implementation, the system first uses a surface autoencoder to encode the 3D object to be generated into low-resolution, low-dimensional structure voxel latent variables. Then, the constraint point set is converted into a 3D Gaussian heatmap as a conditional signal input to a pre-trained diffusion model. The diffusion model starts from random noise in the latent variable space and gradually performs conditional denoising generation under the guidance of the Gaussian heatmap conditions, finally outputting a target structure voxel latent variable with an overall layout conforming to the gear shape and accurate geometric features reserved in the key point regions.

[0035] Step S30: Input the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point.

[0036] It should be noted that, in this embodiment, the query point location refers to the discrete three-dimensional spatial coordinate point randomly sampled or densely sampled according to specific rules in three-dimensional space, used to detect and reconstruct the three-dimensional shape surface; the basic decoder refers to the neural network module that constitutes the other half of the surface autoencoder, which is responsible for restoring and mapping low-dimensional features back to the geometric field representation in high-dimensional continuous space; the basic implicit field refers to the initial neural implicit field output by the basic decoder based on latent variables and position information before the introduction of hard constraint correction, which represents the continuous distance or occupancy probability from the query point to the estimated surface.

[0037] This embodiment aims to decode low-dimensional macroscopic features into a continuous geometric field in a high-dimensional space, establishing the basic surface morphology of a three-dimensional object without precise constraint intervention.

[0038] In one possible implementation, the latent variables of the target structure voxel and the query point positions in the latent variable space can be input into the basic decoder using a multilayer perceptron network combined with positional encoding techniques to extract features and output the basic implicit field. In another possible implementation, a sparse convolutional network can be used to directly interpolate and decode the neighborhood features of the query point on a 3D grid to obtain the basic implicit field.

[0039] Additionally, it should be noted that while outputting the basic implicit field, the basic decoder also extracts and outputs the intermediate layer features within its network. These intermediate layer features contain rich local geometric structure information and will be directly used for calculating local basis function values. For example, in one specific implementation, the system uniformly samples 10,000 query point positions in the latent variable space. After sinusoidally encoding these query point positions, they are input together with the latent variables of the target structure voxels into the basic decoder composed of a multilayer perceptron. The basic decoder outputs a continuous basic implicit field value for each query point through multilayer nonlinear transformations. This value initially depicts the approximate outline and volume distribution of the gear, but at this point, the surface has not yet strictly passed through the previously set center point of the mounting hole.

[0040] Step S40: Based on the location information of the constraint point set and the location of the query point, generate local basis function values ​​corresponding to each constraint point through the constraint basis function network, and perform spatial sparsification processing on the local basis function values ​​using a local kernel function.

[0041] It should be noted that, in this embodiment, the constraint basis function network refers to a specially designed neural network structure used to calculate a set of basis function values ​​for subsequent linear correction based on the spatial relative relationship between constraint points and query points; the local basis function value refers to the value output by the constraint basis function network that represents the strength of the influence of each constraint point on the surrounding local spatial geometry; the local kernel function refers to a mathematical function with local support characteristics in space, whose function value decays rapidly to zero as the distance increases; spatial sparsity processing refers to the data processing process that uses the characteristics of the local kernel function to force the local basis function values ​​to zero in regions far from the constraint points, thereby eliminating global mutual interference between different constraint points.

[0042] This embodiment aims to construct a mathematically decoupled local geometric correction tool. By introducing local basis functions and performing spatial sparsification, the system can ensure that precise adjustment of a specific constraint point will not cause deformation in other irrelevant regions of the model, which greatly improves the numerical stability of the linear solution system.

[0043] In one possible implementation, the local basis function values ​​corresponding to each constraint point are generated by the constraint basis function network. The query point features and constraint token features can interact through a cross-attention mechanism, and then the output is achieved through a small multilayer perceptron network. In another possible implementation, the relative spatial coordinates and distance norm between the query point and the constraint point can be directly calculated, concatenated, and then input into a one-dimensional convolutional network to output the local basis function values.

[0044] Furthermore, it should be noted that spatial sparsity processing is a necessary mathematical technique to avoid the problem of local geometry being affected and modified globally when directly optimizing network parameters. For example, in a specific implementation, the system extracts the intermediate layer features of the basic decoder and fuses them with the query point position to form query point features. At the same time, the position information of the constraint point set is encoded as constraint tokens. Through a cross-attention mechanism, the two interact and output local basis function values. Subsequently, the system calculates the spatial distance from the query point to the constraint point with each constraint point as the center. When the spatial distance is greater than the preset radius of action, the local basis function value is directly set to zero. When the spatial distance is within the radius of action, it is multiplied by a Gaussian kernel function for attenuation weighting, thereby completing the spatial sparsity processing and ensuring that a certain mounting hole constraint only affects the local geometry around its own hole.

[0045] Step S50: Based on the local basis function values ​​after sparsification and the target constraint values, construct a system of linear equations about the combination coefficients of the basis functions, and calculate the combination coefficients using a differentiable linear solver.

[0046] It should be noted that, in this embodiment, the basis function combination coefficients refer to a set of unknown scalar parameters. By linearly weighting and summing these parameters with the local basis function values, the error of the underlying implicit field at the constraint point can be accurately compensated. The linear equation system refers to a set of mathematical equations consisting of multiple linear equations containing unknown combination coefficients. This equation system reflects the linear mapping relationship between the geometric error at the constraint point and the basis function correction. The differentiable linear solver refers to a numerical computation algorithm module that can perform forward equation solving within a computational graph framework and simultaneously support error gradient backpropagation.

[0047] This embodiment aims to cleverly transform the complex problem of forced fitting of nonlinear geometric surfaces into a deterministic and easily solvable linear algebraic problem. By constructing and solving this system of linear equations, the system can strictly guarantee in mathematical principle that the final generated shape satisfies the target constraint value at the constraint point, thus eliminating the constraint residual caused by the probability approximation of traditional neural networks.

[0048] In one possible implementation, the combination coefficients calculated by a differentiable linear solver can be used to directly solve the above linear equation system using the LU decomposition algorithm; in another possible implementation, considering numerical stability, the damped least squares method with the addition of a Tikhonov regularization term can also be used for the solution.

[0049] Furthermore, it should be noted that since the combination coefficients are only related to the set of constraint points and the target constraint value, and are independent of the position of any query point in space, under the same set of constraints, the combination coefficients only need to be calculated once throughout the entire inference phase and can be reused globally, greatly improving computational efficiency. For example, in a specific implementation, the system substitutes the three previously set mounting hole constraint points into a mathematical expression containing local basis function values ​​to construct a third-order linear equation system matrix and a right-hand side term vector. To prevent matrix singularity, the system adds a small Tikhonov regularization term to the linear equation system matrix, and then calls a differentiable linear solver to quickly calculate the three corresponding basis function combination coefficients using damped least squares.

[0050] Step S60: Linearly fuse the basic implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field.

[0051] It should be noted that in this embodiment, linear fusion refers to the process of merging and calculating multiple variables directly by algebraic weighted summation without introducing additional nonlinear activation functions; the final implicit field refers to the three-dimensional spatial continuous geometric field representation that mathematically satisfies both shape rationality and absolute constraint accuracy after basic semantic decoding and deterministic algebraic hard constraint correction.

[0052] This embodiment aims to physically integrate the macroscopic semantic basis provided by the diffusion model with the microscopic precise correction provided by the linear solver. This process marks a perfect bridge between probabilistic generative networks and deterministic algebraic structures, producing a high-quality implicit field that has a natural and smooth three-dimensional appearance and low-error geometric properties at specified coordinates.

[0053] In one possible implementation, the linear fusion of the underlying implicit field, local basis function values, and the solved combination coefficients can be achieved by directly adding the product of the local basis function values ​​and the combination coefficients to the underlying implicit field as a constant term. In another possible implementation, nonnegativity constraints or range constraints can be applied to the local basis function values ​​before fusion to ensure that the fusion process does not introduce abnormal geometric mutations.

[0054] Furthermore, it should be noted that since the linear fusion process utilizes pre-solved combination coefficients, the system can directly call these coefficients for instant calculation of any newly added dense query point in space, without repeatedly triggering the time-consuming linear solution process. For example, in a specific implementation, for any query point in three-dimensional space, the system extracts the basic implicit field value corresponding to the query point as the basis, and simultaneously obtains the three local basis function values ​​affected by the three mounting hole constraint points of the query point. These three local basis function values ​​are multiplied by the three basis function combination coefficients obtained previously, and finally, these three products are directly added to the basic implicit field value to instantly obtain the final implicit field value of the query point. The system verifies that the final implicit field value at the center point of the three mounting holes is equal to the target constraint value of zero.

[0055] Step S70: Based on the final implicit field, isosurface extraction is performed to obtain the three-dimensional model and associated three-dimensional content.

[0056] It should be noted that, in this embodiment, isosurface extraction refers to the process of finding and fitting the geometry of a continuous surface composed of all spatial points with a certain function value (usually zero) in a three-dimensional continuous scalar field; the three-dimensional model refers to a discrete three-dimensional surface data structure that can be directly rendered by computer graphics hardware and processed by industrial manufacturing software, which is composed of the coordinates of vertices in space and the topological relationship of the triangular facets connecting these vertices.

[0057] This embodiment aims to transform abstract continuous mathematical fields into visualized and applicable concrete engineering models. By extracting the zero-level set isosurface from the final implicit field that strictly satisfies hard constraints, the system can output a three-dimensional mesh with smooth edges and absolutely accurate key points.

[0058] In one possible implementation, isosurface extraction based on the final implicit field can be performed using the classic moving cube algorithm by interpolating through a voxel mesh; in another possible implementation, a moving tetrahedron algorithm based on bilinear interpolation or a differentiable isosurface extraction algorithm can also be used to generate the 3D model and the 3D content associated with the 3D model.

[0059] Additionally, it should be noted that the generated 3D model typically undergoes post-processing steps such as mesh smoothing or topology repair before output to further optimize its industrial usability. For example, in one specific implementation, the system densely samples millions of query points in a 3D space containing a gear shape, inputs these query points into the computational framework to obtain a 3D scalar field composed of the final implicit field, and then calls the traveling cube algorithm, setting the zero level set target value to zero. The algorithm accurately locates the vertex positions of isosurfaces with a value of zero in the scalar field through trilinear interpolation and constructs topological connections, finally outputting a 3D gear model file containing precise vertex coordinates and triangular facet indices.

[0060] This application linearly fuses the probabilistic underlying implicit field generated by decoding a diffusion model with deterministic hard constraint correction terms calculated based on local neural basis functions and a differentiable linear solver. Finally, it extracts the 3D model and its associated 3D content using isosurface extraction. This scheme achieves a perfect combination of probabilistic 3D semantic generation and deterministic algebraic equality constraints. Its core effects are: firstly, it overcomes the limitation of traditional soft constraints that can only "approximate," mathematically ensuring that the generated surface strictly passes through the specified constraint points, achieving low residual tolerance control to approximate the true value; secondly, through the spatial sparsity mechanism of local kernel functions, it ensures that precise editing of local key points does not affect or destroy the global geometry at a distant point; furthermore, it transforms complex nonlinear constraints into a small-scale linear equation system for solving, and the combination coefficients only need to be pre-calculated once and can be globally reused during inference. While balancing shape semantic rationality and generation diversity, it significantly improves the generation efficiency and industrial applicability of hard-constrained 3D models.

[0061] Furthermore, before the step of encoding the three-dimensional object to be generated into low-dimensional structure voxel latent variables, the process further includes: a pre-training procedure; wherein, (1) the three-dimensional training data is normalized and spatial query points and their corresponding true symbolic distance values ​​are sampled; (2) the surface autoencoder is trained, the normalized three-dimensional training data is input into the encoder to obtain structure voxel latent variables, and the basic implicit field is output to the spatial query points through the decoder, with symbolic distance error and gradient consistency error as training loss for optimization; (3) the latent space diffusion model is trained, the structure voxel latent variables output by the surface autoencoder are used as training data and noise is gradually added, and the spatial query points are processed by the decoder. The training constraint points are converted into a three-dimensional Gaussian heatmap and the conditional features are extracted by the constraint encoder and injected into the diffusion network. The noise prediction loss or latent variable reconstruction loss is used as the training target for optimization. (4) The hard constraint basis function layer is trained, the parameters of the surface autoencoder are frozen, and a linear equation system about the combination coefficients of the basis functions is constructed according to the training constraint points. The combination coefficients are calculated by a differentiable linear solver, and the symbol distance loss of the final implicit field obtained by combining the combination coefficients is used for parameter optimization. In addition, at least two of the trained surface autoencoder, the latent space diffusion model, the constraint encoder and the hard constraint basis function layer can be jointly fine-tuned.

[0062] In one possible implementation, refer to Figure 2a and Figure 2b ,in Figure 2a and Figure 2b The difference between a conventional surface autoencoder and the hard-constrained surface autoencoder of this invention is shown. Figure 2a A typical surface autoencoder includes an encoder E and a decoder. It obtains latent variables based on shape input. And combined with the query point location code Output neural implicit field. Figure 2b A constraint layer is added after the decoder in a medium-hard constraint surface autoencoder. and additional constraint set input This means that the final neural field not only comes from latent variables and location encoding, but is also controlled by the solution results of hard constraint equations.

[0063] In one implementation, the input 3D shape can be a mesh, point cloud, or voxel. The system first normalizes the input shape to a unified coordinate system and samples spatial query points. and its true symbolic distance value Encoder Encoding 3D shape as structural latent variables decoder according to and Output basic implicit field When user constraints exist, the constraint layer By solving for the combination coefficients based on the constraint points, objective function values, and basis function outputs, the final implicit field satisfying the hard constraints is obtained. .

[0064] In one possible implementation, refer to Figure 3 The constraint basis function layer receives two types of input: one is a set of constraint points. Position encoding Another type is query point Position encoding and basic decoder features. Constraint point position encoding via constraint encoder. Forming constraint tokens For each query point, the basic decoder The system provides intermediate layer features, and through cross-attention or equivalent feature fusion structures, it enables the query point features to interact with constraint tokens. Finally, a small multilayer perceptron outputs the basis function values. .

[0065] In one specific implementation, each basis function It can consist of two MLP layers or one-dimensional convolutional layers. Its input can include features from the penultimate layer of the base decoder, the relative position between the query point and the corresponding constraint point, the distance norm, and the constraint token features. To enhance local controllability, the basis function output is multiplied by a Gaussian kernel centered at the constraint point. Therefore, the final implicit field is: ; When the constraint set is The target constraint value is Substituting each constraint point into the above expression yields the following system of linear equations: ; make , ; The equation can then be written as ; The system can be computed using LU decomposition, QR decomposition, least squares solver, or regularized differentiable linear solver. When the constraint point is a point on the target surface, let That can make This ensures that the surface strictly passes through the constraint point.

[0066] To avoid matrix singularity or ill-conditioned phenomena, this scheme may employ one or more of the following measures: constraint point deduplication and normalization; selecting basis functions that match or exceed the number of constraint points; and applying matrix... Add small-amplitude Tikhonov regularization; use Gaussian kernels to limit the range of action of basis functions; for Apply nonnegativity constraints or range constraints; prune the training gradients. None of these measures change the core mechanism of this invention, which strictly satisfies hard constraints through a combination of linear basis functions.

[0067] Furthermore, the step of encoding the three-dimensional object to be generated into low-dimensional structure voxel latent variables includes A201~A203: Step A201: Normalize the input 3D shape and map the normalized 3D shape to a unified coordinate system; Step A202: Sample spatial query points in the unified coordinate system and obtain the real symbol distance value corresponding to the spatial query points; Step A203: Input the normalized three-dimensional shape into the encoder of the surface autoencoder, extract the three-dimensional geometric features through the encoder, and output the low-resolution structure voxel latent variables.

[0068] It should be noted that in this embodiment, the 3D shape refers to the 3D object to be processed by the input system, and its data format includes, but is not limited to, mesh, point cloud, or voxel. Normalization processing refers to performing translation, rotation, and scaling operations on the input 3D shape to map it to a unified coordinate system, with the aim of eliminating scale, position, and pose differences between 3D data from different sources. The unified coordinate system refers to a standard spatial reference system set for all 3D shapes. After completing coordinate system one, the system samples spatial query points in this space. Spatial query points refer to discrete coordinate points collected in 3D space according to specific rules. At the same time, the system obtains the true symbolic distance values ​​corresponding to these spatial query points. The true symbolic distance value refers to the shortest distance between any point in space and the true surface of the 3D shape, and it carries a positive or negative sign to distinguish whether the point is located inside or outside the shape. These values ​​constitute the supervision signal for neural implicit field learning. The encoder in a surface autoencoder refers to a neural network module specifically designed to compress high-dimensional 3D data into a low-dimensional feature space. It extracts 3D geometric features and ultimately outputs low-resolution structure voxel latent variables. The low-resolution structure voxel latent variables refer to a low-dimensional feature matrix with a 3D mesh topology. These latent variables not only preserve the geometric structure of the original 3D shape but also significantly reduce the cubic storage overhead and computational cost of processing directly in high-resolution 3D space, laying an efficient data foundation for latent space diffusion generation.

[0069] In one possible implementation, the normalization process can specifically involve translating the centroid of the 3D shape to the origin of a unified coordinate system and scaling its farthest boundary point to a sphere with a preset radius from the origin. In another possible implementation, the sampling of spatial query points can employ a hybrid strategy of dense sampling near the surface of the 3D shape according to a Gaussian distribution, while simultaneously performing sparse random sampling in spatial regions far from the surface, and simultaneously calculating the surface normal information of the sampling points to assist subsequent gradient consistency training. In yet another possible implementation, the encoder of the surface autoencoder can adopt a network structure composed of multiple 3D convolutional layers and downsampling layers, or it can adopt a sparse convolutional network structure capable of efficiently processing irregular data to extract 3D geometric features and output low-resolution structural voxel latent variables.

[0070] Furthermore, the step of conditionally denoising the low-dimensional structure voxel latent variables in the latent variable space using a pre-trained diffusion model to obtain the target structure voxel latent variables includes A301~A303: Step A301: Convert the set of constraint points into a local Gaussian heatmap in three-dimensional space; Step A302: Input the local Gaussian heatmap into the constraint encoder to extract the constraint condition features; Step A303: The constraint features are injected into the intermediate layer of the denoising network of the diffusion model through an attention mechanism or a control network branch to obtain the latent variables of the target structure voxels.

[0071] It should be noted that, in this embodiment, when the system performs conditional denoising generation in the latent variable space, it first converts the set of constraint points into a local Gaussian heatmap in three-dimensional space. The set of constraint points refers to the discrete spatial coordinate points that the user-specified requirement for generating the three-dimensional surface must strictly traverse. The local Gaussian heatmap refers to a three-dimensional probability density distribution map with spatially smooth decay characteristics generated around these constraint points. Its purpose is to replace traditional sparse binary point markers, thereby preserving dense and continuous positional information during three-dimensional convolution propagation. Subsequently, the system inputs this local Gaussian heatmap into a constraint encoder to extract constraint features. The constraint encoder is a neural network module specifically designed to analyze the spatial heatmap distribution and extract geometric layout priors. Constraint features are high-dimensional feature vectors that characterize the spatial topology of key points and the constraint intent. Finally, the system injects the constraint features into the intermediate layer of the denoising network of the diffusion model through an attention mechanism or control network branch to obtain the latent variables of the target structure voxels. The intermediate layer of the denoising network refers to the core network layer within the diffusion model that progressively predicts and removes noise. The attention mechanism or control network branch (such as a 3D Control network) refers to the network structure used to fuse external condition features with generation process features across space. The latent variables of the target structure voxels refer to the final low-dimensional feature matrix that conforms to the overall semantic prior of the 3D shape and is strictly constrained by the keypoint layout. This embodiment introduces continuous heatmap constraints, guiding the generation of latent variables from a macroscopic level without compromising the original powerful generation capabilities of the diffusion model. This enables the diffusion model to generate a coarse-grained feature base that meets the overall layout requirements, providing a highly adapted initial geometric shape for subsequent accurate solving of hard constraints.

[0072] In one possible implementation, converting the set of constraint points into a local Gaussian heatmap can be achieved by spatially convolving each constraint point location with a Gaussian kernel function of fixed variance, generating an independent Gaussian thermal distribution channel for each constraint point. In another possible implementation, injecting constraint features into the intermediate layer of the denoising network can specifically employ zero-convolution operations, directly superimposing the feature map output by the constraint encoder onto the corresponding feature resolution layer of the denoising network element-wise. In yet another possible implementation, a cross-attention mechanism can be used, where the feature sequence extracted by the current layer of the denoising network is used as a query vector, and the constraint feature sequence is used as a key-value pair for deep interaction and fusion, thereby achieving precise guidance for the generated shape space layout without retraining the entire massive diffusion model.

[0073] In one feasible implementation, when generating a 3D model for a certain type of precision connector, the system obtains the coordinates of the center points of four key assembly holes specified by the user as a set of constraint points. The system then generates four local Gaussian heatmaps with smooth decay characteristics, centered on these four center points, in a unified 3D coordinate system corresponding to the low-dimensional structural voxel latent variables. These heatmaps visually represent the dense distribution of positions around the constraint points in the form of a low-resolution voxel matrix. The system inputs this Gaussian heatmap voxel matrix into a constraint encoder composed of a multi-layer 3D convolutional network to extract constraint features containing the spatial relative positional relationships of the four holes. Subsequently, the system uses a pre-built 3D ControlNet as a control network branch to precisely inject these constraint features into the intermediate layer of the denoising network of a pre-trained diffusion model through a zero-convolution operation. After receiving this strong spatial layout guidance, the denoising network gradually performs conditional denoising generation in the latent variable space, starting from random noise, and finally outputs a target structural voxel latent variable that not only conforms to the connector category characteristics in its overall shape but also perfectly reserves the macroscopic layout trend of the four assembly holes in their corresponding spatial positions.

[0074] Furthermore, referring to Figure 4 The step of inputting the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point includes A401~A403: Step A401: Perform location encoding processing on the query point location to obtain the query point location feature; Step A402: Extract the spatial geometric features corresponding to the query point from the latent variables of the target structure voxel, and fuse the spatial geometric features with the query point location features; Step A403: Input the fused features into the network layer of the basic decoder for forward propagation, output the basic implicit field corresponding to the query point, and extract the intermediate layer features of the basic decoder.

[0075] It should be noted that in this embodiment, the basic decoder refers to the neural network module that constitutes part of the hard-constrained surface autoencoder, responsible for restoring low-dimensional features back to a high-dimensional continuous space. The system first performs positional encoding on the query point location to obtain high-frequency spatial information, resulting in query point location features. Then, it extracts the spatial geometric features corresponding to the query point from the latent variables of the target structure voxels, and fuses these spatial geometric features with the query point location features. Finally, the fused features are input into the network layer of the basic decoder for forward propagation, outputting the basic implicit field corresponding to the query point, and simultaneously extracting the intermediate layer features of the basic decoder. This embodiment endows the network with the ability to perceive high-frequency spatial details through positional encoding and provides a global shape prior using latent variables. The deep fusion of these two features enables the decoder to construct a smooth and globally reasonable geometric field in a high-dimensional continuous space. This basic implicit field, as the basis for hard-constrained algebraic correction, determines the smoothness and semantic reasonableness of the final shape. The extracted intermediate layer features contain rich local three-dimensional geometric structure information and are an indispensable key input source for the constraint basis function network to generate local basis function values.

[0076] In one possible implementation, the location encoding of the query point can be performed using sinusoidal location encoding technology, which maps low-dimensional three-dimensional spatial coordinates to a high-dimensional space, thereby effectively breaking the low-frequency bias phenomenon that traditional neural networks are prone to when processing spatial coordinates. In another possible implementation, the spatial geometric features extracted from the latent variables of the target structure voxels can be obtained using a trilinear interpolation algorithm, which obtains continuous feature vectors at the query point on a low-resolution voxel grid and directly concatenates and fuses these vectors with the query point location features in the channel dimension. In yet another possible implementation, the network layer of the basic decoder can specifically adopt a multi-layer residual fully connected network with skip connections, and the extracted intermediate layer features can be specifically extracted from the output of the penultimate hidden layer of this fully connected network to ensure that the features contain both global context information and retain sufficient local geometric representation capabilities.

[0077] Further, see Figure 5 The step of generating local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the location information of the constraint point set and the location of the query point includes A501~A503: Step A501: Encode the position information of the constraint point set to generate a constraint token; Step A502: Fuse the intermediate layer features of the basic decoder with the query point location to obtain the query point features; Step A503: The query point features are made to interact with the constraint token through a cross-attention mechanism, and the local basis function value is output through a multilayer perceptron network.

[0078] It should be noted that, in this embodiment, the location information of the constraint point set refers to the sequence of discrete spatial coordinate points that the user-specified 3D surface is required to strictly traverse; the constraint token refers to the high-dimensional feature vector sequence formed by feature mapping of the above location information, used to uniformly represent the spatial attributes and constraint intentions of each constraint point in the feature space; the intermediate layer features of the basic decoder refer to the hidden layer tensors generated by the pre-sequence decoding network in the process of outputting the basic implicit field, which contain rich 3D geometric structure and global shape context information; the query point position refers to the discrete spatial coordinates sampled in 3D space for detecting and reconstructing the shape surface; the query point feature refers to the comprehensive feature vector formed by fusing the intermediate layer features of the basic decoder with the query point position, which can simultaneously represent the global shape prior and the specific spatial detection position; the cross-attention mechanism refers to a neural network structure that allows one set of feature sequences as queries and another set of feature sequences as keys and values ​​for deep information interaction. This embodiment first abstracts the discrete constraint point locations into constraint tokens within a continuous feature domain. Simultaneously, it utilizes the deep geometric features extracted by the basic decoder to enhance the feature representation of the query point. Subsequently, a cross-attention mechanism is used to enable adaptive matching and interaction between the query point features and constraint tokens under a global perspective, allowing each query point to accurately perceive the distribution of surrounding constraint points. Finally, a multilayer perceptron network maps this complex spatial interaction relationship into specific numerical values. These local basis function values ​​can accurately characterize the differentiated geometric influence of each constraint point on different locations in space.

[0079] In one possible implementation, the location information of the constraint point set is encoded to generate constraint tokens. This can be achieved using an embedding network with sinusoidal positional encoding transformation, which maps the low-dimensional spatial coordinate sequence into a high-dimensional token feature matrix. In another possible implementation, the intermediate layer features of the base decoder are fused with the query point location. This can be achieved by directly concatenating the features along the feature channel dimension to maximize the preservation of the original geometric information extracted by the preceding network. In yet another possible implementation, the multilayer perceptron network that outputs local basis function values ​​can be replaced with a one-dimensional convolutional network. The sliding operation of the convolutional kernel can efficiently capture local pattern changes in the interaction features between the query point and the constraint token.

[0080] Furthermore, the step of spatially sparsifying the local basis function values ​​using a local kernel function includes A601~A603: Step A601: Calculate the spatial distance between the query point and the constraint point, with each constraint point as the center. Step A602: When the spatial distance is greater than the preset radius of action, the local basis function value of the corresponding constraint point is set to zero; Step A603: When the spatial distance is less than or equal to the radius of action, the local basis function value is attenuated and weighted using a Gaussian kernel function or a radial basis kernel function.

[0081] It should be noted that, in this embodiment, the local basis function value refers to the numerical value output by the constraint basis function network, representing the intensity of the influence of each constraint point on the surrounding local spatial geometry. This embodiment first calculates the spatial distance between the query point and the constraint point, centered on each constraint point; the spatial distance refers to the Euclidean distance between the coordinates of the query point and the constraint point in three-dimensional space; then, when the spatial distance is greater than a preset radius of action, the local basis function value of the corresponding constraint point is set to zero; the preset radius of action refers to a distance threshold artificially set to define the local spatial boundary where the constraint point generates an effective geometric influence; when the spatial distance is less than or equal to the radius of action, the local basis function value is attenuated and weighted using a Gaussian kernel function or a radial basis kernel function; the Gaussian kernel function is a radially symmetric function with a bell-shaped curve that can generate a smooth, continuously decaying weight within the radius of action based on distance; the radial basis kernel function is a real-valued function that depends only on the distance between two points in space. This embodiment introduces a spatial truncation mechanism and a distance decay mechanism to ensure that each basis function mainly affects the local region near its corresponding constraint point. The purpose is to decouple the global relationship between different constraint points from the mathematical structure level, which can significantly reduce the global correlation between different constraint basis functions, greatly improve the solvability and numerical stability of the constructed linear equation system, and avoid the geometric deformation of other distant irrelevant regions of the model when a local constraint point is precisely adjusted, thus realizing true local fine-grained and controllable editing.

[0082] Furthermore, it should be noted that there are several different implementation methods for the specific parameter selection and kernel function application of spatial sparsity processing. In one possible implementation, the preset radius of action can be dynamically calculated proportionally based on the shortest neighbor distance between each constraint point in the constraint point set, so as to flexibly prevent excessive overlap of the local action regions of different constraint points, which could lead to ill-conditioned linear systems. In another possible implementation, when using a Gaussian kernel function for attenuation weighting, the variance parameter in the Gaussian kernel function can be used as a learnable parameter and automatically optimized and updated by the system through backpropagation during the network training iteration, so as to accurately adapt to geometries with different complexities and curvature variations. In yet another possible implementation, in addition to using a Gaussian kernel function and a conventional radial basis function, the system can also use a compactly supported radial basis function or a B-spline kernel function to achieve spatial sparsity processing, thereby obtaining more stringent truncation characteristics and different smoothness properties at the boundary of the radius of action.

[0083] Furthermore, the step of calculating the combination coefficients using a differentiable linear solver includes A701~A703: Step A701: Substitute each constraint point into the implicit field expression containing the local basis function values ​​to construct a linear equation system matrix and a right-hand term vector with the combination coefficients as unknowns; Step A702: Add a Tikhonov regularization term to the matrix of the linear equation system; Step A703: Solve the linear equation system with added regularization terms using the damped least squares method to obtain the combined coefficients.

[0084] It should be noted that, in this embodiment, the combination coefficients refer to the unknown scalar parameters used to weight and sum the local basis function values ​​to compensate for the underlying implicit field error after the nonlinear surface constraints are transformed into linear corrections; the implicit field expression refers to the mathematical formula that expresses the final implicit field as the sum of the linear combination of the basic decoder output and several local neural basis functions; the linear equation system matrix refers to the coefficient matrix composed of the local basis function values ​​corresponding to each constraint point position arranged in a specific order; the right-hand side vector refers to the constant vector composed of the difference between the user-specified target constraint value and the basic implicit field value output by the basic decoder at the constraint point; the Tikhonov regularization term refers to the mathematical means of improving the matrix condition number by adding a small regularization coefficient on the diagonal of the linear equation system matrix; the damped least squares method refers to a numerical calculation algorithm that introduces a damping factor during the solution process to avoid matrix singularity or ill-conditioned problems that could lead to solution collapse. This embodiment cleverly transforms the complex nonlinear three-dimensional geometric surface forced fitting problem into a deterministic and easily solvable linear algebra problem. The aim is to accurately calculate the combination coefficients that can offset the underlying implicit field bias by constructing and solving this system of linear equations. This is intended to strictly guarantee, in terms of mathematical principles, that the final generated shape satisfies the target constraint value at the constraint point, eliminating the constraint residuals caused by the probability approximation of traditional neural networks. At the same time, through the combination of regularization and specific solution algorithms, matrix singularities or ill-conditioned phenomena in numerical calculations are effectively avoided, ensuring the numerical stability of the entire network training during end-to-end gradient backpropagation.

[0085] Furthermore, it should be noted that the construction and solution of the aforementioned linear equation system can be implemented in several different ways. In one possible implementation, adding a Tikhonov regularization term to the linear equation system matrix can specifically involve multiplying the identity matrix by a very small positive scalar and then superimposing it onto the main diagonal elements of the original linear equation system matrix. In another possible implementation, in addition to using damped least squares, the system can also use direct solution algorithms such as LU decomposition or QR decomposition to quickly solve the linear equation system after adding the regularization term. In yet another possible implementation, if faced with a large number of constraint points or extremely complex constraint distribution leading to a large matrix size, the system can use the conjugate gradient method or SVD singular value decomposition algorithm to robustly solve the high-dimensional linear equation system.

[0086] Furthermore, the step of linearly fusing the fundamental implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field includes: The local basis function values ​​are weighted and summed using the combined coefficients to output the constraint offset field; The basic implicit field and the constraint offset field are added and fused point by point to output the final implicit field that satisfies the target constraint value at each constraint point.

[0087] It should be noted that in this embodiment, the constraint offset field refers to a spatial scalar field specifically used to compensate for deviations in the underlying implicit field, which is composed of a weighted sum of local basis function values ​​and combination coefficients; the final implicit field refers to a three-dimensional geometric field that, after superposition correction, possesses both global shape rationality and absolute accuracy at the constraint points. This embodiment uses pure algebraic superposition rather than nonlinear neural network layers to achieve geometric correction. The purpose is to decouple the probabilistically generated smooth basis from the precise compensation obtained through deterministic algebraic solution at the mathematical level and then reassemble them, thereby reducing the fitting residuals of traditional generative models at key points without destroying the original smooth geometric topology of the underlying implicit field.

[0088] Furthermore, it should be noted that various different implementation methods can be adopted for the specific numerical calculations and implementation details of linear fusion. In one possible implementation, when using the combination coefficients to perform weighted summation of local basis function values, since the combination coefficients are global constants related only to the constraint points, the system can pre-convert them into constant tensors. During the inference phase, these constant tensors can be calculated in parallel with the local basis function values ​​of batch query points through matrix multiplication or tensor broadcasting mechanisms, thereby greatly improving the generation efficiency of the constraint offset field during full-space sampling. In another possible implementation, the system can also introduce a very small range of numerical truncation protection mechanism on the summation result, forcibly limiting abnormal extreme values ​​that exceed the normal physical range of the signed distance field to within a preset safety threshold, in order to prevent numerical overflow or topological artifacts in areas of extreme geometric abrupt changes.

[0089] Furthermore, the step of extracting isosurfaces from the final implicit field to obtain the three-dimensional model and associated three-dimensional content includes A801~A804: Step A801: Obtain the zero-level set target value of the final implicit field, where the zero-level set target value is zero; Step A802: Input the dense query points sampled in the three-dimensional space into the final implicit field calculation framework after superposition to obtain the final signed distance function value corresponding to each dense query point, forming a three-dimensional scalar field. Step A803: Input the three-dimensional scalar field and the target value of the zero level set into the traveling cube algorithm, and determine the position of the isosurface vertex and the topological connectivity relationship in the three-dimensional scalar field with a value of zero through interpolation; Step A804: Based on the vertex positions of the isosurface and the topological connectivity, generate a 3D model consisting of vertex coordinates and triangle patch indices, as well as the 3D content associated with the 3D model.

[0090] It should be noted that, in this embodiment, the zero-level-set target value refers to a specific function value standard used to define and extract the surface of a three-dimensional object in a three-dimensional scalar field, and this value is strictly zero for a signed distance field; the final signed distance function value refers to the shortest distance value between any point in space and the finally determined three-dimensional surface, with a positive or negative sign to distinguish whether the point is inside or outside the object; the three-dimensional scalar field refers to the continuous spatial data field formed by assigning a final signed distance function value to each densely sampled point in three-dimensional space; the moving cube algorithm refers to a classic computer graphics isosurface extraction algorithm that reconstructs the three-dimensional surface by performing local interpolation by traversing the edges and corners of the spatial voxel mesh; the isosurface vertex position refers to the precise coordinates of the point in the three-dimensional scalar field whose function value is equal to the zero-level-set target value in three-dimensional space; the topological connectivity refers to the geometric topological structure of how the vertices constituting the isosurface are connected to each other through edges and patches to form a continuous closed surface; the three-dimensional model refers to the discrete three-dimensional surface data structure composed of the set of vertex coordinates in space and the set of triangular patch indices connecting these vertices. The associated 3D content refers to the geometric, physical, or semantic information data attached to the surface of a 3D mesh, such as surface normals, texture coordinates, material parameters, or semantic segmentation labels.

[0091] This embodiment first establishes the mathematical benchmark for surface extraction, namely, obtaining the zero-level set target value (zero) of the final implicit field. Then, high-density spatial sampling is performed in three-dimensional space, and the dense query points are input into the superimposed final implicit field computation framework to calculate the final signed distance function value for each point. This transforms the abstract neural network output into a concrete three-dimensional scalar field. Next, the system uses the traveling cube algorithm to analyze this three-dimensional scalar field, accurately locking the spatial locations where the function value is equal to zero as isosurface vertices through interpolation, and establishing the topological connections between these vertices. Finally, an industry-standard three-dimensional model is generated based on this geometric data. This embodiment completes the crucial transformation from a continuous mathematical field to a discrete engineering model, aiming to output a physical mesh with smooth edges and low error at specified constraint points.

[0092] Furthermore, it should be noted that the specific algorithms and post-processing for the aforementioned isosurface extraction and mesh construction can be implemented in various ways. In one possible implementation, when inputting the 3D scalar field into the traveling cube algorithm for interpolation, a trilinear interpolation method can be used to accurately calculate the intersection points of the isosurfaces and voxel edges, thereby improving the smoothness of the extracted isosurface vertex positions. In another possible implementation, the system can also use the traveling tetrahedron algorithm to extract isosurfaces from the 3D scalar field, effectively avoiding the topological ambiguity that may arise when the traveling cube algorithm handles complex topological shapes. In yet another possible implementation, after generating the initial 3D model based on the isosurface vertex positions and topological connectivity, the system can further introduce a mesh smoothing filter or mesh thinning algorithm for post-processing to optimize the quality of the mesh's triangular faces and rendering performance while maintaining the geometric accuracy of key points.

[0093] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the three-dimensional content generation method based on key point constraints in this application. Any simple transformations based on this technical concept are within the protection scope of this application.

[0094] This application also provides a 3D content generation system based on key point constraints, please refer to... Figure 6 The keypoint-constrained 3D content generation system includes: Data preprocessing module 10 is used to obtain the set of constraint points and the corresponding target constraint values ​​of the three-dimensional object to be generated; The diffusion generation module 20 is used to encode the three-dimensional object to be generated into low-dimensional structure voxel latent variables, and to perform conditional denoising on the low-dimensional structure voxel latent variables in the latent variable space through a pre-trained diffusion model to obtain the target structure voxel latent variables. The basic decoding module 30 is used to input the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point; The constraint decoding module 40 is used to generate local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the location information of the constraint point set and the location of the query point, and to perform spatial sparsification processing on the local basis function values ​​using a local kernel function. The linear calculation module 50 is used to construct a system of linear equations about the combination coefficients of the basis functions based on the local basis function values ​​after sparsification and the target constraint values, and to calculate the combination coefficients by a differentiable linear solver. The linear fusion module 60 is used to linearly fuse the basic implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field. The model generation module 70 is used to extract isosurfaces based on the final implicit field to obtain a three-dimensional model and associated three-dimensional content.

[0095] The 3D content generation system based on keypoint constraints provided in this application, employing the 3D content generation method based on keypoint constraints in the above embodiments, can solve the technical problem that existing soft constraint approximation cannot meet the low-error requirement of generating surfaces that strictly pass through constraint points in industrial manufacturing. Compared with the prior art, the beneficial effects of the 3D content generation system based on keypoint constraints provided in this application are the same as those of the 3D content generation method based on keypoint constraints provided in the above embodiments, and other technical features of the 3D content generation system based on keypoint constraints are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0096] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0097] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A method for generating 3D content based on key point constraints, characterized in that, The 3D content generation method based on key point constraints includes: Obtain the set of constraint points and corresponding target constraint values ​​of the 3D object to be generated; The three-dimensional object to be generated is encoded as a low-dimensional structure voxel latent variable, and the low-dimensional structure voxel latent variable is conditionally denoised in the latent variable space by a pre-trained diffusion model to obtain the target structure voxel latent variable. The latent variables of the target structure voxel and the query point positions in the latent variable space are input into the basic decoder to obtain the basic implicit field corresponding to the query point; Based on the location information of the constraint point set and the location of the query point, local basis function values ​​corresponding to each constraint point are generated through the constraint basis function network, and the local basis function values ​​are spatially sparsified using a local kernel function. Based on the local basis function values ​​after sparsification and the target constraint values, a system of linear equations about the combination coefficients of the basis functions is constructed, and the combination coefficients are calculated by a differentiable linear solver. The fundamental implicit field, the local basis function values, and the solved combination coefficients are linearly fused to obtain the final implicit field. Isosurfaces are extracted based on the final implicit field to obtain the three-dimensional model and associated three-dimensional content.

2. The 3D content generation method based on key point constraints as described in claim 1, characterized in that, The step of encoding the three-dimensional object to be generated as a low-dimensional structure voxel latent variable includes: The input 3D shape is normalized, and the normalized 3D shape is mapped to a unified coordinate system. Sampling of spatial query points in the unified coordinate system, and obtaining the true symbol distance value corresponding to the spatial query points; The normalized 3D shape is input into the encoder of the surface autoencoder, and the encoder extracts the 3D geometric features and outputs low-resolution structure voxel latent variables.

3. The 3D content generation method based on key point constraints as described in claim 2, characterized in that, The step of conditionally denoising the low-dimensional structure voxel latent variables in the latent variable space using a pre-trained diffusion model to obtain the target structure voxel latent variables includes: The set of constraint points is converted into a local Gaussian heatmap in three-dimensional space; The local Gaussian heatmap is input into the constraint encoder to extract the constraint condition features; The constraint features are injected into the intermediate layer of the denoising network of the diffusion model through an attention mechanism or a control network branch to obtain the latent variables of the target structure voxels.

4. The 3D content generation method based on key point constraints as described in claim 3, characterized in that, The step of inputting the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point includes: The query point location is encoded to obtain the query point location feature; Extract the spatial geometric features corresponding to the query point from the latent variables of the target structure element, and fuse the spatial geometric features with the query point location features; The fused features are input into the network layer of the base decoder for forward propagation, outputting the basic implicit field corresponding to the query point, and extracting the intermediate layer features of the base decoder.

5. The 3D content generation method based on key point constraints as described in claim 4, characterized in that, The step of generating local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the location information of the constraint point set and the location of the query point includes: The location information of the constraint point set is encoded to generate a constraint token; The intermediate layer features of the basic decoder are fused with the query point location to obtain the query point features; The query point features are interacted with the constraint token through a cross-attention mechanism, and the local basis function values ​​are output through a multilayer perceptron network.

6. The 3D content generation method based on key point constraints as described in claim 5, characterized in that, The step of spatially sparsifying the local basis function values ​​using a local kernel function includes: Calculate the spatial distance between the query point and the constraint point, with each constraint point as the center; When the spatial distance is greater than the preset radius of action, the local basis function value of the corresponding constraint point is set to zero; When the spatial distance is less than or equal to the radius of action, the local basis function values ​​are attenuated and weighted using a Gaussian kernel function or a radial basis kernel function.

7. The 3D content generation method based on key point constraints as described in claim 6, characterized in that, The step of calculating the combination coefficients using a differentiable linear solver includes: Substitute each constraint point into the implicit field expression containing the local basis function values ​​to construct a linear equation system matrix and a right-hand term vector with the combination coefficients as unknowns. Add a Tikhonov regularization term to the matrix of the linear equation system; The damped least squares method is used to solve the linear equation system with regularization terms added, and the combination coefficients are obtained.

8. The method for generating 3D content based on key point constraints as described in claim 7, characterized in that, The step of linearly fusing the fundamental implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field includes: The local basis function values ​​are weighted and summed using the combined coefficients to output the constraint offset field; The basic implicit field and the constraint offset field are added and fused point by point to output the final implicit field that satisfies the target constraint value at each constraint point.

9. The method for generating 3D content based on key point constraints as described in claim 8, characterized in that, The step of extracting isosurfaces based on the final implicit field to obtain the 3D model and associated 3D content includes: Obtain the zero-level set target value of the final implicit field, where the zero-level set target value is zero; The dense query points sampled in the three-dimensional space are input into the final implicit field calculation framework after superposition to obtain the final signed distance function value corresponding to each dense query point, forming a three-dimensional scalar field. The three-dimensional scalar field and the target value of the zero level set are input into the traveling cube algorithm, and the positions of the isosurface vertices with a value of zero in the three-dimensional scalar field and the topological connectivity are determined by interpolation. Based on the vertex positions of the isosurfaces and the topological connectivity, a 3D model consisting of vertex coordinates and triangle patch indices, along with associated 3D content, is generated.

10. A 3D content generation system based on key point constraints, characterized in that, The key-point-constrained 3D content generation system includes: The data preprocessing module is used to obtain the set of constraint points and the corresponding target constraint values ​​of the 3D object to be generated; The diffusion generation module is used to encode the three-dimensional object to be generated into low-dimensional structure voxel latent variables, and to perform conditional denoising on the low-dimensional structure voxel latent variables in the latent variable space through a pre-trained diffusion model to obtain the target structure voxel latent variables. The basic decoding module is used to input the latent variables of the target structure voxel and the query point position in the latent variable space into the basic decoder to obtain the basic implicit field corresponding to the query point; The constraint decoding module is used to generate local basis function values ​​corresponding to each constraint point through a constraint basis function network based on the location information of the constraint point set and the location of the query point, and to perform spatial sparsification processing on the local basis function values ​​using a local kernel function. The linear calculation module is used to construct a system of linear equations about the combination coefficients of the basis functions based on the local basis function values ​​after sparsification and the target constraint values, and to calculate the combination coefficients by a differentiable linear solver. The linear fusion module is used to linearly fuse the basic implicit field, the local basis function values, and the solved combination coefficients to obtain the final implicit field. The model generation module is used to extract isosurfaces based on the final implicit field to obtain a three-dimensional model and associated three-dimensional content.