Methods, systems, apparatuses, and media for generating 3D models of industrial scenes

By using a domain-specific semantic segmentation model library and a planar point cloud processing method, the complexity of point cloud data is solved, enabling efficient and accurate 3D model reconstruction of industrial scenes and adapting to the flexible needs of different industrial scenarios.

CN122228528APending Publication Date: 2026-06-16SIMENS INDASTRI SOFTVEAR INK

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SIMENS INDASTRI SOFTVEAR INK
Filing Date
2023-08-30
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies struggle to quickly and accurately reconstruct high-quality 3D models of industrial scenes from point cloud data, especially due to the large volume of data, the presence of uneven points and noise points, and the complex topology of industrial scenes and the diversity of different industries, which increases the complexity of processing.

Method used

Employing a domain-specific semantic segmentation model library and a planar point cloud processing method, 3D models are generated through semantic segmentation and meshing processes, including planar segmentation, semantic classification, and meshing of point cloud data. Machine learning algorithms such as random forest classifiers and deep learning algorithms are utilized, combined with Euclidean distance clustering and triangulation techniques.

🎯Benefits of technology

It improves modeling efficiency and model quality, reduces the computational complexity of processing large-scale point cloud data, achieves faster processing speed and higher accuracy, and adapts to the flexible needs of different industrial scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122228528A_ABST
    Figure CN122228528A_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure disclose a method, system, device and medium for generating a 3D model of an industrial scene. The method comprises: obtaining point cloud data for an industrial scene; determining a domain of the industrial scene; obtaining a trained semantic segmentation model associated with the domain from a model library, wherein the model library comprises a plurality of trained semantic segmentation models associated with respective domains; performing semantic segmentation on the point cloud data based on the trained semantic segmentation model associated with the domain to obtain a plurality of segmented objects; gridding the plurality of segmented objects; and generating a 3D model of the industrial scene based on the gridded plurality of segmented objects. A flexible domain-specific model library is provided to meet the needs of different industrial scenes, thereby improving modeling efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of industrial technology, and more particularly to methods, systems, apparatus and media for generating 3D models of industrial scenes. Background Technology

[0002] Given the current demand for digitization and the generation of intelligent digital twins, obtaining three-dimensional (3D) models, which are not yet available in the current information technology (IT) field, is crucial. This issue is further highlighted by the sweeping wave of the industrial metaverse.

[0003] However, raw point cloud data, especially point cloud data acquired by scanning an entire factory scene, is typically enormous in size and contains many non-uniform points, noisy points, and missing points, making it difficult to accurately and quickly segment the objects needed for 3D reconstruction. Furthermore, the processing flow becomes more complex when meshing is required. In addition, structures in industrial settings have very complex topologies, and the situation varies across different industries. Mechanical components and structures are often interconnected, introducing additional complexity. Obtaining high-quality 3D models from point cloud data requires significant manual post-processing. Summary of the Invention

[0004] Embodiments of this disclosure provide a method, system, apparatus, and medium for generating 3D models of industrial scenes.

[0005] In a first aspect, a method for generating 3D models of industrial scenes is provided. The method includes: Acquire point cloud data for industrial applications; Determine the domain of the industrial scenario; Obtain trained semantic segmentation models associated with the domain from a model library, wherein the model library includes multiple trained semantic segmentation models associated with the corresponding domain; Based on the trained semantic segmentation model associated with the domain, semantic segmentation is performed on the point cloud data to obtain multiple segmentation objects; Mesh the plurality of segmented objects; and The 3D model of the industrial scene is generated based on multiple segmented objects in a grid.

[0006] In a second aspect, a system for generating 3D models of industrial scenes is provided. The system includes: An interface module configured to acquire point cloud data for industrial scenarios; A model library configured to store multiple trained semantic segmentation models associated with a corresponding domain; A segmentation module, configured to determine the domain of the industrial scene, obtain a trained semantic segmentation model associated with the domain from the model library, and perform semantic segmentation on the point cloud data based on the trained semantic segmentation model associated with the domain to obtain multiple segmented objects; and A generation module is configured to mesh the plurality of segmented objects and generate a 3D model of the industrial scene based on the meshed plurality of segmented objects.

[0007] In a third aspect, an electronic device is provided. The electronic device includes a processor and a memory, wherein an application program executable by the processor is stored in the memory to cause the processor to execute a method for generating a 3D model of an industrial scene as described in any of the foregoing.

[0008] In a fourth aspect, a computer-readable medium is provided that includes computer-readable instructions stored thereon, wherein the computer-readable instructions, when executed by a processor, implement a method for generating a 3D model of an industrial scene as described in any of the foregoing.

[0009] In a fifth aspect, a computer program product comprising a computer program, which, when executed by a processor, is used to perform a method for generating a 3D model of an industrial scene as described in any of the preceding methods.

[0010] Based on the above technical solution, a flexible domain-specific model library is provided to meet the needs of different industrial scenarios, thereby improving modeling efficiency. Furthermore, a novel planar point cloud processing method connects the segmentation and meshing processes within a single system, making point cloud processing more efficient and flexible. It processes large-scale inputs faster because the workload of processing multiple planes is significantly reduced compared to processing with a large number of points. Attached Figure Description

[0011] To make the technical solutions of the examples in this disclosure clearer, accompanying drawings will be briefly introduced below to describe the examples. Obviously, the drawings described below are only some examples of this disclosure. Those skilled in the art can obtain other drawings from these drawings without inventive effort.

[0012] Figure 1 This is a flowchart of a method for generating a 3D model of an industrial scene according to an embodiment of the present disclosure.

[0013] Figure 2 This is a schematic diagram of an exemplary workflow for generating a 3D model of an industrial scene according to embodiments of the present disclosure.

[0014] Figure 3This is a flowchart of performing plane segmentation according to an embodiment of the present disclosure.

[0015] Figure 4 This is a flowchart illustrating the process of performing planar meshing and grouping to generate a 3D model according to embodiments of the present disclosure.

[0016] Figure 5 This is a flowchart of training a semantic segmentation model according to an embodiment of the present disclosure.

[0017] Figure 6 This is a structural diagram of a system for generating 3D models of industrial scenes according to embodiments of the present disclosure.

[0018] Figure 7A This is a schematic diagram of a transected plane having a calculated normal vector according to an embodiment of the present disclosure.

[0019] Figure 7B This is a schematic diagram of the final output of planar segmentation according to an embodiment of the present disclosure.

[0020] Figure 7C This is a schematic diagram of the triangulation result of a segmented plane according to an embodiment of the present disclosure.

[0021] Figure 7D This is a schematic diagram of a 3D model according to an embodiment of the present disclosure.

[0022] Figure 7E This is a schematic diagram of semantic classification results generated by a trained random forest model according to an embodiment of the present disclosure.

[0023] Figure 8 This is a structural diagram of an electronic device according to an embodiment of the present disclosure.

[0024] List of reference numerals in the attached diagram: Detailed Implementation

[0025] To make the purpose, technical solutions and advantages of this disclosure clearer, the following examples are provided to further explain this disclosure in detail.

[0026] For the sake of brevity and intuitiveness, the solutions of this disclosure are described below through several representative embodiments. Many details in the embodiments are only used to help understand the solutions of this disclosure. However, it is obvious that the technical solutions of this disclosure can be implemented without being limited to these details. In order to avoid unnecessarily obscuring the solutions of this disclosure, some embodiments are not described in detail, but only a framework is given. In the following text, "comprising" means "comprising but not limited to", and "according to..." means "at least according to..., but not limited to...". Unless the number of elements is specifically indicated below, it means that the elements can be one or more, or can be understood as at least one.

[0027] In the industrial sector, machinery and structures are often interconnected, introducing additional complexity. Currently, obtaining high-quality 3D models from point cloud data requires significant manual post-processing. 3D object reconstruction also depends on further utilization. It may be necessary for high-quality visualization, accuracy and CAD modeling, navigation, and lightweight visualization on the web. The need for flexibility necessitates flexible segmentation and meshing.

[0028] Embodiments of this disclosure present a novel system and method for industrial 3D reconstruction of point clouds, comprising point cloud semantic segmentation and meshing processes, particularly suitable for large-scale point clouds. The point cloud semantic segmentation process segments point clouds corresponding to objects of interest using semantic tags, and the meshing process provides a final output of a 3D model with meshes and textures, as well as the desired 3D model format.

[0029] Embodiments of this disclosure also provide a domain-specific model library to meet the needs of different industrial scenarios. Models in the library are trained to learn context-oriented features of objects with different shapes, thereby achieving more accurate segmentation and higher-quality meshing. Furthermore, a flexible API is provided to facilitate the integration of embodiments of this disclosure with external 3D software tools, thereby assisting in 3D modeling.

[0030] Figure 1 This is a flowchart of a method for generating a 3D model of an industrial scene according to embodiments of the present disclosure. Figure 1 As shown, the method includes: Step 101: Obtain point cloud data for industrial scenarios.

[0031] A point cloud is a dataset of points in a coordinate system. The point cloud may contain 3D coordinates, color, classification values, intensity values, time, and other information. Here, "industrial scene" refers to a specific application scenario within an industrial setting, which may include corporations, factories, warehouses, stadiums, workshops, assembly lines, and workstations.

[0032] Point cloud data for industrial applications can be acquired in various ways. Specific methods may include: (1) Laser scanner / LiDAR: Laser 3D scanner / LiDAR uses the principle of laser ranging to record the 3D coordinates, reflectivity and texture information of many dense points on the surface of industrial scenes.

[0033] (2) Depth camera: Uses a near-infrared laser to project light onto objects with structural features and collects depth information through an infrared camera.

[0034] (3) Stereo camera: Two cameras are used to obtain two images of the industrial scene from different positions. The 3D coordinates of each point are calculated by calculating the positional deviation of the corresponding points and using the principle of triangulation.

[0035] (4) Optical camera multi-view reconstruction: Provides a set of corresponding multiple images and their feature points to estimate the position of 3D points and camera pose.

[0036] The above exemplary description provides typical examples of industrial scenarios and typical examples of obtaining point cloud data from industrial scenarios. Those skilled in the art will recognize that this specification is exemplary and is not intended to limit the scope of this disclosure.

[0037] Step 102: Determine the domain of the industrial scenario.

[0038] A field refers to the scope, region, professional activity, or department corresponding to an industrial scenario. Preferably, a field can have a multi-level structure. Multiple industrial fields can be divided according to different classification methods. For example, industry can be divided into the following fields by sector: metallurgical industry, power industry, coal and coking industry, petroleum industry, chemical industry, machinery industry, building materials industry, forestry industry, food industry, textile, sewing, and leather manufacturing industry, and other fields. For example, based on the nature of the products, industry can be divided into the following fields: light industry and heavy industry. For example, based on the relative intensity of labor, capital, and technology in various industries, industrial fields can be further divided into the following fields: labor-intensive industry, capital-intensive industry, and technology-intensive industry. Alternatively, according to an industrial classification catalog, industrial fields can be divided into main categories, intermediate categories, and subcategories. Each main category, intermediate category, subcategory, or combination thereof can form a field.

[0039] For example, the industrial sector can include industrial sectors (electronics, robotics, warehousing and office, logistics, etc.) as well as sub-levels under industrial sectors (e.g., assembly stations, conveyors, etc. in the logistics industry).

[0040] In one embodiment, step 102 includes: receiving user input including descriptive information about a domain; and determining the domain based on the descriptive information.

[0041] For example, if the text description information received from user input is "warehousing and office", then the domain of the industrial scenario can be determined to be warehousing and office.

[0042] In one embodiment, step 102 includes: inputting point cloud data into a trained domain determination model; and obtaining the domain from the output of the trained domain determination model. In this embodiment, the domain determination model is pre-trained. The training method for the domain determination model includes: inputting first point cloud training data with domain labels into a first neural network model; receiving a first classification result of the domain to which the first point cloud training data belongs from the first neural network model; determining a first loss function value based on the difference between the domain label and the first classification result; configuring the model parameters of the first neural network model so that the first loss function value is lower than a first preset threshold; and determining the configured first neural network model as a trained domain determination model.

[0043] Therefore, artificial intelligence can also be used to generate domain-specific models, thereby improving work efficiency.

[0044] Step 103: Obtain a trained semantic segmentation model associated with the domain from the model library, wherein the model library includes multiple trained semantic segmentation models associated with the corresponding domain.

[0045] After determining the domain of the industrial scenario in step 102, trained semantic segmentation models associated with the domain can be retrieved from a model library, for example, by using the domain as a search term. For instance, assuming the domain determined in step 102 is an assembly station in the logistics industry, trained models suitable for performing semantic segmentation on assembly stations in the logistics industry can be retrieved from the model library.

[0046] The model library contains multiple trained semantic segmentation models associated with corresponding domains. Each trained semantic segmentation model is predefined based on prior knowledge, such as industrial plant layout and on-site asset categories. When multiple domains exist (electronics, robotics, warehousing and office, processing industry, etc.), specific levels can be further subdivided within each domain to provide more accurate semantic segmentation models for specific situations. There can be multiple subdivision levels. For example, factory-level semantic segmentation can be trained. Then, factory-level semantic segmentation models (e.g., assembly stations, machines, conveyors) can be trained in the production area of ​​the factory, and factory-level semantic segmentation models (e.g., storage piles, forklifts, material containers) can be trained in the storage area of ​​the factory. Following this hierarchical structure, more domains can be defined, and the corresponding trained models can be saved to the model library.

[0047] Step 104: Perform semantic segmentation on the point cloud data based on a domain-associated trained semantic segmentation model to obtain multiple segmentation objects.

[0048] Here, semantic segmentation of point cloud data can be performed directly along the voxel dimension to form segmented objects. Preferably, a first pair of point cloud data undergoes planar segmentation, and a second pair of multiple planes undergoes semantic classification to form segmented objects. Performing planar segmentation on the point cloud data first, followed by semantic classification, significantly reduces computational complexity.

[0049] In one embodiment, performing semantic segmentation on point cloud data to obtain multiple segmentation objects includes: performing planar segmentation on the point cloud data to obtain multiple planes; classifying the multiple planes based on a trained semantic segmentation model; and determining multiple segmentation objects, each segmentation object including planes of the same classification.

[0050] Step 105: Mesh out multiple segmented objects.

[0051] In one embodiment, meshing multiple segmented objects includes: assessing the importance of a segmented object based on the category to which it is classified; determining the resolution for the segmented object based on its importance; and meshing the segmented objects based on the resolution.

[0052] Preferably, determining the appropriate resolution includes at least one of the following criteria: (1) When the importance assessment results indicate that the more important the segmentation object is, the higher the resolution of the segmentation object.

[0053] (2) When the importance assessment results indicate that the less important the segmentation object is, the lower the resolution of the segmentation object is.

[0054] For example, suppose the categories of segmented objects include: machines, workpieces, and workbenches. Based on prior knowledge or user input, machines and workpieces are determined to be of high importance, while workbenches are of low importance. Therefore, high-resolution meshing is performed for segmented objects classified as machines and workpieces, while low-resolution meshing is performed for segmented objects classified as workbenches, which reduces the complexity of meshing.

[0055] Step 106: Generate a 3D model of the industrial scene based on multiple segmented objects in a mesh.

[0056] In one embodiment, the method includes a process of training a semantic segmentation model. The process includes: inputting second point cloud training data with semantic labels into a second neural network model, wherein the second point cloud training data includes multi-dimensional contextual features and is associated with a domain; receiving semantic segmentation results of the second point cloud training data from the second neural network model; determining a second loss function value based on the difference between the semantic labels and the semantic segmentation results; configuring model parameters of the second neural network model such that the second loss function value is lower than a second preset threshold; determining the configured second neural network model as a domain-associated trained semantic segmentation model; and storing the domain-associated trained semantic segmentation model in a model library.

[0057] The following is a demonstration of the training process using domain names as production areas in a logistics warehouse. First, point cloud training data acquired from the production areas is input into a neural network model. This point cloud training data includes multi-dimensional contextual features and semantic labels for corresponding objects (e.g., assembly stations, machines, conveyors, etc.) within the production areas. Then, semantic segmentation results, including object classification, are received from the neural network model. Furthermore, before determining the configured neural network model as a trained semantic segmentation model associated with the domain of the production areas in the logistics warehouse, a loss function value is determined based on the difference between the semantic labels and the semantic segmentation results, and the model parameters of the neural network model are configured so that the loss function value meets predetermined requirements, such as being below a preset threshold.

[0058] Therefore, when a new domain emerges, a new model corresponding to that domain can be trained and extended to the model library. Providing a flexible domain-specific model library to meet the needs of different scenarios improves modeling efficiency.

[0059] Figure 2 This is a schematic diagram of an exemplary workflow for generating a 3D model of an industrial scene according to embodiments of the present disclosure. Figure 2 As shown, the workflow illustrates the entire process of point cloud segmentation and meshing. The workflow includes: Step 201: Load point cloud data for industrial scenarios.

[0060] Step 202: Determine the domain of the industrial scenario based on the domain configuration description. For example, the domain could be an assembly station in the logistics industry.

[0061] Step 203: Determine the segmentation method based on the configuration information. For example, the configured segmentation method can be a planar segmentation method. In addition, the planar segmentation method can include: (1) a planar segmentation algorithm based on region growing; (2) a deep learning algorithm based on the Point Net++ backbone network; (3) a scene semantic segmentation method based on unsupervised contrastive learning, etc.

[0062] Step 204: Obtain the importance level corresponding to the category to which the corresponding object belongs.

[0063] Step 205: Perform planar segmentation on the point cloud data using the configured segmentation method.

[0064] Step 206: Obtain a semantic segmentation model associated with the domain from the model library, and perform semantic classification on multiple planes after segmentation to obtain multiple classified planes.

[0065] Step 207: Perform meshing on multiple classified planes based on the corresponding resolution determined by the importance level.

[0066] Step 208: Perform a grouping process on these classified planes to form a 3D model.

[0067] In steps 201 and 202-204: The input required for point cloud processing is loaded. The domain configuration determines which semantic segmentation model will be used in the model library, the mesh resolution configuration determines the mesh resolution for a specific object category, and the configured segmentation method determines the method used for segmentation.

[0068] In step 205, the plane segmentation model processes the input point cloud into basic elements of planar shapes. In step 206, a domain-specific semantic segmentation model classifies the planes into different object categories. To train the semantic segmentation model, a machine learning-based classifier can be applied here to take various plane inputs. The core idea is to use the contextual features of the planes (including geometric features) to classify objects; the machine learning algorithm is not limited to this. If the contextual features of points and planes can be learned in a deep neural network, a deep learning-based model can also be applied here. After plane classification, an optional Euclidean distance clustering algorithm can be applied to post-process the classified planes and nearby smaller planes to obtain a more accurate classification of the segmented planes as a complete object with the same semantic label. Then, in steps 207–208, the voxel-based planes can be triangulated and meshed using texture for the next 3D model output.

[0069] Figure 3 This is a flowchart illustrating the execution of planar segmentation according to embodiments of the present disclosure. (e.g.) Figure 3 As shown, the flowchart includes: In step 301: An octree data structure is created for the original point cloud to organize all unordered points into a standard voxel-based octree structure.

[0070] In step 302: all input points are indexed by voxels, and the eigenvalues ​​and eigenvectors of each voxel are calculated from the points in the voxel.

[0071] In step 303: After the point cloud is voxelized, a region growing algorithm can be used to process the voxels together into a voxel-based plane. The growth rule is to calculate the dot product of the normal values ​​of every two neighboring voxels and use the cosine of the angle between them that is higher than a predefined threshold (default value 0.9) so that the result of the growth is a cluster of points in the planar shape.

[0072] In step 304: all the obtained planes can be refined by merging neighboring planes into a larger plane based on the consistent normal values ​​of the two planes, and planes with very few points can also be removed. Finally, the output of the plane segmentation model is a clean and organized voxel-based plane.

[0073] Figure 7A This is a schematic diagram of a transected plane having a calculated normal vector according to an embodiment of the present disclosure. Figure 7B This is a schematic diagram of the final output of planar segmentation according to an embodiment of the present disclosure.

[0074] Figure 4 This is a flowchart illustrating the process of performing planar meshing and grouping to generate a 3D model according to embodiments of this disclosure. Figure 4 As shown, the flowchart for performing planar meshing and grouping to generate a 3D model includes: In step 401: The Deloni triangulation algorithm is proposed to process voxel-based planes. It reflects points in space onto the plane, performs triangulation on the 2D points, and then reflects them back into 3D space. Since the input is already points in a 2D planar structure, this process achieves more efficient (fewer point computations per plane) and higher quality (avoiding noise in 3D space) triangulation. Optionally, the minimum number of triangles and the triangle size in a plane can be adjusted according to the mesh resolution configuration.

[0075] In step 402: The texture of the plane can be extracted from the raw RGB values ​​of the pixels and created by mapping the pixels to points in the segmented plane. Optionally, the texture resolution can be adapted to a mesh resolution configuration.

[0076] In step 403: A plane with a textured mesh is created using the desired 3D representation. Optionally, the planar mesh can be extracted, optimized, and grouped into a complete object, with hole filling performed on the surface.

[0077] Figure 7C This is a schematic diagram of the triangulation result of a segmented plane according to an embodiment of the present disclosure. Figure 7D This is a schematic diagram of a 3D model according to an embodiment of the present disclosure.

[0078] The following describes an exemplary process for training a semantic segmentation model. Machine learning methods can be used to implement a context-based training process that requests consideration of context-based features of points when training a point cloud segmentation model. Many unsupervised or supervised methods exist for implementing such model training; the algorithms or methods are not limited. Here, an example of context-based training is presented using a random forest classifier to train a model for an electronic domain with factory-level objects.

[0079] Figure 5 This is a flowchart illustrating the training of a semantic segmentation model according to embodiments of the present disclosure. The flowchart for training a semantic segmentation model includes: In step 501: A plane classification model is trained based on the contextual features of the segmented planes, which are obtained from the input points by running the plane segmentation model. This means that a training dataset can be generated from the plane segmentation model, and context-based and geometric features can be automatically computed for the next labeling job, with training requiring only 200-300 labeled samples from the total number of planes generated from the entire factory (typically more than 10,000 planes).

[0080] In step 502: Train a model to classify factory-level objects used for feature engineering. The types and features of the planes required as input variables for training are shown in Table 1:

[0081] Table 1

[0082] The definition of variables and relevant contextual features is important for training a model with high accuracy. There are point-based features, planar geometry-based features, and spatial features used for training inputs, which together are correlated with the type of plane.

[0083] (1) Point-based features: 1. Bulk density is the normalized point density of a planar volume.

[0084] 2. A point in a plane is a set of points in a plane based on a predefined density standard.

[0085] 3. Adjacent points are the normalized number of points in all adjacent planes within a radius of 0.02m.

[0086] (2) Based on the characteristics of planar geometry: 1. The maximum plane length is the maximum length of the plane.

[0087] 2. The ratio of length to width is the ratio of the maximum length and the second longest length of the plane.

[0088] 3. The slope of a plane is the normalized value of the ratio of the maximum length along the z-axis to the length of the plane.

[0089] 4. The normal value is the normal vector in (x, y, z).

[0090] 5. The RGB value is the average of the RGB values ​​of all points in the plane.

[0091] (3) Spatial characteristics: 1. Maximum spatial height is the normalized value of the maximum plane height in the entire point cloud space.

[0092] 2. Minimum spatial height is the normalized value of the minimum plane height in the entire point cloud space.

[0093] The above exemplary description provides typical examples of multidimensional contextual features, and those skilled in the art will recognize that this specification is exemplary and not intended to limit the scope of protection of the embodiments disclosed herein.

[0094] In step 503: Train the random forest classifier. Random forest is a combined supervised learning method. In random forest, multiple prediction models are generated simultaneously, and the results of these models are combined to improve the accuracy of the prediction model. During training, the maximum tree depth is set to 40, the number of trees is set to 200, and the split variable value is set to 4. At the end of training, the out-of-bag (OOB) error is approximately 0.05.

[0095] In step 504: Generate a 3D model. The model can be used after step 503 (running the plane segmentation model) in the system workflow. The planes grown from the input point cloud can be classified with semantic labels using a trained model. Note that very small planes (less than 1000 points) and planes with low confidence (less than 0.7) are not adopted with predicted labels. These unadopted planes are then grouped with the picked classified planes into complete 3D objects using Euclidean distance clustering.

[0096] Figure 7E This is a schematic diagram of semantic classification results generated by a trained random forest model according to an embodiment of the present disclosure.

[0097] Options for user practice of embodiments of this disclosure may include (1) allowing the user to select a UI mode for GUI operation or command-line operation; (2) allowing the user to select an input file and preview the input; (3) allowing the user to configure the model and method based on the input point cloud; and (4) allowing the user to view and save the results. All of the above processes can also be implemented by running a program API for integration with external 3D software.

[0098] Figure 6 This is a structural diagram of a system for generating 3D models of industrial scenes according to embodiments of the present disclosure.

[0099] like Figure 6 As shown, system 10 includes: an interface module 11 configured to acquire point cloud data for industrial scenarios; a model library 12 configured to store multiple trained semantic segmentation models associated with corresponding domains; a segmentation module 13 configured to determine the domain of the industrial scenario, acquire the trained semantic segmentation models associated with the domain from the model library 12, and perform semantic segmentation on the point cloud data based on the trained semantic segmentation models associated with the domain to obtain multiple segmented objects; and a generation module 14 configured to mesh the multiple segmented objects and generate a 3D model of the industrial scenario based on the meshed multiple segmented objects.

[0100] In one embodiment, the segmentation module 13 includes a domain selector 131 configured to input point cloud data into a trained domain determination model and obtain a domain from the output of the trained domain determination model.

[0101] In one embodiment, system 10 includes a training module 16 configured to: input first point cloud training data with domain labels into a first neural network model; receive a first classification result of the domain to which the first point cloud training data belongs from the first neural network model; determine a first loss function value based on the difference between the domain label and the first classification result; configure the model parameters of the first neural network model so that the first loss function value is lower than a first preset threshold; and determine the configured first neural network model as a trained domain determination model.

[0102] Preferably, the training module 16 is configured to: input second point cloud training data with semantic labels into a second neural network model, wherein the second point cloud training data includes multi-dimensional contextual features and is associated with a domain; receive semantic segmentation results of the second point cloud training data from the second neural network model; determine a second loss function value based on the difference between the semantic labels and the semantic segmentation results; configure the model parameters of the second neural network model so that the second loss function value is lower than a second preset threshold; determine the configured second neural network model as a domain-associated trained semantic segmentation model; and store the domain-associated trained semantic segmentation model in the model library 12.

[0103] The following is a detailed description of the modules in System 10.

[0104] Interface Module 11: This module is used for input data processing, which can come from a file system or any other stream from various point cloud formats and output providers. The output can be a structured model generated in various formats. The input loader supports point cloud data in different formats such as E57 files, PLY files, and XYZ files. The output results can be saved as various 3D models, for example, in OBJ files, IGES files, and JT file formats.

[0105] User guidance for domain and context selection can be implemented as a UI mode switcher 18 (supporting both GUI and command-line input). User guidance can come from general statements about the domain (e.g., the electronics assembly industry, factory level) or more precise statements (e.g., the electronics assembly industry, storage area).

[0106] Point Cloud Processing Module: This is the core module of the system, comprising a segmentation module 13, a generation module 14, and a model library 12. The model library 12 is a collection of models trained with different segmentation and meshing algorithms based on specific domains of industrial scenes. The segmentation module 13 provides a domain selector 131 to select a suitable model for the input point cloud scene. The generation module 14 provides a resolution adapter 141 to the user to configure the meshing resolution of segmented objects (objects) based on semantic classification. The meshing resolution can offer high, medium, and low options, and the resolution adapter 141 can adjust the triangulation and texture processing algorithms accordingly, and can configure the resolution for some specific objects. For example, a segmented object with the semantic label "chair" can be assigned a high resolution, while the object "pillar" can be assigned a low resolution.

[0107] Significantly, prior knowledge from general industrial plant layouts and on-site asset categories is leveraged to predefine semantic segmentation models associated with the corresponding domains. In the detailed design, the domain selector 131 provides options for the trained models hierarchically. Various industrial domains exist (electronics, robotics, warehousing and office scenarios, processing industries, etc.), and importantly, within each domain, user-configurable segmentation levels are defined to provide more accurate models in specific situations. Multiple levels can be provided, typically training plant-level models for object categories such as floors / walls / ceilings / pipes / pillars / long production lines / large storage. Then, at this level, a scene-level model can be trained in the volume of production areas (such as assembly stations, machines, conveyors), and another scene-level model can be trained in the volume of warehousing areas (such as storage piles, forklifts, material containers). Following this hierarchical structure, more scenarios can be defined, and corresponding models can be trained into the model library 12.

[0108] Training Module 16: This module supports expanding the model library 12 with more domain-specific models. Essentially, all models in the model library 12 are trained based on the contextual features in Training Module 16, but different semantic segmentation methods can be applied according to specific domains and requirements, such as planar segmentation algorithms based on region growing, deep learning algorithms based on the PointNet++ backbone, and scene semantic segmentation methods based on unsupervised contrastive learning. System 10 can have built-in trained models for general domains such as electronics, robotics, warehousing, and office scenarios, and also provides the possibility of expanding the model library 12 within Training Module 16.

[0109] API Library 17: This is a module with a collection of APIs for various functions from Interface Module 11, UI Model Switcher 18 and Point Cloud Processing Module 15, which can be easily accessed by external industrial 3D modeling software (e.g., Siemens ProcessSimulate or NX software), and it also supports integration into the software as a plug-in for direct use.

[0110] Based on the above description, the embodiments of this disclosure present at least one of the following advantages: (1) A novel planar point cloud processing method connects the segmentation and meshing processes within a single system, making point cloud processing more efficient and flexible. It processes large-scale inputs (even exceeding 100 million points) much faster because the workload of processing multiple planes is significantly reduced compared to processing with a large number of points. The planar growth process can also remove noise interference for further classification and triangulation processing. Planar machine learning has the following advantages: it is a lightweight model and easy to implement. It requires only a small training dataset and minimal labeling work; the input features of the plane are interpretable and readily available and computationally efficient.

[0111] (2) Support for industrial context processing: hierarchical from the plant level to the object level. And various industry-specific domain-oriented models provide flexible and scalable processing from raw data to desired results. In particular, we propose an adoptable processing and automatic alignment of machine learning models and a processing flow for target content.

[0112] (3) Modular componentization to support various implementation architectures and integration into any engineering workflow.

[0113] Embodiments of this disclosure also propose an electronic device having a processor-memory architecture. Figure 8 This is an exemplary structural diagram of an electronic device having a processor-memory architecture according to embodiments of the present disclosure. Figure 8 As shown, the electronic device 800 includes a processor 801, a memory 802, and a computer program stored on the memory 802 that can run on the processor 801. When the computer program is executed by the processor 801, a method for generating a 3D model of an industrial scene as described in any of the preceding embodiments is implemented. The memory 802 can be implemented as various storage media, such as electrically erasable programmable read-only memory (EEPROM), flash memory, programmable programmable read-only memory (PROM), etc. The processor 801 can be implemented as including one or more central processing units (CPUs) or one or more field-programmable gate arrays (FPGAs), wherein the FPGA integrates one or more CPU cores. Specifically, the central processing unit or core can be implemented as a CPU, MCU, DSP, etc.

[0114] It should be noted that not all steps and modules in the above process and structure diagrams are necessary, and some steps or modules may be omitted as needed. The execution order of each step is not fixed and can be adjusted as required. The division of each module is merely for the convenience of describing the functional division used. In actual implementations, a module can be divided into multiple modules, and the functions of multiple modules can also be implemented by the same module. These modules can be in the same device or different devices.

[0115] The hardware modules in each implementation can be implemented mechanically or electronically. For example, a hardware module may contain specially designed permanent circuitry or logic devices (e.g., a dedicated processor, such as an FPGA or ASIC) to perform a specific operation. A hardware module may also contain programmable logic devices or circuitry (e.g., a general-purpose processor or other programmable processor) temporarily configured by software to perform a specific operation. The specific use of mechanical methods, or the use of dedicated permanent circuitry, or the use of circuitry (e.g., software configuration) temporarily configured to implement the hardware module, can be determined based on cost and time considerations.

[0116] The above are merely preferred embodiments of this disclosure and are not intended to limit the scope of protection of this disclosure. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A method for generating 3D models of industrial scenes, comprising: Acquire (101) point cloud data for industrial scenarios; Determine the domain of the industrial scenario described in (102); Obtain (103) trained semantic segmentation models associated with the domain from a model library, wherein the model library includes multiple trained semantic segmentation models associated with the corresponding domain; Based on the trained semantic segmentation model associated with the domain, semantic segmentation (104) is performed on the point cloud data to obtain multiple segmentation objects; Mesh the plurality of segmented objects (105); and The 3D model of the industrial scene described in (106) is generated based on multiple segmented objects in a grid.

2. The method according to claim 1, wherein determining the domain of the industrial scenario (102) includes: Receive user input including descriptive information about the field; The domain is determined based on the descriptive information.

3. The method according to claim 1, wherein determining the domain of the industrial scenario (102) includes: The point cloud data is input into a trained domain determination model; The domain is obtained from the output of the trained domain determination model.

4. The method according to claim 3, comprising: The first point cloud training data with domain labels is input into the first neural network model; Receive the first classification result of the domain to which the first point cloud training data belongs from the first neural network model; The first loss function value is determined based on the difference between the domain label and the first classification result; Configure the model parameters of the first neural network model so that the value of the first loss function is lower than a first preset threshold; The configured first neural network model is identified as the trained domain determination model.

5. The method of claim 1, wherein meshing the plurality of segmented objects (105) comprises: The importance of each segmented object is evaluated based on the category to which it is classified. The corresponding resolution for the corresponding segmentation object is determined based on the corresponding importance of the corresponding segmentation object; The corresponding segmented object is meshed based on the corresponding resolution used for the corresponding segmented object.

6. The method of claim 5, wherein determining the corresponding resolution includes at least one of the following criteria: The higher the resolution of the segmented object, the more important the importance assessment results indicate. The less important the segmentation object is, the lower its resolution.

7. The method according to any one of claims 1 to 6, comprising: Perform plane segmentation on the point cloud data to obtain multiple planes; The multiple planes are classified based on the trained semantic segmentation model. Identify multiple segmentation objects, each of which includes planes of the same category.

8. The method according to any one of claims 1 to 6, the method comprising the process of training a semantic segmentation model, the process comprising: The training data of the second point cloud with semantic labels is input into the second neural network model, wherein the training data of the second point cloud includes multi-dimensional contextual features and is associated with the domain. Receive the semantic segmentation result of the second point cloud training data from the second neural network model; The value of the second loss function is determined based on the difference between the semantic label and the semantic segmentation result; Configure the model parameters of the second neural network model so that the value of the second loss function is lower than a second preset threshold; The configured second neural network model is determined to be a trained semantic segmentation model associated with the domain; The trained semantic segmentation model associated with the domain is stored in the model library.

9. A system for generating 3D models of industrial scenes, comprising: Interface module (11), which is configured to acquire point cloud data for industrial scenarios; A model library (12) is configured to store multiple trained semantic segmentation models associated with the corresponding domain; The segmentation module (13) is configured to determine the domain of the industrial scene, obtain a trained semantic segmentation model associated with the domain from the model library (12), and perform semantic segmentation on the point cloud data based on the trained semantic segmentation model associated with the domain to obtain multiple segmentation objects. as well as The generation module (14) is configured to mesh the plurality of segmented objects and generate a 3D model of the industrial scene based on the meshed plurality of segmented objects.

10. The system of claim 9, wherein the segmentation module (13) includes a domain selector (131) configured to input the point cloud data into a trained domain determination model and obtain the domain from the output of the trained domain determination model.

11. The system according to claim 10, comprising a training module (16), the training module being configured to: input first point cloud training data having domain labels into a first neural network model; and receive from the first neural network model a first classification result of the domain to which the first point cloud training data belongs; The first loss function value is determined based on the difference between the domain label and the first classification result; Configure the model parameters of the first neural network model so that the value of the first loss function is lower than a first preset threshold; and determine the configured first neural network model as the trained domain determination model.

12. The system according to claim 11, wherein the training module (16) is configured to: input second point cloud training data with semantic labels into a second neural network model, wherein the second point cloud training data includes multi-dimensional contextual features and is associated with a domain; receive semantic segmentation results of the second point cloud training data from the second neural network model; determine a second loss function value based on the difference between the semantic labels and the semantic segmentation results; configure the model parameters of the second neural network model so that the second loss function value is lower than a second preset threshold; determine the configured second neural network model as a trained semantic segmentation model associated with the domain; and store the trained semantic segmentation model associated with the domain in the model library (12).

13. An electronic device comprising a processor (801) and a memory (802), wherein an application program executable by the processor (801) is stored in the memory (802) to cause the processor (801) to execute a method for generating a 3D model of an industrial scene according to any one of claims 1 to 8.

14. A computer-readable medium comprising computer-readable instructions stored thereon, wherein the computer-readable instructions are configured to perform a method for generating a 3D model of an industrial scene according to any one of claims 1 to 8.

15. A computer program product comprising a computer program, which, when executed by a processor, performs a method for generating a 3D model of an industrial scene according to any one of claims 1 to 8.