A method, device and medium for fast retrieval of a three-dimensional model

By performing image segmentation and feature fusion on a user-drawn partial 2D sketch, a second key feature is generated, which solves the problem of insufficient retrieval accuracy caused by incomplete sketches in existing technologies. This enables fast and accurate retrieval from incomplete sketches to target 3D models, improving design efficiency and user experience.

CN121808097BActive Publication Date: 2026-06-26SHANDONG HUAYUN 3D TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG HUAYUN 3D TECH CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In the existing technology, sketch-based 3D model retrieval methods require users to draw relatively complete and accurate sketches. However, due to users' vague memories or incomplete initial design, only partial or incomplete sketches can be provided during retrieval, making it difficult to accurately match the user's design requirements.

Method used

By segmenting a portion of the user's 2D sketch into multiple sub-images, mapping and attention calculations are performed on each sub-image to generate initial features, which are then fused into a second key feature. This feature is then used to perform similarity matching in a feature database, thereby quickly and accurately retrieving a 3D model that matches the user's design intent from a massive database.

Benefits of technology

It significantly improves the accuracy and efficiency of 3D model retrieval, allowing users to start the search from incomplete sketches, gradually find the target model by iteratively adding details, and improves design efficiency and user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121808097B_ABST
    Figure CN121808097B_ABST
Patent Text Reader

Abstract

The application discloses a three-dimensional model fast retrieval method, equipment and medium, and relates to the computer technical field. The method comprises the following steps: in response to receiving a part of a two-dimensional sketch of a three-dimensional model drawn by a user, performing image segmentation on the part of the two-dimensional sketch to obtain a plurality of sub-images; performing mapping processing on each sub-image to obtain an initial feature corresponding to each sub-image, and performing attention calculation on the initial features corresponding to the plurality of sub-images to obtain a plurality of first key features; fusing the plurality of first key features to obtain a second key feature corresponding to the part of the two-dimensional sketch; based on the second key feature, performing similarity matching on a feature database to obtain a plurality of target features, and searching a three-dimensional model set corresponding to the plurality of target features from a three-dimensional model database. In this way, the target three-dimensional model meeting the design intention of the user can be quickly and accurately retrieved according to the part of the sketch of the model drawn by the user, so that the design efficiency and the user experience are significantly improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, device and medium for rapid retrieval of three-dimensional models. Background Technology

[0002] With the widespread application of computer-aided design (CAD) and 3D modeling technologies, users have an increasing demand for quickly and accurately retrieving models that match their design intent from massive 3D model libraries. Keyword or tag-based search methods are limited in the field of CAD due to their difficulty in accurately describing complex geometric shapes. However, in the field of 3D modeling, users prefer to intuitively express their search intent by drawing simple 2D outlines, making sketch-based 3D model retrieval methods a focus of widespread attention.

[0003] In related technologies, sketch-based 3D model retrieval methods often require users to draw relatively complete and accurate sketches. However, in practical application scenarios, due to reasons such as users' vague memory, focusing only on local features of the model, or incomplete initial design concepts, only partial or incomplete model sketches can be provided during retrieval. As a result, the accuracy of the retrieved model matching is insufficient, making it difficult to accurately meet the user's design needs. Summary of the Invention

[0004] This application provides a method, device, and medium for rapid retrieval of 3D models to solve the following technical problem: how to quickly and accurately retrieve a target 3D model that conforms to the user's design intent based on a partial sketch of the model drawn by the user.

[0005] In a first aspect, embodiments of this application provide a method for rapid retrieval of three-dimensional models, the method comprising:

[0006] In response to receiving a partial two-dimensional sketch of a three-dimensional model drawn by a user, the partial two-dimensional sketch is image segmented to obtain multiple sub-images;

[0007] Each of the sub-images is mapped to obtain the initial features corresponding to each sub-image, and attention is calculated on the initial features corresponding to multiple sub-images to obtain multiple first key features;

[0008] By fusing multiple first key features, a second key feature corresponding to the partial two-dimensional sketch is obtained;

[0009] Based on the second key feature, similarity matching is performed on the feature database to obtain multiple target features, and a set of three-dimensional models corresponding to the multiple target features is searched from the three-dimensional model database. The set of three-dimensional models contains one or more three-dimensional models.

[0010] Secondly, embodiments of this application also provide a rapid retrieval device for three-dimensional models, the device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform a rapid retrieval method for three-dimensional models as described above.

[0011] Thirdly, embodiments of this application also provide a computer storage medium storing computer-executable instructions, which, when executed, implement a method for rapid retrieval of a three-dimensional model as described above.

[0012] The present application provides a method, device, and medium for rapid retrieval of three-dimensional models, which has the following beneficial effects:

[0013] First, image segmentation is performed on a portion of the 2D sketch of the 3D model to obtain multiple sub-images. This transforms continuous, unstructured visual information into machine-processable discretized units, laying the foundation for subsequent refined feature extraction. Next, each sub-image undergoes mapping processing, abstracting low-level pixel information into high-dimensional initial features. Attention is then applied to these initial features to obtain multiple first key features containing local details and global structure of the partial 2D sketch. Subsequently, these dispersed first key features are fused, compressing the model's massive and complex spatial information into a highly condensed and standardized global vector, yielding second key features that represent the entire partial 2D sketch. This provides data support for rapid and accurate feature comparison from a massive database. Then, similarity matching is performed on a pre-constructed feature database based on the second key features to filter out multiple target features highly similar to the partial 2D sketch. Based on the correspondence between target features and 3D models, highly similar 3D models in the 3D model database are determined for user selection. This allows for rapid and accurate retrieval of target 3D models that match the user's design intent based on the user-drawn partial sketch, significantly improving design efficiency and user experience. Attached Figure Description

[0014] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0015] Figure 1 A flowchart illustrating a fast retrieval method for three-dimensional models provided in this application embodiment;

[0016] Figure 2 This is an overall structural diagram of the fast retrieval method for three-dimensional models provided in the embodiments of this application;

[0017] Figure 3 This is a schematic diagram of the internal structure of a rapid retrieval device for three-dimensional models provided in an embodiment of this application. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0019] It is understood that in the embodiments of this disclosure, data related to user information (such as user accounts) is involved. When the embodiments of this disclosure are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with relevant laws, regulations and standards.

[0020] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of this disclosure only and is not intended to be limiting of this disclosure.

[0021] In the following description, the terms “first, second, ...” are used merely to distinguish similar objects and do not represent a specific ordering of objects. It is understood that “first, second, ...” may be interchanged in a specific order or sequence where permitted, so that the embodiments of this disclosure described herein can be implemented in an order other than that illustrated or described herein.

[0022] With the widespread application of computer-aided design (CAD) and 3D modeling technologies, users have an increasing demand for quickly and accurately retrieving models that match their design intent from massive 3D model libraries. Keyword or tag-based search methods are limited in the field of CAD due to their difficulty in accurately describing complex geometric shapes. However, in the field of 3D modeling, users prefer to intuitively express their search intent by drawing simple 2D outlines, making sketch-based 3D model retrieval methods a focus of widespread attention.

[0023] In related technologies, sketch-based 3D model retrieval methods often require users to draw relatively complete and accurate sketches. However, in practical application scenarios, due to reasons such as users' vague memory, focusing only on local features of the model, or incomplete initial design concepts, only partial or incomplete model sketches can be provided during retrieval. As a result, the accuracy of the retrieved model matching is insufficient, making it difficult to accurately meet the user's design needs.

[0024] Based on this, embodiments of this application provide a fast retrieval method for 3D models, which can accurately retrieve target 3D models that meet the user's design intent based on the user's partial sketch of the model, thereby significantly improving design efficiency and user experience.

[0025] The technical solutions proposed in the embodiments of this application will be described in detail below with reference to the accompanying drawings.

[0026] Figure 1 This document presents a flowchart illustrating a rapid retrieval method for 3D models, provided in one or more embodiments. This method can be applied to various types of 3D modeling scenarios, such as mechanical parts design, architectural and interior design, consumer product design, reverse engineering, and model repair. Certain input parameters or intermediate results in the process can be manually adjusted to help improve accuracy.

[0027] This disclosure provides a method for rapid retrieval of 3D models. It should be noted that the execution entity in these embodiments can be a server or any terminal device with data processing capabilities. For example, the server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms. The terminal device can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, in-vehicle terminal, etc., but is not limited to these.

[0028] like Figure 1 As shown in the figure, the fast retrieval method for three-dimensional models provided in this disclosure specifically includes the following steps:

[0029] Step 101: In response to receiving a partial two-dimensional sketch of a three-dimensional model drawn by the user, perform image segmentation on the partial two-dimensional sketch to obtain multiple sub-images.

[0030] It should be noted that a partial two-dimensional sketch of a three-dimensional model refers to a complete or partial two-dimensional projection image of the target three-dimensional model from a certain preset viewpoint (such as the front view, top view, left view, etc.). Image segmentation can be performed by uniform meshing according to a fixed size (such as 4×4 pixels) and step size, or by dynamic segmentation through deformable convolution, or by segmentation through sliding windows, etc., without specific limitations here.

[0031] As an example, suppose a portion of a 2D sketch of a 3D model (e.g., 224×224 pixels) is uniformly meshed according to a fixed size (e.g., 16×16 pixels). First, the input portion of the 2D sketch is preprocessed by standardization (e.g., adjusting the brightness and contrast of the sketch) to obtain a standardized sketch. Then, the standardized sketch is segmented by an image segmentation layer. The standardized sketch (e.g., 224×224 pixels) can be cut into 14 columns of small squares in the horizontal direction and 14 rows of small squares in the vertical direction. Finally, the standardized sketch is cut into 196 sub-images.

[0032] Step 102: Perform mapping processing on each of the sub-images to obtain the initial features corresponding to each sub-image.

[0033] It should be noted that mapping an image to obtain its initial features refers to transforming low-level pixel information into high-level initial features. This can be achieved through methods such as single-layer linear mapping, multilayer perceptrons, and convolutional neural network autoencoders, without specific limitations here.

[0034] In some embodiments, step 102 described above can be implemented as follows: for each sub-image, the following processing is performed: converting the pixel data of the sub-image to a one-dimensional vector space to obtain a one-dimensional pixel feature corresponding to the sub-image; performing linear embedding processing on the one-dimensional pixel feature to obtain a high-dimensional pixel feature corresponding to the sub-image; and encoding the high-dimensional pixel feature based on the position information of each pixel in the sub-image to obtain an initial feature corresponding to the sub-image.

[0035] In this way, the pixel data of the sub-image is converted into a one-dimensional vector space, and the unstructured image data is digitized and structured to transform a two-dimensional pixel matrix into a linear digital sequence, allowing the system to perform subsequent processing in sequence. Then, the one-dimensional features are linearly embedded to map the features to a high-dimensional semantic space, so that the primary geometric information such as lines, angles, and textures represented by the pixel combinations are initially encoded, thus obtaining high-dimensional features rich in semantic information. Finally, based on the position information of each pixel in the sub-image, the generated high-dimensional features are positionally encoded, and spatial correlation information is added to each element in the high-dimensional features, which can provide a data foundation for subsequent attention mechanisms to perform relational reasoning.

[0036] As an example, suppose a 224×224 pixel 3D model is partially segmented into 196 sub-images based on a preset size (16×16 pixels). For each sub-image, the 256 pixels of the sub-image are arranged in a certain order to form a one-dimensional pixel feature X of length 256, [p1, p2, p3, ..., p256]. Then, the generated one-dimensional pixel feature X is used as input to a fully connected layer for linear embedding processing, that is, the one-dimensional pixel feature X is multiplied by a learnable weight matrix W, and the result of the multiplication is added to a learnable bias feature b, and the output is the high-dimensional pixel feature Y corresponding to the sub-image. After that, the position information of each pixel in the sub-image is encoded to generate a position vector D of the same size as the high-dimensional pixel feature Y corresponding to the sub-image, and the high-dimensional pixel feature Y is added to the position vector D element by element to obtain the initial feature F corresponding to the sub-image.

[0037] Step 103: Perform attention calculation on the initial features corresponding to the multiple sub-images to obtain multiple first key features.

[0038] It should be noted that attention calculation can be achieved through self-attention, cross-attention, sparse attention, and multi-head attention, without specific limitations here; the first key feature refers to the features extracted from each sub-image and the cross-regional information between sub-images.

[0039] In some embodiments, step 103 described above can be implemented as follows: based on a preset window size, the initial features corresponding to the multiple sub-images are divided into multiple first windows; for each first window, attention calculation is performed on the initial features contained in the first window to obtain a first key feature corresponding to the first window; based on a preset window movement size, a shift operation is performed on each first window to obtain multiple second windows; for each second window, attention calculation is performed on the initial features contained in the second window to obtain a first key feature corresponding to the second window.

[0040] In this way, the first attention calculation is performed on a window-by-window basis, which can accurately learn the close relationship between sub-images with low computational cost, thereby efficiently capturing local contextual information. At the same time, by shifting the window and performing the second attention calculation, the relationship between different windows that cannot be obtained by the first attention calculation can be captured, thereby establishing a connection between different isolated local contextual information and improving the accuracy of subsequent model retrieval.

[0041] As an example, suppose the input model sketch is segmented into multiple 4×4 pixel sub-images, each corresponding to an initial feature. First, based on a preset window size (e.g., 12×12 pixels), the system divides the nine 4×4 sub-images into a first window, meaning each first window contains nine initial features. For each first window, the system performs multi-head self-attention computation on its nine initial features to obtain the first key feature corresponding to each first window. The first key feature corresponding to the first window mainly contains the correlation between the initial features within the first window, i.e., the feature of the first window. Subsequently, based on a preset window shift size (e.g., one sub-image unit), all first windows are shifted to form multiple second windows. This shifting operation causes the initial features originally belonging to different first windows to be recombined. For each second window, the system also performs multi-head self-attention computation on its initial features to obtain the first key feature corresponding to each second window. The first key feature corresponding to the second window mainly contains the correlation between the initial features within the second window, i.e., the correlation feature between the first windows.

[0042] In some embodiments, the above-described attention calculation of the initial features contained in the first window to obtain the first key feature corresponding to the first window can be implemented in the following way: performing a linear transformation on each initial feature contained in the first window to obtain a first query vector, a first key vector, and a first value vector corresponding to each initial feature; performing attention head partitioning on the first query vector, first key vector, and first value vector corresponding to each initial feature to obtain a second query vector, a second key vector, and a second value vector corresponding to each attention head for each initial feature; performing the following processing on each initial feature of each attention head: calculating attention scores for the second query vector corresponding to the initial feature and the second key vector corresponding to each initial feature in the first window respectively to obtain the attention score between the initial feature and each initial feature in the first window; fusing the second value vector corresponding to each initial feature in the first window based on the attention score between the initial feature and each initial feature in the first window to obtain the output vector corresponding to the initial feature; concatenating the output vector corresponding to the initial feature in each attention head for each initial feature in the first window, and performing a linear transformation on the concatenation result to obtain the first key feature corresponding to each initial feature in the first window.

[0043] Thus, by generating query vectors, key vectors, and value vectors through linear transformation, a data foundation can be provided for subsequent attention calculations. By abstracting the initial features into vectors containing different information, the foundation is laid for asymmetric information exchange between features through attention calculations. Subsequently, through multi-head attention partitioning, each feature is assigned to different attention heads, and attention scores are calculated and fused within each attention head. This allows the model to quantify the correlation between features, achieve accurate information filtering and weighted aggregation, and concatenate and linearly transform the calculation results of each attention head. By comprehensively weighing the calculation results of different attention heads, the obtained first key feature is more accurate and comprehensive.

[0044] As an example, suppose the first window contains 9 sub-images. First, the system performs a linear transformation on the initial features (e.g., a 96-dimensional vector) of each sub-image within the first window to obtain the first query vector Q, the first key vector K, and the first value vector V corresponding to each initial feature. Then, the system divides the 96-dimensional Q, K, and V vectors corresponding to each initial feature into 4 attention heads, each containing a 24-dimensional sub-vector, namely {Attention Head 1: Q1, K1, V1; Attention Head 2: Q2, K2, V2; Attention Head 3: Q3, K3, V3; Attention Head 4: Q4, K4, V4}. Afterward, within each attention head, for each sub-image within that window... For each initial feature, the second query vector of the initial feature is multiplied by the second key vectors of the nine initial features within the window to obtain the attention score between the initial feature and each initial feature within the window. Then, for each initial feature, the system uses the attention score between the initial feature and each initial feature within the window as a weight to perform a weighted summation of the second value vectors corresponding to each initial feature, resulting in the output vector corresponding to the initial feature. Thus, each attention head obtains nine output vectors. Finally, for each initial feature, the output vectors corresponding to the initial feature in each attention head are concatenated and linearly transformed through a fully connected layer to obtain the first key feature corresponding to the initial feature.

[0045] In some embodiments, the above-described attention calculation of the initial features contained in the second window to obtain the first key feature corresponding to the second window can be implemented in the following way: performing a linear transformation on each initial feature contained in the second window to obtain a first query vector, a first key vector, and a first value vector corresponding to each initial feature; performing attention head partitioning on the first query vector, first key vector, and first value vector corresponding to each initial feature to obtain a second query vector, a second key vector, and a second value vector corresponding to each attention head for each initial feature; performing the following processing on each initial feature of each attention head: calculating attention scores for the second query vector corresponding to the initial feature and the second key vector corresponding to each initial feature in the second window respectively to obtain the attention score between the initial feature and each initial feature in the second window; fusing the second value vector corresponding to each initial feature in the second window based on the attention score between the initial feature and each initial feature in the second window to obtain the output vector corresponding to the initial feature; concatenating the output vector corresponding to the initial feature in each attention head for each initial feature in the second window, and performing a linear transformation on the concatenation result to obtain the first key feature corresponding to each initial feature in the second window.

[0046] Thus, by generating query vectors, key vectors, and value vectors through linear transformation, a data foundation can be provided for subsequent attention calculations. By abstracting the initial features into vectors containing different information, the foundation is laid for asymmetric information exchange between features through attention calculations. Subsequently, through multi-head attention partitioning, each feature is assigned to different attention heads, and attention scores are calculated and fused within each attention head. This allows the model to quantify the correlation between features, achieve accurate information filtering and weighted aggregation, and concatenate and linearly transform the calculation results of each attention head. By comprehensively weighing the calculation results of different attention heads, the obtained first key feature is more accurate and comprehensive.

[0047] It should be noted that the implementation method of performing attention calculation on the initial features contained in the second window to obtain the first key feature corresponding to the second window is similar to the implementation method of performing attention calculation on the initial features contained in the first window to obtain the first key feature corresponding to the first window as described above. For details, please refer to the specific implementation method of performing attention calculation on the initial features contained in the first window to obtain the first key feature corresponding to the first window, which will not be repeated here.

[0048] Step 104: Fuse multiple first key features to obtain the second key features corresponding to the partial two-dimensional sketch.

[0049] It should be noted that the above-mentioned fusion implementation methods may include global average pooling, global max pooling, flattening followed by a fully connected layer, pooling with attention mechanism, etc., and no specific limitation is made here; the second key feature refers to the overall features of the drawn part of the two-dimensional sketch.

[0050] In some embodiments, step 104 described above can be implemented as follows: performing hierarchical feature fusion on multiple first key features to obtain a third key feature containing multiple channels; performing average pooling on the third key feature to obtain a pooled feature; and performing a linear transformation on the pooled feature to obtain a second key feature corresponding to the partial two-dimensional sketch.

[0051] Thus, by performing hierarchical feature fusion on multiple primary key features, neighboring local features can be combined into more representative regional features. At the same time, the number of channels is increased to carry richer semantic information, achieving a cognitive progression from fine-grained to coarse-grained. Furthermore, the obtained key features are subjected to average pooling, and by calculating the average value of each channel across all spatial locations, the two-dimensional feature map is compressed into a one-dimensional vector, thus converging scattered information into a unified whole. This greatly improves the inclusiveness of users' drawing habits and the accuracy of matching. Moreover, a linear transformation is performed on the pooled features, accurately mapping the feature vector output by any network structure to the exact same dimension as the model features in the feature database, ensuring efficient matching for users in the future.

[0052] As an example, suppose there are 196 first key features of 96 dimensions (arranged in a 16×16 pattern). First, the system performs hierarchical feature fusion on them, through patching. Merging concatenates and linearly transforms adjacent 2×2 first key features, merging them into a new feature with doubled channel count (192 dimensions) (the third key feature), and halves the spatial resolution (i.e., halves the number of first key features), resulting in 49 192-dimensional third key features (arranged in a 7×7 pattern). Subsequently, the system performs global average pooling on the 49 192-dimensional third key features, that is, calculates the average of the 49 third key features in each dimension channel to obtain a 192-dimensional pooled feature, which contains global statistical information of the entire partial 2D sketch. Finally, the 192-dimensional pooled feature is input into a preset fully connected layer with an output dimension of 1024 for linear transformation, mapping the 192-dimensional pooled feature to a 1024-dimensional feature space optimized for retrieval tasks, resulting in a 1024-dimensional key feature, which is the second key feature corresponding to the partial 2D sketch.

[0053] Step 105: Based on the second key feature, perform similarity matching on the feature database to obtain multiple target features.

[0054] It should be noted that similarity matching of feature databases can be performed using methods such as exact nearest neighbor search, approximate nearest neighbor search, hash algorithms, and deep learning-based metric learning / end-to-end retrieval, without any specific limitations here.

[0055] In some embodiments, step 105 above can be implemented as follows: based on the second key feature, perform an approximate nearest neighbor search on the feature database to obtain multiple candidate features; based on the similarity score between each candidate feature and the second key feature, sort the multiple candidate features in descending order, and take the top M candidate features as the target features, where M is a positive integer.

[0056] Thus, by using near nearest neighbor search and leveraging a pre-built, efficient index, a massive feature database can be searched quickly. The search scope can be narrowed down to a set of candidate features with high similarity in milliseconds, which can greatly improve the feature search speed. Then, based on the similarity score between the candidate features and the second key feature, the candidate features in the candidate feature set are sorted, and the Top-M are selected as the target features. This can quickly and accurately provide users with target features similar to a partial 2D sketch, thereby quickly and accurately retrieving the target 3D model that meets the user's design intent based on the partial sketch of the model drawn by the user, significantly improving design efficiency and user experience.

[0057] It should be noted that similarity scores can be calculated using Euclidean distance, cosine similarity, or other commonly used similarity calculation methods; no specific limitations are made here.

[0058] As an example, suppose a partial 2D sketch of a 3D model yields a 1024-dimensional second key feature after feature extraction. First, based on the second key feature, the system performs an Approximate Nearest Neighbor (ANN) search on a pre-built feature database containing tens of millions of 3D model view features. This database is indexed using the HNSW algorithm. The system utilizes the hierarchical navigation structure of this index to quickly locate a highly similar candidate region among tens of millions of features and extracts 100 initially matching model view features from this region, thus obtaining multiple candidate features. Subsequently, the system calculates the pre-similarity score between the second key feature of the partial 2D sketch and each of these 100 candidate features. For example, the score with model view A might be 0.92, the score with model view B might be 0.89, while the score with model view C might only be 0.65. Finally, the system sorts these 100 similarity scores in descending order, setting M to 10. After sorting, the system uses the top 10 candidate features with the highest similarity scores as the final target features.

[0059] In some embodiments, before performing step 105 above, the following processing may also be performed: for each three-dimensional model in the three-dimensional model database, the following processing is performed respectively: projecting the three-dimensional model from multiple preset perspectives to obtain a two-dimensional wireframe view of the three-dimensional model; extracting features from the two-dimensional wireframe view of the three-dimensional model to obtain the model features corresponding to the three-dimensional model; associating the model features corresponding to the three-dimensional model with the identification information of the three-dimensional model and storing them in the feature database.

[0060] In this way, by projecting the 3D model from multiple preset perspectives into a 2D wireframe view, the overall shape of the model can be fully captured, avoiding information loss due to a single perspective. Furthermore, by extracting features from the 2D wireframe view, the model can be encoded from its geometric shape to digital information. The model's visual appearance can be encoded into a digital form that machines can understand and compare, providing a foundation for subsequent data processing. Finally, the model features are associated with and stored with the model's identification information, establishing a fast indexing channel between features and the model. This allows for quick location and retrieval of the corresponding model information based on the features for user use, thereby improving the search speed of 3D models.

[0061] It should be noted that a 2D wireframe view is a skeleton outline or line drawing of a 3D object at a specific angle. It only displays structural information such as the object's edges, outlines, and structural lines, while ignoring texture information such as the object's surface color, material, lighting, and shadows.

[0062] As an example, before a user performs a search, the system needs to preprocess the 3D model database in the background to build an efficient feature vector database. First, by traversing each 3D model in the 3D model database, the system automatically performs 2D projection along multiple preset viewpoints (such as front view, top view, left view, etc.) for each 3D model to obtain a set of 2D wireframe views that can comprehensively describe the geometric features of the 3D model. Next, all 2D wireframe views are input into the feature information extraction module for feature extraction. Using deep learning models or other image feature extraction algorithms, a high-dimensional model feature vector is generated for each 2D wireframe view. This feature vector is a digital representation of the geometric features of the view. Finally, the generated feature vectors are associated with the corresponding 3D model identifier, view angle, and other information, and stored uniformly in the feature database to provide a basis for fast matching in subsequent searches.

[0063] Step 106: Search the three-dimensional model database for a set of three-dimensional models corresponding to the multiple target features.

[0064] Here, the 3D model collection contains one or more 3D models.

[0065] In some embodiments, step 106 described above can be implemented by: querying the identification information corresponding to each target feature from the feature database; and obtaining a three-dimensional model corresponding to each target feature from the three-dimensional model database based on the identification information corresponding to each target feature.

[0066] In this way, querying the identification information corresponding to the target feature from the feature database achieves a complete separation between the retrieval logic and data delivery, which can greatly improve the speed of subsequent retrieval of the corresponding 3D model based on the user's sketch, thus ensuring retrieval performance. Furthermore, obtaining the corresponding model from the 3D model database based on the identification information enables on-demand and dynamic data loading, which can greatly save network resources and ensure that the entire system can still provide users with a smooth and efficient retrieval experience even in a massive database.

[0067] As an example, assuming the system obtains 10 target features after similarity matching, firstly, the system sends a query request to the feature database based on these 10 target features to obtain the identification information corresponding to each target feature. The feature database can return a list containing 10 IDs (i.e., identification information), for example: ID: "Chair_Model_001", ID: "Office_Chair_045", ID: "Dining_Chair_B2", ...; Subsequently, the system sends a request to a separately stored 3D model database based on the 10 IDs as query conditions. This 3D model database stores the actual model files (e.g., .step, .iges files) and their metadata. For each identification information (ID) query request, the model database will return complete information such as the corresponding 3D model file itself, thumbnail, model name, and source "open source component library A". Finally, complete information about 10 3D models can be obtained.

[0068] In some embodiments, after performing step 106, the following processing may also be performed: returning the set of 3D models corresponding to the plurality of features to the client interface for display, so that the user can browse and select. Thus, returning the set of 3D models to the client interface provides the user with clear and intuitive decision-making basis, allowing the user to directly observe, compare, and judge the physical models.

[0069] As an example, suppose the system backend completes all calculations and returns a data packet containing complete information on 10 3D models to the client. This data packet includes the name, source, a low-resolution thumbnail, and a unique identifier ID for each model. Upon receiving this data packet, the client interface renders a result grid area, where each grid cell displays a thumbnail of a model, clearly labeled with its name below the image, such as "Model 1" and its source "Open Source Part Library A". Users can browse these 10 results by scrolling or paginating. When the user's mouse hovers over a thumbnail, the client asynchronously sends a request to retrieve a medium-resolution preview of that model and displays it. The display allows users to quickly view the 3D effect of a model without loading the complete model file. If a user is interested in a particular model (e.g., a chair with the ID "Chair_Model_001"), they can click the "Browse" button on the interface. At this time, the client will instruct the background to load the 3D data of the model and display it in a separate 3D window. Users can rotate and zoom 360 degrees to fully examine its structure. Once the user confirms that this is the model they need, they can click the "Insert Model" button. The client will then use API calls to directly load the geometric data of the 3D model into the current design scene, allowing the user to immediately move, zoom, or perform further editing.

[0070] In some embodiments, when a user browses the initial results returned on the client, if the user is not satisfied with the results, the user does not need to redraw. Instead, the user can continue to add details and correct lines based on the original sketch to make the sketch more information-rich. Subsequently, the user searches again, and the system will repeat the above steps 101 to 106. This time, the input will be a sketch with richer information. Since the query feature vector contains more details, the search range is narrowed, and the similarity calculation results will be more accurate, returning more matching results. This process is iterated until the user finds a satisfactory model.

[0071] The following will describe an exemplary application of the embodiments of this disclosure in a practical application scenario.

[0072] With the widespread application of computer-aided design (CAD) and 3D modeling technologies, users have an increasing demand for quickly and accurately retrieving models that match their design intent from massive 3D model libraries. Keyword or tag-based search methods are limited in the field of CAD due to their difficulty in accurately describing complex geometric shapes. However, in the field of 3D modeling, users prefer to intuitively express their search intent by drawing simple 2D outlines, making sketch-based 3D model retrieval methods a focus of widespread attention.

[0073] In related technologies, sketch-based 3D model retrieval methods often require users to draw relatively complete and accurate sketches. However, in practical application scenarios, due to reasons such as users' vague memory, focusing only on local features of the model, or incomplete initial design concepts, only partial or incomplete model sketches can be provided during retrieval. As a result, the accuracy of the retrieved model matching is insufficient, making it difficult to accurately meet the user's design needs.

[0074] Based on this, embodiments of this application provide a fast retrieval method for 3D models, allowing users to start the retrieval from an incomplete, partial sketch, and iterate by gradually adding details until the target 3D model is accurately found, thereby significantly improving the flexibility, fault tolerance, and user experience of the retrieval.

[0075] In some embodiments, see Figure 2 , Figure 2 This is an overall structural diagram of the fast retrieval method for 3D models provided in the embodiments of this application, as shown below. Figure 2 As shown, this application comprises two stages: offline feature database construction and online iterative retrieval, and will combine... Figure 2 The processing procedures for the two stages of this application are described in detail.

[0076] During the offline feature database construction phase, before users can perform searches, the backend 3D model database needs to be preprocessed to build an efficient feature database. First, by traversing each 3D model in the database, the system automatically performs 2D projections along multiple preset viewpoints (e.g., front view, top view, left view, etc.) for each 3D model, obtaining a set of 2D wireframe views that comprehensively describe the geometric features of the 3D model. Next, all 2D wireframe views are input into the feature information extraction module for feature extraction. Using deep learning models or other image feature extraction algorithms, a high-dimensional model feature vector is generated for each 2D wireframe view. This feature vector is a digital representation of the geometric features of that view. Finally, the generated feature vectors are associated with the corresponding 3D model identifier, view angle, and other information, and stored uniformly in the feature database to provide a basis for rapid matching in subsequent searches.

[0077] In the online iterative retrieval phase, firstly, the user draws a 2D model sketch based on memory or design intent on the client's sketching interface and clicks the "Search" button to send the sketch data to the service scheduler. Next, upon receiving the request, the service scheduler creates a unique task ID and stores it in the task status database to track the retrieval progress, then forwards the task ID and the user's sketch data to the sketch search service. Subsequently, in the sketch search service, the feature extraction module processes the received sketch, using the same algorithm as in the offline feature library construction phase to generate corresponding feature vectors. The feature extraction module employs a deep neural network based on Swin Transformer (Shifted Window Transformer) as its backbone network. This network effectively captures the local details of sketch lines and the global dependencies of the overall geometric topology using a self-attention mechanism. The network performs normalization preprocessing on the input sketch image (e.g., 224×224 pixels) through patching. The Partition layer divides the image into multiple non-overlapping patches (e.g., 4×4 pixels) and maps the original pixel values ​​of each patch to initial feature embeddings (Linear Embeddings). Then, through a Shifted Window operation, multi-head self-attention (SW-MSA) is computed across the window, enabling the network to establish cross-region feature interactions and understand the relative positional relationships between lines. Thus, even when the input is a partial sketch of the model, it can still infer the underlying global geometric structure. The Patch Merging layer gradually reduces the feature resolution and increases the number of channels to generate multi-scale feature representations. Finally, the output of the last layer of the network is mapped through Global Average Pooling and a fully connected layer to generate a fixed-dimensional high-dimensional feature vector (e.g., 1024 dimensions), which serves as the feature vector of the sketch (i.e., the second key feature).Next, fuzzy similarity matching is performed based on the feature vectors of the sketch. Using the feature vectors of the sketch as input, a fast approximate nearest neighbor search is performed in the feature vector database. Candidate results with high similarity to the current incomplete user sketch in terms of set features are selected from the massive features in the database, and a list of the most similar feature vectors (i.e., multiple target features) after initial screening is returned. Here, the system uses the Milvus distributed vector database as the core retrieval engine. In the offline stage, the feature vectors generated by all 3D model views are stored in the Milvus set, and a high-performance vector index is built based on the HNSW (Hierarchical Navigable Small World) or IVF_FLAT algorithm. At the same time, the similarity calculation module inputs the query feature vectors output by the Swing Transformer into Milvus. Milvus internally calculates the Euclidean distance (L2) between the feature vectors of the sketch and the feature vectors in the feature database. For sketch retrieval scenarios, cosine similarity (or distance) can focus more on the directional consistency of feature vectors, i.e., the similarity of geometric shapes rather than the absolute difference in image pixel intensity. Milvus utilizes a pre-built index structure to perform approximate nearest neighbor search in a massive feature space. The search process uses the hierarchical graph structure of the HNSW index to quickly locate the vector clusters whose feature vector distribution is closest to that of the sketch. Even if the input is an incomplete local sketch, Milvus can still recall the complete model view vectors that highly match the local features based on the local clustering characteristics in the vector space due to the strong semantic robustness of the feature vectors extracted by SwinTransformer. It also returns the Top-K model IDs (identification information) with the highest similarity scores as the initial screening results for user confirmation or for the next round of iteration. Finally, the sketch search service returns the list of model IDs corresponding to the initial screening results to the service scheduler. The service scheduler retrieves the complete information of the model (such as model name, thumbnail, source, etc.) from the 3D model database based on the model ID. The service scheduler then returns the integrated complete search results to the client interface for display. ;

[0078] Furthermore, if a user is not satisfied with the initial results when browsing the client, they do not need to redraw the sketch. Instead, they can continue to add details and correct lines to enrich the sketch information. Then, the user searches again, and the system will repeat steps 101 to 106. This time, the input will be a more information-rich sketch. Since the query feature vector contains more details, the search range is narrowed, and the similarity calculation results will be more accurate, returning a more matching result. This process is iterated until the user finds a satisfactory model.

[0079] In summary, the rapid retrieval method for 3D models provided in this application allows users to initiate searches using incomplete or imprecise sketches, significantly reducing the requirements on users' drawing skills and memory integrity, making the user interaction more user-friendly. Furthermore, by building a feature database offline and performing online approximate nearest neighbor search, it achieves millisecond-level rapid matching and filtering from massive models, resulting in a swift response. In addition, based on a cloud-based CAD platform, the retrieved models can be reused in the current design environment with a single click, thereby greatly improving overall design efficiency.

[0080] The above are embodiments of the method proposed in this application. Based on the same inventive concept, embodiments of this application also provide a rapid retrieval device for three-dimensional models, the structure of which is as follows: Figure 3 As shown.

[0081] Figure 3 This is a schematic diagram of the internal structure of a rapid retrieval device for three-dimensional models provided in an embodiment of this application. Figure 3 As shown, the device includes:

[0082] At least one processor 201;

[0083] And a memory 202 that is communicatively connected to at least one processor;

[0084] The memory 202 stores instructions that can be executed by at least one processor. The instructions are executed by at least one processor 201 to enable at least one processor 201 to perform the steps of the method corresponding to any of the above embodiments.

[0085] Some embodiments of this application provide corresponding to Figure 1 A non-volatile computer storage medium stores computer-executable instructions configured to perform the steps of the method corresponding to any of the above embodiments.

[0086] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the embodiments for IoT devices and media are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0087] The systems, media, and methods provided in this application are one-to-one correspondences. Therefore, the systems and media also have similar beneficial technical effects as their corresponding methods. Since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the systems and media will not be repeated here.

[0088] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0089] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0090] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0091] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0092] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0093] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0094] Computer-readable media include both permanent and non-permanent, removable and non-removable media that can store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0095] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0096] The above description is merely an embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of this application should be included within the scope of the claims of this application.

Claims

1. A fast retrieval method for three-dimensional models, characterized in that, The method includes: In response to receiving a partial two-dimensional sketch of a three-dimensional model drawn by a user, the partial two-dimensional sketch is image segmented to obtain multiple sub-images; Each sub-image is mapped to obtain the initial features corresponding to each sub-image; Based on a preset window size, the initial features corresponding to the multiple sub-images are divided into multiple first windows; For each of the first windows, a linear transformation is performed on each initial feature contained in the first window to obtain a first query vector, a first key vector, and a first value vector corresponding to each initial feature; attention heads are divided for each of the first query vector, first key vector, and first value vector corresponding to each initial feature to obtain a second query vector, a second key vector, and a second value vector corresponding to each attention head for each initial feature; for each initial feature of each attention head, the following processing is performed: attention scores are calculated for the second query vector corresponding to the initial feature and the second key vector corresponding to each initial feature in the first window, respectively, to obtain the attention score between the initial feature and each initial feature in the first window; based on the attention score between the initial feature and each initial feature in the first window, the second value vector corresponding to each initial feature in the first window is fused to obtain the output vector corresponding to the initial feature; for each initial feature in the first window, the output vector corresponding to the initial feature in each attention head is concatenated, and a linear transformation is performed on the concatenation result to obtain a first key feature corresponding to each initial feature in the first window; Based on a preset window movement size, each of the first windows is shifted to obtain multiple second windows; For each of the second windows, attention is calculated on the initial features contained in the second window to obtain the first key feature corresponding to the second window; By fusing multiple first key features, a second key feature corresponding to the partial two-dimensional sketch is obtained; Based on the second key feature, similarity matching is performed on the feature database to obtain multiple target features, and a set of three-dimensional models corresponding to the multiple target features is searched from the three-dimensional model database. The set of three-dimensional models contains one or more three-dimensional models.

2. The method according to claim 1, characterized in that, The mapping process for each sub-image to obtain the initial features corresponding to each sub-image includes: For each of the sub-images, the following processing is performed: The pixel data of the sub-image is converted into a one-dimensional vector space to obtain the one-dimensional pixel features corresponding to the sub-image; The one-dimensional pixel features are linearly embedded to obtain the high-dimensional pixel features corresponding to the sub-image. Based on the position information of each pixel in the sub-image, the high-dimensional pixel features are encoded with position information to obtain the initial features corresponding to the sub-image.

3. The method according to claim 1, characterized in that, The step of fusing multiple first key features to obtain the second key feature corresponding to the partial two-dimensional sketch includes: Layered feature fusion is performed on multiple first key features to obtain a third key feature containing multiple channels; The third key feature is subjected to average pooling to obtain pooled features; A linear transformation is performed on the pooling features to obtain the second key features corresponding to the partial two-dimensional sketch.

4. The method according to claim 1, characterized in that, Based on the second key feature, similarity matching is performed on the feature database to obtain multiple target features, including: Based on the second key feature, an approximate nearest neighbor search is performed on the feature database to obtain multiple candidate features; Based on the similarity score between each candidate feature and the second key feature, the candidate features are sorted in descending order, and the top M candidate features are selected as the target features, where M is a positive integer.

5. The method according to claim 1, characterized in that, Before performing similarity matching on the feature database based on the second key feature to obtain multiple target features, the method further includes: For each 3D model in the 3D model database, the following processing is performed: The three-dimensional model is projected from multiple preset perspectives to obtain a two-dimensional wireframe view of the three-dimensional model; Feature extraction is performed on the two-dimensional wireframe view of the three-dimensional model to obtain the model features corresponding to the three-dimensional model; The model features corresponding to the 3D model are associated with the identification information of the 3D model and stored in the feature database.

6. The method according to claim 1, characterized in that, The step involves searching a set of 3D models corresponding to the multiple target features from a 3D model database. The set of 3D models includes one or more 3D models, including: Query the feature database for the identification information corresponding to each target feature; Based on the identification information corresponding to each target feature, a three-dimensional model corresponding to each target feature is obtained from the three-dimensional model database.

7. A rapid retrieval device for three-dimensional models, characterized in that, The device includes: At least one processor; And, a memory communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform a fast retrieval method for a three-dimensional model as described in any one of claims 1-6.

8. A computer storage medium storing computer-executable instructions, characterized in that, When the computer-executable instructions are executed, a method for rapid retrieval of a three-dimensional model as described in any one of claims 1-6 is implemented.