Three-dimensional entity positioning method and apparatus, and electronic device and readable medium

By performing pose rendering and data storage on 3D entities in the neural radiation field scene model, the problem of low positioning accuracy of 3D entities is solved, achieving more efficient alignment of virtual entities with real-world scenes and reducing resource waste.

WO2026139105A1PCT designated stage Publication Date: 2026-07-02LINGBAN INTELLIGENT (HANGZHOU) INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
LINGBAN INTELLIGENT (HANGZHOU) INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-02-26
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

In existing technologies, the accuracy of 3D entity localization is low due to the different map builders and virtual object placers, resulting in the omission or duplication of virtual entities and a waste of computing resources.

Method used

By performing image rendering processing on the 3D entities in the pre-created neural radiation field scene model under various poses, a set of pose image data pairs is generated and stored in the database. The positioning pose information is generated in response to the image of the entity to be located, and the positioning status of the 3D entity is displayed using the neural radiation field scene model.

Benefits of technology

It improves the accuracy of 3D entity positioning, reduces the possibility of virtual entity position deviation and omission or duplication, and reduces the waste of computing resources.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

A three-dimensional entity positioning method and apparatus, and an electronic device and a readable medium. The method comprises: performing image rendering processing on each of at least one three-dimensional entity at each pose in a pre-created neural radiance field scenario model, so as to obtain pose image data pair sets (101); storing the obtained pose image data pair sets in a preset database (102); in response to having received an entity image to be subjected to positioning sent by a preset terminal device, generating, on the basis of said entity image and the pose image data pair sets stored in the preset database, positioning pose information corresponding to said entity image (103); and on the basis of the positioning pose information and the neural radiance field scenario model, displaying, in the preset terminal device, positioning entity state information, in the neural radiance field scenario model, of a three-dimensional entity corresponding to said entity image (104). The method reduces the waste of computing power resources.
Need to check novelty before this filing date? Find Prior Art

Description

Three-dimensional solid localization methods, devices, electronic equipment and readable media

[0001] Cross-reference of related applications

[0002] This application claims priority to Chinese Patent Application No. 202411957902.4, filed with the Chinese Patent Office on December 26, 2024, the entire contents of which are incorporated herein by reference. Technical Field

[0003] The embodiments of this application relate to the field of computer technology, and more specifically to three-dimensional entity positioning methods, apparatuses, electronic devices, and computer-readable media. Background Technology

[0004] With the rise of virtual reality technology, visual positioning technology has been further developed. 3D entity positioning refers to the technology of determining the position and orientation of a physical entity in the real world within three-dimensional space. Currently, in visualization scenarios such as AR, the common method for locating 3D entities is to input a virtual representation of the physical entity into a pre-built 3D map, and then overlay this virtual entity onto the real scene on a display device (e.g., AR glasses or other electronic devices), thereby achieving visual positioning of the 3D entity.

[0005] However, when using the above method to locate 3D entities, the following technical problems often arise: When placing virtual entities on a pre-built map, since the operator who builds the map and the operator who places the virtual objects are usually not the same person, there is a discrepancy between the position of the virtual entity on the map and the position of its corresponding physical entity in the real world. This results in low accuracy of 3D entity positioning, requiring repositioning. Furthermore, during map digitization, the easy omission or duplicate entry of some virtual entities leads to omissions or duplicates in the placement of virtual entities on the map, increasing the number of positioning failures and wasting computing resources.

[0006] The information disclosed in this background section is only intended to enhance the understanding of the background of the inventive concept, and therefore may contain information that does not form prior art known to those skilled in the art. Summary of the Invention

[0007] The summary section of this application is intended to provide a brief overview of the concepts, which will be described in detail in the detailed description section below. This summary section is not intended to identify key or essential features of the claimed technical solutions, nor is it intended to limit the scope of the claimed technical solutions.

[0008] Some embodiments of this application provide three-dimensional entity positioning methods, apparatuses, electronic devices, and computer-readable media to solve one or more of the technical problems mentioned in the background section above.

[0009] In a first aspect, some embodiments of this application provide a three-dimensional entity localization method, which includes: performing image rendering processing on each three-dimensional entity in at least one three-dimensional entity in a pre-created neural radiation field scene model under various poses to obtain a pose image data pair set; storing the obtained pose image data pairs set into a preset database; responding to receiving an image of an entity to be localized sent by a preset terminal device, generating localization pose information corresponding to the image of the entity to be localized based on the image of the entity to be localized and the pose image data pairs stored in the preset database; and displaying the localization entity status information of the three-dimensional entity corresponding to the image of the entity to be localized in the neural radiation field scene model in the preset terminal device based on the localization pose information and the neural radiation field scene model.

[0010] Secondly, some embodiments of this application provide a three-dimensional entity localization device, comprising: an image rendering unit configured to perform image rendering processing on each three-dimensional entity in at least one three-dimensional entity in a pre-created neural radiation field scene model under various poses to obtain a pose image data pair set; a storage unit configured to store the obtained pose image data pairs set to a preset database; a generation unit configured to, in response to receiving an image of an entity to be localized sent by a preset terminal device, generate localization pose information corresponding to the image of the entity to be localized based on the image of the entity to be localized and the pose image data pairs stored in the preset database; and a display unit configured to, based on the localization pose information and the neural radiation field scene model, display the localization entity status information of the three-dimensional entity corresponding to the image of the entity to be localized in the neural radiation field scene model on the preset terminal device.

[0011] Thirdly, some embodiments of this application provide an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any implementation of the first aspect above.

[0012] Fourthly, some embodiments of this application provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

[0013] The 3D entity localization method of some embodiments of this application improves the accuracy of 3D entity localization. Specifically: First, image rendering processing is performed on each 3D entity in a pre-created neural radiation field scene model under various poses to obtain a set of pose image data for each 3D entity. Relying on the neural radiation field scene model, the real-world 3D scene can be more accurately reproduced, and the alignment between virtual entities and the real-world 3D scene can be achieved. This reduces positional deviations caused by differences in map construction and virtual entity placement operators, and reduces the possibility of virtual entities being missed or repeated. Second, the set of pose image data for each 3D entity in the neural radiation field scene model is stored in a preset database to establish a data foundation for subsequent rapid localization and matching. Next, in response to receiving an image of the entity to be located sent by a preset terminal device, the image of the entity to be located is matched with the set of pose image data stored in the preset database to generate localization pose information corresponding to the image of the entity to be located. Finally, based on the localization pose information and the neural radiation field scene model, the localization entity status information of the 3D entity corresponding to the image of the entity to be located in the neural radiation field scene model is displayed on the preset terminal device. Since the pre-built neural radiation field scene model can align virtual entities with the real-world 3D scene, it reduces the positional deviation between virtual entities and 3D entities in the real 3D scene, and reduces the possibility of virtual entities being omitted or repeated. Therefore, it can reduce the waste of computing resources caused by positioning failure and repositioning. Attached Figure Description

[0014] The above and other features, advantages, and aspects of the embodiments of this application will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and elements are not necessarily drawn to scale.

[0015] Figure 1 is a flowchart of some embodiments of the three-dimensional entity positioning method according to this application;

[0016] Figure 2 is a structural schematic diagram of some other embodiments of the three-dimensional entity positioning method according to this application;

[0017] Figure 3 is a schematic diagram of the structure of some embodiments of the three-dimensional solid positioning device according to this application;

[0018] Figure 4 is a schematic diagram of the structure of an electronic device suitable for implementing some embodiments of this application. Detailed Implementation

[0019] Embodiments of this application will now be described in more detail with reference to the accompanying drawings. While some embodiments of this application are shown in the drawings, it should be understood that this application can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this application. It should be understood that the drawings and embodiments of this application are for illustrative purposes only and are not intended to limit the scope of protection of this application.

[0020] It should also be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings. Unless otherwise specified, the embodiments and features described herein can be combined with each other.

[0021] It should be noted that the concepts of "first" and "second" mentioned in this application are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.

[0022] It should be noted that the terms "a" and "a plurality of" used in this application are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0023] The names of the messages or information exchanged between multiple devices in the embodiments of this application are for illustrative purposes only and are not intended to limit the scope of these messages or information.

[0024] The present application will now be described in detail with reference to the accompanying drawings and embodiments.

[0025] Figure 1 illustrates a flowchart 100 of some embodiments of the three-dimensional entity localization method according to this application. The three-dimensional entity localization method includes the following steps:

[0026] Step 101: Perform image rendering processing on each three-dimensional entity in at least one three-dimensional entity in the pre-created neural radiation field scene model under various poses to obtain a set of pose image data pairs.

[0027] In some embodiments, the execution entity (e.g., a server) of the 3D entity localization method can perform image rendering processing on each 3D entity in at least one 3D entity in a pre-created neural radiation field scene model at various poses to obtain a set of pose image data pairs. The aforementioned execution entity can be a server that locates entities in an image of an entity to be localized based on that image.

[0028] The aforementioned 3D entities can be 3D objects or parts of a 3D scene. In practice, the executing entity can use 3D rendering technology to render the 3D entities based on the pose information, obtaining a rendered image. Then, the executing entity can determine the pose information and the rendered image as pose-image data pairs. Finally, the executing entity can determine the obtained pose-image data pairs as a pose-image data set. Here, the pose information can represent the pose for rendering the 3D entity.

[0029] In some alternative implementations of certain embodiments, the above neural radiation field scene model is created through the following steps:

[0030] The first step is to acquire scene point cloud data. In practice, the aforementioned execution entity can obtain scene point cloud data from a pre-set database. Optionally, the aforementioned execution entity can obtain scene point cloud data by scanning the 3D scene where the 3D entity is located using LiDAR.

[0031] The second step is to create an initial neural radiation field scene model based on the aforementioned scene point cloud data. In practice, the aforementioned execution entity can use 3DGS (3D Gaussian Splatting) technology to create a 3D model as the initial neural radiation field scene model based on the aforementioned scene point cloud data. The aforementioned scene point cloud data can be the point cloud data of the 3D scene in which the 3D entity to be located is situated (e.g., LiDAR point cloud data).

[0032] The third step involves scene editing of the initial neural radiation field scene model to obtain an augmented reality scene model. In practice, firstly, the executing entity can call a preset scene editing interface to add pre-created 3D object models (e.g., virtual furniture, virtual decorations) into the initial neural radiation field scene model. Then, the executing entity can define the initial neural radiation field scene model with added 3D object models as the augmented reality scene model. The preset scene editing interface can be an interface for adding 3D object models to a 3D scene model (e.g., a Unity3D interface).

[0033] The fourth step is to define the above augmented reality scene model as a neural radiation field scene model.

[0034] In some optional implementations of certain embodiments, the aforementioned execution entity can perform image rendering processing on the aforementioned three-dimensional entity at various poses through the following steps to obtain a set of pose image data pairs:

[0035] The first step is to establish a three-dimensional spatial coordinate system with the center point of the aforementioned three-dimensional entity as the origin.

[0036] The second step involves rendering images of the three-dimensional entity in various poses based on preset stepping parameters and the aforementioned three-dimensional coordinate system, resulting in a set of pose image data pairs. The preset stepping parameters include: starting point rendering pose information, horizontal angle interval information, pitch angle interval information, and sphere radius information. Each pose image data pair in the set includes rendering pose information and a rendered image. The starting point rendering pose information represents the pose at which rendering of the three-dimensional entity begins. For example, the starting point rendering pose information could be "horizontal angle 0 degrees, pitch angle 0 degrees, distance from the origin 10m". The horizontal angle interval information represents the horizontal interval angle for rendering the three-dimensional entity in the horizontal direction. For example, the horizontal angle interval information could be "5 degrees". The pitch angle interval information represents the vertical interval angle for rendering the three-dimensional entity in the vertical direction. For example, the pitch angle interval information could be "5 degrees".

[0037] When generating pose image data pairs for a 3D entity, the pose is typically selected only around the y-axis (vertical axis) where the center point of the 3D entity is located, with a preset pitch angle (e.g., 0 degrees) within a preset rotation range, and then rendered to obtain pose image data pairs of different sides of the 3D entity during horizontal rotation. Since this method only covers the horizontal rotation view of the 3D entity, if the acquired image of the entity to be localized is not a horizontal side view of the 3D entity (e.g., a view directly above or below the 3D entity), it may be impossible to match the corresponding pose image data pairs. This increases the number of 3D entity localization failures and relocalization attempts, resulting in a waste of computing resources.

[0038] In response to the aforementioned technical problems, the following solution was adopted:

[0039] In some optional implementations of certain embodiments, the execution entity may perform image rendering processing on the three-dimensional entity at various poses based on preset step parameter information and the three-dimensional spatial coordinate system through the following steps to obtain a set of pose image data pairs:

[0040] Step 1: Determine the preset stepping parameter information, including: starting point rendering pose information, horizontal angle interval information, pitch angle interval information, and sphere radius information. The starting point rendering pose information includes: starting point horizontal angle and starting point pitch angle. The sphere radius, as represented by the sphere radius information, is used as the observation distance.

[0041] Step 2: Determine the starting horizontal angle as the initial horizontal angle.

[0042] Step 3: Determine the starting pitch angle as the initial pitch angle.

[0043] Step 4: Based on the initial horizontal angle, initial pitch angle, and observation distance, generate a set of pose image data pairs for the 3D entity.

[0044] Step 5: Generate an updated pitch angle based on the initial pitch angle and pitch angle interval information. In practice, the above-mentioned execution entity can use the sum of the initial pitch angle and pitch angle interval information as the updated pitch angle.

[0045] Step 6: Set the updated pitch angle as the initial pitch angle to complete the initial pitch angle update.

[0046] Step 7: Following step 6, in response to the updated initial pitch angle being less than or equal to a first preset angle, return to step 4 based on the updated initial pitch angle. This generates a pose image data pair corresponding to the updated initial pitch angle. The first preset angle can be 90 degrees. The initial pitch angle can range from -90 degrees to 90 degrees.

[0047] Step 8: Following Step 6, in response to the updated initial pitch angle being greater than the first preset angle, determine that the pitch angle traversal under the current initial horizontal angle is complete, and continue to execute Step 9.

[0048] Step 9: Generate an updated horizontal angle based on the current initial horizontal angle and horizontal angle interval information. In practice, the executing entity can determine the updated horizontal angle by summing the horizontal interval angles corresponding to the current initial horizontal angle and horizontal angle interval information. The value range of the updated horizontal angle can be from 0 degrees to 360 degrees.

[0049] Step 10: Following step 9, in response to determining that the updated horizontal angle is less than or equal to the second preset angle, the updated horizontal angle is updated to the initial horizontal angle. Based on the updated initial horizontal angle, step 3 is executed again, and the pitch angles under the updated initial horizontal angle are iterated to obtain multiple pairs of pose image data corresponding to each pitch angle under the updated initial horizontal angle. The second preset angle can be 360 ​​degrees.

[0050] Step 11: Following step 9, in response to the determination that the updated horizontal angle is greater than the second preset angle, the traversal of the horizontal angle is also completed, and the pose image data pairs generated in the above process are determined as pose image data pair sets.

[0051] Therefore, through steps 1-11 above, a set of pose image data pairs covering different sides (different horizontal angles) and different pitch angles of a 3D entity in 3D space are generated. These pose image data pairs show a comprehensive view of the 3D entity from top to bottom at different tilt angles, no longer limited to side pose images during the horizontal rotation of the 3D entity, thus improving the comprehensiveness of pose coverage. Even if the acquired image of the entity to be localized is not a horizontally rotated side image (e.g., a view directly above or below the 3D entity; for example, the initial pitch angle ranges from -90° to 90°, with -90° corresponding to the view directly below and 90° corresponding to the view directly above), the corresponding pose image data pair can still be matched through the more comprehensive pose image data set. This effectively reduces the number of 3D entity localization failures and repeated localization attempts, and reduces the waste of computer computing resources.

[0052] In some optional implementations of certain embodiments, the aforementioned execution entity can generate pose image data pairs corresponding to the three-dimensional entity based on the initial horizontal angle, initial pitch angle, and observation distance through the following steps:

[0053] The first step is to generate the coordinate information of the observation point in the three-dimensional coordinate system based on the initial horizontal angle, initial pitch angle, and observation distance. In practice, the execution entity can determine the cosine value of the initial pitch angle as the first value, the cosine value of the initial horizontal angle as the second value, the product of the observation distance and the first value as the first product, and the product of the first product and the second value as the abscissa value. The execution entity can determine the sine value of the initial horizontal angle as the third value, and the product of the first product and the third value as the ordinate value. The execution entity can determine the sine value of the initial pitch angle as the fourth value, and the product of the observation distance and the fourth value as the ordinate value. Then, the execution entity can determine the coordinate information of the observation point in the three-dimensional coordinate system using the abscissa, ordinate, and ordinate values. The observation point can be a rendering point, that is, the point where the center of the virtual camera lens is located during rendering (equivalent to shooting or observation).

[0054] The second step is to determine the coordinate information, initial pitch angle, and initial horizontal angle as the rendering pose information.

[0055] The third step involves rendering the 3D entity based on the rendering pose information to obtain a rendered image. In practice, the aforementioned execution entity can use 3D rendering technology to render the 3D entity and obtain a 2D image of the 3D entity in the pose corresponding to the rendering pose information as the rendered image.

[0056] The fourth step is to determine the rendered pose information and the rendered image as a pose image data pair.

[0057] Step 102: Store the obtained pose image data set into a preset database.

[0058] In some embodiments, the execution entity may store the obtained pose image data set into a preset database. For example, the preset database may be a MySQL database, a NoSQL database, etc.

[0059] Step 103: In response to receiving the image of the entity to be located sent by the preset terminal device, generate positioning pose information corresponding to the image of the entity to be located based on the image of the entity to be located and the pose image data set stored in the preset database.

[0060] In one example, the execution entity can identify each rendered image in the pose image data pair set as a render image to be matched. Then, the execution entity can extract image feature information from each render image to be matched using an image feature extraction algorithm (e.g., HOG feature extraction algorithm). Next, the execution entity can extract image feature information from the entity image to be located using the image feature extraction algorithm. Then, the execution entity can determine the similarity between the entity image feature information and the render image feature information to be matched as a matching similarity. Then, the execution entity can determine the match similarity with the highest similarity value among the match similarities as the target match similarity. Next, the execution entity can determine the pose image data pair corresponding to the render image to be matched corresponding to the target match similarity as the target pose image data pair. Finally, the execution entity can determine the rendered pose information included in the target pose image data pair as the localization pose information corresponding to the entity image to be located.

[0061] The feature information of the entity image to be located can be represented by a feature vector (e.g., HOG feature vector). The feature information of each image to be matched and rendered can also be represented by a feature vector. The preset terminal device can be a mobile terminal (e.g., head-mounted display device, mobile phone, tablet, etc.). The head-mounted display device can include, but is not limited to, one of the following: AR glasses, VR glasses, MR glasses.

[0062] In another example, the aforementioned execution entity can also generate positioning pose information corresponding to the aforementioned entity image to be located by the following steps based on the aforementioned entity image to be located and the pose image data pair stored in the aforementioned preset database:

[0063] The first step involves matching the target pose image data pair from the pose image data pair set to the target entity image. In practice, the executing entity can identify the target entity image as a template image. Then, using template matching technology, the executing entity compares each rendered image in the pose image data pair set with the template image and identifies the rendered image most similar to the template image as the target rendered image. Next, the executing entity can identify the pose image data pairs in the pose image data pair set that contain the target rendered image as target pose image data pairs. Finally, the executing entity can identify these target pose image data pairs as the matching pose image data pairs.

[0064] The second step is to determine the rendered images included in the above-mentioned matched pose image data pair as the localization rendered images.

[0065] The third step is to determine the scene point cloud data corresponding to the three-dimensional entity of the location rendering image as having no corresponding scene point cloud data in response to the determination that the rendering pose information included in the above-mentioned matching pose image data pair is determined as the location pose information.

[0066] In some optional implementations of certain embodiments, the aforementioned execution entity may further perform the following steps:

[0067] In response to the determination that the 3D entity corresponding to the localized rendered image has corresponding scene point cloud data, localization pose information is generated based on the image of the entity to be localized and the scene point cloud data. In practice, the aforementioned execution entity can use the SIFT algorithm to extract feature points from the image of the entity to be localized, obtaining an image feature point information group. The image feature point information in the image feature point information group can represent image feature points in the image of the entity to be localized. Image feature point information can include, but is not limited to, the following: two-dimensional coordinates and intensity information. Intensity information can represent the color value of the image feature point. The execution entity can use keypoint extraction algorithms (e.g., SIFT (Scale-Invariant Feature Transform) keypoint detection, SUSAN (Smallest Univalue Segment Assimilating Keypoint Detection) to perform keypoint extraction. Nucleus (a keypoint detection algorithm) is used to extract point cloud feature points from scene point cloud data, resulting in a point cloud feature point information set. Each point cloud feature point in this set represents a feature point in the corresponding spatial point of the scene point cloud data. Point cloud feature point information may include, but is not limited to, 3D coordinates, texture information, and color values. The executing entity then inputs the image feature point information set and the point cloud feature point information set into a pre-trained feature point matching model to obtain various matching feature point information sets. Each matching feature point information set includes one image feature point and one point cloud feature point. The feature point matching model can be the SuperGlue feature matching network model. For each matching feature point information set, the executing entity determines the positional correspondence information based on the 2D and 3D coordinates included in the matching feature point information set. Then, the executing entity uses the PnP algorithm to generate the localization pose information of the entity image to be located based on the obtained positional correspondence information. Color values ​​represent the color values ​​of the spatial points corresponding to the point cloud feature point information.

[0068] Among them, the positioning and pose information can represent the position and orientation of the acquisition device in three-dimensional space when acquiring the image of the entity to be positioned.

[0069] Therefore, when the 3D entity corresponding to the positioning rendering image has corresponding scene point cloud data, the positioning pose information of the entity to be positioned can be generated with higher accuracy by using the entity image to be positioned and the scene point cloud data.

[0070] Step 104: Based on the positioning pose information and the neural radiation field scene model, display the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be positioned in the neural radiation field scene model on the preset terminal device.

[0071] In some embodiments, the execution entity may, based on the positioning pose information and the neural radiation field scene model, display the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be positioned in the neural radiation field scene model in the preset terminal device through the following steps:

[0072] The first step is to determine the pose image data pairs corresponding to the positioning pose information.

[0073] The second step is to identify the three-dimensional entities corresponding to the pose image data pairs in the neural radiation field scene model as the localized three-dimensional entities.

[0074] The third step is to determine the model data of the three-dimensional entity to be located as the model data to be sent.

[0075] The fourth step is to send the model data to be sent, the positioning pose information, and the preset pose floating range information to the preset terminal device, so that the preset terminal device can display the positioning entity status information of the three-dimensional entity corresponding to the entity image to be positioned in the neural radiation field scene model.

[0076] The positioning entity state information can represent the display of a 3D entity positioned within a preset pose range. The preset pose range can be a floating pose range that includes the pose corresponding to the positioning pose information. As an example, when the positioning pose information is "x-coordinate is 10m, y-coordinate is 0m, z-coordinate is 0m, horizontal angle is 0 degrees, and pitch angle is 0 degrees", the floating pose range can be set as follows: with the spatial point (x-coordinate is 10m, y-coordinate is 0m, z-coordinate is 0m) as the reference, the horizontal angle is allowed to float by ±2°, and the pitch angle is allowed to float by ±1°.

[0077] Therefore, a three-dimensional entity within a preset pose range can be displayed on a preset terminal device.

[0078] The 3D entity localization method of some embodiments of this application improves the accuracy of 3D entity localization. Specifically: First, image rendering processing is performed on each 3D entity in a pre-created neural radiation field scene model under various poses to obtain a set of pose image data for each 3D entity. Relying on the neural radiation field scene model, the real-world 3D scene can be more accurately reproduced, and the alignment between virtual entities and the real-world 3D scene can be achieved. This reduces positional deviations caused by differences in map construction and virtual entity placement operators, and reduces the possibility of virtual entities being missed or repeated. Second, the set of pose image data for each 3D entity in the neural radiation field scene model is stored in a preset database to establish a data foundation for subsequent rapid localization and matching. Next, in response to receiving an image of the entity to be located sent by a preset terminal device, the image of the entity to be located is matched with the set of pose image data stored in the preset database to generate localization pose information corresponding to the image of the entity to be located. Finally, based on the localization pose information and the neural radiation field scene model, the localization entity status information of the 3D entity corresponding to the image of the entity to be located in the neural radiation field scene model is displayed on the preset terminal device. Since the pre-built neural radiation field scene model can align virtual entities with the real-world 3D scene, it reduces the positional deviation between virtual entities and 3D entities in the real 3D scene, and reduces the possibility of virtual entities being omitted or repeated. Therefore, it can reduce the waste of computing resources caused by positioning failure and repositioning.

[0079] Referring further to Figure 2, which illustrates a flow 200 of a three-dimensional entity localization method according to another embodiment, including the following steps:

[0080] Step 201: Obtain scene point cloud data.

[0081] In some embodiments, the aforementioned execution entity may acquire scene point cloud data.

[0082] Step 202: Create an initial neural radiation field scene model based on scene point cloud data.

[0083] In some embodiments, the aforementioned execution entity may create an initial neural radiation field scene model based on the aforementioned scene point cloud data.

[0084] Step 203: Perform scene editing processing on the initial neural radiation field scene model to obtain the augmented reality scene model.

[0085] In some embodiments, the aforementioned execution entity may perform scene editing processing on the aforementioned initial neural radiation field scene model to obtain an augmented reality scene model.

[0086] Step 204: The augmented reality scene model is determined as a neural radiation field scene model.

[0087] In some embodiments, the aforementioned execution entity may determine the aforementioned augmented reality scene model as a neural radiation field scene model.

[0088] Step 205: Perform image rendering processing on each three-dimensional entity in at least one three-dimensional entity in the pre-created neural radiation field scene model under various poses to obtain a set of pose image data pairs.

[0089] Step 206: Store the obtained pose image data set into a preset database.

[0090] Step 207: In response to receiving the image of the entity to be located sent by the preset terminal device, generate positioning pose information corresponding to the image of the entity to be located based on the image of the entity to be located and the set of pose image data pairs stored in the preset database.

[0091] Step 208: Based on the positioning pose information and the neural radiation field scene model, display the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be positioned in the neural radiation field scene model on the preset terminal device.

[0092] As can be seen from Figure 2, compared with the description of some embodiments corresponding to Figure 1, the flow 200 of the 3D entity localization method in some embodiments corresponding to Figure 2 reflects the creation of an initial neural radiation field scene model through scene point cloud data. The initial neural radiation field scene model is then processed through scene editing to obtain an augmented reality scene model as the neural radiation field scene model. By processing the initial neural radiation field scene model through scene editing, various virtual entities, such as virtual furniture and virtual decorations, can be added to the initial neural radiation field scene model, thereby enhancing the interactivity and realism of the neural radiation field scene model. Users can place specific virtual entities in the initial neural radiation field scene model according to their needs, overlaying the virtual scene onto the real scene, thus achieving an augmented reality effect.

[0093] In some embodiments, the specific implementation of steps 205-208 and the resulting technical effects can be referred to steps 101-104 in the embodiments corresponding to Figure 1, and will not be repeated here.

[0094] Referring further to FIG3, as an implementation of the methods shown in the figures, this application provides some embodiments of a three-dimensional entity positioning device, which correspond to the method embodiments shown in FIG1, and the device can be specifically applied to various electronic devices.

[0095] As shown in Figure 3, a three-dimensional entity localization device 300 in some embodiments includes: an image rendering unit 301, a storage unit 302, a generation unit 303, and a display unit 304. The image rendering unit 301 is configured to perform image rendering processing on each three-dimensional entity in at least one three-dimensional entity within a pre-created neural radiation field scene model at various poses, obtaining a set of pose image data pairs. The storage unit 302 is configured to store the obtained sets of pose image data pairs in a preset database. The generation unit 303 is configured to, in response to receiving an image of the entity to be localized sent by a preset terminal device, generate localization pose information corresponding to the image of the entity to be localized, based on the image of the entity to be localized and the sets of pose image data stored in the preset database. The display unit 304 is configured to, based on the localization pose information and the neural radiation field scene model, display the localization entity state information of the three-dimensional entity corresponding to the image of the entity to be localized in the neural radiation field scene model on the preset terminal device.

[0096] It is understood that the units described in the apparatus 300 correspond to the various steps in the method described with reference to FIG1. ​​Therefore, the operations, features, and beneficial effects described above for the method also apply to the apparatus 300 and the units contained therein, and will not be repeated here.

[0097] Referring now to FIG4, a schematic diagram of the structure of an electronic device 400 suitable for implementing some embodiments of the present application is shown. The electronic device shown in FIG4 is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.

[0098] As shown in Figure 4, the electronic device 400 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 401, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 402 or a program loaded from a storage device 408 into a random access memory (RAM) 403. The RAM 403 also stores various programs and data required for the operation of the electronic device 400. The processing unit 401, ROM 402, and RAM 403 are interconnected via a bus 404. An input / output (I / O) interface 405 is also connected to the bus 404.

[0099] Typically, the following devices can be connected to I / O interface 405: input devices 406 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 407 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 408 including, for example, magnetic tapes, hard disks, etc.; and communication devices 409. Communication device 409 allows electronic device 400 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 4 shows electronic device 400 with various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have instead. Each box shown in Figure 4 may represent one device or multiple devices as needed.

[0100] In particular, according to some embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 409, or installed from storage device 408, or installed from ROM 402. When the computer program is executed by processing device 401, it performs the functions defined in the methods of some embodiments of this application.

[0101] It should be noted that, in some embodiments of this application, the computer-readable medium described may be a computer-readable signal medium or a computer-readable storage medium, or any combination of both. A computer-readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In some embodiments of this application, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In some embodiments of this application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0102] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol), and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or future-developed networks.

[0103] The computer-readable medium may be included in an electronic device or may exist independently without being assembled into the electronic device. The computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: perform image rendering processing on each of the at least one three-dimensional entities in a pre-created neural radiation field scene model at various poses, obtaining a set of pose image data pairs; store the obtained sets of pose image data pairs in a preset database; in response to receiving an image of an entity to be located sent by a preset terminal device, generate positioning pose information corresponding to the image of the entity to be located based on the image of the entity to be located and the sets of pose image data stored in the preset database; and display, on the preset terminal device, the positioning entity state information of the three-dimensional entity corresponding to the image of the entity to be located in the neural radiation field scene model.

[0104] Computer program code for performing operations of some embodiments of this application can be written in one or more programming languages ​​or a combination thereof. Programming languages ​​include object-oriented programming languages—such as Java, Smalltalk, and C++—and conventional procedural programming languages—such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0105] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0106] The units described in some embodiments of this application can be implemented in software or hardware. The described units can also be housed in a processor; for example, a processor may be described as including an image rendering unit, a storage unit, a generation unit, and a display unit. The names of these units do not necessarily limit the specific unit; for example, a storage unit may be described as "storing the obtained sets of pose image data pairs to a preset database."

[0107] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0108] The above description is merely a selection of preferred embodiments of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this application is not limited to technical solutions formed by specific combinations of technical features, but should also cover other technical solutions formed by arbitrary combinations of technical features or their equivalents without departing from the inventive concept. For example, technical solutions formed by substituting features with (but not limited to) technical features with similar functions disclosed in the embodiments of this application.

Claims

1. A method for locating a three-dimensional entity, comprising: For each three-dimensional entity in at least one three-dimensional entity in the pre-created neural radiation field scene model, perform image rendering processing under various poses to obtain a set of pose image data pairs. The obtained sets of pose image data are stored in a preset database; In response to receiving an image of an entity to be located sent by a preset terminal device, positioning pose information corresponding to the image of the entity to be located is generated based on the image of the entity to be located and the set of pose image data stored in the preset database. Based on the positioning and pose information and the neural radiation field scene model, the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be positioned in the neural radiation field scene model is displayed in the preset terminal device.

2. The method of claim 1, wherein, The method further includes: Acquire scene point cloud data; Based on the scene point cloud data, an initial neural radiation field scene model is created; The initial neural radiation field scene model is processed by scene editing to obtain an augmented reality scene model; The augmented reality scene model is defined as a neural radiation field scene model.

3. The method of claim 1, wherein, The image rendering process is performed on each of the three-dimensional entities in at least one three-dimensional entity in the pre-created neural radiation field scene model under various poses to obtain a set of pose image data pairs, including: A three-dimensional spatial coordinate system is established with the center point of the three-dimensional entity as the origin; Based on the preset stepping parameter information and the three-dimensional spatial coordinate system, the three-dimensional entity is subjected to image rendering processing under various poses to obtain a pose image data pair set. Each pose image data pair in the pose image data pair set includes rendering pose information and rendering image.

4. The method of claim 3, wherein, The preset stepping parameter information includes: starting point rendering pose information, horizontal angle interval information, pitch angle interval information, and sphere radius information. Based on the preset stepping parameter information and the three-dimensional spatial coordinate system, image rendering processing is performed on the three-dimensional entity under various poses to obtain a pose image data set, including: Step 1: Determine the starting point rendering pose information included in the preset step parameter information, wherein the starting point rendering pose information includes: starting point horizontal angle and starting point pitch angle; the sphere radius represented by the sphere radius information is used as the observation distance; Step 2: Determine the starting point horizontal angle as the initial horizontal angle; Step 3: Determine the starting pitch angle as the initial pitch angle; Step 4: Based on the initial horizontal angle, initial pitch angle, and observation distance, generate pose image data pairs for the corresponding 3D entities; Step 5: Generate updated pitch angles based on the initial pitch angle and pitch angle interval information; Step 6: Set the updated pitch angle as the initial pitch angle to update the initial pitch angle; Step 7: Following step 6, in response to the updated initial pitch angle being less than or equal to the first preset angle, return to step 4 based on the updated initial pitch angle; Step 8: Following step 6, in response to determining that the updated initial pitch angle is greater than the first preset angle, determine that the pitch angle traversal under the current initial horizontal angle is complete; Step 9: Generate updated horizontal angles based on the initial horizontal angle and horizontal angle interval information; Step 10: Following step 9, in response to determining that the updated horizontal angle is less than or equal to the second preset angle, the updated horizontal angle is updated to the initial horizontal angle. Based on the updated initial horizontal angle, step 3 is executed again. The pitch angles under the updated initial horizontal angle are traversed to obtain multiple pairs of pose image data corresponding to each pitch angle under the updated initial horizontal angle. Step 11: Following step 9, in response to determining that the updated horizontal angle is greater than the second preset angle, the horizontal angle traversal is determined to be complete, and the generated pose image data pairs are determined as pose image data pair sets.

5. The method of claim 4, wherein, The generation of pose image data pairs for corresponding 3D entities based on the initial horizontal angle, initial pitch angle, and observation distance includes: Based on the initial horizontal angle, initial pitch angle and observation distance, the coordinate information of the observation point in the three-dimensional spatial coordinate system is generated; The coordinate information, initial pitch angle, and initial horizontal angle are determined as the rendering pose information; Based on the rendering pose information, the 3D entity is rendered to obtain a rendered image; The rendered pose information and the rendered image are defined as a pose image data pair.

6. The method of claim 1, wherein, The step of generating positioning pose information corresponding to the entity image to be located based on the entity image to be located and the pose image data set stored in the preset database includes: From the set of pose image data pairs, the target pose image data pairs corresponding to the image of the entity to be located are matched as the matching pose image data pairs; The rendered images included in the matched pose image data pair are determined as the localization rendered images; In response to determining that there is no corresponding scene point cloud data for the 3D entity corresponding to the positioning rendering image, the rendering pose information included in the matching pose image data pair is determined as the positioning pose information.

7. The method of claim 6, wherein, The method further includes: In response to determining that the 3D entity corresponding to the positioning rendering image has corresponding scene point cloud data, positioning pose information is generated based on the image of the entity to be positioned and the scene point cloud data.

8. The method of claim 1, wherein, The step of displaying the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be located in the neural radiation field scene model on the preset terminal device based on the positioning pose information and the neural radiation field scene model includes: Determine the pose image data pair corresponding to the positioning pose information; The three-dimensional entities in the neural radiation field scene model that correspond to the pose image data pairs are identified as the localized three-dimensional entities; The model data of the located 3D entity is determined as the model data to be sent; The model data to be sent, the positioning pose information, and the preset pose floating range information are sent to the preset terminal device so that the preset terminal device can display the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be positioned in the neural radiation field scene model.

9. A three-dimensional solid positioning device, comprising: The image rendering unit is configured to perform image rendering processing on each of the three-dimensional entities in at least one three-dimensional entity in the pre-created neural radiation field scene model at various poses to obtain a set of pose image data pairs. The storage unit is configured to store the obtained sets of pose image data pairs into a preset database; The generation unit is configured to, in response to receiving an image of an entity to be located sent by a preset terminal device, generate positioning pose information corresponding to the image of the entity to be located based on the image of the entity to be located and the set of pose image data pairs stored in the preset database. The display unit is configured to display, in the preset terminal device, the positioning entity status information of the three-dimensional entity corresponding to the image of the entity to be positioned in the neural radiation field scene model, based on the positioning pose information and the neural radiation field scene model.

10. An electronic device, comprising: One or more processors; A storage device on which one or more programs are stored; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1 to 8.

11. A computer readable medium having stored thereon a computer program, wherein, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 8.