Multi-user real-time localization and mapping (SLAM)

By using SLAM technology on independent user devices, the physical environment information of multiple users is captured and merged, solving the problem of multiple users sharing virtual objects and realizing a simplified system without central device coordination and a consistent CGR experience.

CN122308610APending Publication Date: 2026-06-30APPLE INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
APPLE INC
Filing Date
2020-04-26
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies make it difficult for multiple users to share virtual objects in a physical environment, and require a central device to coordinate user device information, which increases system complexity.

Method used

SLAM is performed using independent user equipment, which uses image sensors to capture images of the physical environment, generates keyframes and reconstructs the physical environment locally, and receives and merges information from other user equipment to achieve the sharing of virtual objects and a consistent CGR experience.

Benefits of technology

This enables multi-user SLAM to achieve independent reconstruction of the physical environment by each user device without the need for central device coordination, ensuring consistent positioning of virtual objects across different devices, simplifying the system structure and improving the consistency of user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308610A_ABST
    Figure CN122308610A_ABST
Patent Text Reader

Abstract

This disclosure relates to multi-user real-time localization and mapping (SLAM). In some embodiments, a first electronic device, including a first image sensor, uses a processor to execute a method. The method involves acquiring a first set of keyframes based on images of a physical environment captured by the first image sensor. The method generates a mapping that defines the relative positions of keyframes within the first set of keyframes. The method receives keyframes corresponding to images of the physical environment captured at different second electronic devices and positions the received keyframes to the mapping. The method then receives anchor points from the second electronic devices, which define the position of a virtual object relative to the keyframes. The method displays a CGR environment including the virtual object at the location based on the anchor points and the mapping.
Need to check novelty before this filing date? Find Prior Art

Description

Case Analysis

[0001] This application is a divisional application of Chinese invention patent application No. 202010340386.6, filed on April 26, 2020, entitled "Multi-user Real-time Localization and Mapping (SLAM)". Technical Field

[0002] This disclosure relates generally to computer vision, and more particularly to systems, methods and apparatus for performing localization and mapping. Background Technology

[0003] For a single user or a single device, various technologies exist for performing Simultaneous Localization and Mapping (SLAM). There is a need for technologies that allow multiple users to share virtual objects while performing SLAM in a physical environment. Summary of the Invention

[0004] The various embodiments disclosed herein include devices, systems, and methods capable of sharing information about a physical environment or virtual objects across different user devices performing multi-user SLAM in a physical environment. In some embodiments, each user device contributes to reconstructing a physical environment that can be used to enhance user experience, such as a computer-generated reality (CGR) experience. In some embodiments, each user device in multi-user SLAM creates a locally unique reconstruction of a physical environment that includes contributions from each of the other users reconstructing the physical environment. In some embodiments, contributions include information about virtual objects included in a user device's CGR experience to more consistently locate virtual objects in CGR experiences on other user devices.

[0005] In some implementations, a distributed approach to multi-user SLAM is implemented, where a central or master device does not need to coordinate information from the involved user devices. In some implementations, each user device involved in multi-user SLAM independently performs SLAM relative to its own mapping in its own three-dimensional (3D) coordinate space. In some implementations, each user device incorporates contributions from each other user to reconstruct the physical environment to provide more consistent reconstruction results or more effectively provide reconstructions among different user devices included in the multi-user SLAM of the physical environment. In some implementations, redundant mappings of the physical environment maintained at multiple user electronic devices eliminate the need for a central or master non-user device, while also allowing any user device to join or leave the multi-user SLAM.

[0006] In some embodiments, a first electronic device, including a first image sensor, uses a processor to execute the method. The method involves acquiring a first set of keyframes based on an image of the physical environment captured by the first image sensor. The method generates a mapping defining the relative positions of the keyframes within the first set of keyframes. In some embodiments, this mapping is in a first 3D coordinate system maintained by the first electronic device. For example, the mapping may include the relative 3D positions (or physical features shown in those keyframes) of the keyframes in the first set relative to each other and relative to the first image sensor in the 3D coordinate system maintained by the first electronic device.

[0007] The method also involves receiving and using information about a physical environment or virtual object captured by or used at different second electronic devices. Specifically, the method receives keyframes corresponding to images of the physical environment captured at different second electronic devices and positions the received keyframes to their own mapping. For example, the mapping of the first electronic device can be modified to add the 3D position of the keyframes received from the second electronic devices or the physical features shown in the received keyframes. Thus, the modified mapping will provide the position of the received keyframes relative to keyframes already represented in the mapping and relative to the first image sensor.

[0008] The method may also receive anchor points from a second electronic device, which define the position of the virtual object relative to the received keyframes. For example, on the second electronic device, a user may add a virtual object at a specific location already tracked (e.g., an anchor point) in one or more keyframes of the second electronic device relative to itself, for example, its position relative to a keyframe already received by the first electronic device. The first electronic device receives the anchor points and thus has information about the position of the virtual object relative to previously received positioning keyframes and the position of the virtual object's mapping relative to the first electronic device itself.

[0009] Therefore, this method displays the CGR environment including the virtual object at its location based on the anchor point and the mapping. In some specific implementations, the method uses the 3D coordinate system of the first electronic device to display the CGR environment with the virtual object on the display. The CGR experience on the first and second electronic devices can be more consistent with each other because the virtual object is positioned relative to the same keyframe, and the same keyframe is included or otherwise used in the corresponding mapping on each device. A virtual vase placed on a real-world desktop on the second electronic device may also be placed on a desktop on the first electronic device. Incorporating the same keyframe and anchor point into the mappings on both devices helps ensure that the vase is positioned more accurately or more consistently on the table in both CGR experiences.

[0010] In some implementations, the method is performed by a first electronic device including a first image sensor and a second electronic device including a second image sensor. The method involves acquiring a first set of one or more keyframes based on an image of a physical environment captured by the first image sensor, the first set of keyframes being defined in a first coordinate system. In this method, the first electronic device receives a second set of one or more keyframes corresponding to an image of the physical environment captured at the second electronic device, the second set of keyframes being defined in a second coordinate system different from the first coordinate system. In this method, the first electronic device generates a first mapping that defines the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the first coordinate system. In this method, the second electronic device receives the first set of keyframes corresponding to an image of the physical environment captured at the first electronic device and generates a second mapping that defines the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the second coordinate system. In some implementations, the method uses shared keyframes to implement concurrent independent mappings (e.g., pose maps) with different 3D coordinate systems.

[0011] According to some embodiments, an apparatus includes one or more processors, non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing or causing to perform any of the methods described herein. According to some embodiments, a non-transitory computer-readable storage medium stores instructions that, when executed by one or more processors of the apparatus, cause the apparatus to perform or cause to perform any of the methods described herein. According to some embodiments, an apparatus includes: one or more processors, non-transitory memory, and means for performing or causing to perform any of the methods described herein. Attached Figure Description

[0012] Therefore, this disclosure will be understood by those skilled in the art, and a more detailed description can be made with reference to some exemplary embodiments, some of which are shown in the accompanying drawings.

[0013] Figure 1 It is a block diagram based on some specific implementations of exemplary operating environments.

[0014] Figure 2 It is a block diagram of an exemplary controller based on some specific implementations.

[0015] Figure 3 It is a block diagram of an exemplary head-mounted device (HMD) based on some specific implementations.

[0016] Figures 4A-4DThis is a schematic diagram of an electronic device using multi-user SLAM technology based on some specific implementations.

[0017] Figure 5A-5U The diagram illustrates exemplary scenarios and techniques based on specific implementations that allow multiple users, each performing location and mapping of their physical environment, to share virtual objects.

[0018] Figure 6 It is a flowchart representation of some specific implementations of methods for presenting virtual objects in a CGR experience.

[0019] As is customary, the various features shown in the accompanying drawings may not be drawn to scale. Therefore, for clarity, the dimensions of various features may be arbitrarily expanded or reduced. Additionally, some drawings may not depict all components of a given system, method, or apparatus. Finally, similar reference numerals may be used throughout the specification and drawings to denote similar features. Detailed Implementation

[0020] Numerous details have been described to provide a thorough understanding of the exemplary embodiments shown in the accompanying drawings. However, the drawings illustrate only some exemplary aspects of this disclosure and should not be considered limiting. Those skilled in the art will recognize that other effective aspects or variations do not include all the specific details described herein. Furthermore, well-known systems, methods, components, devices, and circuits have not been described exhaustively so as not to obscure further relevant aspects of the exemplary embodiments described herein. Although Figures 1-3 Exemplary implementations involving head-mounted devices (HMDs) are shown, but other implementations may not necessarily involve HMDs and may involve other types of devices, including but not limited to watches and other wearable electronic devices, mobile devices, laptops, desktop computers, gaming devices, home automation devices, and other devices that include or use image capture devices.

[0021] Figure 1 This is a block diagram of an exemplary operating environment 100 according to some specific implementations. Although relevant features are shown, those skilled in the art will recognize from this disclosure that various other features are not shown for brevity and to avoid obscuring further relevant aspects of the exemplary implementations disclosed herein. For this purpose, as a non-limiting example, operating environment 100 includes a controller 110 and a head-mounted display (HMD) 120, one or both of which may be located in a physical environment. A physical environment refers to the physical world that people can sense and / or interact with without the assistance of electronic systems. Physical environments, such as physical parks, include physical objects such as physical trees, physical buildings, and physical people. People can directly sense and / or interact with physical environments, such as through sight, touch, hearing, taste, and smell.

[0022] In some implementations, controller 110 is configured to manage and coordinate the user's computer-generated reality (CGR) experience. In some implementations, controller 110 includes a suitable combination of software, firmware, or hardware. See below for reference. Figure 2 The controller 110 is described in more detail. In some implementations, the controller 110 is a computing device located locally or remotely relative to the physical environment 105.

[0023] In one example, controller 110 is a local server located within physical environment 105. In another example, controller 110 is a remote server (e.g., a cloud server, central server, etc.) located outside physical environment 105. In some implementations, controller 110 is communicatively coupled to HMD 120 via one or more wired or wireless communication channels 144 (e.g., Bluetooth, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.).

[0024] In some implementations, controller 110 and HMD 120 are configured to present the CGR experience to the user together.

[0025] In some implementations, the HMD 120 is configured to present a CGR experience to the user. In some implementations, the HMD 120 includes a suitable combination of software, firmware, or hardware. See below for reference. Figure 3 HMD 120 is described in more detail. In some specific implementations, the functionality of controller 110 is provided by or combined with HMD 120, for example, in the case of HMD being used as a stand-alone unit.

[0026] According to some specific implementations, when a user is present within physical environment 105, HMD 120 presents a CGR experience to the user. A CGR environment refers to a fully or partially simulated environment that people sense and / or interact with via electronic systems. In CGR, a subset of a person's physical motion, or a representation thereof, is tracked, and in response, one or more characteristics of one or more virtual objects simulated in the CGR environment are adjusted in a manner consistent with at least one physical law. For example, a CGR system may detect a person's head rotation and, in response, adjust the graphical content and sound field presented to the person in a manner similar to how such views and sounds change in a physical environment. In some cases (e.g., for accessibility reasons), the adjustment of characteristics of virtual objects in the CGR environment may be responsive to a representation of physical motion (e.g., a voice command).

[0027] Humans can use any of their senses to sense and / or interact with CGR objects, including sight, hearing, touch, taste, and smell. For example, a person can sense and / or interact with audio objects that create a 3D or spatial audio environment that provides the perception of a point audio source in 3D space. As another example, audio objects can enable audio transparency, which selectively introduces ambient sound from the physical environment, with or without computer-generated audio. In some CGR environments, a person can sense and / or interact only with audio objects.

[0028] Examples of CGR include virtual reality and mixed reality. A virtual reality (VR) environment is a simulated environment designed to be entirely based on computer-generated sensory input for one or more senses. A VR environment includes virtual objects that a person can sense and / or interact with. For example, trees, buildings, and computer-generated images representing human avatars are examples of virtual objects. A person can sense and / or interact with virtual objects in a VR environment through the simulation of a person's presence within the computer-generated environment, and / or through the simulation of a subgroup of physical movements of a person within the computer-generated environment.

[0029] Compared to VR environments, which are designed to be entirely based on computer-generated sensory input, mixed reality (MR) environments are simulated environments designed to incorporate sensory input from the physical environment, or representations thereof, in addition to computer-generated sensory input (e.g., virtual objects). On the virtual continuum, a mixed reality environment is any state between a purely physical environment as one end and a virtual reality environment as the other end, but not including either end.

[0030] In some MR environments, computer-generated sensory input can respond to changes in sensory input from the physical environment. Additionally, some electronic systems used to present MR environments can track position and / or orientation relative to the physical environment, enabling virtual objects to interact with real objects (i.e., physical objects or representations of them from the physical environment). For example, the system can cause motion so that virtual trees appear stationary relative to the physical ground.

[0031] Examples of mixed reality include augmented reality and augmented virtual. An augmented reality (AR) environment is a simulated environment in which one or more virtual objects are superimposed on the physical environment or a representation thereof. For example, an electronic system for presenting an AR environment may have a transparent or semi-transparent display through which a person can directly view the physical environment. The system can be configured to present virtual objects on the transparent or semi-transparent display, allowing a person to perceive the virtual objects superimposed on the physical environment using the system. Alternatively, the system may have an opaque display and one or more imaging sensors that capture images or videos of the physical environment, which are representations of the physical environment. The system combines the images or videos with virtual objects and presents the combination on the opaque display. A person uses the system to indirectly view the physical environment via images or videos of the physical environment and perceive the virtual objects superimposed on the physical environment. As used herein, video of the physical environment displayed on an opaque display is referred to as “pass-through video,” meaning that the system uses one or more image sensors to capture images of the physical environment and uses those images when presenting the AR environment on the opaque display. Alternatively, the system may have a projection system that projects virtual objects onto a physical environment, such as as a hologram or on a physical surface, so that a person can use the system to perceive the virtual objects superimposed on the physical environment.

[0032] Augmented reality environments also refer to simulated environments where the representation of the physical environment is transformed by computer-generated sensory information. For example, in providing pass-through video, a system can transform images from one or more sensors to apply a selected viewpoint (e.g., viewpoint) different from the viewpoint captured by the imaging sensor. Alternatively, the representation of the physical environment can be transformed by graphically modifying (e.g., magnifying) portions of it, such that the modified portion is a representative but not realistic version of the original captured image. Furthermore, the representation of the physical environment can be transformed by graphically removing or blurring portions of it.

[0033] Augmented virtual (AV) environments are simulated environments in which a virtual or computer-generated environment is combined with one or more sensory inputs from a physical environment. Sensory inputs can be representations of one or more features of the physical environment. For example, an AV park could have virtual trees and virtual buildings, but a person's face could be realistically reproduced from an image taken of a physical person. Similarly, virtual objects could adopt the shape or color of a physical object imaged by one or more imaging sensors. Furthermore, virtual objects could adopt shadows that correspond to the sun's position within the physical environment.

[0034] Many different types of electronic systems enable people to sense and / or interact with a variety of CGR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields with integrated display capabilities, windows with integrated display capabilities, displays shaped as lenses designed to be placed on a person's eyes (e.g., similar to contact lenses), headphones / earpieces, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop / laptop computers. Head-mounted systems may have one or more speakers and an integrated opaque display. Alternatively, head-mounted systems may be configured to receive an external opaque display (e.g., a smartphone). Head-mounted systems may incorporate one or more imaging sensors for capturing images or video of the physical environment, and / or one or more microphones for capturing audio of the physical environment. Head-mounted systems may have transparent or semi-transparent displays instead of opaque displays. Transparent or semi-transparent displays may have a medium through which light representing the image is directed to the person's eyes. The display can utilize digital light projection, OLED, LED, uLED, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium can be an optical waveguide, holographic medium, optical combiner, optical reflector, or any combination thereof. In one embodiment, a transparent or translucent display can be configured to selectively become opaque. Projection-based systems can employ retinal projection technology, which projects graphic images onto the human retina. Projection systems can also be configured to project virtual objects onto a physical environment, such as as holograms or on a physical surface.

[0035] Figure 2This is a block diagram of an example controller 110 according to some specific implementations. Although some specific features are shown, those skilled in the art will recognize from this disclosure that various other features are not shown for the sake of brevity and in order not to obscure further relevant aspects of the specific implementations disclosed herein. Therefore, as a non-limiting example, in some specific implementations, controller 110 includes one or more processing units 202 (e.g., microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), central processing units (CPUs), processing cores, etc.), one or more input / output (I / O) devices 206, one or more communication interfaces 208 (e.g., Universal Serial Bus (USB), FireWire, Thunderbolt, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, Global System for Mobile Communications (GSM), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Global Positioning System (GPS), Infrared (IR), Bluetooth, ZigBee, or similar type interfaces), one or more programming (e.g., I / O) interfaces 210, memory 220, and one or more communication buses 204 for interconnecting these components and various other components.

[0036] In some embodiments, the one or more communication buses 204 include circuitry for communication between interconnecting system components and control system components. In some embodiments, one or more I / O devices 206 include at least one of a keyboard, mouse, touchpad, joystick, one or more microphones, one or more speakers, one or more image capture devices or other sensors, one or more displays, etc.

[0037] Memory 220 includes high-speed random access memory, such as dynamic random access memory (DRAM), static random access memory (CGRAM), double data rate random access memory (DDR RAM), or other random access solid-state memory devices. In some embodiments, memory 220 includes non-volatile memory, such as one or more disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 220 optionally includes one or more storage devices located remotely from the one or more processing units 202. Memory 220 includes a non-transitory computer-readable storage medium. In some embodiments, memory 220 or the non-transitory computer-readable storage medium of memory 220 stores programs, modules, and data structures or subsets thereof, including optional operating system 230 and computer-generated reality (CGR) experience module 240.

[0038] The operating system 230 includes processes for handling various basic system services and for performing hardware-related tasks.

[0039] In some implementations, CGR module 240 is configured to create, edit, or experience CGR experiences. In some implementations, CGR module 240 includes a 3D content creation unit 242 and a collaborative SLAM unit 244. 3D content creation unit 242 is configured to create and edit 3D content that will be used as part of a CGR experience for one or more users (e.g., a single CGR experience for one or more users, or multiple CGR experiences for a corresponding group of one or more users). Content creation CGR experiences can be provided by CGR module 240 to facilitate the creation of such content. For example, users can view or otherwise experience a CGR-based user interface that allows users to select, place, move, and otherwise configure virtual objects in the 3D content being created or edited, for example, based on input provided via gestures, voice commands, input device input, etc. Collaborative SLAM unit 244 is configured to facilitate the sharing of virtual objects among users in a multi-user SLAM during such 3D content creation or editing experiences using one or more merging techniques based on shared relative information from another user in a multi-user SLAM.

[0040] Although these modules and units are shown residing on a single device (e.g., controller 110), it should be understood that in other specific implementations, any combination of these modules and units may reside in a separate computing device. Furthermore, Figure 2 This is used more as a functional description of various features present in a specific implementation, and differs from the structural diagrams of the specific implementations described herein. As those skilled in the art will recognize, items shown individually can be combined, and some items can be separated. For example, Figure 2 Some functional modules shown individually can be implemented in a single module, and the various functions of a single functional block can be implemented in various specific implementations through one or more functional blocks. The actual number of modules and the division of specific functions, as well as how features are allocated therein, will vary depending on the specific implementation, and in some specific implementations, it depends in part on the specific combination of hardware, software, or firmware chosen for that particular implementation.

[0041] Figure 3This is a block diagram of an example of a head-mounted device (HMD) 120 according to some specific implementations. Although some specific features are shown, those skilled in the art will recognize from this disclosure that various other features are not shown for the sake of brevity and in order not to obscure further relevant aspects of the specific implementations disclosed herein. Therefore, as a non-limiting example, in some specific implementations, the HMD 120 includes one or more processing units 302 (e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, etc.), one or more input / output (I / O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, Firewire, Thunderbolt, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, BlueTooth, ZigBee, SPI, I2C, or similar interfaces), one or more programming (e.g., I / O) interfaces 310, one or more displays 312, one or more internal or external image sensors 314, a memory 320, and one or more communication buses 304 for interconnecting these components and various other components.

[0042] In some implementations, one or more communication buses 304 include circuitry for interconnecting and communicating between control system components. In some implementations, one or more I / O devices and sensors 306 include inertial measurement units (IMUs), accelerometers, magnetometers, gyroscopes, thermometers, one or more physiological sensors (e.g., blood pressure monitors, heart rate monitors, blood oxygen sensors, blood glucose sensors, etc.), one or more microphones, one or more speakers, haptic engines, or one or more depth sensors (e.g., structured light, time-of-flight, etc.).

[0043] In some embodiments, one or more displays 312 are configured to present a CGR experience to a user. In some embodiments, one or more displays 312 correspond to holographic, digital light processing (DLP), liquid crystal display (LCD), liquid crystal on silicon (LCoS), organic light-emitting field-effect transistor (OLET), organic light-emitting diode (OLED), surface-conducting electron emitter display (SED), field emission display (FED), quantum dot light-emitting diode (QD-LED), microelectromechanical system (MEMS), or similar display types. In some embodiments, one or more displays 312 correspond to waveguide displays such as diffraction, reflection, polarization, and holography. For example, HMD 120 includes a single display. Alternatively, HMD 120 may include displays for each of the user's eyes.

[0044] Memory 320 includes high-speed random access memory, such as DRAM, CGRAM, DDR RAM, or other random access solid-state memory devices. In some embodiments, memory 320 includes non-volatile memory, such as one or more disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 320 optionally includes one or more storage devices located remotely from the one or more processing units 302. Memory 320 includes a non-transitory computer-readable storage medium. In some embodiments, memory 320 or the non-transitory computer-readable storage medium of memory 320 stores programs, modules, and data structures, or subsets thereof, including optional operating system 330 and CGR module 340.

[0045] Operating system 330 includes processes for handling various basic system services and for performing hardware-related tasks.

[0046] In some implementations, CGR module 340 is configured to create, edit, or experience CGR experiences. In some implementations, CGR module 340 includes a 3D content creation unit 342 and a collaborative SLAM unit 344. 3D content creation unit 342 is configured to create and edit 3D content that will be used as part of a CGR experience for one or more users (e.g., a single CGR experience for one or more users, or multiple CGR experiences for a corresponding group of one or more users). Content creation CGR experiences can be provided by CGR module 340 to facilitate the creation of such content. For example, users can view or otherwise experience a CGR-based user interface that allows users to select, place, move, and otherwise configure virtual objects in the 3D content being created or edited, for example, based on input provided via gestures, voice commands, input device input, etc. Collaborative SLAM unit 344 is configured to facilitate the sharing of virtual objects among users in a multi-user SLAM during a 3D content creation or editing experience using one or more merging techniques based on shared relative information from another user in a multi-user SLAM. Although these modules and units are shown as residing on a single device (e.g., HMD 120), it should be understood that in other specific implementations, any combination of these modules and units may reside in a separate computing device.

[0047] also, Figure 3 This is used more as a functional description of various features present in a specific implementation, and differs from the structural diagrams of the specific implementations described herein. As those skilled in the art will recognize, items shown individually can be combined, and some items can be separated. For example, Figure 3Some functional modules shown individually can be implemented in a single module, and the various functions of a single functional block can be implemented in various specific implementations through one or more functional blocks. The actual number of modules and the division of specific functions, as well as how features are allocated therein, will vary depending on the specific implementation, and in some specific implementations, it depends in part on the specific combination of hardware, software, or firmware chosen for that particular implementation.

[0048] Figure 4A Electronic devices 400A and 400B are shown. Electronic device 400A or electronic device 400B may include some or all of the features of one or both of controller 110 and HMD 120.

[0049] exist Figure 4A At this point, electronic devices 400A and 400B provide a multi-user CGR experience. Electronic devices 400A and 400B display images of the physical environment captured by the image sensors (e.g., image sensors) of the respective devices. In addition to displaying images of physical objects such as cubes 402A and 402B, electronic devices 400A and 400B also display virtual objects, making them appear to exist in the physical environment, thereby enhancing the user's view of the physical environment. However, in order to display or share virtual objects (or to enhance the physical environment in some other way (e.g., change the appearance color of physical objects)), it is advantageous for electronic devices 400A and 400B to consistently determine the mapping of the physical environment or its own relative image sensor pose (e.g., position and orientation).

[0050] Based on some specific implementations, the following references... Figure 4B-4D This describes techniques for determining a more consistent mapping of the physical environment or an estimate of the attitude of electronic devices.

[0051] exist Figure 4BAt this point, electronic device 400A initiates a process for mapping and locating device 400A relative to the physical environment using, for example, SLAM technology. Electronic device 400A captures images of cubes 402A and 402B via an image sensor (e.g., an image sensor) located on electronic device 400A. Electronic device 400A displays these captured images via a CGR experience 405A on display 401A. In some embodiments, to determine its pose relative to the physical environment, electronic device 400A combines the captured images with data acquired via additional sensors (e.g., motion sensors, depth sensors, orientation sensors, etc.) and corresponding sensor parameters. In some embodiments, electronic device 400A detects salient features (e.g., lines, segments, planes, points, or other 3D geometric elements and shapes, such as edges or corners of cubes 402A and 402B in the field of view of the image sensor) from the captured images and estimates its position in 3D space, while also estimating its own pose, by iteratively reducing or minimizing the error function of 3D position and pose estimation using the captured images and data acquired via the image sensor and additional sensors. Electronic device 400A can create and store keyframes that include images, the locations of features within those images, or the image sensor pose associated with those images. For example... Figure 4C As shown, during the localization and mapping process, electronic device 400A moves to another location in the physical environment. While cubes 402A and 402B are still within the field of view of the image sensor, electronic device 400A captures images of cubes 402A and 402B from another perspective. Electronic device 400A displays these captured images via CGR experience 405A on display 401A. Electronic device 400A detects... Figure 4B At least some of the detected features. By comparing the positions of features in the captured image or combining data from additional sensors, the electronic device 400A updates its estimation of the 3D position of the features (e.g., the position of a point in 3D space) and its own estimated pose relative to the physical environment. The electronic device 400A may create and store keyframes, each comprising an image, the position of the features shown in the image, or the image sensor pose associated with the image. Features of such keyframes, image sensor pose information, and information from other sources (e.g., device motion detection data) can be used to determine a mapping that provides the relative positions of the keyframes to each other in 3D coordinate space. In some specific implementations, the electronic device 400A performs SLAM by simultaneously determining its current pose (e.g., localization) and determining the relative keyframe positions (e.g., mapping).

[0052] In some implementations, upon successful localization and mapping, electronic device 400A is able to display virtual content at the appropriate location within the CGR experience. In one example, electronic device 400A uses an image sensor to determine the determined pose relative to the physical environment to determine where to display virtual object 404A. In some implementations, after successful localization and mapping, electronic device 400A uses pose estimation to display virtual object 404A to position virtual object 404A within the CGR experience 405A on display 401A. Alternatively, electronic device 400A anchors virtual object 404A to keyframe feature locations and positions the virtual object accordingly within the generated view of the CGR experience.

[0053] In some embodiments, after successful localization and mapping, electronic device 400A receives a captured image and an estimated pose of electronic device 400B from electronic device 400B. In some embodiments, after successful localization and mapping, electronic device 400A receives data from an additional sensor and corresponding sensor parameters of electronic device 400B. Using the captured image and its corresponding pose of electronic device 400B (and optionally additional data), electronic device 400A performs reconstruction by estimating positional data of salient features in the captured image (e.g., a set of 3D points, lines, segments, planes, and / or other 3D geometric elements and shapes) by performing a similar mapping function. For example, the positional data includes the Cartesian coordinates of the corners of cubes 402A and 402B captured by the image. In some embodiments, electronic device 400A receives a keyframe associated with the image captured at electronic device 400B. The received keyframe may include the image, the location of features in the image, or the image sensor pose of electronic device 400B. In some implementations, electronic device 400A receives information from electronic device 400B using network protocols, layers, or services. In some implementations, electronic device 400A receives information from electronic device 400B only after electronic device 400B has successfully performed location and mapping.

[0054] In some implementations, electronic device 400A then attempts local registration by comparing a physical scene reconstructed using information received from electronic device 400B with CGR experience 405A. In some implementations, electronic device 400A then performs localization between the physical scene reconstructed using information received from electronic device 400B and CGR experience 405A. In some implementations at electronic device 400A, localization is a relative transformation between the physical scene reconstructed using information received from electronic device 400B and several salient features in CGR experience 405A. Once electronic device 400A matches the physical scene reconstructed using information received from electronic device 400B with CGR experience 405A, electronic device 400A adds (e.g., merges) the information received from electronic device 400B to CGR experience 405A on display 401A using a relative transformation. In some implementations, after adding the information received from electronic device 400B to CGR experience 405A, electronic device 400A updates its estimated pose for itself regarding the CGR experience. In some implementations, the electronic device 400A determines the relative positions of keyframes already included in its own mapping by matching common features found between keyframes. In some implementations, the electronic device 400A modifies its own mapping / pose map to include the received keyframes.

[0055] In some implementations, after electronic device 400A locally registers the reconstructed physical scene to CGR experience 405A using information received from electronic device 400B, subsequent information received from electronic device 400B can be directly added to CGR experience 405A at electronic device 400A. In some implementations, subsequent information received from electronic device 400B can be directly added to CGR experience 405A at electronic device 400A using locally determined relative transformations. In some implementations, subsequent information received from electronic device 400B can be directly added to CGR experience 405A at electronic device 400A using previously added information received from electronic device 400B and already in CGR experience 405A.

[0056] In some embodiments, the information received at electronic device 400A from electronic device 400B includes representations of one or more features in a three-dimensional space (e.g., a physical environment) acquired by a second electronic device or using information acquired by the second electronic device. In some examples, the representation of the one or more features is a keyframe or Cartesian coordinate of one or more features in the physical environment (e.g., the corners of cubes 202A and 202B in the field of view of an image sensor). In some examples, features include points, lines, segments, planes, and / or other 3D geometric elements and shapes. In some examples, the representation of one or more features corresponds to a physical object in the physical environment (e.g., 202A, 202B) (e.g., the representation of one or more features includes the spatial location of certain features of the physical object). In some embodiments, the information received at electronic device 400A from electronic device 400B includes locally determined relative transformations between images or additional sensor parameters, map registration data, virtual object information, or CGR experience 405B at electronic device 400B and information received at electronic device 400B from other electronic devices in multi-user SLAM.

[0057] return Figure 4BElectronic device 400B may also initiate a process for mapping and locating itself relative to its physical environment using, for example, Simultaneous Localization and Mapping (SLAM) technology. Electronic device 400B captures images of cubes 402A and 402B via an image sensor (e.g., an image sensor) located on the back of the device. Electronic device 400B displays these captured images via a display 401B. In some embodiments, to determine its pose relative to the physical environment, electronic device 400A combines the captured images with data acquired via additional sensors (e.g., motion sensors, depth sensors, orientation sensors, etc.) and corresponding sensor parameters. In some embodiments, electronic device 400B detects salient features (e.g., lines, segments, planes, points, and / or other 3D geometric elements and shapes) from the captured images (e.g., edges or corners of cubes 402A and 402B in the field of view of the image sensor) and estimates its position in 3D space, while also estimating its own pose, by iteratively reducing or minimizing the error function of 3D position and pose estimation using the captured images and data acquired via the image sensor and additional sensors. When the electronic device 400B is moved, it can update its 3D position and pose estimation using additional captured images and data acquired via additional sensors. In some specific implementations, upon successful localization and mapping, the electronic device 400B can provide a CGR experience on a display because it can use a determined pose relative to the physical environment. Therefore, after successful localization and mapping, the electronic device 400B displays the CGR experience 405B on a display 401B using the pose estimation.

[0058] As described above, for electronic device 400A, after successfully performing localization and mapping, electronic device 400B receives information from electronic device 400A, such as captured images and the estimated pose of electronic device 400A that captured the images via network protocols, layers, or services. As described above, regarding electronic device 400A, when electronic device 400B locally registers the physical scene reconstructed using the information received from electronic device 400A to CGR experience 405B, electronic device 400B uses relative transformations to add (e.g., merge) the information received from electronic device 400A into CGR experience 405B on display 401B.

[0059] like Figure 4DAs shown, after the physical scene reconstructed by electronic device 400B using information received from electronic device 400A is locally registered to CGR experience 405B, virtual object 404A is consistently displayed. As described above, regarding electronic device 400A, after electronic device 400B locally registers the reconstructed physical scene to CGR experience 405B using information received from electronic device 400A, subsequent information received from electronic device 400A can be directly added to CGR experience 405B at electronic device 400B. In some specific implementations, electronic device 400B and electronic device 400A exchange the same type of information.

[0060] Based on some specific implementations, techniques for sharing virtual objects among electronic devices in multi-user SLAM will now be described. Figure 5A-5U This is a schematic diagram illustrating an exemplary scenario in which multiple users, each performing SLAM in their own physical environment, share a virtual object.

[0061] In various specific implementations, two users each begin a separate CGR experience (e.g., location and mapping) within a shared physical environment. For example... Figure 5A As shown, electronic device 500A starts, and upon successful positioning and mapping, electronic device 500A (e.g., a first user) has two keyframes KF_A1 and KF_A2. Electronic device 500A displays these captured images via CGR experience 505A. Similarly, electronic device 500B starts, and upon successful positioning and mapping, electronic device 500B (e.g., a second user) has two keyframes KF_B1 and KF_B2. Electronic device 500B displays these captured images via CGR experience 505B. Figure 5A As shown, keyframes KF_A1, KF_A2, KF_B1, and KF_B2 are highlighted.

[0062] In some implementations, keyframes are a subset of all image sensor frames generated by an image sensor (e.g., an RGB camera, an RGB-D camera (e.g., within a CGR segment)). In some implementations, each keyframe, like all frames of camera data, includes aligned image (e.g., RGB color) information and additional sensor information (e.g., depth information) associated with the camera pose (e.g., position and orientation in space) at a known time. In various implementations, keyframes are selected using a keyframe representation that satisfies the physical environment of the CGR experience. In various implementations, keyframes may be identified (e.g., selected from multiple frames) based on camera motion. A new keyframe is created or initiated when there is sufficient movement (e.g., a 3D spatial distance above a threshold) or adequate movement between the current camera frame or viewpoint and a neighboring keyframe (e.g., the keyframe immediately preceding it). In alternative implementations, keyframe initiation may be based on other camera characteristics (such as time, movement speed, etc.) or the physical environment. Each keyframe can be stored in memory and includes RGB information (e.g., a frame of pixel data), depth information (e.g., a frame of depth information), and pose (e.g., orientation and 3D position in a 3D coordinate system).

[0063] A history of movement relative to an image sensor (e.g., an electronic device) can be maintained; this is known as a pose map. In some implementations, keyframes are assigned or located along the pose map, and the current camera position can be highlighted. Depending on the implementation, the pose map is displayed within the global point cloud of the currently viewed segment.

[0064] like Figure 5B As shown, in various implementations, to begin a shared multi-user SLAM process, electronic device 500A is connected to electronic device 500B; or electronic device 500B is connected to electronic device 500A. In various implementations, a network layer can be used to connect the multi-user SLAM experiences. In some implementations, the network layer can be any conventional network layer implemented by the electronic devices. In some implementations, the network layer does not have latency requirements (e.g., minimum message transmission time). In some implementations, map registration data is shared or exchanged when connecting the shared CGR experience and beginning the multi-user SLAM process. In some implementations, sharing map registration data includes sending the current state of each electronic device's local 3D map to all other electronic devices in the shared CGR experience for multi-user SLAM.

[0065] In various embodiments, for electronic device 500A, the current state of the local 3D map 510 includes all generated keyframes (e.g., the attitude map of electronic device 500A) and 3D map registration data. In some embodiments, the 3D map registration data includes all hardware information of electronic device 500A. For example, hardware information includes image sensor (e.g., camera) parameters and additional sensor (e.g., depth, motion, inertial) parameters to allow for the correct use of data from each other electronic device (e.g., keyframe data). In some embodiments, keyframes and 3D map registration data are sent separately. In some embodiments, 3D map registration data is sent first, followed by each keyframe. In some embodiments, only a subset or a predetermined number of keyframes representing the current state of the electronic device are sent (e.g., to reduce data volume). In some embodiments, only 3D map registration data or hardware information is sent.

[0066] like Figure 5B As shown, electronic device 500B receives the current state of the 3D map 510 (e.g., MAP_A) from electronic device 500A, and in some specific implementations, stores the MAP_A data in the user queue 552 at electronic device 500B. Additionally, as... Figure 5B As shown, electronic device 500A receives the current state of device 500B's 3D map 550 (e.g., MAP_B) from electronic device 500B, and in some specific implementations, stores this MAP_B data in user queue 502 at electronic device 500A. Figure 5B As shown, MAP_A and MAP_B are highlighted.

[0067] In various specific implementations, the information stored in the queues at electronic devices 500B and 500A is held there until the corresponding electronic device is able to correctly process the queued information.

[0068] like Figure 5C As shown, electronic device 500B has created a local 3D external map 559 of the shared physical environment using the MAP_A information (e.g., locally reconstructing the CGR experience 505A). Therefore, the MAP_A information has been removed from the user queue 552. Electronic device 500B has registered a first user (e.g., electronic device 500A) by creating a first user external map 559. Electronic device 500B has created a local copy of the 3D map of electronic device 500A (e.g., when electronic device 500A is connected). In some specific implementations, such as... Figure 5C As shown, the local external map 559 uses two keyframes, KF_A1 and KF_A2, as well as hardware information of the electronic device 500A.

[0069] In some specific implementations, the keyframe queue 554, RL queue 556, or anchor queue 558 cannot be processed until the electronic device 500B creates an external 3D map for the corresponding electronic device (e.g., creates an external 3D map 559 for the electronic device 500A).

[0070] like Figure 5C As shown, simultaneously, electronic device 500A adds a third keyframe KF_A3 to its 3D map. Electronic device 500B receives the third keyframe KF_A3 and stores it in its local keyframe queue 554. Electronic device 500B holds the third keyframe KF_A3 in the keyframe queue 554 until it has time to process the queued information or all relevant information.

[0071] like Figure 5D As shown, electronic device 500A has created a local 3D external map 509 of the shared physical environment using the MAP_B information (e.g., locally reconstructing the CGR experience 505B). Therefore, the MAP_B information has been removed from the user queue 502. Electronic device 500A has registered a second user (e.g., electronic device 500B) by creating a second user external map 509. Electronic device 500A has created a local copy of the 3D map of electronic device 500B (e.g., when electronic device 500B is connected). Figure 5D As shown, the local external map 509 uses two keyframes, KF_B1 and KF_B2, as well as hardware information of the electronic device 500B.

[0072] Similarly, as Figure 5D As shown, electronic device 500B has used MAP_A information to create a local 3D external map 559 of the shared physical environment. At this time, as... Figure 5E As shown, the electronic device 500B can process the third keyframe KF_A3 because the external map 559 has been created.

[0073] In various specific implementations, Figure 5E In the CGR experience 505A, electronic device 500A has a corresponding 3D map 509 for electronic device 500B. Similarly, electronic device 500B has a corresponding 3D map 559 for electronic device 500A. However, in the CGR experience 505A, electronic device 500A cannot display or see any information from electronic device 500B. Furthermore, in the CGR experience 505B, electronic device 500B cannot display or see any information from electronic device 500A. Electronic device 500A has not yet incorporated any information from electronic device 500B into its local 3D map 559 of the CGR experience 505A.

[0074] like Figure 5FAs shown, electronic device 500B adds the first virtual object (VO) anchor point OA_B2 to its own 3D map 550. In some specific implementations, each anchor point is associated with or attached to a keyframe. Figure 5F In this process, the second user creates a first VO at electronic device 500B, and the first VO is attached to keyframe KF_B2 (e.g., placed in CGR experience 505B). Simultaneously, electronic device 500B transmits the first anchor point OA_B2 to other electronic devices sharing the CGR experience. Therefore, electronic device 500A receives the first anchor point OA_B2 and stores it in its local anchor queue 508. Electronic device 500A holds the first anchor point OA_B2 in the anchor queue 508 until it has time or relevant information to process the queue information.

[0075] like Figure 5F As shown, simultaneously, electronic device 500A adds a fourth keyframe KF_A4 to its 3D map 510. Electronic device 500B receives the fourth keyframe KF_A4 and stores it in its local keyframe queue 554. Electronic device 500B keeps the fourth keyframe KF_A3 in the keyframe queue 554 until it can correctly process the queuing information.

[0076] In some implementations, entries in each queue are processed at different preset frequencies. In some implementations, user queues, KF queues, RL queues, and anchor queues are processed at different variable rates at the corresponding electronic devices.

[0077] like Figure 5G As shown, electronic device 500A uses anchor point OA_B2 to add a first virtual object to its external 3D map 509. In some specific implementations, the first anchor point OA_B2 is again attached to keyframe KF_B2 in the external 3D map 509 (e.g., the same keyframe for creating the first anchor point OA_B2 at electronic device 500B).

[0078] Once electronic device 500B creates a local 3D external map 559 of the shared physical environment using MAP_A information, electronic device 500B attempts to match the 3D external map 559 with its local 3D map 550. In various embodiments, the matching includes various known optimization techniques. In some embodiments, the matching includes 3D-to-3D feature matching techniques between multiple common features in the 3D external map 559 and the local 3D map 550. In some embodiments, the matching includes 2D-to-3D feature matching techniques between multiple common features in the 3D external map 559 and the local 3D map 550. In some embodiments, image data (or additional sensor data, such as depth) captured by electronic devices 500A and 500B corresponds to one or more portions of the same physical object in the physical environment (e.g., including data about it). In some specific implementations, multiple matching common features in the 3D external map 559 and the local 3D map 550 generate a repositioning (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 and the estimated pose of the camera of the electronic device 500A in the 3D map 559.

[0079] In some specific implementations, multiple matching common features are located in a single keyframe in the 3D external map 559 and in a single keyframe in the local 3D map 550, and a repositioning result is generated between the estimated pose of the camera in a keyframe of the 3D map 550 and the estimated pose of the camera in the corresponding keyframe of the 3D map 559.

[0080] like Figure 5H As shown, the repositioning result RL_A4_B1 is used to convert the pose of the estimated 3D geometry of the camera of electronic device 500A in keyframe A4 of 3D map 510 into the pose of the estimated 3D geometry of the camera of electronic device 500B in keyframe B1 of 3D external map 509. In some embodiments, the repositioning data allows electronic device 500A to incorporate the 3D external map 509 into the local 3D map 510 (e.g., a pose map). In some embodiments, electronic device 500A uses the repositioning result to incorporate a portion of the 3D external map 509 into the local 3D map 510.

[0081] like Figure 5IAs shown, electronic device 500A merges the 3D external map 509 of electronic device 500B into the local 3D map 510, and shares information used by electronic device 500A to perform the merging. In some embodiments, the relocation result RL_A4_B1 is determined by matching features of keyframes from electronic device 500B in the external 3D map 509 with features of keyframes from electronic device 500A in the 3D map 510. In some embodiments, the matching uses 2D or 3D spatial location estimates of features from keyframes from electronic device 500B and 2D or 3D spatial location estimates of features from keyframes from electronic device 500A. In some embodiments, the matching of spatial location estimates of features from keyframes from electronic device 500B with spatial location estimates of features from keyframes from electronic device 500A depends on the type of imaging sensor or camera on the respective electronic device. In some embodiments, the matching of spatial location estimates of features from electronic device 500B with spatial location estimates of features from electronic device 500A depends on the type of additional sensors (e.g., depth sensor, inertial sensor, IR sensor, motion sensor, etc.) on the respective electronic device.

[0082] In various implementations, the repositioning result RL_A4_B1 uses the correspondence of matching features from electronic device 500B in external 3D map 509 and electronic device 500A in 3D map 510 to select the estimated pose of electronic device 500B that reduces or minimizes the error between the estimated spatial locations of the matching features (e.g., an optimization process). In some implementations, the repositioning result RL_A4_B1 is a transformation between the selected pose of the camera of electronic device 500B for keyframe B4 and the estimated pose of the camera of electronic device 500A for known keyframe A4.

[0083] In some specific implementations, because 3D map 509 includes information relating each keyframe (e.g., KF_B1, KF_B2) to every other keyframe in 3D map 509, once keyframe B1 is merged into 3D map 510, all other keyframes from 3D map 509 (e.g., electronic device 500B) can be merged into 3D map 510 using the relationship information relative to the merged keyframe B1. Therefore, as... Figure 5I As shown, at electronic device 500A, the repositioning result RL_A4_B1 is used to merge the information of 3D map 509 into 3D map 510.

[0084] In addition, such as Figure 5IAs shown, simultaneously, electronic device 500B adds the fourth keyframe KF_A4 to the external 3D map 559, and electronic device 500A adds the second virtual object anchor point OA_A1 to its own 3D map 510. Figure 5I In this process, the first user creates a second VO at electronic device 500A, which is associated with or attached to keyframe KF_A1. Simultaneously, electronic device 500A transmits the second anchor point OA_A1 to other electronic devices sharing the CGR experience. Therefore, electronic device 500B receives the second anchor point OA_A1 and stores it in its local anchor queue 558. Electronic device 500B holds the second anchor point OA_A1 in the anchor queue 558 until it has time to process the queuing information.

[0085] like Figure 5J As shown, the repositioning result RL_B1_A4 is used to convert the pose of the estimated 3D geometry of the camera of electronic device 500B in keyframe B1 of 3D map 550 into the pose of the estimated 3D geometry of the camera of electronic device 500A in keyframe A4 of 3D external map 559. In some specific implementations, the repositioning data allows electronic device 500B to incorporate part or all of the 3D external map 559 into the local 3D map 550. This document refers to... Figure 5H The electronic device 500A describes an exemplary technique for determining the repositioning result RL_B1_A4.

[0086] like Figure 5K As shown, electronic device 500B uses the repositioning result RL_B1_A4 to merge 3D map information from keyframe A4 into 3D map 550. Because 3D map 559 (e.g., a pose map) includes information relating each keyframe (e.g., KF_A1, KF_A2, KF_A3, KF_A4) to each other keyframe in 3D map 559, once keyframe A4 is merged into 3D map 550, all other keyframes from 3D map 559 (e.g., electronic device 500B) are merged into 3D map 550 using the relational information relative to the merged keyframe A4. Therefore, as... Figure 5K As shown, at electronic device 500B, the repositioning result RL_B1_A4 is used to merge the 3D map 559 information into the 3D map 550. In some specific implementations, different paired keyframes can be used at each individual electronic device in a set of electronic devices sharing the CGR experience to merge the corresponding external 3D map into the local 3D map.

[0087] like Figure 5KAs shown, electronic device 500B adds the second virtual object anchor point OA_A1 to its 3D map 550. Figure 5K In the process, the second VO anchor point OA_A1 is attached to the keyframe KF_A1 in the 3D map 550 (for example, the same keyframe at the electronic device 500A where the second VO anchor point OA_A1 is created).

[0088] like Figure 5K As shown, electronic devices 500A and 500B have corresponding or "similar" 3D maps for their respective CGR experiences 505A and 505B. In some embodiments, when each has the same amount of information (e.g., keyframes, pose maps, sensor parameters, virtual objects, etc.), 3D map 510 corresponds to 3D map 550, and each electronic device uses this information individually (e.g., in its SLAM optimizer) to estimate the keyframe pose in its individual 3D map. In some embodiments, when each has the same information, 3D map 510 corresponds to 3D map 550, but each electronic device uses this information individually (e.g., in its SLAM optimizer) to estimate the keyframe pose in its individual 3D map in its local 3D coordinate system. In some embodiments, when each has the same relative information, 3D map 510 corresponds to 3D map 550, but each electronic device merges this information separately and individually on its local 3D map. In some specific implementations, when each has the same amount of relative information, 3D map 510 corresponds to 3D map 550, but control of the CGR experience is distributed to each individual electronic device in the CGR experience at its local 3D map.

[0089] like Figure 5L As shown, electronic device 500A adds a fifth keyframe KF_A5 to its 3D map 510. Electronic device 500B receives the fifth keyframe KF_A5 and can add the fifth keyframe KF_A5 directly from the keyframe queue 554 to the 3D map 550 (see...). Figure 5M In some implementations, electronic device 500B may add the fifth keyframe KF_A5 directly from keyframe queue 554 to 3D map 550 because the relationship with one or more other keyframes of electronic device 500A is known and allows the fifth keyframe KF_A5 to be immediately included in 3D map 550. In some implementations, external map A 559 is not used after the repositioning result allows the merging of external map A 559 into 3D map 550.

[0090] like Figure 5M As shown, the two users belong to a shared multi-user CGR experience within a shared physical environment. Figure 5MAs shown, when the electronic device 500C starts and is successfully located and mapped, the electronic device 500C (e.g., a third user) connects the shared CGR experience with a single keyframe KF_C1.

[0091] In some implementations, shared map registration data is exchanged when connecting to a shared CGR experience and initiating a multi-user SLAM process. In some implementations, shared map registration data includes sending the current state of each electronic device's local 3D map to all other electronic devices in the multi-user SLAM shared CGR experience. This document references... Figure 5B It describes the shared map registration data.

[0092] like Figure 5M As shown, electronic device 500A receives the current state of the 3D map 570 (e.g., MAP_C) of electronic device 500C from electronic device 500C, and stores the MAP_C data in user queue 502 at electronic device 500A. Electronic device 500B receives the current state of the 3D map 570 (e.g., MAP_C) of electronic device 500C from electronic device 500C, and stores the MAP_C data in user queue 552 at electronic device 500B. Figure 5M As shown, electronic device 500C receives the current state of electronic device 500A's 3D map 510 (e.g., MAP_A) from electronic device 500A and the current state of electronic device 500B's 3D map 550 (e.g., MAP_B) from electronic device 500B, and stores the MAP_A and MAP_B data in user queue 572 at electronic device 500C. In an alternative embodiment, one electronic device sharing the CGR experience transmits the current state of all electronic devices to the connected electronic devices (e.g., electronic device 500A sends MAP_A and MAP_B data to electronic device 500C). In some embodiments, the transmission of the current state of electronic devices 500A and 500B may be performed using other designated responsibilities.

[0093] like Figure 5N As shown, electronic device 500A has used MAP_C information (e.g., keyframe KF_C1 and hardware information of electronic device 500C) to create a local 3D external map 507 of the shared physical environment. Figure 5N As shown, electronic device 500B has used MAP_C information to create a local 3D external map 557 of the shared physical environment. This document refers to... Figure 5C It describes a local copy or an external 3D map.

[0094] In addition, Figure 5NIn the process, electronic device 500B sends the relocation result RL_B1_A4 to electronic device 500C, which stores it in RL queue 576. Similarly, electronic device 500A sends the relocation result RL_A4_B1 to electronic device 500C, which stores it in RL queue 576.

[0095] like Figure 5O As shown, electronic device 500C has used MAP_A information to create a local 3D external map 577 of the shared physical environment. Therefore, MAP_A information has been removed from user queue 572. In some specific implementations, such as Figure 5O As shown, the external 3D map 577 uses keyframes KF_A1, KF_A2, KF_A3, KF_A4, and KF_A5, virtual object A1, and hardware information of electronic device 500A. At this time, as... Figure 5O As shown, electronic device 500B creates a third keyframe KF_B3 in its 3D map 550. Electronic devices 500A and 500C receive the third keyframe KF_B3 and store it in KF queue 504 and KF queue 574, respectively.

[0096] like Figure 5P As shown, electronic device 500C has used MAP_B information to create a local 3D external map 579 of the shared physical environment. Therefore, MAP_B information has been removed from user queue 572. In some specific implementations, such as Figure 5P As shown, the external 3D map 579 uses keyframes KF_B1, KF_B2, and KF_B3, virtual object B2, and hardware information of electronic device 500B. Additionally, as... Figure 5P As shown, the electronic device 500A adds the third keyframe KF_B3 directly from the keyframe queue 504 to the 3D map 510.

[0097] like Figure 5Q As shown, the repositioning result RL_B3_C1 is used to convert the pose of the estimated 3D geometry of the camera of electronic device 500B in keyframe B3 of 3D map 550 into the pose of the estimated 3D geometry of the camera of electronic device 500C in keyframe C1 of 3D external map 557. In some specific implementations, the repositioning data allows electronic device 500B to incorporate part or all of the 3D external map 557 into the local 3D map 550. This document refers to... Figure 5H Exemplary techniques for determining and using relocation results (e.g., RL_B1_A4) are described. Additionally, in Figure 5QIn the process, electronic device 500A receives the repositioning result RL_B3_C1 and stores it in RL queue 506, and electronic device 500C receives the repositioning result RL_B3_C1 and stores it in RL queue 576.

[0098] like Figure 5R As shown, electronic device 500B uses the repositioning result RL_B3_C1 to incorporate 3D map information from keyframe C1 into 3D map 550. Similarly, as... Figure 5R As shown, the repositioning result RL_C1_B3 is used to convert the pose of the estimated 3D geometry of the camera of electronic device 500C in keyframe C1 of 3D map 570 into the pose of the estimated 3D geometry of the camera of electronic device 500B in keyframe B3 of 3D external map 579. In some implementations, when determining the repositioning result RL_C1_B3, the repositioning result RL_B3_C1 is removed from the RL queue. In some implementations, the repositioning result RL_B3_C1 helps to determine the repositioning result RL_C1_B3 at electronic device 500C.

[0099] like Figure 5S As shown, electronic device 500A uses the repositioning result RL_B3_C1 to incorporate 3D map information from keyframe C1 into 3D map 510. Similarly, as... Figure 5S As shown, electronic device 500C uses the repositioning result RL_A4_B1 to merge 3D map information from keyframes KF_A1-KF_A5 into 3D map 570.

[0100] In some implementations, electronic device 500A uses the repositioning result RL_C1_B3 to modify the 3D map information of 3D map 510. In some implementations, electronic device 500A ignores... Figure 5S The repositioning result RL_C1_B3 is used in some implementations. In some implementations, electronic device 500C uses the repositioning result RL_B1_A4 to modify the 3D map information of 3D map 570. In some implementations, electronic device 500C optionally ignores... Figure 5S The repositioning result in RL_B1_A4.

[0101] like Figure 5TAs shown, electronic devices 500A, 500B, and 500C have corresponding or "similar" 3D maps for their respective CGR experiences 505A, 505B, and 505C. In some embodiments, when each has the same amount of information (e.g., keyframes, pose maps, sensor parameters, virtual objects, etc.), 3D map 510 corresponds to 3D map 550, and each electronic device uses this information individually (e.g., in its SLAM optimizer) to estimate keyframe poses in its individual 3D map. According to some embodiments, this document refers to... Figure 5T The corresponding 3D map is described.

[0102] like Figure 5U As shown, electronic device 500A leaves the shared multi-user CGR experience, while electronic devices 500B and 500C continue to share the multi-user CGR experience.

[0103] Figure 6 This is a flowchart representation of a method 600 for representing virtual objects in a CGR experience from the perspective of different originating users at the first user (e.g., among users sharing a multi-user CGR experience). In some specific implementations, method 600 is performed by an electronic device (e.g., Figure 1-3 Method 600 may be performed on a mobile device, HMD, desktop computer, laptop computer, or server device. Method 600 may be performed on a head-mounted device having a screen for displaying 2D images or a screen for viewing stereoscopic images. In some embodiments, method 600 is performed by processing logic components, including hardware, firmware, software, or a combination thereof. In some embodiments, method 600 is performed by a processor that executes code stored in a non-transitory computer-readable medium (e.g., memory).

[0104] At block 610, method 600 acquires a first set of keyframes (e.g., one or more keyframes) based on images of the physical environment captured by a first image sensor (e.g., a camera) of a first electronic device. In some embodiments, the keyframes include information from additional sensors at the first electronic device. In some embodiments, the keyframes include feature data that defines the position of features relative to a first pose of the first image sensor.

[0105] At box 620, method 600 generates a mapping of the relative positions of keyframes in the first set of keyframes defining the first electronic device. In some embodiments, the mapping includes a pose map. In some embodiments, the mapping includes a 3D map of a shared CGR environment in a 3D coordinate system.

[0106] At block 630, method 600 receives a keyframe corresponding to an image of the physical environment captured by the second electronic device at the first electronic device. In some embodiments, the first electronic device also receives additional information acquired by the second electronic device. In some embodiments, the first electronic device also receives positioning information, image sensor parameters, and depth sensor parameters associated with the second electronic device.

[0107] At block 640, method 600 locates the received keyframe to the mapping. In some embodiments, the first electronic device locates the received keyframe to the mapping based on determined or received relocation data. In some embodiments, the first electronic device locates the received keyframe to the mapping by determining the relative position of the keyframe of the second electronic device to one or more keyframes from a set of keyframes already part of the mapping from the first electronic device. In some embodiments, the first electronic device locates the received keyframe to the mapping by determining the relative position of the keyframe of the second electronic device to the estimated pose of the first electronic device using a first coordinate system at the first electronic device.

[0108] At box 650, method 600 receives an anchor point from a second electronic device at a first electronic device, wherein the anchor point defines the position of the virtual object relative to the received keyframe. In some implementations, the anchor point is associated with a feature in the received keyframe.

[0109] At box 660, method 600 displays the CGR environment including the virtual object at the location based on the anchor point and the mapping. In some embodiments, method 600 displays the CGR environment in a first 3D coordinate system at the first electronic device. In some embodiments, the first electronic device displays the CGR environment on a display at the first electronic device.

[0110] In some implementations, the system includes a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium includes program instructions that, when executed on the one or more processors, cause the system to perform the following operations at a first electronic device having a first image sensor: acquiring a first set of keyframes based on an image of a physical environment captured by the first image sensor; generating a mapping defining the relative positions of keyframes in the first set of keyframes; receiving keyframes corresponding to an image of the physical environment captured at a second electronic device; locating the keyframes to the mapping; receiving anchor points from the second electronic device, the anchor points defining the position of a virtual object relative to the keyframes; and displaying a CGR environment including the virtual object at the location based on the anchor points and the mapping.

[0111] In some implementations, the system includes a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium includes program instructions that, when executed on the one or more processors, cause the system to perform the following operations at a first electronic device having a first image sensor: acquiring a first set of keyframes based on an image of a physical environment captured by the first image sensor, the first set of keyframes being defined in a first coordinate system; receiving a second set of keyframes corresponding to an image of the physical environment captured at a second electronic device, the second set of keyframes being defined in a second coordinate system different from the first coordinate system; generating a first mapping that defines the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the first coordinate system; and at a second electronic device having a second image sensor: receiving the first set of keyframes corresponding to the image of the physical environment captured at the first electronic device; and generating a second mapping that defines the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the second coordinate system.

[0112] In some implementations, a non-transitory computer-readable storage medium stores computer-executable program instructions on a computer to perform operations, including at a first electronic device having a first image sensor: acquiring a first set of keyframes based on an image of a physical environment captured by the first image sensor; generating a mapping defining the relative positions of keyframes in the first set of keyframes; receiving keyframes corresponding to an image of the physical environment captured at a second electronic device; locating the keyframes to the mapping; receiving anchor points from the second electronic device, the anchor points defining the position of a virtual object relative to the keyframes; and displaying a CGR environment including the virtual object at the location based on the anchor points and the mapping.

[0113] In some specific implementations, a non-transitory computer-readable storage medium stores computer-executable program instructions on a computer to perform operations, including at a first electronic device having a first image sensor: acquiring a first set of keyframes based on an image of a physical environment captured by the first image sensor, the first set of keyframes being defined in a first coordinate system; receiving a second set of keyframes corresponding to an image of the physical environment captured at a second electronic device, the second set of keyframes being defined in a second coordinate system different from the first coordinate system; generating a first mapping that defines the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the first coordinate system; and at a second electronic device having a second image sensor: receiving the first set of keyframes corresponding to an image of the physical environment captured at the first electronic device; and generating a second mapping that defines the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the second coordinate system.

[0114] This document sets forth numerous specific details to provide a comprehensive understanding of the claimed subject matter. However, those skilled in the art will understand that the claimed subject matter can be practiced without these specific details. In other instances, methods, apparatus, or systems known to a person of ordinary skill have not been described in detail so as not to obscure the claimed subject matter.

[0115] Unless otherwise specifically stated, it should be understood that throughout this specification, discussions using terms such as “processing,” “calculating,” “computing,” “determining,” and “identifying” refer to the actions or processes of computing devices, such as one or more computers or similar electronic computing devices, which manipulate or convert data representing physical electronic or magnetic quantities within the memory, registers, or other information storage, transmission, or display devices of a computing platform.

[0116] The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device may include any suitable arrangement of components that provide results conditioned on one or more inputs. Suitable computing devices include computer systems based on multi-purpose microprocessors that access stored software that programs or configures the computing system from a general-purpose computing device to a special-purpose computing device that implements one or more specific embodiments of the subject matter of this invention. The teachings contained herein can be implemented in the software used for programming or configuring the computing device using any suitable programming, scripting, or other type of language or combination of languages.

[0117] Specific implementations of the methods disclosed herein can be performed in the operation of such computing devices. The order of the boxes presented in the above examples can be changed; for example, the boxes can be reordered, grouped, or divided into sub-boxes. Some boxes or procedures can be executed in parallel.

[0118] The use of “applies to” or “configured to” in this document implies open and inclusive language, which does not exclude applicability to or configuration to devices performing additional tasks or steps. Similarly, the use of “based on” implies openness and inclusivity, as processes, steps, calculations, or other actions “based on” one or more of the stated conditions or values ​​may in practice be based on additional conditions or values ​​beyond those stated. The headings, lists, and numbering included in this document are for illustrative purposes only and are not intended to be restrictive.

[0119] It will also be understood that while terms such as "first," "second," etc., may be used in this document to describe various elements, these elements should not be limited by these terms. These terms are merely used to distinguish one element from another. For example, a first node can be called a second node, and similarly, a second node can be called a first node, changing the meaning of the description, provided that all occurrences of "first node" are consistently renamed and all occurrences of "second node" are consistently renamed. First nodes and second nodes are both nodes, but they are not the same node.

[0120] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the claims. As used in the description of these embodiments and the appended claims, the singular forms “a” and “the” are intended to also cover the plural forms, unless the context clearly indicates otherwise. It will also be understood that the term “and / or” as used herein refers to and covers any and all possible combinations of one or more of the associated listed items. It will also be understood that the term “comprises” (or “comprising”) as used in this specification specifies the presence of the stated features, integers, steps, operations, elements, or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof.

[0121] As used herein, the term "if" can be interpreted as meaning "when the prerequisite is true" or "when the prerequisite is true" or "in response to determination" or "according to determination" or "in response to detection" that the prerequisite is true, depending on the context. Similarly, the phrases "if it is determined [the prerequisite is true]" or "if [the prerequisite is true]" or "when [the prerequisite is true]" are interpreted as meaning "when it is determined that the prerequisite is true" or "in response to determination" or "according to determination" that the prerequisite is true or "when the prerequisite is detected" or "in response to detection" that the prerequisite is true, depending on the context.

[0122] The foregoing description and overview of this disclosure should be understood as illustrative and exemplary in every respect, and not restrictive, and the scope of this disclosure as disclosed herein is determined not only by the detailed description of the illustrative specific embodiments, but also to the full extent permitted by patent law. It should be understood that the specific embodiments shown and described herein are merely illustrative of the principles of this disclosure, and various modifications can be made by those skilled in the art without departing from the scope and spirit of this disclosure.

Claims

1. A method for locating and mapping, comprising: At the first electronic device having the first image sensor: A first set of key frames is obtained based on images of the physical environment captured by the first image sensor. The first set of key frames is defined in a first coordinate system and includes several images, each of which is associated with the posture of the corresponding electronic device at the time the corresponding image is acquired. The second set of keyframes corresponding to an image of the physical environment captured by a second image sensor at a second electronic device is communicatively received. The second set of keyframes is defined in a second coordinate system different from the first coordinate system. The second set of keyframes includes several images, each of which is associated with the pose of the corresponding electronic device at the time the corresponding image is acquired. as well as Generating a first mapping, the first mapping defining the relative positions of keyframes in the first set of keyframes and the second set of keyframes in the first coordinate system, wherein generating the first mapping includes: generating an internal mapping, the internal mapping defining the relative positions of the first set of keyframes in the first coordinate system; generating an external mapping, the external mapping defining the relative positions of the second set of keyframes; and pairing a keyframe from the first set of keyframes with a keyframe from the second set of keyframes; and at the second electronic device having the second image sensor: The first set of keyframes corresponding to an image of the physical environment captured at the first electronic device are communicatively received; and Generating a second mapping, which defines the relative positions of keyframes in the first set of keyframes and keyframes in the second set of keyframes in the second coordinate system, wherein generating the second mapping includes: generating an internal mapping, which defines the relative positions of keyframes in the second set of keyframes in the second coordinate system; generating an external mapping, which defines the relative positions of keyframes in the first set of keyframes; and pairing keyframes in the first set of keyframes with keyframes in the second set of keyframes. The first and second mappings are generated based on the exchange of keyframes between the first electronic device and the second electronic device.

2. The method according to claim 1, further comprising: At the first electronic device: Receive a third set of keyframes corresponding to an image of the physical environment captured at a third electronic device, the third set of keyframes being defined in a third coordinate system different from the first coordinate system; The first mapping is modified by including the relative positions of the selected keyframes in the third set of keyframes in the first coordinate system; as well as At the second electronic device: Receive the third set of keyframes corresponding to the image of the physical environment captured at the third electronic device; The second mapping is modified by including the relative positions of the selected keyframes in the third set of keyframes in the second coordinate system.

3. The method according to claim 2, further comprising: At the third electronic device: Receive the first set of keyframes corresponding to an image of the physical environment captured at the first electronic device; Receive the second set of keyframes corresponding to the image of the physical environment captured at the second electronic device; as well as A third mapping is generated, which defines the relative positions of the keyframes in the first set of keyframes, the second set of keyframes, and the third set of keyframes in the third coordinate system.

4. The method according to claim 1, further comprising: At the first electronic device: Receive a second anchor point from the second electronic device, the second anchor point defining the position of the second virtual object relative to one of the keyframes in the second set of keyframes; and Based on the second anchor point and the first mapping, a computer-generated reality (CGR) experience including the second virtual object is displayed at the location; and At the second electronic device: Receive a first anchor point from the first electronic device, the first anchor point defining the position of the first virtual object relative to a keyframe in the first set of keyframes; and Based on the first anchor point and the second mapping, a CGR experience including the first virtual object is displayed at the location.

5. The method of claim 1, wherein the keyframe further comprises camera image data and additional sensor data.

6. The method of claim 1, wherein the keyframe includes a representation of one or more features in the physical environment or a representation of a virtual object.

7. The method of claim 1, wherein the first mapping at the first electronic device or the second mapping at the second electronic device includes a representation of relocation information.

8. The method of claim 1, wherein the first mapping at the first electronic device or the second mapping at the second electronic device includes additional sensor information or map registration data.

9. The method of claim 1, wherein the first electronic device is performing location and mapping, and wherein the second electronic device is performing location and mapping.

10. The method according to claim 1, further comprising: At the first electronic device: Pair one keyframe from the first set of keyframes, which includes a set of matching features, with one keyframe from the second set of keyframes. Determine the first positioning information from the paired keyframe; as well as The external mapping is merged into the internal mapping using the first positioning information; as well as At the second electronic device: The keyframes in the first set of keyframes, which include a set of matching features, are paired with the keyframes in the second set of keyframes. Determine the second positioning information from the paired keyframe; as well as The external mapping is merged into the internal mapping using the second location information.

11. The method according to claim 1, further comprising: At the first electronic device: Based on the first mapping, a computer-generated reality (CGR) experience is displayed, including representations of physical objects and virtual objects within the physical environment; and At the second electronic device: The second mapping is used to display the CGR experience, which includes representations of physical objects in the physical environment and virtual objects.