Mapping objects in a local area around a head-mounted device to a local area model maintained by the head-mounted device
By integrating a depth sensor into a head-mounted device to generate a 3D model and combining it with an imaging device to detect the 2D image of the object, the problem of inaccurate object positioning and tracking in traditional methods is solved, achieving precise positioning and simplified navigation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CTRL-LABS CORP
- Filing Date
- 2024-07-28
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional computer vision methods cannot accurately locate objects in the three-dimensional local area around a head-mounted device, nor can they track or identify a specific object in different images, leading to navigation difficulties for users.
By integrating a depth sensor into a head-mounted device to generate a 3D model of a local area, and combining this with an imaging device to detect a 2D image of the object, the position of the object in the 3D model is determined and associated with the identification information, thus achieving precise positioning and navigation of the object.
It enables precise positioning and navigation of objects around the head-mounted device, simplifies information presentation, and enhances the user experience.
Smart Images

Figure CN122295697A_ABST
Abstract
Description
Technical Field
[0001] This disclosure generally relates to artificial reality systems, and more specifically to object detection within local regions of artificial reality systems. Background Technology
[0002] Various devices (e.g., augmented reality (AR) headsets) employ one or more computer vision methods to detect or recognize objects contained in images. For example, AR headsets include imaging devices that acquire images of local areas surrounding the AR headset and detect one or more objects from these images. Object recognition within local areas can be performed at the category level, where objects are detected within a specific set of categories; or at the instance level, where a specific object is detected from the local area based on training with example images of a specific set of objects. One or more object registration methods are used to train the device (e.g., the AR headset) to recognize specific objects.
[0003] Traditional methods for detecting objects in localized regions use two-dimensional images of the localized region from an imaging device (e.g., an imaging device on a head-mounted display). However, identifying one or more objects from a two-dimensional image does not allow for the localization of objects in a three-dimensional localized region surrounding the head-mounted display. If the three-dimensional position of an object relative to the head-mounted display in the localized region cannot be determined, the head-mounted display cannot guide the user to that position within the localized region surrounding the head-mounted display.
[0004] Furthermore, many traditional computer vision methods for object detection treat each image of a received local region as an independent image of the received local region. This independent identification of objects in each image makes it difficult for many traditional object detection methods to determine whether an image contains a new object or another perspective of an object detected in one or more previously received images. Independent detection of objects in different images prevents traditional object detection methods from tracking or identifying a specific object in different images of a local region. Summary of the Invention
[0005] A device (e.g., an augmented reality (AR) head-mounted device) detects objects in a localized area around the device. In various embodiments, the imaging device acquires images of the localized area and detects one or more objects from these images. Because the images of the localized area are two-dimensional, detecting objects from the images of the localized area allows for object identification but cannot determine the object's position relative to the device (e.g., the head-mounted device) within the localized area.
[0006] To enhance object detection within a local region by leveraging the object's position relative to the head-mounted device, the head-mounted device determines a local region model based on depth information obtained from one or more depth sensors. This local region model is a three-dimensional representation of the local region surrounding the head-mounted device. For example, the head-mounted device includes one or more depth sensors that determine distances from the head-mounted device to various parts of the local region surrounding it. Based on the depth information, the head-mounted device determines the local region model, which represents the distances from the head-mounted device to different parts of the local region. The head-mounted device determines the object's position and / or bounding box within the local region model based on the position and / or size of the object detected in an image of the acquired local region, as well as parameters of the imaging device used to acquire the image. By storing the object's position within the local region model, the head-mounted device's display elements can display interface elements at positions determined based on the object's position within the local region model. Placing interface elements based on the object's position within the local region model allows for closer proximity to the object, simplifying the presentation of information about the object to the user. Furthermore, interface elements contribute to contextual understanding of the information presented by the head-mounted device.
[0007] According to one aspect, a method is provided, the method comprising: acquiring one or more images of a local region surrounding a head-mounted device worn by a user via one or more imaging devices included in the head-mounted device; detecting an object in the local region from the images of the local region acquired by the imaging devices; determining a bounding box of the object, the bounding box specifying the size of the region in the image including the object; determining a local region model of the local region based on depth information generated by one or more depth sensors included in the head-mounted device, the local region model including a three-dimensional reconstruction of the local region; determining the position of the object in the local region model based on the bounding box of the object and one or more parameters of the imaging devices; and storing the position of the object in the local region in association with information identifying the object.
[0008] Determining the position of an object in a local region model based on its bounding box and one or more parameters of the imaging device may include: determining the center of the bounding box based on its dimensions; generating a ray in the coordinate system of the local region model based on the parameters of the imaging device, the ray intersecting the center of the bounding box; and determining the position of the object in the local region model as the position in the local region model where the ray intersects.
[0009] The local region model may include one or more candidate regions, each candidate region having depth information that differs from the depth information of its neighbors by at least a threshold amount. Determining the position of the object in the local region model as the position where it intersects with the ray may include: determining the position of the object in the local region model as a candidate region where it intersects with the ray, or determining the position of the object in the local region model as a candidate region where the candidate region is located within a threshold distance of the local region where it intersects with the ray.
[0010] Storing the location of the object in a local area in association with information identifying the object may include: storing the location of the object in the local area in association with one or more tags obtained from the detection of the object.
[0011] The method may further include: receiving input from a user at a head-mounted device identifying the object and requesting navigation to the object; retrieving a stored location of the object in a local region model; determining the current location of the head-mounted device in the local region model based on depth information; generating a route from the current location of the head-mounted device in the local region model to the stored location of the object in the local region model; and displaying at least a portion of the generated route to the user via one or more display elements.
[0012] Generating a route from the current location of the head-mounted device in a local region model to the stored location of the object in the local region model may include: retrieving a stored confidence value associated with the stored location of the object in the local region model, the confidence value being based on the time the location of the object in the local region model was stored and a decay factor that decreases the confidence value over time between the current time and the current time; and displaying a message to the user via the one or more display elements in response to the confidence value falling below a threshold confidence value.
[0013] The method may further include: displaying an interface element to a user via one or more display elements of a head-mounted device, the interface element being displayed at a position in a local region model relative to the position of the object in the local region model.
[0014] The position of a display element in a local region model can be determined by an offset relative to a portion of the object's bounding box.
[0015] A portion of this interface element can interact with a portion of the bounding box of the object in the local region model.
[0016] According to another aspect, a head-mounted device is provided, the head-mounted device comprising: a frame; One or more display elements coupled to a frame, each display element configured to generate image light for presentation to a user; one or more imaging devices coupled to the frame, configured to acquire images of a local region surrounding the frame; a depth camera assembly configured to acquire depth information between a head-mounted device and multiple portions of the local region; and an object detection module including a processor and a non-transitory computer-readable storage medium having instructions encoded thereon that, when executed by the processor, cause the head-mounted device to: detect an object in the local region from an image of the local region acquired by the imaging devices; determine a bounding box of the object, the bounding box specifying the size of the region in the image that includes the object; determine a local region model of the local region based on the depth information, the local region model including a three-dimensional reconstruction of the local region; determine the position of the object in the local region model based on the bounding box of the object and one or more parameters of the imaging devices; and store the position of the object in the local region in association with information identifying the object.
[0017] Determining the position of an object in a local region model based on its bounding box and one or more parameters of the imaging device may include: determining the center of the bounding box based on its dimensions; generating a ray in the coordinate system of the local region model based on the parameters of the imaging device, the ray intersecting the center of the bounding box; and determining the position of the object in the local region model as the position in the local region model where the ray intersects.
[0018] The local region model may include one or more candidate regions, each candidate region's depth information differs from the depth information of its neighbors by at least a threshold amount, and determining the position of an object in the local region model as the position where it intersects with a ray may include: determining the position of the object in the local region model as a candidate region where it intersects with a ray in the local region model, or determining the position of the object in the local region model as a candidate region where the candidate region is located within a threshold distance of the local region where it intersects with a ray in the local region model.
[0019] Storing the location of the object in a local area in association with information identifying the object may include: storing the location of the object in the local area in association with one or more tags obtained from the detection of the object.
[0020] The non-transitory computer-readable storage medium may also have instructions encoded thereon that, when executed by a processor, cause the head-mounted device to perform the following operations: receiving input from a user identifying the object and requesting navigation to the object at the head-mounted device; retrieving the stored position of the object in a local region model; determining the current position of the head-mounted device in the local region model based on depth information; generating a route from the current position of the head-mounted device in the local region model to the position of the stored object in the local region model; and displaying at least a portion of the generated route to the user via one or more display elements.
[0021] Generating a route from the current location of the head-mounted device in a local region model to the location of a stored object in the local region model may include: retrieving a stored confidence value associated with the stored location of the object in the local region model, the confidence value being based on the time the location of the object in the local region model was stored and a decay factor that decreases the confidence value over time between the current time and the current time; and displaying a message to the user via the one or more display elements in response to the confidence value falling below a threshold confidence value.
[0022] The non-transitory computer-readable storage medium may also have instructions encoded thereon that, when executed by a processor, cause the head-mounted device to perform the following operations: displaying interface elements to a user through one or more display elements of the head-mounted device, the interface elements being displayed in a local region model at a position relative to the position of an object in the local region model.
[0023] The position of a display element in a local region model can be determined by an offset relative to a portion of the object's bounding box.
[0024] A portion of this interface element can interact with a portion of the bounding box of the object in the local region model.
[0025] According to another aspect, a non-transitory computer-readable storage medium according to claim 15 is provided.
[0026] In various embodiments, one or more imaging devices included in a user-worn head-mounted device acquire one or more images of a local region surrounding the head-mounted device. One or more objects in the local region are detected from the images acquired by the imaging devices, and bounding boxes are determined for the objects. The bounding boxes specify the size of the region in the image containing the object. A local region model of the local region is determined based on depth information generated by one or more depth sensors included in the head-mounted device. The local region model is a three-dimensional reconstruction of the local region. The position of the object in the local region model is determined based on the object's bounding box and one or more parameters of the imaging devices, and the position of the object in the local region model is stored in association with information identifying the object.
[0027] In some embodiments, the head-mounted device includes one or more display elements coupled to a frame, each display element being configured to generate image light presented to a user. One or more imaging devices coupled to the frame are configured to acquire images of a local region surrounding the frame, while a depth camera assembly is configured to acquire depth information between the head-mounted device and multiple portions of the local region. The object detection module includes a processor and a non-transitory computer-readable storage medium having instructions encoded thereon that, when executed by the processor, cause the head-mounted device to detect objects in the local region from images of the local region acquired by the imaging devices and to determine the bounding box of the object. The bounding box specifies the size of the region in the image containing the object. When executed by the processor, these instructions also cause the processor to: determine a local region model of the local region based on depth information obtained from one or more depth sensors, the local region model including a three-dimensional reconstruction of the local region. Furthermore, when executing the instructions, the processor causes the processor to: determine the position of the object in the local region model based on the object's bounding box and one or more parameters of the imaging devices, and to store the object's position in the local region in association with information identifying the object.
[0028] It will be appreciated that any feature described herein that is suitable for incorporation into one or more aspects or embodiments of this disclosure is intended to be generalizable in any and all aspects and embodiments of this disclosure. Other aspects of this disclosure will be understood by those skilled in the art based on the specification, claims, and drawings of this disclosure. The foregoing general description and the following detailed description are exemplary and illustrative only, and not intended to limit the scope of the claims. Attached Figure Description
[0029] Figure 1A This is a perspective view of a head-mounted device implemented as an eyeglass device according to one or more embodiments.
[0030] Figure 1BThis is a perspective view of a head-mounted device implemented as a head-mounted display according to one or more embodiments.
[0031] Figure 2 This is a block diagram of an object detection module included in a head-mounted device according to one or more embodiments.
[0032] Figure 3 This illustrates a method for determining the position of an object in a model of a local area surrounding a head-mounted device, according to one or more embodiments.
[0033] Figure 4 This is a process flowchart of a method for determining the position of an object in a model of a local area surrounding a head-mounted device, according to one or more embodiments.
[0034] Figure 5 This is an example of a head-mounted device, according to one or more embodiments, displaying interface elements based on the position of an object in a local region model of a local area surrounding the head-mounted device.
[0035] Figure 6 It is a system including a head-mounted device according to one or more embodiments.
[0036] These accompanying drawings depict various embodiments for illustrative purposes only. Those skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods shown herein can be employed without departing from the principles described herein. Detailed Implementation
[0037] Various devices (e.g., augmented reality (AR) headsets) employ one or more computer vision methods to detect objects within images. When detecting objects, a headset applies one or more object detection models to an image of a localized region surrounding the headset. One or more imaging devices included in or coupled to the headset acquire the image of the localized region. While detecting objects from an image of a localized region allows the headset to detect objects within that region, because the image is two-dimensional, detecting objects from an image provides the headset with only limited information about the object's position relative to the headset.
[0038] To determine the position of an object relative to a head-mounted device in a detected local region, one or more depth sensors included in the head-mounted device acquire depth information about the local region. This depth information is used to identify the distances between the head-mounted device and different parts of the local region, thus providing information about the distance between the object and the head-mounted device. The head-mounted device then generates a local region model based on the depth information; this model is a 3D reconstruction of the local region based on the distances between the head-mounted device and the various parts of the local region.
[0039] For an object detected in a local region, the head-mounted device determines the object's position within a local region model to determine the distance between the head-mounted device and the object. To determine the object's position within the local region model, the head-mounted device determines the center of the object's bounding box. In various embodiments, when an object is detected from an image, a bounding box is determined, where the size of the bounding box is determined based on the characteristics of the region in the image that includes the object. The head-mounted device determines the center of the bounding box based on its size and generates a ray based on parameters of the imaging device, including the imaging device's position and orientation within the local region, which intersects the center of the bounding box. The location where the ray intersects the local region model is determined as the object's position within the local region model, where the object's size within the local region is determined based on the imaging device's parameters. Storing the object's position within the local region model in association with information identifying the object allows the head-mounted device to subsequently locate the object within the local region model, or determine the positions of one or more interface elements relative to the object's position within the local region model.
[0040] Embodiments of the present invention may include, or may combine with, an artificial reality system. An artificial reality is a form of reality that has been adjusted in some way before being presented to a user. This artificial reality may include, for example, virtual reality (VR), augmented reality (AR), mixed reality (MR), hybrid reality, or some combination and / or derivative thereof. Artificial reality content may include fully generated content or generated content combined with captured (e.g., real-world) content. Artificial reality content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single-channel or multi-channel manner (e.g., stereoscopic video that produces a three-dimensional effect for the viewer). Furthermore, in some embodiments, the artificial reality may also be associated with applications, products, accessories, services, or some combination thereof for creating content in the artificial reality and / or otherwise using it in the artificial reality. Artificial reality systems that deliver artificial reality content can be implemented on a variety of platforms, including wearable devices (e.g., head-mounted devices) connected to a host computer system, standalone wearable devices (e.g., head-mounted devices), mobile devices or computing systems, or any other hardware platform capable of delivering artificial reality content to one or more viewers.
[0041] Figure 1AThis is a perspective view of a head-mounted device 100 implemented as an eyewear device according to one or more embodiments. In some embodiments, the eyewear device is a near-eye display (NED). Typically, the head-mounted device 100 can be worn on a user's face to present content (e.g., media content) using a display component and / or an audio system. However, the head-mounted device 100 can also be used to present media content to a user in different ways. Examples of media content presented by the head-mounted device 100 include one or more images, videos, audio, or some combination thereof. The head-mounted device 100 includes a frame and may include a display component, a depth camera assembly (DCA), an audio system, and other components such as a position sensor 190, the display component including one or more display elements 120. Although Figure 1A The illustration shows example locations of components of the head-mounted device 100 on the head-mounted device 100, but these components may be located at other locations on the head-mounted device 100; on a peripheral device paired with the head-mounted device 100; or some combination thereof. Similarly, the head-mounted device 100 may have more than Figure 1A The components shown may have more or fewer components.
[0042] Frame 110 holds other components of the head-mounted device 100. Frame 110 includes a front component that holds one or more display elements 120, and an end component (e.g., a temple) that attaches to the user's head. The front component of frame 110 extends across the top of the user's nose. The length of the end component may be adjustable (e.g., an adjustable temple length) to fit different users. The end component may also include a curved portion behind the user's ears (e.g., a temple tip, an ear piece).
[0043] One or more display elements 120 provide light to a user wearing a head-mounted device 100. As shown, the head-mounted device includes a display element 120 for each of the user's eyes. In some embodiments, the display elements 120 generate image light that is provided to the eye box of the head-mounted device 100. The eye box is the location of the user's eyes in the space occupied when wearing the head-mounted device 100. For example, the display element 120 may be a waveguide display. A waveguide display includes a light source (e.g., a two-dimensional source, one or more line sources, one or more point sources, etc.) and one or more waveguides. Light from the light source is coupled into one or more waveguides, which output light in such a way that pupil replication exists in the eye box of the head-mounted device 100. The coupling of light in and / or the coupling of light out of one or more waveguides may be accomplished using one or more diffraction gratings. In some embodiments, the waveguide display includes a scanning element (e.g., a waveguide, a mirror, etc.) that scans the light as it is coupled into the one or more waveguides. Note that in some embodiments, one or both of the display elements 120 are opaque and do not transmit light from a local area surrounding the head-mounted device 100. This local area is the area surrounding the head-mounted device 100. For example, this local area could be a room where a user wearing the head-mounted device 100 is inside, or the user wearing the head-mounted device 100 might be outdoors and the local area is an outdoor area. In this context, the head-mounted device 100 generates VR content. Alternatively, in some embodiments, one or both of the display elements 120 are at least partially transparent, such that light from the local area can be combined with light from the one or more display elements to generate AR and / or MR content.
[0044] In some embodiments, the display element 120 does not generate image light, but rather acts as a lens that transmits light from a localized area to the eye-friendly area. For example, one or both of these display elements 120 may be an uncorrected (over-the-counter) lens or a prescription lens (e.g., a single-vision lens, bifocal lens, trifocal lens, or progressive lens) that helps correct a user's visual impairment. In some embodiments, the display element 120 may be polarized and / or tinted to protect the user's eyes from the sun's rays.
[0045] In some embodiments, display element 120 may include another optical component block (not shown). The optical component block may include one or more optical elements (e.g., lenses, Fresnel lenses, etc.) that guide light from display element 120 to the eye-friendly area. The optical component block may, for example, correct some or all aberrations in the image content, magnify some or all of the image, or some combination thereof.
[0046] DCA determines depth information for a portion of a local area surrounding the head-mounted device 100. DCA includes one or more imaging devices and a DCA controller (not included in...). Figure 1A As shown in the figure, it may also include an illuminator 140. In some embodiments, the illuminator 140 uses light to illuminate a portion of a local area. This light may be, for example, infrared (IR) structured light (e.g., dot-patterned structured light, strip structured light, etc.), an IR flash for time-of-flight (ToF), etc. In some embodiments, one or more imaging devices 130 acquire an image of a portion of the local area including light from the illuminator 140. As shown in the figure, Figure 1A A single illuminator 140 and two imaging devices 130 are shown. In an alternative embodiment, there is no illuminator 140 but at least two imaging devices 130.
[0047] The DCA controller uses the acquired images and one or more depth determination techniques to calculate the depth information of that portion of a local area. The depth determination techniques can be, for example, direct time-of-flight (ToF) depth sensing, indirect ToF depth sensing, structured light, passive stereo analysis, active stereo analysis (using textures added to the scene by light from illuminator 140), some other techniques for determining the depth of the scene, or some combination thereof.
[0048] The DCA may include an eye-tracking unit that determines eye-tracking information. The eye-tracking information may include information about the position and orientation of a single or both eyes (within their respective adaptive zones). The eye-tracking unit may include one or more cameras. The eye-tracking unit estimates the angular orientation of a single or both eyes based on image acquisition from the one or more cameras. In some embodiments, the eye-tracking unit may also include one or more illuminators that illuminate a single or both eyes with an illumination pattern (e.g., structured light, flash, etc.). The eye-tracking unit can use the illumination pattern in the acquired images to determine the eye-tracking information. The head-mounted device 100 may prompt a user to opt in to allow the eye-tracking unit to operate. For example, by opting in, the head-mounted device 100 may detect and store any images of the user or the user's eye-tracking information.
[0049] An eye-tracking unit determines the user's gaze direction based on information related to the position and orientation of one or both of the user's eyes. For example, the eye-tracking unit determines a vector or ray representing the user's gaze point relative to the user's head position. In various embodiments, the eye-tracking unit determines the user's gaze point for each eye based on the position and orientation of each of the user's eyes. In various embodiments, the eye-tracking unit may employ various models or combinations of models to determine the user's gaze direction based on the position and orientation information of one or both of the user's eyes; an audio system provides audio content. The audio system includes a transducer array, a sensor array, and an audio controller 150. However, in other embodiments, the audio system may include different components and / or additional components. Similarly, in some cases, the functions described with reference to the components of the audio system may be distributed among the multiple components in a manner different from that described herein. For example, some or all of the functions of the controller may be performed by a remote server.
[0050] In various embodiments, DCA is coupled to the object detection module, as described below. Figure 2 Further described. The object detection model generates a local region model based on the depth information calculated by DCA. This local region model is a 3D reconstruction of a local area surrounding the head-mounted device 100. Therefore, the local region model provides an indication of the distances between the head-mounted device 100 and various parts of the local area surrounding the head-mounted device 100. (See below for further details.) Figures 2 to 5 Further described, the object detection module also receives images of local regions from one or more imaging devices 130, and detects objects within the local regions from the acquired images. For detected objects, the object detection module utilizes a local region model to determine the object's position within the local region model, thereby allowing the head-mounted device 100 to recognize the object and determine its distance and orientation relative to the head-mounted device 100. (See the following description...) Figures 2 to 5 As further described, storing the location of an object in a local region model allows the head-mounted device 100 to locate the object or consider its location in the local region model when displaying one or more interface elements via the display element 120.
[0051] A transducer array presents sound to the user. This transducer array includes multiple transducers. The transducers may be speakers 160 or tissue transducers 170 (e.g., bone conduction transducers or cartilage conduction transducers). Although speakers 160 are shown external to frame 110, they may be enclosed within frame 110. In some embodiments, the head-mounted device 100 includes a speaker array instead of individual speakers for each ear, comprising multiple speakers integrated into frame 110 to improve the directionality of the presented audio content. Tissue transducers 170 are coupled to the user's head and directly vibrate the user's tissue (e.g., bone or cartilage) to generate sound. The number and / or location of the transducers may vary depending on the specific device. Figure 1A The quantities and / or positions shown are different.
[0052] A sensor array detects sound within a localized area of a head-mounted device 100. The sensor array includes multiple acoustic sensors 180. Each acoustic sensor 180 collects sound emitted from one or more sound sources within a localized area (e.g., a room). Each acoustic sensor is configured to detect sound and convert the detected sound into an electronic format (analog or digital). The acoustic sensor 180 may be a sound wave sensor, a microphone, a sound transducer, or a similar sensor suitable for detecting sound.
[0053] In some embodiments, one or more acoustic sensors 180 may be placed in the ear canal of each ear (e.g., acting as a binocular microphone). In some embodiments, these acoustic sensors 180 may be placed on the outer surface of the head-mounted device 100, on the inner surface of the head-mounted device 100, separate from the head-mounted device 100 (e.g., as part of some other device), or some combination of the above locations. The number and / or location of the acoustic sensors 180 may vary with... Figure 1A The number and / or locations shown may vary. For example, the number of acoustic detection locations can be increased to increase the amount of audio information collected and improve the sensitivity and / or accuracy of that information. The acoustic detection locations can be oriented such that the microphone can detect sound in a wide range of directions around the user wearing the headset 100.
[0054] Audio controller 150 processes information from a sensor array describing the sound detected by the sensor array. Audio controller 150 may include a processor and a computer-readable storage medium. Audio controller 150 may be configured to generate direction of arrival (DOA) estimates, generate acoustic transfer functions (e.g., array transfer function and / or head-related transfer function), track the location of sound sources, form beams in the direction of sound sources, classify sound sources, generate sound filters for loudspeaker 160, or some combination thereof.
[0055] Position sensor 190 generates one or more measurement signals in response to motion of head-mounted device 100. Position sensor 190 may be located on a portion of frame 110 of head-mounted device 100. Position sensor 190 may include an inertial measurement unit (IMU). Examples of position sensor 190 include one or more accelerometers, one or more gyroscopes, one or more magnetometers, other suitable types of sensors for detecting motion, a class of sensors for error correction of the IMU, or some combination thereof. Position sensor 190 may be located external to the IMU, internal to the IMU, or some combination thereof.
[0056] In some embodiments, the head-mounted device 100 can provide simultaneous localization and mapping (SLAM) for updating the position of the head-mounted device 100 and the model of a local region. For example, the head-mounted device 100 may include a passive camera assembly (PCA) that generates color image data. The PCA may include one or more RGB (red, green, blue) cameras that acquire images of some or all of a local region. In some embodiments, some or all of the imaging devices 130 of the DCA may also be used as the PCA. The images acquired by the PCA and the depth information determined by the DCA can be used to determine parameters of the local region, generate a model of the local region, update the model of the local region, or some combination thereof. In addition, the position sensor 190 tracks the position (e.g., location and orientation) of the head-mounted device 100 within a room. The following is combined with Figure 6 Additional details regarding the components of the head-mounted device 100 are discussed.
[0057] Figure 1B This is a perspective view of a head-mounted device 105 implemented as an HMD according to one or more embodiments. In embodiments describing AR and / or MR systems, a portion of the front of the HMD is at least partially transparent in the visible wavelength range (approximately 380 nm to 750 nm), and a portion of the HMD between the front of the HMD and the user's eyes is at least partially transparent (e.g., a partially transparent electronic display). The HMD includes a front rigid body 115 and a band 175. The head-mounted device 105 includes the above-referenced... Figure 1A Many of the same components are described, but modified for integration with the HMD shape elements. For example, the HMD includes a display assembly, DCA, audio system, and position sensor 190. Figure 1BThe image shows an illuminator 140, multiple speakers 160, multiple imaging devices 130, multiple acoustic sensors 180, and a position sensor 190. These speakers 160 can be located in various positions, such as being coupled to a band 175 (as shown), coupled to a front rigid body 115, or configured to be inserted into a user's ear canal.
[0058] Figure 2 This is a block diagram of one embodiment of the object detection module 200. In various embodiments, the object detection module 200 is included in the head-mounted device 100. For example, the object detection module 200 is included in or coupled to the frame 110 of the head-mounted device 100. In other embodiments, the object detection module 200 is physically separate from the frame 110 of the head-mounted device 100 and communicatively coupled to one or more components of the frame 110. For example, the object detection module 300 is included in a server or other computing device that is communicatively coupled to the frame 110 via a network or other communication channel. Figure 3 In one example, the object detection module 200 includes an object detection model 205, a location tracker 210, a local region modeler 215, an object mapping module 220, and an object index 225. In other embodiments, the object detection module 200 includes components combined with... Figure 2 The components described are compared to additional, different, or fewer components.
[0059] Furthermore, the object detection module 200 includes a processor and one or more non-transitory computer-readable storage media. The one or more non-transitory computer-readable storage media have instructions encoded thereon that, when executed by the processor, cause the processor to provide the following... Figure 2 Further description of the functionality.
[0060] Object detection model 205 includes one or more trained models that detect objects in images of local regions surrounding head-mounted device 100. For example, one or more imaging devices 130 acquire images of the local region. In various embodiments, imaging devices 130 are located on the frame of head-mounted device 100, as described above. Figure 1A and Figure 1B To be further described, for example, the field of view of a local area of one or more imaging devices 130 at least partially overlaps with the field of view of a local area of a user wearing head-mounted device 100.
[0061] In various embodiments, the object detection model 205 includes a category classifier. The category classifier is a trained model applied to received images of local regions from one or more imaging devices 130. In various embodiments, the category classifier is a trained region-based convolutional neural network (R-CNN). The category classifier identifies one or more categories or types of objects contained within a region of the image based on the characteristics of different regions of the local region image. Instead of identifying specific objects from the image, the category classifier identifies regions in the image that contain one or more categories or types of objects, and the category classifier is trained for these categories or types. For example, the category classifier might identify a region in the image containing an object of category "cup" as a candidate object, but it does not distinguish between different objects of category "cup" in the image. Therefore, the category classifier identifies regions in the image of a local region that may contain one or more categories of objects.
[0062] In various embodiments, alternatively or additionally, the object detection model 205 includes an instance classifier. The instance classifier detects specific objects in images from the imaging device 130. To detect a specific object, a registration process is first initiated and performed by a user of the head-mounted device 100 to register the specific object with the instance classifier. The instance classifier detects the identified specific object, rather than the object's category, thereby allowing it to distinguish different objects of the same category.
[0063] In some embodiments, object detection model 205 includes a zero-shot object detection model. The zero-shot object detection model utilizes an open-vocabulary object detector that detects objects in an image based on free-text queries without requiring fine-tuning of the model using a labeled dataset. Open-vocabulary detection is achieved by embedding a free-text query using a text encoder and using it as input to an object classification and localization head. The zero-shot instance-level object detection model can identify a specific instance without requiring any prior training data from the user for the object instance (e.g., “blue pen”). In one or more example embodiments, the zero-shot object detection model can identify objects based on free-text queries. For example, in response to the question “Do you see my blue pen around here?”, the zero-shot instance-level object detection model can identify the specific pen without any prior labeled training data for the queried specific pen.
[0064] In some embodiments, an instance classifier is a machine learning model that includes a set of weights. These weights are parameters used by the machine learning model to transform input data received by the model into output data. For an instance classifier, the input data includes one or more images of an object, and the output is a label applied to the object, which is used by a user to identify the object. The weights can be generated through a training process, wherein the machine learning model is trained based on a set of training examples and labels associated with those training examples. In various embodiments, the training process includes: applying the machine learning model to the training examples, comparing the output of the machine learning model with the labels associated with the training examples, and updating the weights associated with the machine learning model through a backpropagation process. The weights can be stored on one or more computer-readable media to constitute the instance classifier. The training examples are images of identified objects acquired by one or more imaging devices 130. Subsequently, the instance classifier receives one or more images and detects objects in the one or more images. Thus, the instance classifier can detect specific objects in the images, while the category classifier identifies the type or category of objects contained in the one or more images. In various embodiments, the object detection model 205 includes a category classifier and an instance classifier.
[0065] For a region in an image where the object detection model 205 detects an object, the object detection model 205 also specifies the size of the object. In various embodiments, the object detection model 205 determines the bounding box of the object, which is included in the region enclosed by the bounding box in the image. The object detection model 205 determines the size of the bounding box of each object based on the characteristics of the region including the object, so different objects may be surrounded by bounding boxes of different sizes. Furthermore, in various embodiments, the object detection model 205 determines the size of the bounding box without user input, thereby simplifying the identification of objects in the image. In various embodiments, the object detection model 205 identifies the coordinates of the bounding box of each detected object in the image and associates an object identifier with each bounding box to identify different objects. The object detection model 205 also associates a label with each detected object, wherein the label identifies: a category corresponding to the object, an instance corresponding to the object, or a combination of a category corresponding to the object and an instance corresponding to the object.
[0066] although Figure 2An example is shown where object detection model 205 is included in object detection module 200; however, in other embodiments, object detection model 205 is included in a device separate from object detection module 200. For example, object detection module 200 is included in frame 110 of head-mounted device 100, while a server or other computing device communicatively coupled to head-mounted device 100 performs object detection model 205. In the preceding example, the computing device performing object detection module 205 receives images from local regions of one or more imaging devices 130 included in head-mounted device 100 and transmits information to head-mounted device 100 identifying one or more objects detected in the one or more images.
[0067] Position tracker 210 acquires one or more measurement signals from one or more position sensors 190. One or more of these measurement signals are generated in response to motion of head-mounted device 100. In some embodiments, one or more measurement signals are received from a depth camera assembly (DCA) of head-mounted device 100 or one or more other depth sensors included in head-mounted device 100. Position tracker 210 determines the position of head-mounted device 100 in a local region surrounding head-mounted device 100 based on the one or more measurement signals. When determining the position of head-mounted device 100, position tracker 210 determines the coordinate system of the local region surrounding head-mounted device 100 and the position of head-mounted device 100 in the coordinate system of that local region. In various embodiments, position tracker 210 uses one or more Simultaneous Localization and Mapping (SLAM) methods to determine the three-dimensional position and orientation of head-mounted device 100 in the local region.
[0068] In various embodiments, the position tracker 210 performs one or more methods to reduce errors in the determined position of the head-mounted device 100 in a local region over time. When a user moves within a local region while wearing the head-mounted device 100, the previously determined position of the head-mounted device 100 deviates from a corresponding feature (e.g., a corresponding object) in the local region. To compensate for such errors, the position tracker 210 caches features of the local region (e.g., a reference position or reference object in the local region) and user head transformations determined at different times for a specific position in the local region, where the transformations represent the position and orientation of the head-mounted device 100 in the local region. The position tracker 210 matches the reference position or reference object in images acquired by one or more imaging devices 130 and modifies the head transformation for the user based on a comparison of the reference position or reference object with the cached reference position or reference object. Modifying the transformation for the user based on the comparison allows the position tracker 210 to reduce the cumulative error in the determined position of the head-mounted device 100 in the local region.
[0069] Local region modeler 215 receives data from the depth camera assembly (DCA) of head-mounted device 100 and generates a local region model, which is a three-dimensional representation of a local area surrounding head-mounted device 100. Based on depth information from the DCA or other depth sensors, the local region model includes information that roughly identifies the position and shape of objects within the local region. Different distances between different parts of the local region model and head-mounted device 100 are determined by the depth information; therefore, the local region model provides a representation of the distances between head-mounted device 100 and different parts of the local region.
[0070] Object mapping module 220 identifies the position of objects detected by object detection model 205 within a local region model. Object mapping module 220 receives an image from imaging device 130 of head-mounted device 100, and a set of objects detected in the image by object detection model 205. In various embodiments, each object in the set of objects detected in the image includes a corresponding bounding box and one or more labels (e.g., category label, instance label). Object mapping module 220 determines the position of the object in the local region model by determining a ray passing through the bounding box of the object and projecting that ray onto the local region model. Object mapping module 220 identifies the position of the object as the location in the local region model where the ray passing through the bounding box of the object intersects, as described below. Figure 3 and Figure 4 Further described. In addition, the object mapping module 220 determines the size of the object in the local region model based on the size of the object's bounding box, as described below. Figure 3 and Figure 4 Further description.
[0071] The object mapping module 220 determines the position of an interface element to be displayed near the object by one or more display elements 120 of the head-mounted device 100, based on the determined position of the object in the local region model. For example, the object mapping module 220 determines the offset between the boundary of the interface element in the local region model and a portion of the object's bounding box, and positions a portion of the interface element at the determined offset, so that the interface element is displayed with the offset between the boundary of the interface element and a portion of the bounding box. This placement allows the interface element to be displayed near an object that is visible to the user in the local region. In other embodiments, other methods of interacting with the detected object may be used to display the interface element. For example, if the user prompts the object detection module 200 to navigate to a specific cached object, the interface element can provide navigation guidance to the location of that object (e.g., located in a different room).
[0072] In response to determining the location of an object in the local region model, object mapping module 220 updates data stored in object index 225 based on the determined object location. In various embodiments, object mapping module 220 combines one or more labels of the object determined by object detection model 205 to determine the object's location in the local region model. For example, object mapping module 220 determines coordinates in the local region model and associates these coordinates with one or more labels identifying the object's category or instance. The combination of coordinates and one or more labels is stored in object index 225 for subsequent retrieval of the object's location in the local region.
[0073] Object index 225 includes combinations of previously determined object coordinates within a local region model, each of which is associated with one or more labels. In various embodiments, object mapping module 220 updates the stored combinations of object coordinates within the local region model that correspond to one or more labels. Object mapping module 220 compares the combinations of coordinates within the local region model with one or more labels stored in object index 225. In response to determining that a combination of coordinates within the local region model with one or more labels does not match at least one such combination in object index 225, object mapping module 220 stores the combinations of coordinates within the local region model with one or more labels in object index 225 to identify newly detected objects. In various embodiments, object mapping module 220 determines whether the combination of coordinates within the local region model with one or more labels is within a threshold distance of the combinations of coordinates within the local region model with one or more labels stored in object index 225. The object mapping module 220 updates the combination of coordinates within the local region model and one or more labels stored in the object index 225 to include the determined combination of coordinates within the local region model, in response to determining that the combination of coordinates within the local region model and one or more labels is within a threshold distance of the combination of coordinates within the local region model and one or more labels stored in the object index 225. This allows the data stored in the object index 225 to be updated to reflect changes in the position of objects within the local region model over time, thereby allowing the object index 225 to maintain the current position of objects within the local region model.
[0074] In some embodiments, the object mapping module 220 may group objects together to form a single object. In these embodiments, the object mapping module 220 determines whether the bounding boxes of two objects sufficiently intersect and whether the two objects have the same label. If so, the object mapping module 220 may group the two objects into a single object. In other embodiments, the object mapping module 220 may group objects based on the sufficient intersection of their shapes (e.g., three-dimensional or two-dimensional shapes).
[0075] Figure 3 This is a flowchart of a method for determining the position of an object in a model of a local region surrounding a head-mounted device 100, according to one or more embodiments. Figure 3 The process shown can be performed by a component of the object detection system (e.g., object detection module 200). In other embodiments, other entities can perform the process. Figure 3 Some or all of the steps in the process. Implementations may include different or additional steps, or these steps may be performed in a different order.
[0076] As mentioned above Figure 1A and Figure 1B Further described, the head-mounted device 100 includes one or more imaging devices 130 that acquire 305 images of a local area surrounding the head-mounted device 100. In various embodiments, the imaging devices 130 are positioned on the frame 110 such that the field of view of the imaging devices 130 at least partially overlaps with the field of view of a user wearing the head-mounted device 100. Such embodiments allow the images 305 acquired by the one or more imaging devices 130 to reflect the localized viewpoint as seen by the user. In various embodiments, the localized area includes one or more objects that the user can see through the head-mounted device 100.
[0077] To enhance the visibility of objects in a local environment to the user using additional information generated or received by the head-mounted device 100, the head-mounted device 100 detects one or more objects in one or more images of a local region acquired by the imaging device 130. (As described above...) Figure 2 Further described, in various embodiments, the head-mounted device 100 includes an object detection module 200 that applies one or more object detection models 205 to images of local regions from the imaging device 130. The object detection model 205 identifies regions in the image containing objects based on characteristics of different regions of the image. As described above... Figure 2 As further described, in various embodiments, the object detection model 205 can identify the category of an object in an image, can identify instances of an object in an image, or can identify a combination of the category of an object and instances of an object in an image.
[0078] When an object 310 is detected in an image, the size of each object is identified in the image where the object is detected. Object detection model 205 determines a bounding box 315 for each object to represent the size of the region in the image corresponding to that object. The region in the image corresponding to the object is enclosed within the bounding box to distinguish the object from other regions of the image. Object detection model 205 determines the size of the bounding box 315 based on the characteristics of the region containing the object; therefore, different objects may be enclosed by bounding boxes of different sizes. In various embodiments, the size of the bounding box of the region containing the object in the image 315 can be determined without user input. For example, head-mounted device 100 identifies the coordinates within the image of the bounding box of each region in the image that includes the detected object and associates an identifier with each detected object and its corresponding bounding box. In various embodiments, one or more labels are associated with the coordinates of each bounding box in the image. The labels may identify the following: the category of the detected object, an instance of the detected object, a combination of an instance of the detected object and its category, or other information describing or identifying the detected object. Alternatively or additionally, an object identifier is associated with the bounding box to distinguish the bounding box and its corresponding object from other objects in the image.
[0079] One or more depth sensors (e.g., a depth camera assembly (DCA)) included in the head-mounted device 100 acquire measurement signals describing a local region. These measurement signals include depth information that identifies the distances between multiple parts of the local region (e.g., objects) and the head-mounted device 100. The head-mounted device 100 determines its position within the local region based on the depth information in one or more measurement signals. In various embodiments, the head-mounted device 100 determines a coordinate system of a local region surrounding the head-mounted device 100 based on the measurement signals, and determines the position of the head-mounted device 100 within that coordinate system. For example, the head-mounted device 100 uses one or more Simultaneous Localization and Mapping (SLAM) methods to determine its position within the coordinate system of the local region. As described above... Figure 2 As further described, in various embodiments, the head-mounted device 100 performs one or more methods to reduce errors in the determined position of the head-mounted device 100 in a local area over time.
[0080] The head-mounted device 100 determines a local region model of the local area surrounding the head-mounted device 100 based on measurement signals and the position of the head-mounted device 100 within a local region. The local region model is a three-dimensional representation of the local area surrounding the head-mounted device 100. Different parts of the local region model correspond to portions of the local area at different distances from the head-mounted device 100.
[0081] In various embodiments, the local region model includes candidate regions corresponding to potential locations of objects within the local region, wherein these candidate regions are determined based on depth information. For example, a candidate region corresponds to a region within the local region whose depth information differs from that of adjacent regions within the local region by at least a threshold amount. In the preceding example, the differences in depth information between different regions of the local region determine the boundaries of the candidate regions in the local region model. Identifying candidate regions in the local region model allows the head-mounted device 100 to utilize the depth information of the local region to identify potential regions for one or more objects within the local region model.
[0082] For an object 310 detected in an image of a local region, head-mounted device 100 determines 325 the position of the object in the local region model based on a bounding box 315 defined for the object. In various embodiments, head-mounted device 100 determines the coordinates of the bounding box of the object in the acquired image where the object was detected 310. Head-mounted device 100 determines the center of the bounding box based on the coordinates of the bounding box using one or more methods. For example, head-mounted device 100 determines the center of the bounding box based on the width and height of the bounding box. When determining the center of the bounding box of the object, head-mounted device 100 also considers the characteristics of the imaging device 130 that acquired the image of the local region. For example, the head-mounted device (e.g., the object mapping module 220 of the object detection module 200) determines the height and width of the imaging device 130 to take into account the resolution of the imaging device 130, and modifies the center of the determined bounding box based on the height and width of the camera, so that the size of the bounding box used to determine its center takes into account the parameters of the imaging device (e.g., the resolution or aspect ratio of the imaging device 130).
[0083] After determining the center of the bounding box of the object, the head-mounted device 100 (e.g., object detection module 200) generates a ray that intersects the center of the bounding box in a coordinate system corresponding to the local region model. In various embodiments, the ray is perpendicular to the bounding box and intersects the bounding box at its center. When determining the ray passing through the center of the bounding box, the head-mounted device 100 uses an imaging device transformation and an imaging device projection transformation. The imaging device transformation is based on the position and orientation of the imaging device 130 within the local region (determined by the position sensor 190 of the head-mounted device); the imaging device projection transformation is determined based on the imaging device transformation (the position and orientation of the imaging device 130 within the local region) combined with parameters of the imaging device 130 (e.g., focal length, pixel size, image origin, etc.). For example, the center of the bounding box containing the object is scaled based on the imaging device transformation and the imaging device projection transformation to generate a ray passing through the center of the bounding box in a coordinate system corresponding to the local region model.
[0084] The head-mounted device 100 determines the position 325 of an object in a local region model as a position in the local region model where a ray intersecting the center of the object's bounding box intersects the local region model. In various embodiments, the head-mounted device 100 determines the position of the object in the local region model as the position of the candidate region in response to a ray from the center of the object's bounding box intersecting at least a portion of a candidate region included in the local region model. In some embodiments, the head-mounted device 100 may determine the position 325 of the object as the candidate region in response to a ray from the center of the object's bounding box intersecting the local region model within a threshold distance of the candidate region. Furthermore, in some embodiments, the head-mounted device 100 ignores a detected object in response to a distance greater than a threshold distance between the intersection of the ray from the center of the object's bounding box and the local region model and the candidate region in the local region model. This allows the head-mounted device 100 to consider the depth information of the local region, mapping the object to a portion of the local region model corresponding to a region whose depth information differs from that of adjacent regions of the local region.
[0085] In addition to determining the position of the object in the local region model, the head-mounted device 100 also determines the distance from the head-mounted device 100 to the object. In various embodiments, the head-mounted device 100 determines the distance between itself and the object in the local region model by scaling the distance using an imaging device transformation. Since the imaging device transformation is based on the position and orientation of the imaging device 130 within the local region, determining the distance between the head-mounted device 100 and the object based on the imaging device transformation improves the accuracy of determining the distance by taking into account the position and orientation of the head-mounted device 100 within the local region.
[0086] Furthermore, in some embodiments, the head-mounted device 100 determines the size of an object in a local region model. When determining the size of an object in the local region model, the head-mounted device 100 determines the size of the object in the image in which the object was detected. In various embodiments, the head-mounted device 100 determines the size of the object in the image based on the size of the bounding box corresponding to the object. The head-mounted device 100 scales the determined size of the object in the image based on parameters of the imaging device 130 that acquired the image and the image device projection transformation to determine the size of the object in the local region model. For example, the head-mounted device 100 determines the size of the object in the local region model as the ratio of the size of the object in the image (e.g., height, width) to the product of the corresponding size (e.g., height, width) of the imaging device 130 that acquired the image and the corresponding element of the imaging device projection transformation. In the preceding example, the head-mounted device 100 determines the width and height of the object to determine the size of the object in the local region model, so that the representation of the object in the local region model reflects the size of the object in the local region.
[0087] In response to determining the position of object 325 in a local region model, head-mounted device 100 stores data identifying the object and the object's position in the local region model. In some embodiments, the data identifying the object includes one or more labels determined for the object, such as the object's category or instance. In other embodiments, the data identifying the object is the already identified object. For example, head-mounted device 100 stores the determined coordinates of the object in local region model 325 and one or more labels identifying the object's category or instance in object index 225, as described above. Figure 2Further described. In various embodiments, the head-mounted device 100 updates the coordinates of a previously determined object 325 to the coordinates of the most recently determined object 325 in a local region model, thereby enabling the head-mounted device 100 to maintain the most recent position of the object in the local region model. For example, the head-mounted device 100 compares a combination of the object's coordinates in the local region model and one or more labels of the object to a stored combination of the object's coordinates in the local region model and one or more stored labels of the object. In response to a determined object's position in the local region model being within a threshold distance of the stored coordinates, and at least a threshold amount of the object's label matching the label associated with the stored coordinates (or data identifying a threshold amount of the object matching data associated with the stored coordinates), the head-mounted device 100 updates the stored coordinates associated with the stored object to the coordinates of the determined object. This allows the head-mounted device 100 to update the stored object's position in the local region using the most recently determined object position 325. In various embodiments, the head-mounted device 100 stores an object identifier associated with a combination of the determined 325 object's location in a local region and one or more tags of the object to simplify updating the object's location in the local region model and to simplify subsequent retrieval and identification of the object's location in the local region model.
[0088] In various embodiments, by determining the position of the object 325 in a local region model and the determined size of the object in the local region model, the head-mounted device 100 generates 330 interface elements for the object to be displayed by one or more display elements 120 of the head-mounted device 100. The position of the interface element in the local region model is based on the position of the object in the local region model. For example, a portion of the interface element contacts a portion of the corresponding bounding box of the object in the local region model. As another example, the head-mounted device 100 determines an offset between the boundary of the interface element in the local region model and a portion of the bounding box of the object, and positions a portion of the interface element at the determined offset, thereby displaying the interface element based on the offset between the boundary of the interface element and a portion of the bounding box of the object. This allows the interface element to be displayed close to the object within the local region, making it easier to display information about the object to the user. For example, in response to the head-mounted device 100 receiving one or more inputs from the user, the interface element is displayed to the user via display element 120.
[0089] In some embodiments, interface elements display information identifying an object (e.g., a label or identifier for the object specified by the user). In some embodiments, interface elements display characteristics or other information related to an item received from the user by the head-mounted device 100. Therefore, the interface element allows the head-mounted device 100 to display information about the object to a user near the object. Determining the object's position in a local region model allows the display element 120 of the head-mounted device 100 to display interface elements near the object visible to the user within the local region. Furthermore, generating interface elements based on the object's position in the local region model allows the head-mounted device 100 to update the display position of interface elements based on the object's position, so that when the object's position relative to the head-mounted device changes, the position of the interface element in the display element 120 also changes.
[0090] Furthermore, in some embodiments, the head-mounted device 100 utilizes stored information identifying the location of a determined object in a local region model to display information guiding the user to the object. For example, the head-mounted device 100 receives input from the user identifying an object and requesting navigation to that object. Based on the information identifying the object in the input (e.g., a label associated with the object), the head-mounted device 100 retrieves the stored location of the object in the local region model. For example, the head-mounted device 100 retrieves the coordinates of the object in the local region model stored in object index 225 that are associated with a label matching a label in the received input, or stored in association with an object identifier contained in the received input.
[0091] As mentioned above Figure 2Further described, the head-mounted device 100 determines its position within a local region and its corresponding position within a local region model. Based on its position within the local region model and the position of the stored object within the local region model, the head-mounted device determines a route from its position within the local region model to the position of the stored object within the local region model. To guide the user to the object, the head-mounted device 100 displays one or more messages that include at least a portion of the route to the stored object's position within the local region model. For example, the head-mounted device 100 displays prompts via one or more display elements 120 that guide the user to move in a specific direction or face a specific direction, wherein the prompts guide the user to move to reduce the distance between the head-mounted device 100's position within the local region model and the stored object's position within the local region model. This allows the head-mounted device 100 to use the previously determined location of an object in a local area to guide the user wearing the head-mounted device 100 to the object, thereby allowing the user to find the object more easily.
[0092] In various embodiments, when determining the route from the head-mounted device 100's location in a local region model to the stored location of an object in the local region model, the head-mounted device 100 considers the timeliness of the stored location of the object in the local region model. For example, the head-mounted device 100 stores confidence values associated with the object's location in the local region model and maintains a decay factor that decreases the confidence value over time. As the time interval between the current time and the time of storing the object's location in the local region model increases, the decay factor decreases the confidence value. In various embodiments, in response to a confidence value stored associated with the object's location in the local region model being less than a threshold confidence value, the head-mounted device 100 displays a message to a user via one or more display elements 120. This message may indicate that the head-mounted device 100 has decreased confidence in the following: the stored location of the head-mounted device 100 in the local region is the current location of the head-mounted device 100 in the local region.
[0093] Figure 4 This is an example of how a head-mounted device 100 determines the position of an object within a local region model of a local area surrounding the head-mounted device 100. Figure 4 An example local area 400 surrounding the head-mounted device 100 is shown. Figure 4In the example, local region 400 includes objects 405, 410, and 415. However, in other embodiments, local region 400 includes a different number of objects. Each of these objects is located within the field of view of one or more imaging devices 130 of the head-mounted device 100, which are positioned to acquire an image of local region 400. Local region 400 is visible to a user wearing head-mounted device 100, therefore Figure 4 A local region 400 is shown from the perspective of a user wearing a head-mounted device 100. At least one imaging device 130 of the head-mounted device 100 has a viewing angle of the local region 400 that matches the user's viewing angle of the local region 400, so that the imaging device 130 can acquire an image of the local region 400 including object 405, object 410 and object 415.
[0094] As mentioned above Figure 2 and Figure 3 Further described, the head-mounted device 100 detects one or more objects in an image of a local region 400. In some embodiments, an object detection module 200 included in the head-mounted device 100 detects objects; while in other embodiments, the object detection module 200 is a discrete component communicatively coupled to the head-mounted device 100, and receives an image of the local region 400 from the imaging device 130. One or more object detection models 205 are applied to the image of the local region 400 to detect objects 405, 410, and 415 in the image of the local region 400.
[0095] When object detection module 200 detects objects 405, 410, and 415, it determines a bounding box for each detected object. The bounding box of an object specifies the boundary of the region in the image that contains the object. Therefore, the region within the bounding box in the image contains the object, while the region outside the bounding box does not contain the object. Figure 4 In the example, bounding box 420 specifies the boundary of object 405, bounding box 425 specifies the boundary of object 410, and bounding box 430 specifies the boundary of object 415. Figure 4 As shown, different bounding boxes have different sizes, thus reflecting the different sizes of the corresponding objects.
[0096] In addition to detecting objects in the image of local region 400 and determining the bounding box of each detected object, the head-mounted device 100 (e.g., object detection module 200) also determines a local region model 435 based on depth information or measurement signals obtained from one or more depth sensors of the head-mounted device 100. The depth information is used to identify the distance between the head-mounted device 100 and one or more regions within local region 400 (e.g., object portions within local region 400). Therefore, the local region model 435 is a three-dimensional representation of local region 400. Figure 4 In the example, local region model 435 includes candidate regions corresponding to the positions of objects in local region 400, wherein these candidate regions are based on depth information about the local region from the one or more depth sensors. For example, candidate regions correspond to regions in local region 400 whose depth information differs from the depth information of neighboring regions in local region 400 by at least a threshold amount. Figure 4 In the example, the local region model 435 includes candidate regions 440, 445 and 450 based on depth information obtained from one or more depth sensors. Figure 4 Each of these candidate regions is based on the difference between the distance from the object to each of objects 405, 410, and 415 and the distance from the head-mounted device 100 to other parts of the local region 400. The size of each candidate region is determined based on depth information, so different candidate regions have different dimensions (e.g., length, width) than other candidate regions. Figure 4 As shown in the example.
[0097] Head-mounted device 100 (e.g., object detection module 200) determines the position of each of one or more objects detected in the image of local region 400 within local region model 435. (As described above...) Figure 2 and Figure 3 Further described, the head-mounted device 100 determines the position of the object in the local region model 435 based on the bounding box defined for the object and parameters of the imaging device 130 that acquires an image of the local region 400. Figure 4 In one example, head-mounted device 100 determines the position of object 410 in local region model 435. For example, head-mounted device 100 receives input from a user selecting object 410. In other examples, head-mounted device 100 selects each object detected in an image of local region 400 and determines the corresponding position of the object in local region model 435.
[0098] As mentioned above Figure 2 and Figure 3Further described, the head-mounted device 100 determines the position of object 410 in local region model 435 based on the dimensions of the bounding box 425 of object 410. The head-mounted device 100 determines the center of the bounding box 425 based on its dimensions. The head-mounted device 100 generates a ray 455 passing through the center of the bounding box 425 based on parameters of an image of local region 400 acquired by imaging device 130. In various embodiments, the ray 455 is perpendicular to the bounding box 425. When generating the ray 455 for the bounding box 425, parameters of imaging device 130 are taken into account, thereby generating the ray 455 in the coordinate system of local region model 435.
[0099] The head-mounted device 100 determines the location in a local region model 435 where ray 455 intersects with the local region model 435, based on ray 455. This location specifies the position of object 410 within the local region model 435. In various embodiments, the head-mounted device 100 selects candidate regions in the local region model 435 that intersect with ray 455 as the position of object 410 within the local region model 435. The head-mounted device 100 may determine the position of object 410 as a candidate region in the local region model 435 that is within a threshold distance of the location in the local region model 435 where ray 455 intersects with the local region model 435. Figure 4 In the example, head-mounted device 100 determines that ray 455 intersects with local region model 435 within candidate region 445, and therefore head-mounted device 100 determines that candidate region 445 is the location of object 410 within local region model 435. Head-mounted device 100 stores the location in local region model 435 in association with information identifying object 410, thereby allowing subsequent retrieval of the location of object 410 within local region model 435.
[0100] Having determined the position of object 410 within the local region model 435, head-mounted device 100 can display one or more interface elements to a user wearing head-mounted device 100 based on the position of object 410 within the local region model 435. Using the position of object 410 within the local region model allows one or more interface elements to be displayed close to object 410 via the display element 120 of head-mounted device 100. Figure 5 An example is shown of displaying interface element 505 by approaching an object in a local area via a head-mounted device 100.
[0101] As mentioned above Figure 4 To be further described, the local area 400 is visible to the user wearing the head-mounted device 100, therefore Figure 5 A partial region 400 is shown from the perspective of a user wearing the head-mounted device 100. Figure 5In the example shown, a user views objects 405, 410, and 415 in a local area 400 through a head-mounted device 100. For example, the head-mounted device 100 includes a transparent or translucent display element 120, thereby allowing the user to view the local area 400 while wearing the head-mounted device 100.
[0102] As mentioned above Figures 2 to 4 Further described, the head-mounted device 100 defines a local region model as follows: this local region model is a three-dimensional representation of a local region 400 surrounding the head-mounted device 100. Furthermore, as described above... Figures 2 to 4 Further described, the head-mounted device 100 determines the position of one or more objects within a local region model based on an image of the local region and depth information of the local region. The head-mounted device 100 utilizes the position of one or more objects within the local region model to determine the position of one or more interface elements displayed in the display element 120 while the user views the local region 400 through the head-mounted device 100. Figure 5 In the example, while the user views a local area 400 through a head-mounted device 100, the display element 120 of the head-mounted device 100 displays an interface element 505. The head-mounted device 100 positions the interface element 505 based on the position of the object 410 within the local area model, such that the interface element 505 is close to the object 410 when displayed to the user. For example, the interface element 505 includes information about the object 410 (e.g., an identifier for the object 410, user-provided characteristics of the object 410, etc.), therefore displaying the interface element 505 close to the object 410 allows the user to easily determine information about the object 410. Figure 5 In the example, interface element 505 is positioned within the local region model of local region 400, such that a portion of interface element 505 contacts a portion of bounding box 425, which specifies the boundary of object 410. (As described above...) Figures 2 to 4 Further described, when the head-mounted device 100 detects an object 410 in one or more images of the local region 400, it determines a bounding box 425. Alternatively, the interface element 505 is positioned in the local region model relative to the position of the object 410 in the local region model. For example, the position of a portion of the interface element 505 in the local region model has a specific offset from a portion of the bounding box 425 of the object 410 in the local region model, so the display position of the interface element 505 is determined based on the position of the object 410 in the local region model. Positioning the interface element 505 in the local region model based on the position of the object 410 in the local region model allows the interface element 505 to remain in a specific position relative to the object 410, thereby allowing the interface element 505 to be repositioned as the position of the object 410 changes.
[0103] Figure 6 The system 600 includes a head-mounted device 605 according to one or more embodiments. In some embodiments, the head-mounted device 605 may be... Figure 1A Head-mounted devices 100 or Figure 1B The head-mounted device 105. The system 600 can operate in artificial reality environments (e.g., virtual reality environments, augmented reality environments, mixed reality environments, or some combination thereof). Figure 6 The system 600 shown includes a head-mounted device 605, an input / output (I / O) interface 610 coupled to a console 615, a network 620, and a map building server 625. Although Figure 6 The illustrated example system 600 includes a head-mounted device 605 and an I / O interface 610; however, in other embodiments, system 600 may include any number of these components. For example, multiple head-mounted devices may be present, each having an associated I / O interface 610, wherein each head-mounted device and I / O interface 610 communicates with a console 615. In alternative configurations, system 600 may include different components and / or additional components. Furthermore, in some embodiments, [the following is a continuation of the previous sentence, but the translation is incomplete]. Figure 6 The functions described by one or more of the multiple components shown can be combined with Figure 6 The different ways in which they are described are distributed among the components. For example, some or all of the functions of the console 615 may be provided by the head-mounted device 605.
[0104] Head-mounted device 605 includes a display assembly 630, an optical component block 635, one or more position sensors 640, and a DCA 645. Some embodiments of head-mounted device 605 have [integration / combination / etc.]. Figure 6 These components are different components. Additionally, in other embodiments, they are combined... Figure 6 The functions provided by the various components described may be distributed differently among the components of the head-mounted device 605, or may be embodied in separate components far from the head-mounted device 605.
[0105] Display component 630 displays content to the user based on data received from console 615. Display component 630 uses one or more display elements (e.g., display element 120) to display content. Display elements may be, for example, electronic displays. In various embodiments, display component 630 includes a single display element or multiple display elements (e.g., one display for each of the user's eyes). Examples of electronic displays include: liquid crystal displays (LCDs), organic light-emitting diode (OLED) displays, active-matrix organic light-emitting diode (AMOLED) displays, waveguide displays, some other type of display, or some combination thereof. Note that in some embodiments, display element 120 may also include some or all of the functions of optical component block 635.
[0106] Optical component block 635 can amplify received image light from an electronic display, correct optical errors associated with the image light, and present corrected image light to one or both eye-correcting zones of head-mounted device 605. In various embodiments, optical component block 635 includes one or more optical elements. Example optical elements included in optical component block 635 include: apertures, Fresnel lenses, convex lenses, concave lenses, filters, reflective surfaces, or any other suitable optical elements that affect image light. Furthermore, optical component block 635 can include combinations of different optical elements. In some embodiments, one or more optical elements in optical component block 635 may have one or more coatings, such as partial reflective coatings or antireflective coatings.
[0107] The amplification and focusing of image light by the optical element block 635 allows the electronic display to be physically smaller, lighter, and consume less power compared to larger displays. Furthermore, the amplification increases the field of view of the content presented on the electronic display. For example, the field of view of the displayed content is such that almost the entire user's field of view (e.g., approximately 110 degrees diagonally) is used to present the displayed content, and in some cases, the entire user's field of view is used to present the displayed content. Additionally, in some embodiments, the amplification amount can be adjusted by adding or removing optical elements.
[0108] In some embodiments, the optical element block 635 may be designed to correct one or more types of optical errors. Examples of optical errors include barrel or pincushion distortion, longitudinal or lateral chromatic aberration. Other types of optical errors may include spherical aberration; chromatic aberration; or errors due to lens field curvature, astigmatism; or any other type of optical error. In some embodiments, the content provided to the electronic display for display is pre-distorted, and the optical element block 635 corrects this distortion when it receives image light from the electronic display (which is generated based on the content).
[0109] Position sensor 640 is an electronic device that generates data indicating the position of head-mounted device 605. Position sensor 640 generates one or more measurement signals in response to movement of head-mounted device 605. Position sensor 190 is an embodiment of position sensor 640. Examples of position sensor 640 include one or more IMUs, one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor for detecting motion, or some combination thereof. Position sensor 640 may include multiple accelerometers for measuring translational motion (forward / backward, up / down, left / right) and multiple gyroscopes for measuring rotational motion (e.g., pitch, yaw, roll). In some embodiments, the IMU rapidly samples the measurement signals and calculates an estimated position of head-mounted device 605 based on the sampled data. For example, the IMU integrates the measurement signals received from the accelerometers over time to estimate a velocity vector, and integrates the velocity vector over time to determine an estimated position of a reference point on head-mounted device 605. A reference point is a point that can be used to describe the position of the head-mounted device 605. Although a reference point can generally be defined as a point in space, this reference point is actually defined as a point within the head-mounted device 605.
[0110] The DCA 645 generates depth information for a portion of a local region. The DCA includes one or more imaging devices and a DCA controller. The DCA 645 may also include an illuminator. The operation and structure of the DCA 645 are described above. Figure 1A As described above. In various embodiments, the DCA 645 includes an object detection module 200, as described above in conjunction with... Figures 2 to 5 Further described, the object detection module 200 detects objects within a local area surrounding the head-mounted display 605 and determines the position of the objects in the local area model based on depth information generated by the DCA 645, thereby providing a three-dimensional representation of the local area to the head-mounted device 605. In various embodiments, the position of the objects in the local area model determines the position of the interface elements displayed to the user by the display component 630 in the local area model.
[0111] Audio system 650 provides audio content to a user of head-mounted device 605. Audio system 650 may include one or more acoustic sensors, one or more transducers, and an audio controller. Audio system 650 may provide spatialized audio content to the user. In some embodiments, audio system 650 may request acoustic parameters from map-building server 625 via network 620. The acoustic parameters describe one or more acoustic properties of a local area (e.g., room impulse response, reverberation time, reverberation level, etc.). Audio system 650 may provide, for example, information describing at least a portion of the local area from DCA 645 and / or location information of head-mounted device 605 from location sensor 640. Audio system 650 may use one or more acoustic parameters received from map-building server 625 to generate one or more sound filters and use said sound filters to provide audio content to the user.
[0112] I / O interface 610 is a device that allows a user to send action requests to console 615 and receive responses from console 615. An action request is a request to perform a specific action. For example, an action request may be an instruction to start or stop the acquisition of image or video data, or an instruction to perform a specific action within an application. I / O interface 610 may include one or more input devices. Example input devices include a keyboard, mouse, game controller, or any other suitable device for receiving action requests and transmitting them to console 615. Action requests received by I / O interface 610 are transmitted to console 615, which performs the action corresponding to the action request. In some embodiments, I / O interface 610 includes an IMU that acquires calibration data indicating an estimated position of I / O interface 610 relative to its initial position. In some embodiments, I / O interface 610 may provide haptic feedback to a user based on instructions received from console 615. For example, haptic feedback can be provided when an action request is received, or when the console 615 performs an action, the console 615 sends instructions to the I / O interface 610, thereby causing the I / O interface 610 to generate haptic feedback.
[0113] The console 615 provides content to the head-mounted device 605 for processing based on information received from one or more of the following: DCA 645, head-mounted device 605, and I / O interface 610. For example... Figure 6 As shown, console 615 includes application repository 655, tracing module 660, and engine 665. Some embodiments of console 615 have a combination with... Figure 6 These modules or components are different modules or components. Similarly, the functions further described below can be combined with... Figure 6The described methods are distributed among multiple components of the console 615 in different ways. In some embodiments, the functions of the console 615 discussed herein may be implemented in the head-mounted device 605 or a remote system.
[0114] Application repository 655 stores one or more applications for execution by console 615. An application is a set of instructions that, when executed by a processor, generate content to be presented to a user. The content generated by the application may respond to input received from the user via movement of head-mounted device 605 or I / O interface 610. Examples of applications include: game applications, conferencing applications, video playback applications, or other suitable applications.
[0115] Tracking module 660 uses information from DCA 645, one or more position sensors 640, or some combination thereof, to track the movement of head-mounted device 605 or I / O interface 610. For example, tracking module 660 determines the position of a reference point of head-mounted device 605 in a mapping of a local region based on information from head-mounted device 605. Tracking module 660 can also determine the position of an object or virtual object. Additionally, in some embodiments, tracking module 660 can use data portions from position sensors 640 indicating the position of head-mounted device 605 and a representation of the local region from DCA 645 to predict the future position of head-mounted device 605. Tracking module 660 provides engine 665 with the estimated or predicted future position of head-mounted device 605 or I / O interface 610.
[0116] Engine 665 executes the application and receives position information, acceleration information, velocity information, predicted future position, or a combination thereof from tracking module 660 of head-mounted device 605. Based on the received information, engine 665 determines the content to be presented to the user on head-mounted device 605. For example, if the received information indicates that the user has looked to the left, engine 665 generates content for head-mounted device 605 that reflects the user's movement within a virtual local area or a local area (enhanced with additional content). Additionally, in response to an action request received from I / O interface 610, engine 665 executes an in-application action on console 615 and provides feedback to the user that the action has been performed. The provided feedback may be visual or auditory feedback via head-mounted device 605, or haptic feedback via I / O interface 610.
[0117] Network 620 couples the head-mounted device 605 and / or console 615 to the map building server 625. Network 620 may include any combination of local area networks and / or wide area networks using wireless communication systems and / or wired communication systems. For example, network 620 may include the Internet and mobile phone networks. In some embodiments, network 620 uses standard communication technologies and / or protocols. Therefore, network 620 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G / 3G / 4G mobile communication protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, networking protocols used on Network 620 may include Multiprotocol Label Switching (MPLS), Transmission Control Protocol / Internet Protocol (TCP / IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), Simple Mail Transfer Protocol (SMTP), File Transfer Protocol (FTP), etc. Data exchanged through Network 620 may be represented using technologies and / or formats including binary image data (e.g., Portable Network Graphics (PNG)), Hypertext Markup Language (HTML), Extensible Markup Language (XML), etc.In addition, conventional encryption techniques can be used to encrypt all or some links, such as Secure Sockets Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), and Internet Protocol Security (IPsec).
[0118] Map building server 625 may include a database storing virtual models describing multiple spaces, wherein a location in the virtual model corresponds to the current configuration of a local area of head-mounted device 605. Map building server 625 receives information describing at least a portion of the local area and / or location information of the local area from head-mounted device 605 via network 620. Users can adjust privacy settings to allow or prevent head-mounted device 605 from sending information to map building server 625. Map building server 625 determines the location in the virtual model associated with the local area of head-mounted device 605 based on the received information and / or location information. Map building server 625 determines (e.g., retrieves) one or more acoustic parameters associated with the local area, in part based on the determined location in the virtual model and any acoustic parameters associated with the determined location. Map building server 625 may send the location of the local area and the values of any acoustic parameters associated with the local area to head-mounted device 605.
[0119] One or more components in system 600 may include a privacy module that stores one or more privacy settings for user data elements. The user data elements describe a user or head-mounted device 605. For example, a user data element may describe the user's physical characteristics, actions performed by the user, the user's location on head-mounted device 605, the location of head-mounted device 605, the user's HRTF, etc. The privacy settings (or "access settings") of the user data elements may be stored in any suitable manner, such as being stored in association with the user data element, stored in an index on an authorization server, stored in another suitable manner, or any suitable combination thereof.
[0120] Privacy settings for a user data element specify how the user data element (or specific information associated with the user data element) can be accessed, stored, or otherwise used (e.g., viewed, shared, modified, copied, performed, displayed, or identified). In some embodiments, privacy settings for a user data element may specify a “blacklist” of entities that may be denied access to certain information associated with the user data element. Privacy settings associated with a user data element may specify any appropriate granularity for allowing or denying access. For example, some entities may have the right to see if a specific user data element exists, some entities may have the right to see the content of a specific user data element, and some entities may have the right to modify a specific user data element. Privacy settings may allow a user to allow other entities to access or store the user data element for a limited period of time.
[0121] Privacy settings allow users to specify one or more geographic locations from which user data elements can be accessed. Access to or denial of access to user data elements can depend on the geographic location of the entity attempting to access the user data element. For example, a user can allow access to a user data element and specify that the user data element is only accessible to an entity while the user is in a specific location. If the user leaves that specific location, the user data element may no longer be accessible to that entity. As another example, a user can specify that a user data element is only accessible to entities within a threshold distance of the user (such as another user of a headset in the same local area as the user). If the user subsequently changes location, the entity with access to the user data element may lose access, while a new set of entities may gain access when they come within the threshold distance of the user.
[0122] System 600 may include one or more authorization / privacy servers for implementing privacy settings. A request from an entity for a specific user data element can identify the entity associated with the request, and if the authorization server determines, based on the privacy settings associated with the user data element, that the entity is authorized to access the user data element, it can send the user data element only to that entity. If the requesting entity is not authorized to access the user data element, the authorization server can prevent the requested user data element from being retrieved or from being sent to the entity. Although this disclosure describes implementing privacy settings in a particular manner, this disclosure contemplates implementing privacy settings in any suitable manner.
[0123] Additional configuration information The above description of embodiments has been presented for illustrative purposes and is not intended to be exhaustive or to limit the patent rights to the precise forms disclosed. Those skilled in the art will understand that many modifications and variations are possible in light of the foregoing disclosure.
[0124] Several embodiments of the algorithms and symbolic representations for the manipulation of information are described in parts of this specification. Those skilled in the art of data processing commonly use these algorithmic descriptions and representations to effectively communicate the substance of their work to others skilled in the art. Although these operations are described functionally, computationally, or logically, they should be understood as being implemented by computer programs or equivalent circuits, microcode, etc. Furthermore, without loss of generality, it has sometimes proven convenient to refer to the arrangement of these operations as modules. The described operations and their associated modules can be embodied in software, firmware, hardware, or any combination thereof.
[0125] Any step, operation, or process described herein may be performed or implemented individually or in combination with other devices using one or more hardware or software modules. In some embodiments, a software module is implemented using a computer program product comprising a computer-readable medium containing computer program code that can be executed by a computer processor to perform any or all of the steps, operations, or processes described herein.
[0126] The embodiments may also relate to an apparatus for performing the operations described herein. This apparatus may be specifically constructed for the desired purpose, and / or this apparatus may include a general-purpose computing device selectively activated or reconfigured by a computer program stored in a computer. Such a computer program may be stored in a non-transitory tangible computer-readable storage medium that can be coupled to a computer system bus, or in any type of medium suitable for storing electronic instructions. Furthermore, any computing system mentioned in this specification may include a single processor, or may employ an architecture employing a multiprocessor design to achieve increased computing power.
[0127] The embodiments may also relate to products generated by the computational processes described herein. Such products may include information generated by the computational processes, wherein the information is stored on a non-transitory tangible computer-readable storage medium, and such products may include any embodiment of the computer program product or other combination of data described herein.
[0128] Finally, the terminology used in this specification has been chosen primarily for readability and guidance purposes, and may not be intended to define or limit patent rights. Therefore, the scope of patent rights is not limited to this specific embodiment, but rather to any claims published in the application herein. Thus, the disclosure of the various embodiments is intended to illustrate, rather than limit, the scope of patent rights, which is set forth in the appended claims.
Claims
1. A method comprising: At a head-mounted device worn by a user, one or more images of a local area surrounding the head-mounted device are acquired by one or more imaging devices included in the head-mounted device; Detecting objects in the local region from an image of the local region acquired by an imaging device; Determine the bounding box of the object, the bounding box specifying the size of the region in the image that includes the object; A local region model of the local region is determined based on depth information generated by one or more depth sensors included in the head-mounted device, the local region model including a three-dimensional reconstruction of the local region; The position of the object in the local region model is determined based on the bounding box of the object and one or more parameters of the imaging device; as well as The location of the object in the local area is stored in association with information identifying the object.
2. The method according to claim 1, wherein, Determining the position of the object in the local region model based on the object's bounding box and one or more parameters of the imaging device includes: The center of the bounding box is determined based on the dimensions of the bounding box; Based on the parameters of the imaging device, rays are generated in the coordinate system of the local region model, and these rays intersect the center of the bounding box; and The position of the object in the local region model is determined as the position in the local region model where it intersects with the ray.
3. The method according to claim 2, wherein, The local region model includes one or more candidate regions, where the depth information of each candidate region differs from the depth information of its neighbors by at least a threshold amount. Determining the position of the object in the local region model as the position where it intersects with the ray includes: The position of the object in the local region model is determined as a candidate region in the local region model that intersects with the ray, or The position of the object in the local region model is determined as a candidate region in the local region model as follows: the candidate region is located within a threshold distance of the local region where the ray intersects with the ray in the local region model.
4. The method according to any of the preceding claims, wherein, Storing the location of the object in the local region in association with information identifying the object includes: The location of the object in the local region is stored in association with one or more tags obtained from the detection of the object.
5. The method according to claim 4, further comprising: The device receives input from the user identifying the object and requesting navigation to it. Retrieve the location of the stored object within the local region model; The current position of the head-mounted device in the local region model is determined based on the depth information; Generate a route from the current position of the head-mounted device in the local region model to the stored position of the object in the local region model; as well as At least a portion of the generated route is displayed to the user through one or more display elements.
6. The method according to claim 5, wherein, Generating a route from the current position of the head-mounted device in the local region model to the stored position of the object in the local region model includes: Retrieve a stored confidence value associated with the location of the stored object in the local region model, the confidence value being based on the time the object's location in the local region model was stored and a decay factor that decreases as time progresses between the current time and the stored time; and In response to the confidence value being lower than a threshold confidence value, a message is displayed to the user through the one or more display elements.
7. The method according to any of the preceding claims, further comprising: Interface elements are displayed to the user via one or more display elements of the head-mounted device, and these interface elements are displayed in the local region model at positions relative to the position of the object within the local region model. Optionally, where: The position of the interface element displayed in the local region model is determined based on an offset relative to a portion of the object's bounding box, and / or A portion of the interface element contacts a portion of the bounding box of the object in the local region model.
8. A head-mounted device, comprising: frame; One or more display elements coupled to the frame, each display element being configured to generate image light for presentation to a user; One or more imaging devices coupled to the frame, the one or more imaging devices being configured to acquire images of a local region surrounding the frame; A depth camera assembly configured to acquire depth information between the head-mounted device and multiple portions of the local region; as well as An object detection module, comprising a processor and a non-transitory computer-readable storage medium having instructions encoded thereon, which, when executed by the processor, cause the head-mounted device to: Detecting objects in the local region from an image of the local region acquired by an imaging device; Determine the bounding box of the object, the bounding box specifying the size of the region in the image that includes the object; A local region model of the local region is determined based on the depth information, and the local region model includes a three-dimensional reconstruction of the local region. The position of the object in the local region model is determined based on the bounding box of the object and one or more parameters of the imaging device; as well as The location of the object in the local area is stored in association with information identifying the object.
9. The head-mounted device according to claim 8, wherein, Determining the position of the object in the local region model based on the object's bounding box and one or more parameters of the imaging device includes: The center of the bounding box is determined based on the dimensions of the bounding box; Based on the parameters of the imaging device, rays are generated in the coordinate system of the local region model, and these rays intersect the center of the bounding box; and The position of the object in the local region model is determined as the position in the local region model where it intersects with the ray.
10. The head-mounted device according to claim 9, wherein, The local region model includes one or more candidate regions, where the depth information of each candidate region differs from the depth information of its neighbors by at least a threshold amount. Determining the position of the object in the local region model as the position where it intersects with the ray includes: The position of the object in the local region model is determined as a candidate region in the local region model that intersects with the ray, or The position of the object in the local region model is determined as a candidate region in the local region model as follows: the candidate region is located within a threshold distance of the local region where the ray intersects with the ray in the local region model.
11. The head-mounted device according to any one of claims 8 to 10, wherein, Storing the location of the object in the local region in association with information identifying the object includes: The location of the object in the local region is stored in association with one or more tags obtained from the detection of the object.
12. The head-mounted device according to any one of claims 8 to 11, wherein, The non-transitory computer-readable storage medium also has instructions encoded thereon that, when executed by the processor, cause the head-mounted device to perform the following operations: The device receives input from the user identifying the object and requesting navigation to it. Retrieve the location of the stored object within the local region model; The current position of the head-mounted device in the local region model is determined based on the depth information; Generate a route from the current position of the head-mounted device in the local region model to the stored position of the object in the local region model; as well as At least a portion of the generated route is displayed to the user through one or more display elements.
13. The head-mounted device according to claim 12, wherein, Generating a route from the current position of the head-mounted device in the local region model to the stored position of the object in the local region model includes: Retrieve a stored confidence value associated with the location of the stored object in the local region model, the confidence value being based on the time the object's location in the local region model was stored and a decay factor that decreases as time progresses between the current time and the stored time; and In response to the confidence value being lower than a threshold confidence value, a message is displayed to the user through the one or more display elements.
14. The head-mounted device according to any one of claims 8 to 13, wherein, The non-transitory computer-readable storage medium also has instructions encoded thereon that, when executed by the processor, cause the head-mounted device to perform the following operations: Interface elements are displayed to the user via one or more display elements of the head-mounted device, and these interface elements are displayed in the local region model at positions relative to the position of the object within the local region model. And optionally, wherein: The position of the interface element displayed in the local region model is determined based on an offset relative to a portion of the object's bounding box, and / or A portion of the interface element contacts a portion of the bounding box of the object in the local region model.
15. A non-transitory computer-readable storage medium having instructions encoded thereon, which, when executed by a processor, cause a head-mounted device to perform the method according to any one of claims 1 to 7.