A device, method, and graphical user interface for displaying the movement of virtual objects within a communication session.

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The computer system with advanced interfaces addresses inefficiencies in virtual and augmented reality interactions by using touch-sensitive and gaze-tracking technologies, enhancing user efficiency and reducing energy consumption.

JP2026520151APending Publication Date: 2026-06-22APPLE INC

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: APPLE INC
Filing Date: 2024-06-04
Publication Date: 2026-06-22

AI Technical Summary

Technical Problem

Existing methods for interacting with virtual and augmented reality environments are cumbersome, inefficient, and place a significant cognitive burden on users, often requiring multiple inputs and providing insufficient feedback, leading to wasted energy and potential errors, particularly in battery-powered devices.

Method used

The system employs a computer system with advanced interfaces that include touch-sensitive displays, cameras, eye-tracking, and hand-tracking components, providing intuitive interaction through reduced user inputs, enhanced visual feedback, and efficient energy use by optimizing user input and reducing the need for additional controls.

Benefits of technology

The system enhances user interaction efficiency, reduces errors, conserves power, and extends battery life by minimizing unnecessary inputs, while offering improved usability and a more immersive experience through realistic and diverse visual feedback.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026520151000001_ABST

Patent Text Reader

Abstract

This invention provides a device, method, and graphical user interface for displaying the movement of virtual objects within a communication session. [Solution] The computer system displays a representation of the user's pose in a three-dimensional environment according to the movement of the user's current viewpoint. The computer system displays different representations of the user's virtual representation's movement based on the type of virtual representation. The computer system reduces the visual prominence of virtual representations while changing the spatial arrangement of virtual objects shared in a communication session. The computer system displays different visual feedback while moving virtual objects depending on whether the virtual objects are shared or not in a communication session. The computer system displays visual feedback indicating audio provided by another user. The computer system displays feedback indicating that a participant corresponds to a position. The computer system displays a visual transition sequence when displaying the visual representations of participants in a communication session.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] (Cross - Reference to Related Applications) This application claims the benefit of U.S. Provisional Patent Application No. 63 / 578,962, filed on August 25, 2023; U.S. Provisional Patent Application No. 63 / 515,122, filed on July 23, 2023; U.S. Provisional Patent Application No. 63 / 506,119, filed on June 4, 2023; and U.S. Provisional Patent Application No. 63 / 506,115, filed on June 4, 2023, the contents of which are hereby incorporated by reference in their entirety for all purposes.

[0002] The present disclosure generally relates to computer systems that provide computer - generated experiences, including, but not limited to, electronic devices that provide virtual reality and mixed reality experiences via a display.

Background Art

[0003] The development of computer systems for augmented reality has increased significantly in recent years. Exemplary augmented reality environments include at least some virtual elements that replace or enhance the physical world. Input devices such as cameras, controllers, joysticks, touch - sensitive surfaces, and touch - screen displays for computer systems and other electronic computing devices are used to interact with virtual / augmented reality environments. Exemplary virtual elements include virtual objects such as digital images, videos, text, icons, and control elements such as buttons and other graphics.

Summary of the Invention

[0004] Some methods and interfaces for interacting with environments that include at least some virtual elements (e.g., applications, augmented reality environments, mixed reality environments, and virtual reality environments) are cumbersome, inefficient, and restrictive. For example, systems that provide insufficient feedback for performing actions associated with virtual objects, systems that require a series of inputs to achieve desired results in augmented reality environments, and systems where manipulating virtual objects is complex and error-prone impose a significant cognitive burden on the user and detract from the virtual / augmented reality experience. In addition, these methods are unnecessarily time-consuming, thereby wasting the energy of the computer system. This latter consideration is particularly important in battery-powered devices.

[0005] Therefore, there is a need for computer systems with improved methods and interfaces to provide users with computer-generated experiences that make interaction with the computer system more efficient and intuitive for the user. Such methods and interfaces can optionally complement or replace conventional methods of providing users with extended reality experiences. Such methods and interfaces reduce the number, extent, and / or types of user input by helping the user understand the connection between the inputs provided and the device response to those inputs, thereby generating a more efficient human-machine interface.

[0006] The above-mentioned drawbacks and other problems associated with the user interface of a computer system are mitigated or eliminated by the disclosed system. In some embodiments, the computer system is a desktop computer with an associated display. In some embodiments, the computer system is a portable device (e.g., a notebook computer, tablet computer, or handheld device). In some embodiments, the computer system is a personal electronic device (e.g., a wearable electronic device such as a wristwatch or a head-mounted device). In some embodiments, the computer system has a touchpad. In some embodiments, the computer system has one or more cameras. In some embodiments, the computer system has (e.g., includes or communicates with) a display generating component (e.g., a display device such as a head-mounted device (HMD), a display, a projector, a touch-sensitive display (also known as a “touchscreen” or “touchscreen display”), or other devices or components that present visual content to the user that is visible on or in the display generating component itself or generated from the display generating component and is visible elsewhere). In some embodiments, the computer system has one or more eye-tracking components. In some embodiments, the computer system has one or more hand-tracking components. In some embodiments, the computer system has one or more output devices in addition to the display generation components, and the output devices include one or more tactile output generators and / or one or more audio output devices. In some embodiments, the computer system has a graphical user interface (GUI), one or more processors, memory, and one or more modules, programs, or instruction sets stored in memory for performing multiple functions.In some embodiments, the user interacts with the GUI (and / or computer system) through stylus and / or finger touch and gestures on a touch-sensitive surface, the GUI as captured by a camera and other motion sensors, or the user's eye and hand movements in space relative to the user's body, and / or voice input as captured by one or more audio input devices. In some embodiments, the functions performed through the interaction optionally include image editing, drawing, presentation, word processing, spreadsheet creation, gameplay, making phone calls, video conferencing, sending emails, instant messaging, training support, digital photography, digital videography, web browsing, digital music playback, note-taking, and / or digital video playback. The executable instructions for performing those functions optionally include temporary computer-readable storage media and / or non-temporary computer-readable storage media, or other computer program products configured to be executed by one or more processors.

[0007] There is a need for electronic devices with improved methods and interfaces for interacting with three-dimensional environments. Such methods and interfaces can complement or replace conventional methods for interacting with three-dimensional environments. Such methods and interfaces reduce the number, degree, and / or type of user input, resulting in a more efficient human-machine interface. In the case of battery-operated computing devices, such methods and interfaces conserve power and increase the interval between battery charges.

[0008] In some embodiments, the computer system displays the user's virtual representation in one or more poses within a three-dimensional environment in response to the user's current viewpoint shift. In some embodiments, the computer system displays different representations of the user's virtual representation's movement based on whether the virtual representation is of a first type or a second type. In some embodiments, the computer system reduces the visual prominence of one or more virtual representations while changing the spatial arrangement of virtual objects shared in a communication session. In some embodiments, the computer system displays different visual feedback while moving virtual objects depending on whether the virtual objects are shared within a communication session or not. In some embodiments, the computer system displays visual feedback corresponding to audio provided by another user.

[0009] It should be noted that the various embodiments described herein can be combined with any other embodiments described herein. The features and advantages described herein are not exhaustive, and many additional features and advantages will become apparent to those skilled in the art, in particular, in light of the drawings, specification and claims. Furthermore, it should be noted that the language used herein has been selected solely for readability and explanatory purposes and not to define or limit the subject matter of the invention.

[0010] To better understand the various embodiments described, the following “Modes for Carrying Out the Invention” should be referenced in conjunction with the following drawings, and similar reference numbers throughout the following drawings refer to the corresponding parts. [Brief explanation of the drawing]

[0011] [Figure 1A] This block diagram shows the operating environment of a computer system for providing an XR experience, according to several embodiments.

[0012] [Figure 1B] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1C] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1D] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1E] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1F] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1G] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1H] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1I] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1J] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1K] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1L] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1M] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1N] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 10] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A. [Figure 1P] This is an example of a computer system for providing an XR experience in the operating environment shown in Figure 1A.

[0013] [Figure 2] A block diagram showing a controller of a computer system configured to manage and adjust an XR experience for a user, according to some embodiments.

[0014] [Figure 3] A block diagram showing a display generation component of a computer system configured to provide visual components of an XR experience to a user, according to some embodiments.

[0015] [Figure 4] A block diagram showing a hand tracking unit of a computer system configured to capture a user's gesture input, according to some embodiments.

[0016] [Figure 5] A block diagram showing an eye tracking unit of a computer system configured to capture a user's gaze input, according to some embodiments.

[0017] [Figure 6] A flowchart showing a glint-assisted gaze tracking pipeline, according to some embodiments.

[0018] [Figure 7A] Illustrative techniques for displaying movement of a user's virtual representation in different poses in response to movement of the user's current viewpoint are shown, according to some embodiments. [Figure 7B] Illustrative techniques for displaying movement of a user's virtual representation in different poses in response to movement of the user's current viewpoint are shown, according to some embodiments. [Figure 7C] Illustrative techniques for displaying movement of a user's virtual representation in different poses in response to movement of the user's current viewpoint are shown, according to some embodiments. [Figure 7D]This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7E] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7F] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7G] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7H] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7I] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7J] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7K] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7K1] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7L] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7L1] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7M] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7N] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7O] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7P] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7Q] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7R] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments. [Figure 7S] This document presents exemplary techniques for displaying the movement of a user's virtual representation in different poses in response to a shift in the user's current viewpoint, according to several embodiments.

[0019] [Figure 8] This flowchart illustrates an exemplary method, according to several embodiments, for displaying a virtual representation of a user in one or more poses within a three-dimensional environment, depending on the user's current viewpoint.

[0020] [Figure 9] This flowchart illustrates an exemplary method, according to several embodiments, for displaying different representations of the movement of a virtual representation based on whether the virtual representation is a first type of virtual representation or a second type of virtual representation.

[0021] [Figure 10A]This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10A1] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10B] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10C] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10D] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10E] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10F] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10G] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10H] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10I] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10J] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10K] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10L] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10M] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10N] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10O] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10P] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10Q] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10R] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10S] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10T] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10U] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10V] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10W] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10X] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10Y] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10Z] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments. [Figure 10AA] This document presents exemplary techniques for changing the spatial arrangement of virtual objects in a three-dimensional environment, according to several embodiments.

[0022] [Figure 11] This flowchart illustrates an exemplary method, according to several embodiments, for reducing the visual prominence of one or more virtual representations while changing the spatial arrangement of virtual objects shared within a communication session.

[0023] [Figure 12] This flowchart illustrates an exemplary method, according to several embodiments, of displaying different visual feedback while moving a virtual object, depending on whether the virtual object is shared or not within a communication session.

[0024] [Figure 13A] This document illustrates exemplary techniques for providing visual feedback indicating audio provided by participants in a communication session, according to several embodiments. [Figure 13B] This document illustrates exemplary techniques for providing visual feedback indicating audio provided by participants in a communication session, according to several embodiments. [Figure 13C] This document illustrates exemplary techniques for providing visual feedback indicating audio provided by participants in a communication session, according to several embodiments. [Figure 13D] This document illustrates exemplary techniques for providing visual feedback indicating audio provided by participants in a communication session, according to several embodiments. [Figure 13E] This document illustrates exemplary techniques for providing visual feedback indicating audio provided by participants in a communication session, according to several embodiments. [Figure 13F] This document illustrates exemplary techniques for providing visual feedback indicating audio provided by participants in a communication session, according to several embodiments.

[0025] [Figure 14] This flowchart illustrates an exemplary method, according to several embodiments, for displaying visual feedback indicating audio provided by participants in a communication session.

[0026] [Figure 15A] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15B] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15C] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15D] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15E-1] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15E-2] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15F] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15G] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15H] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15I] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15J] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15K] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15L] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments. [Figure 15M] Examples of computer systems that provide feedback indicating the spatial position of communication session participants are shown in several embodiments.

[0027] [Figure 16] This flowchart shows an exemplary method, according to several embodiments, for providing feedback indicating the spatial position of participants in a communication session.

[0028] [Figure 17A] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17B] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17C] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17D]This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17E] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17F] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17G] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17H] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments. [Figure 17I] This document presents exemplary techniques for facilitating the visual transition of spatial representations of participants in a communication session, based on several embodiments.

[0029] [Figure 18] This flowchart illustrates an exemplary method for displaying the visual transition of spatial representations of participants in a communication session, according to several embodiments. [Modes for carrying out the invention]

[0030] This disclosure relates to user interfaces that provide users with Extended Reality (XR) experiences, in several embodiments.

[0031] The systems, methods, and GUIs described herein improve user interface interactions with virtual / augmented reality environments in multiple ways.

[0032] In some embodiments, a first computer system associated with a first user displays a first virtual object in a first orientation representing the second user's current viewpoint relative to a three-dimensional environment, during a communication session with a second computer system associated with a second user, in a first orientation representing the second user's first viewpoint. In some embodiments, while displaying the first virtual object in the first orientation within the three-dimensional environment, the first computer system receives an indication from the second computer system corresponding to the second user's current viewpoint relative to the three-dimensional environment. In some embodiments, upon receiving the indication and according to a determination that the movement of the second user's current viewpoint from the first viewpoint to the second viewpoint satisfies one or more criteria, including a criterion that is met when the movement of the second user's current viewpoint exceeds a threshold relative to the three-dimensional environment, the first computer system displays the first virtual object in a second orientation different from the first orientation within the three-dimensional environment representing the second user's second viewpoint. In some embodiments, the movement of the second user's current viewpoint does not exceed a threshold for the three-dimensional environment, and therefore, according to the determination that the movement of the second user's current viewpoint does not satisfy one or more criteria, the first computer system maintains the display of the first virtual object in a first pose within the three-dimensional environment.

[0033] In some embodiments, during a communication session with a second computer system, the first computer system displays a virtual representation of the user's current viewpoint orientation in the third-dimensional environment at a first location in the third-dimensional environment. In some embodiments, while displaying the virtual representation at the first location, the first computer system receives an indication from the second computer system corresponding to the user's current viewpoint orientation in the third-dimensional environment. In some embodiments, upon receiving the indication, and according to a determination that the user's virtual representation in the second computer system is a first type of virtual representation, the first computer system displays a first representation of the movement of the user's virtual representation corresponding to the change in the user's current viewpoint orientation in the third-dimensional environment from a first orientation to a second orientation in the third-dimensional environment. In some embodiments, according to a determination that the user's virtual representation is a second type of virtual representation, different from the first type, the first computer system displays a second representation, different from the first representation, of the movement of the user's virtual representation corresponding to the change in the user's current viewpoint orientation in the third-dimensional environment from a first orientation to a second orientation in the third-dimensional environment.

[0034] In some embodiments, during a communication session with one or more computer systems, the first computer system displays a three-dimensional environment from a first viewpoint of a first user of the first computer system, and the three-dimensional environment includes one or more virtual objects, each containing one or more virtual representations of one or more users of the one or more computer systems. In some embodiments, while displaying the three-dimensional environment from the first viewpoint of the first user, the first computer system receives a first input corresponding to a request to change the spatial arrangement of one or more virtual objects in the three-dimensional environment from a first spatial arrangement to a second spatial arrangement relative to the first viewpoint of the first user. In some embodiments, while receiving the first input, the first computer system reduces the visual prominence of one or more virtual representations of one or more users and changes the spatial arrangement of the first virtual object relative to the first viewpoint of the first user according to the first input, while the one or more virtual representations of one or more users have reduced visual prominence relative to the three-dimensional environment.

[0035] In some embodiments, during a communication session with one or more computer systems, the first computer system displays a three-dimensional environment containing a first virtual object. In some embodiments, while displaying the three-dimensional environment containing the first virtual object at a first location relative to a first viewpoint of a first user of the first computer system, the first computer system detects a first input corresponding to a request to move the first virtual object from the first location to a second location different from the first location, relative to the first viewpoint of the first user in the three-dimensional environment. In some embodiments, while detecting the first input, according to a determination that the first virtual object is shared with one or more computer systems in the communication session, the first computer system displays first visual feedback in the three-dimensional environment while moving the first virtual object from the first location to the second location. In some embodiments, according to a determination that the first virtual object is not shared with one or more computer systems in the communication session, the first computer system displays second visual feedback different from the first visual feedback in the three-dimensional environment while moving the first virtual object from the first location to the second location.

[0036] In some embodiments, a computer system displays a visual representation of another user on another computer system while the computer system is engaged in a communication session. In some embodiments, the computer system obtains information from other computer systems, etc. In some embodiments, according to a determination that one or more criteria are met, the computer system maintains the display of the other user's visual representation and displays visual feedback corresponding to audio obtained from the other user. In some embodiments, the visual appearance of the visual feedback is modified according to the spatial relationship between the other user's current direction of attention and the current orientation of the other user's visual representation. In some embodiments, the computer system moves the other user's visual representation according to information obtained from the other user.

[0037] In some embodiments, the computer system generates feedback indicating the position of a visual representation of another user in another computer system while the computer system is engaged in a communication session. In some embodiments, the computer system displays a simulated glow effect indicating the relative position of other users, which may be referred to herein as participants in a communication session. In some embodiments, the computer system additionally or alternatively generates audio that mimics the effect of a physical audio source playing audio, thus giving the audio spatial quality to the user's viewpoint in a three-dimensional environment. In some embodiments, the computer system plays one or more tones contained in such audio. In some embodiments, the computer system plays a series of sounds to indicate that multiple participants correspond to positions in a three-dimensional environment. In some embodiments, the simulated position of the audio source corresponds to the position of a participant when the position is not in the computer system's viewport. In some embodiments, the simulated position of the audio source corresponds to a region to which the position corresponds, and the region is defined with respect to the user's viewpoint. In some embodiments, the audio is played regardless of whether the position is in the user's viewport or not. In some embodiments, the computer system generates non-localization audio indicating that one or more representations of participants will be included in and / or will no longer be included in a three-dimensional environment. In some embodiments, the computer system refrains from providing separate feedback for events associated with different participants, based on a determination that similar feedback has been presented relatively recently.

[0038] In some embodiments, a computer system displays a visual representation of another user on another computer system while the computer system is engaged in a communication session. In some embodiments, the visual representation is visually transitioned to a three-dimensional representation according to a transition sequence that includes initially displaying the visual representation according to a low-fidelity visual model when it is first displayed by the computer, and gradually transitioning the visual representation to be displayed according to a high-fidelity visual model. In some embodiments, both the low-fidelity and high-fidelity visual models are configured to provide the user with a visual indication of the status of the visual representation. For example, the low-fidelity visual model includes displaying the visual representation with noise and colors selected from a given color palette to indicate that the visual representation has not been fully rendered (for example, due to the computer system still acquiring information about the visual representation). In some embodiments, the high-fidelity visual model includes displaying the visual representation according to one or more images associated with the participant that the visual representation is supposed to represent. For example, the high-fidelity representation may include portions of the visual representation that resemble the participant associated with the visual representation.

[0039] Figures 1A to 6 provide a description of exemplary computer systems for providing an XR experience to a user (as described below with respect to methods 800, 900, 1100, 1200, 1400, 1600, and / or 1800). Figures 7A to 7S illustrate exemplary techniques, by several embodiments, for displaying the movement of a user's virtual representation in different poses in response to a movement of the user's current viewpoint. Figure 8 is a flowchart of an exemplary method, by several embodiments, for displaying a user's virtual representation in one or more poses in a three-dimensional environment in response to a movement of the user's current viewpoint. The user interfaces in Figures 7A to 7S are used to illustrate the process in Figure 8. Figure 9 is a flowchart of an exemplary method, by several embodiments, for displaying different representations of the movement of a virtual representation based on whether the virtual representation is a first type virtual representation or a second type virtual representation. The user interfaces in Figures 7A to 7S are used to illustrate the process in Figure 9. Figures 10A to 10AA illustrate exemplary techniques, by several embodiments, for changing the spatial arrangement of virtual objects in a three-dimensional environment. Figure 11 is a flowchart illustrating exemplary methods, according to several embodiments, for reducing the visual prominence of one or more virtual representations while changing the spatial arrangement of virtual objects shared within a communication session. The user interfaces in Figures 10A to 10AA are used to illustrate the process in Figure 11. Figure 12 is a flowchart illustrating exemplary methods, according to several embodiments, for displaying different visual feedback while moving virtual objects depending on whether the virtual objects are shared or not within a communication session. The user interfaces in Figures 10A to 10AA are used to illustrate the process in Figure 12. Figures 13A to 13F illustrate exemplary techniques, according to several embodiments, for providing visual feedback indicating audio provided by participants in a communication session. Figure 14 is a flowchart illustrating exemplary methods, according to several embodiments, for displaying visual feedback indicating audio provided by participants in a communication session.The user interfaces in Figures 13A to 13F are used to illustrate the process in Figure 14. Figures 15A to 15M show examples of computer systems that provide feedback indicating the spatial position of communication session participants, according to several embodiments. Figure 16 is a flowchart illustrating an exemplary method of providing feedback indicating the spatial position of communication session participants, according to several embodiments. The user interfaces in Figures 15A to 15M are used to illustrate the process in Figure 16. Figures 17A to 17I show exemplary techniques for transitioning to visually appearing and disappearing in the spatial representation of participants in a video communication session, according to several embodiments. Figure 18 is a flowchart illustrating an exemplary method of visually transitioning the spatial representation of participants in a video communication session, according to several embodiments. The user interfaces in Figures 17A to 17I are used to illustrate the process in Figure 18.

[0040] The processes described below enhance the usability of the device and make the user device interface more efficient (for example, by helping the user provide appropriate input and reducing user errors when operating / interacting with the device) through various technologies, including providing the user with improved visual feedback, reducing the number of inputs required to perform actions, providing additional control options without cluttering the user interface with additional displayed controls, performing actions without requiring further user input when a set of conditions is met, improving privacy and / or security, providing a more diverse, detailed, and / or realistic user experience while saving memory space, and / or additional technologies. These technologies also reduce power consumption and improve the battery life of the device by enabling the user to use the device more quickly and efficiently. Saving battery power, and therefore weight, improves the ergonomics of the device. These technologies also enable real-time communication, allow the use of fewer and / or less accurate sensors, resulting in more compact, lighter, and less expensive devices, and enabling the device to be used in a variety of lighting conditions. These technologies reduce energy consumption and thereby reduce the heat emitted by the device, which is especially important for wearable devices that can become uncomfortable for the user to wear if they generate excessive heat, even if the device is well within the operating parameters for its components.

[0041] Furthermore, in any method described herein that is conditional on one or more conditions being met in one or more steps, it should be understood that the method described can be repeated in multiple iterations such that all the conditions that the steps of the method are conditional on are met in different iterations of the method. For example, if a method requires that a first step be performed if a condition is met, and a second step be performed if the condition is not met, a person skilled in the art will understand that the steps described in the claim are repeated in an unspecified order until the conditions are met and then not met. Thus, a method described in one or more steps that depends on one or more conditions being met can be rewritten as a method that is repeated until each of the conditions described in the method is met. However, this is not required for a claim of a system or computer-readable medium that includes instructions that perform a conditional action based on the satisfaction of the corresponding one or more conditions, and thus can determine whether a contingency has been met without explicitly repeating the steps of the method until all the conditions that the steps of the method are conditional on are met. Those skilled in the art will also understand that, as with a method having conditional steps, a system or computer-readable storage medium may repeat the steps of the method as many times as necessary to ensure that all of the conditional steps have been performed.

[0042] In some embodiments, as shown in Figure 1A, the XR experience is provided to the user via an operating environment 100 which includes a computer system 101. The computer system 101 includes a controller 110 (e.g., a processor of a portable electronic device or remote server), display generation components 120 (e.g., a head-mounted device (HMD), a display, a projector, a touchscreen, etc.), one or more input devices 125 (e.g., an eye-tracking device 130, a hand-tracking device 140, other input devices 150), one or more output devices 155 (e.g., a speaker 160, a tactile output generator 170, and other output devices 180), one or more sensors 190 (e.g., an image sensor, a light sensor, a depth sensor, a tactile sensor, an orientation sensor, a proximity sensor, a temperature sensor, a location sensor, a motion sensor, a velocity sensor, etc.), and optionally one or more peripheral devices 195 (e.g., home appliances, wearable devices, etc.). In some embodiments, one or more of the input device 125, output device 155, sensor 190, and peripheral device 195 are integrated with the display generation component 120 (for example, within a head-mounted device or handheld device).

[0043] When describing an XR experience, various terms are used to refer individually to several related but distinct environments that the user can perceive and / or interact with (for example, using inputs detected by the computer system 101 that generates the XR experience, causing the computer system generating the XR experience to generate audio, visual, and / or haptic feedback corresponding to various inputs provided to the computer system 101). The following is a subset of these terms.

[0044] Physical Environment: The physical environment refers to the physical world that people can perceive and / or interact with without the help of electronic systems. Examples of physical environments, such as a physical park, include physical objects such as physical trees, physical buildings, and physical people. People can directly perceive and / or interact with the physical environment through their senses of sight, touch, hearing, taste, and smell.

[0045] Extended reality: In contrast, an extended reality (XR) environment refers to a fully or partially simulated environment that people perceive and / or interact with through an electronic system. In XR, a subset of a person's bodily movements or their representation is tracked, and accordingly, one or more properties of one or more virtual objects simulated within the XR environment are adjusted to behave according to at least one law of physics. For example, an XR system may detect a person's head rotation and, accordingly, adjust the graphical content and sound field presented to the person in a similar way to how such views and sounds would change in a physical environment. In some situations (e.g., for reasons of accessibility), adjustments to the properties of one or more virtual objects within the XR environment may be made in response to a representation of physical movement (e.g., a voice command). A person may perceive and / or interact with an XR object using any one of their senses, including sight, sound, touch, taste, and smell. For example, a person can perceive and / or interact with audio objects that create a 3D or spatial audio environment, providing the perception of point audio sources in 3D space. In another example, audio objects may enable audio transparency, selectively incorporating ambient sounds from the physical environment, with or without computer-generated audio. In some XR environments, a person may perceive and / or interact with only audio objects.

[0046] Examples of XR include virtual reality and mixed reality.

[0047] Virtual reality: A virtual reality (VR) environment refers to a simulated environment designed to be entirely based on computer-generated sensory input for one or more senses. A VR environment includes multiple virtual objects that a person can perceive and / or interact with. For example, computer-generated images of trees, buildings, and avatars representing people are examples of virtual objects. A person can perceive and / or interact with virtual objects in a VR environment through a simulation of their presence within the computer-generated environment and / or through a simulation of a subset of their physical movement within the computer-generated environment.

[0048] Mixed Reality: A mixed reality (MR) environment is a simulated environment designed to incorporate sensory input or its representation from a physical environment, in addition to including computer-generated sensory input (e.g., virtual objects), in contrast to a virtual reality (VR) environment designed to rely entirely on computer-generated sensory input. On a virtual continuum, a mixed reality environment is any location between, but not encompassing, the complete physical environment at one end and the virtual reality environment at the other. In some MR environments, computer-generated sensory input may respond to changes in sensory input from the physical environment. Also, some electronic systems for presenting an MR environment may track location and / or orientation relative to the physical environment to enable virtual objects to interact with real objects (i.e., physical articles or their representations from the physical environment). For example, the system may take movement into account so that a virtual tree appears stationary relative to the physical ground.

[0049] Examples of mixed reality include augmented reality and augmented virtual reality.

[0050] Augmented Reality: An augmented reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed on or onto a physical environment. For example, an electronic system for presenting an AR environment may have a transparent or translucent display that allows a person to directly view the physical environment. The system may also be configured to present virtual objects on the transparent or translucent display, thereby allowing a person to use the system to perceive the virtual objects superimposed on the physical environment. Alternatively, the system may have an opaque display and one or more imaging sensors that capture an image or video of the physical environment, which is a representation of the physical environment. The system composites the image or video with the virtual objects and presents the composite on the opaque display. A person uses this system to perceive the virtual objects superimposed on the physical environment by indirectly viewing the physical environment through the image or video of the physical environment. As used herein, a video of the physical environment displayed on an opaque display is referred to as “pass-through video,” meaning that the system uses one or more image sensors to capture images of the physical environment and uses those images when presenting the AR environment on the opaque display. Alternatively, the system may have a projection system that projects virtual objects, for example, as holograms, into or onto the physical environment, thereby allowing a person to perceive the virtual objects superimposed on the physical environment using the system. An augmented reality environment also refers to a simulated environment in which the representation of the physical environment is transformed by computer-generated sensory information. For example, when providing pass-through video, the system may transform one or more sensor images to plane a selected perspective (e.g., viewpoint) different from the perspective captured by the imaging sensor. As another example, the representation of the physical environment may be transformed by graphically modifying (e.g., enlarging) a portion of it, thereby making the modified portion a non-photorealistic altered version of the original captured image. As yet another example, the representation of the physical environment may be transformed by graphically removing or obscuring a portion of it.

[0051] Augmented Virtuality (AV) refers to a simulated environment in which a virtual or computer-generated environment incorporates one or more sensory inputs from a physical environment. These sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park might have virtual trees and buildings, but people with faces might be realistically reproduced from images of real people. Another example is that a virtual object might adopt the shape or color of a physical article captured by one or more imaging sensors. A further example is that a virtual object might adopt shadows that correspond to the position of the sun in the physical environment.

[0052] In augmented reality, mixed reality, or virtual reality environments, a view of a three-dimensional environment is visible to the user. Typically, the view of the three-dimensional environment is visible to the user through one or more display-generating components (e.g., a display or a pair of display modules providing stereoscopic content to different eyes of the same user) via a virtual viewport having a viewport boundary that defines the extent of the three-dimensional environment visible to the user through one or more display-generating components. In some embodiments, the area defined by the viewport boundary is smaller than the user's field of view in one or more dimensions (e.g., based on the user's field of view, the size, optical properties, or other physical characteristics of one or more display-generating components, and / or the location and / or orientation of one or more display-generating components relative to the user's eyes). In some embodiments, the area defined by the viewport boundary is larger than the user's field of view in one or more dimensions (e.g., based on the user's field of view, the size, optical properties, or other physical characteristics of one or more display-generating components, and / or the location and / or orientation of one or more display-generating components relative to the user's eyes). Viewports and viewport boundaries typically move as one or more display-generating components move (for example, with the user's head in the case of a head-mounted device, or with the user's hand in the case of a handheld device such as a tablet or smartphone). The user's viewpoint determines which content is visible within the viewport, and the viewpoint generally specifies the location and orientation of the three-dimensional environment, so that as the viewpoint shifts, the view of the three-dimensional environment also shifts within the viewport. In the case of head-mounted devices, the viewpoint is typically based on the location and orientation of the user's head, face, and / or eyes to provide a view of the three-dimensional environment that is perceptually accurate and provides an immersive experience when the user is using the head-mounted device.For handheld or stationary devices, the viewpoint shifts as the handheld or stationary device moves and / or as the user's position relative to the handheld or stationary device changes (e.g., as the user moves toward or away from the device, above or below the device, to the right of the device, and / or to the left of the device). For devices that include display-generating components with virtual passthrough, the portion of the physical environment visible (e.g., displayed and / or projected) through one or more display-generating components is based on the field of view of one or more cameras communicating with the display-generating components, which typically moves with the display-generating components (e.g., moves with the user's head in a head-mounted device, or moves with the user's hand in a handheld device such as a tablet or smartphone), because the user's viewpoint moves as the field of view of one or more cameras moves (and the appearance of one or more virtual objects displayed through one or more display-generating components is updated based on the user's viewpoint (e.g., the displayed position and orientation of the virtual objects are updated based on the user's viewpoint)). In the case of a display generation component with optical passthrough, parts of the physical environment that are visible through one or more display generation components (for example, optically visible through one or more partially or completely transparent parts of the display generation component) are based on the user's field of view through the partially or completely transparent parts of the display generation component (for example, moving with the user's head in the case of a head-mounted device, or moving with the user's hand in the case of a handheld device such as a tablet or smartphone), because the user's viewpoint moves as the user's field of view moves through the partially or completely transparent parts of the display generation component (one or more), and the appearance of one or more virtual objects is updated based on the user's viewpoint.

[0053] In some embodiments, the representation of the physical environment (e.g., displayed via virtual passthrough or optical passthrough) can be partially or completely obscured by the virtual environment. In some embodiments, the amount of the virtual environment displayed (e.g., the amount of the physical environment not displayed) is based on the level of immersion of the virtual environment (e.g., relative to the representation of the physical environment). For example, increasing the immersion level optionally displays more of the virtual environment and replaces and / or obscures more of the physical environment, while decreasing the immersion level optionally displays less of the virtual environment and reveals portions of the physical environment that were not previously displayed and / or obscured. In some embodiments, at a certain level of immersion, one or more first background objects (e.g., in the representation of the physical environment) are visually less emphasized than one or more second background objects (e.g., dimmed, blurred, and / or displayed with increased transparency), and one or more third background objects are discontinued from being displayed. In some embodiments, the immersion level includes the relevant degree to which the virtual content displayed by the computer system (e.g., a virtual environment and / or virtual content) obscures the background content surrounding / behind the virtual content (e.g., content other than the virtual environment and / or virtual content), and optionally includes the number of items of the background content displayed and / or the visual characteristics of the background content on which it is displayed (e.g., color, contrast, and / or opacity), the angular range of the virtual content displayed through the display-generating components (e.g., 60-degree content displayed at low immersion, 120-degree content displayed at medium immersion, or 180-degree content displayed at high immersion), and / or the percentage of the field of view displayed through the display-generating components consumed by the virtual content (e.g., 33% of the field of view consumed by the virtual content at low immersion, 66% of the field of view consumed by the virtual content at medium immersion, or 100% of the field of view consumed by the virtual content at high immersion). In some embodiments, the background content is included in the background on which the virtual content is displayed (e.g., background content within a representation of a physical environment).In some embodiments, background content includes a user interface (e.g., a user interface generated by a computer system corresponding to the application), virtual objects not associated with or included in the virtual environment and / or virtual content (e.g., files or representations of other users generated by the computer system), and / or real objects (e.g., pass-through objects representing real objects in the physical environment around the user, which are visible so as to be displayed through the display generation components and / or are visible through transparent or translucent components of the display generation components so as not to obscure / hinder their visibility through the display generation components by the computer system). In some embodiments, at low immersion levels (e.g., a first immersion level), the background, virtual and / or real objects are displayed in a non-obscuring manner. For example, a low-immersion virtual environment is optionally displayed simultaneously with the background content, and the background content is optionally displayed with full brightness, color, and / or transparency. In some embodiments, at higher immersion levels (e.g., a second immersion level higher than a first immersion level), backgrounds, virtual and / or real objects are displayed in an obscured manner (e.g., dimmed, blurred, or removed from the display). For example, a separate virtual environment with a high immersion level is displayed without simultaneously displaying background content (e.g., in full-screen or fully immersive mode). As another example, a virtual environment displayed at an intermediate immersion level is displayed simultaneously with background content that is dimmed, blurred, or otherwise de-emphasized. In some embodiments, the visual characteristics of background objects differ among them. For example, at a particular immersion level, one or more first background objects are visually de-emphasized more than one or more second background objects (e.g., dimmed, blurred, and / or displayed with increased transparency), and one or more third background objects are not displayed at all.In some embodiments, a null or zero immersion level corresponds to the discontinuation of the display of the virtual environment, and instead, the representation of the physical environment is displayed (optionally together with one or more virtual objects such as applications, windows, or virtual three-dimensional objects) without the representation of the physical environment being obscured by the virtual environment. Adjusting the immersion level using physical input elements provides a quick and efficient way to adjust immersion, improving the usability of the computer system and making the user-device interface more efficient.

[0054] Viewpoint-locked virtual objects: A virtual object is viewpoint-locked when the computer system displays the virtual object in the same location and / or position within the user's view, even if the user's viewpoint shifts (e.g., changes). In embodiments where the computer system is a head-mounted device, the user's viewpoint is locked in the forward direction of the user's head (e.g., the user's viewpoint is at least a portion of the user's field of view when the user is looking straight ahead). Thus, the user's viewpoint remains fixed even if the user's gaze shifts, without moving the user's head. In embodiments where the computer system has a display-generating component (e.g., a display screen) that can be repositioned relative to the user's head, the user's viewpoint is the augmented reality view presented to the user on the display-generating component of the computer system. For example, a viewpoint-locked virtual object displayed in the upper-left corner of the user's viewpoint when the user's viewpoint is in a first orientation (e.g., the user's head is facing north) will continue to be displayed in the upper-left corner of the user's viewpoint even if the user's viewpoint changes to a second orientation (e.g., the user's head is facing west). In other words, the location and / or position in which a viewpoint-locked virtual object is displayed from the user's viewpoint is independent of the user's position and / or orientation in the physical environment. In embodiments where the computer system is a head-mounted device, the user's viewpoint is locked to the orientation of the user's head, so that the virtual object is also referred to as a "head-locked virtual object."

[0055] Environment-Locked Virtual Objects: A virtual object is environment-locked (or "world-locked") when a computer system displays it at a location and / or position in the user's viewpoint that is based on (e.g., selected by reference to and / or fixed to) a location and / or object in a three-dimensional environment (e.g., a physical or virtual environment). As the user's viewpoint shifts, the location and / or object in the environment relative to the user's viewpoint changes, and as a result, the environment-locked virtual object will appear at a different location and / or position in the user's viewpoint. For example, an environment-locked virtual object locked to a tree directly in front of the user will appear centered in the user's viewpoint. If the user's viewpoint shifts to the right (e.g., the user's head is turned to the right) and the tree becomes left-leaning in the user's viewpoint (e.g., the tree's position in the user's viewpoint shifts), the environment-locked virtual object locked to the tree will appear left-leaning in the user's viewpoint. In other words, the location and / or position in which an environment-locked virtual object is displayed in the user's viewpoint depends on the location and / or object's position and / or orientation in the environment to which the virtual object is locked. In some embodiments, the computer system uses a stationary reference frame (e.g., a fixed location in the physical environment and / or a coordinate system fixed to an object) to determine the position in which the environment-locked virtual object is displayed from the user's viewpoint. The environment-locked virtual object can be locked to a stationary part of the environment (e.g., a floor, wall, table, or other stationary object) or to a moving part of the environment (e.g., a vehicle, animal, person, or a representation of a part of the user's body that moves independently of the user's viewpoint, such as the user's hands, wrists, arms, or feet), so that the virtual object moves as the viewpoint or the part of the environment moves in order to maintain a fixed relationship between the virtual object and the part of the environment.

[0056] In some embodiments, an environment-locked or viewpoint-locked virtual object exhibits delayed tracking behavior, reducing or delaying its movement in response to the movement of a reference point that the virtual object is following. In some embodiments, when exhibiting delayed tracking behavior, the computer system detects movement of the reference point that the virtual object is following (e.g., a part of the environment, a viewpoint, or a point fixed to the viewpoint, such as a point between 5 and 300 cm from the viewpoint) and intentionally delays the movement of the virtual object. For example, when the reference point (e.g., a part of the environment or the viewpoint) moves at a first velocity, the virtual object is moved by the device so as to remain locked to the reference point, but at a second velocity slower than the first velocity (e.g., the virtual object begins to catch up to the reference point until the reference point stops or slows down). In some embodiments, when a virtual object exhibits delayed tracking behavior, the device ignores small movements of the reference point (e.g., ignoring movements of the reference point that are below a threshold movement amount, such as a movement of 0 to 5 degrees or a movement of 0 to 50 cm). For example, when the reference point (e.g., the part of the environment or viewpoint from which the virtual object is locked) moves by a first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is displayed to maintain a fixed or substantially fixed position relative to a different viewpoint or part of the environment from which the virtual object is locked), and when the reference point (e.g., the part of the environment or viewpoint from which the virtual object is locked) moves by a second amount greater than the first amount, the distance between the reference point and the virtual object first increases (e.g., because the virtual object is displayed to maintain a fixed or substantially fixed position relative to a different viewpoint or part of the environment from which the virtual object is locked), and then decreases as the amount of movement of the reference point increases beyond a threshold (e.g., a "delayed tracking" threshold) as the virtual object is moved by the computer system to maintain a fixed or substantially fixed position relative to the reference point.In some embodiments, a virtual object that maintains a substantially fixed position with respect to a reference point includes the virtual object being displayed within a threshold distance (e.g., 1, 2, 3, 5, 15, 20, 50 cm) of the reference point in one or more dimensions (e.g., above / below, left / right, and / or forward / behind the position of the reference point).

[0057] Hardware: There are many different types of electronic systems that enable a person to perceive and / or interact with various XR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields with integrated display capabilities, windows with integrated display capabilities, displays formed as lenses designed to be positioned over a person's eyes (e.g., contact lenses), headphones / earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop / laptop computers. A head-mounted system may have one or more speakers and an integrated opaque display. Alternatively, a head-mounted system may be configured to accept an external opaque display (e.g., a smartphone). A head-mounted system may incorporate one or more imaging sensors for capturing images or videos of the physical environment and / or one or more microphones for capturing audio of the physical environment. A head-mounted system may have a transparent or translucent display instead of an opaque display. A transparent or translucent display may have a medium through which light representing an image is directed to a person's eye. The display may utilize digital light projection, OLED, LED, uLED, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium may be an optical waveguide, a holographic medium, an optical coupler, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to be selectively opaque. The projection-based system may employ retinal projection technology to project a graphical image onto a person's retina. The projection system may also be configured to project virtual objects into the physical environment, for example, as a hologram or onto a physical surface. In some embodiments, the controller 110 is configured to manage and adjust the XR experience for the user.In some embodiments, the controller 110 includes a preferred combination of software, firmware, and / or hardware. The controller 110 is described in more detail below with reference to Figure 2. In some embodiments, the controller 110 is a computing device that is local or remote to the scene 105 (e.g., the physical environment). For example, the controller 110 is a local server located within the scene 105. In another example, the controller 110 is a remote server located outside the scene 105 (e.g., a cloud server, a central server, etc.). In some embodiments, the controller 110 is communicably coupled to a display generation component 120 (e.g., an HMD, a display, a projector, a touchscreen, etc.) via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH®, IEEE802.11x, IEEE802.16x, IEEE802.3x, etc.). In another example, the controller 110 is contained within a housing (e.g., a physical housing) of one or more of the display generation components 120 (e.g., a portable electronic device including a display and one or more processors), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and / or peripheral devices 195, or shares the same physical housing or support structure as one or more of the above.

[0058] In some embodiments, the display generation component 120 is configured to provide the user with an XR experience (e.g., at least the visual components of the XR experience). In some embodiments, the display generation component 120 includes a preferred combination of software, firmware, and / or hardware. The display generation component 120 is described in more detail below with reference to Figure 3. In some embodiments, the functions of the controller 110 are provided by and / or combined with the display generation component 120.

[0059] According to some embodiments, the display generation component 120 provides the user with an XR experience while the user is virtually and / or physically present in the scene 105.

[0060] In some embodiments, the display generation component is mounted on a part of the user's body (e.g., their head or hand). Thus, the display generation component 120 includes one or more XR displays provided for displaying XR content. For example, in various embodiments, the display generation component 120 surrounds the user's field of view. In some embodiments, the display generation component 120 is a handheld device (such as a smartphone or tablet) configured to present XR content, and the user holds the device, which has a display directed towards the user's field of view and a camera directed towards scene 105. In some embodiments, the handheld device is optionally placed in a housing mounted on the user's head. In some embodiments, the handheld device is optionally placed on a support in front of the user (e.g., a tripod). In some embodiments, the display generation component 120 is an XR chamber, housing, or room configured to present XR content when the user is not mounting or holding the display generation component 120. Many user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) may be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interaction with XR content triggered based on interaction occurring in the space in front of a handheld or tripod-mounted device may be implemented similarly to an HMD where the interaction occurs in the space in front of the HMD and the XR content response is displayed through the HMD. Similarly, a user interface showing interaction with XR content triggered based on the movement of a handheld or tripod-mounted device relative to a physical environment (e.g., Scene 105 or a part of the user's body (e.g., the user's eyes, head, or hands)) may be implemented similarly to an HMD where the movement is triggered by the movement of the HMD relative to a physical environment (e.g., Scene 105 or a part of the user's body (e.g., the user's eyes, head, or hands)).

[0061] While relevant features of the operating environment 100 are shown in Figure 1A, those skilled in the art will understand from this disclosure that various other features have been omitted for brevity so as not to obscure more appropriate embodiments of the exemplary embodiments disclosed herein.

[0062] Figures 1A to 1P show various examples of computer systems used to carry out the method and to provide audio, visual, and / or haptic feedback as part of the user interface described herein. In some embodiments, the computer system optionally includes one or more display generating components (e.g., first and second display assemblies 1-120a, 1-120b and / or first and second optical modules 11.1.1-104a and 11.1.1-104b) for displaying to the user of the computer system a representation of virtual elements and / or a physical environment generated based on detected events and / or user input detected by the computer system. The user interface generated by the computer system is optionally corrected by one or more corrective lenses 11.3.2-216 (optionally detachably attached to one or more of the optical modules) to make it easier for users who otherwise correct their vision using glasses or contact lenses to view the user interface. While many of the user interfaces shown herein show a single view of the user interface, the user interface in the HMD is optionally displayed using two optical modules (e.g., first and second display assemblies 1-120a, 1-120b and / or first and second optical modules 11.1.1-104a and 11.1.1-104b), one for the user's right eye and a different one for the user's left eye, with slightly different images presented to the two different eyes to create a three-dimensional depth illusion, and the single view of the user interface is typically either the right-eye or left-eye view, and the depth effect is described in text or using other schematic diagrams or views.In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information of the computer system to the user of the computer system (when the computer system is not installed) and / or to other people near the computer system, which is optionally generated based on detected events and / or user input detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, which is optionally generated based on detected events and / or user input detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting inputs such as one or more sensors (e.g., sensor assembly 1-356 and / or one or more sensors in Figure 1I) for detecting information about the physical environment of a device that can be used (optionally in conjunction with one or more illuminators, such as the illuminator shown in Figure 1I) to generate a digital passthrough image, capture a visual medium (e.g., photograph and / or video) corresponding to a physical environment, or determine the orientation (e.g., position and / or orientation) of physical objects and / or surfaces in the physical environment, so that virtual objects can be positioned based on the detected orientation of physical objects and / or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input, such as one or more sensors for detecting the position and / or movement of a hand (e.g., sensor assembly 1-356 and / or one or more sensors in Figure 1I), which may be used to determine when one or more air gestures were performed (optionally in conjunction with one or more illuminators, such as illuminator 6-124 shown in Figure 1I).In some embodiments, the computer system includes one or more input devices for detecting input, such as one or more sensors for detecting eye movement (e.g., eye-tracking and gaze-tracking sensors in Figure 1I), which may be used (optionally, in conjunction with one or more lights, such as lights 11.3.2-110 in Figure 1O) to determine attention or gaze position and / or gaze movement, which may be used to detect gaze-only input based on gaze movement and / or dwell time. Using the various combinations of sensors described above, it may be possible to determine the user's facial expressions and / or hand movements for use when generating the user's avatar or representation, such as a personified avatar or representation for use in a real-time communication session, the avatar having facial expressions, hand movements and / or body movements that are based on or similar to the detected facial expressions, hand movements and / or body movements of the user of the device. Gaze and / or attention information is optionally combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and / or indirect inputs such as air gestures or inputs using one or more hardware input devices, including buttons (e.g., first buttons 1-128, buttons 11.1.1-114, second buttons 1-132, and / or dials or buttons 1-328), knobs (e.g., first buttons 1-128, buttons 11.1.1-114, and / or dials or buttons 1-328), digital crowns (e.g., pressable, twistable, or rotatable first buttons 1-128, buttons 11.1.1-114, and / or dials or buttons 1-328), trackpads, touchscreens, keyboards, mice, and / or other input devices.One or more buttons (for example, the first buttons 1-128, buttons 11.1.1-114, the second button 1-132, and / or the dial or button 1-328) are optionally used to perform system actions such as re-centering content in a three-dimensional environment visible to the device user, displaying a home user interface for launching an application, starting a real-time communication session, or starting to display a virtual three-dimensional background. A knob or digital crown (e.g., a first button 1-128, button 11.1.1-114, and / or dial or button 1-328, which is pressable and twistable or rotatable) is optionally rotatable to adjust parameters of the visual content, such as the level of immersion of the virtual three-dimensional environment (e.g., the extent to which the virtual content occupies the user's viewport into the three-dimensional environment), or other parameters associated with the virtual content displayed via the three-dimensional environment and optical modules (e.g., first and second display assemblies 1-120a, 1-120b, and / or first and second optical modules 11.1.1-104a and 11.1.1-104b).

[0063] Figure 1B shows front, top, and perspective views of an example of a head-mountable display (HMD) device 1-100, which is worn by a user and configured to provide a virtual and augmented / mixed reality (VR / AR) experience. The HMD 1-100 may include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 fixed to the electronic strap assembly 1-104 at either end. The electronic strap assembly 1-104 and the band 1-106 may be part of a retaining assembly configured to wrap around the user's head to hold the display unit 1-102 against the user's face.

[0064] In at least one example, the band assembly 1-106 may include a first band 1-116 configured to wrap around the back of the user's head and a second band 1-117 configured to extend over the top of the user's head. The second strap may extend between the first electronic strap 1-105a and the second electronic strap 1-105b of the electronic strap assembly 1-104, as shown in the illustration. The strap assembly 1-104 and the band assembly 1-106 may be part of a fastening mechanism that extends rearward from the display unit 1-102 and is configured to hold the display unit 1-102 against the user's face.

[0065] In at least one example, the fastening mechanism includes a first electronic strap 1-105a, which includes a first proximal end 1-134 coupled to a housing 1-150 of the display unit 1-102, for example, and a first distal end 1-136 opposite the first proximal end 1-134. The fastening mechanism may also include a second electronic strap 1-105b, which includes a second proximal end 1-138 coupled to the housing 1-150 of the display unit 1-102, and a second distal end 1-140 opposite the second proximal end 1-138. The fastening mechanism may also include a first band 1-116 having a first end 1-142 coupled to a first distal end 1-136 and a second end 1-144 coupled to a second distal end 1-140, and a second band 1-117 extending between the first electronic strap 1-105a and the second electronic strap 1-105b. The straps 1-105a and 1-105b and the band 1-116 may be connected via a connecting mechanism or assembly 1-114. In at least one example, the second band 1-117 includes a first end 1-146 coupled to a first electron strap 1-105a between a first proximal end 1-134 and a first distal end 1-136, and a second end 1-148 coupled to a second electron strap 1-105b between a second proximal end 1-138 and a second distal end 1-140.

[0066] In at least one example, the first and second electronic straps 1-105a-b include plastic, metal, or other structural material that forms the shape of substantially rigid straps 1-105a-b. In at least one example, the first and second bands 1-116, 1-117 are formed from an elastic flexible material, including woven fabric, rubber, etc. The first and second bands 1-116, 1-117 may be flexible to conform to the shape of the user's head when the HMD 1-100 is worn.

[0067] In at least one example, one or more of the first and second electronic straps 1-105a to b may define an internal strap volume and include one or more electronic components disposed within that internal strap volume. In one example, as shown in Figure 1B, the first electronic strap 1-105a may include electronic component 1-112. In one example, electronic component 1-112 may include a speaker. In another example, electronic component 1-112 may include a computing component such as a processor.

[0068] In at least one example, the housing 1-150 defines a first forward-facing opening 1-152. The display assembly 1-108 is positioned to block the first opening 1-152 from view when the HMD 1-100 is assembled, so the forward-facing opening is labeled with a dotted line at 1-152 in Figure 1B. The housing 1-150 may also define a second rearward-facing opening 1-154. The housing 1-150 also defines an internal volume between the first opening 1-152 and the second opening 1-154. In at least one example, the HMD 1-100 includes a display assembly 1-108, which may include a front cover and a display screen (shown in other figures) disposed within or across the front opening 1-152 to block the front opening 1-152. In at least one example, the display screen of display assembly 1-108 has a curvature configured to follow the curvature of the user's face, as well as the display assembly 1-108 as a whole. The display screen of display assembly 1-108 can be curved to complement the features of the user's face and the overall curvature from one side of the face to the other, for example, from left to right and / or from top to bottom when the display unit 1-102 is pressed.

[0069] In at least one example, the housing 1-150 may define a first aperture 1-126 between a first opening 1-152 and a second opening 1-154, and a second aperture 1-130 between the first opening 1-152 and the second opening 1-154. The HMD 1-100 may also include a first button 1-128 disposed in the first aperture 1-126 and a second button 1-132 disposed in the second aperture 1-130. The first and second buttons 1-128 and 1-132 may be pressable through their respective apertures 1-126 and 1-130. In at least one example, the first button 1-126 and / or the second button 1-132 may be a twistable dial and a pressable button. In at least one example, the first buttons 1-128 are pressable and twistable dial buttons, and the second buttons 1-132 are pressable buttons.

[0070] Figure 1C shows a rear perspective view of HMD1-100. HMD1-100 may include an optical seal 1-110 extending rearward from the housing 1-150 of the display assembly 1-108 and around the outer periphery of the housing 1-150, as shown. The optical seal 1-110 may be configured to extend from the housing 1-150 to the user's face around the user's eyes to block external light from being visible. In one example, HMD1-100 may include first and second display assemblies 1-120a, 1-120b disposed in or within a rearward-facing second opening 1-154 defined by the housing 1-150 and / or disposed within the internal volume of the housing 1-150 and configured to project light through the second opening 1-154. In at least one example, each display assembly 1-120a-b may include respective display screens 1-122a, 1-122b configured to project light backward through a second opening 1-154 toward the user's eyes.

[0071] In at least one example, referring to both Figures 1B and 1C, the display assembly 1-108 may be a forward-facing display assembly including a display screen configured to project light in a first forward direction, and the rear-facing display screens 1-122a-b may be configured to project light in a second rear direction opposite to the first direction. As described above, the light seal 1-110 may be configured to prevent external light from the HMD 1-100, including light projected by the forward-facing display screen of the display assembly 1-108 shown in the front perspective view of Figure 1B, from reaching the user's eyes. In at least one example, the HMD 1-100 may also include a curtain 1-124 that closes a second opening 1-154 between the housing 1-150 and the rear-facing display assemblies 1-120a-b. In at least one example, the curtain 1-124 may be elastic or at least partially elastic.

[0072] Any of the features, components, and / or parts shown in Figures 1B and 1C, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1D to 1F and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1D to 1F, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figures 1B and 1C.

[0073] Figure 1D shows an exploded view of an example of HMD1-200, which includes various parts or components separated according to modularity and the selective coupling of their components. For example, HMD1-200 may include a band 1-216 that can be selectively coupled to first and second electronic straps 1-205a, 1-205b. The first fastening strap 1-205a may include a first electronic component 1-212a, and the second fastening strap 1-205b may include a second electronic component 1-212b. In at least one example, the first and second straps 1-205a and 1-205b may be detachably coupled to a display unit 1-202.

[0074] In addition, the HMD1-200 may include an optical seal 1-210 configured to be detachably coupled to a display unit 1-202. The HMD1-200 may also include a lens 1-218 that can be detachably coupled to the display unit 1-202, for example, on first and second display assemblies including a display screen. The lens 1-218 may include a customized prescription lens configured for vision correction. As stated, each component shown in the exploded view of Figure 1D and described above may be detachably coupled, mounted, remounted, and replaced in order to update or replace parts for different users. For example, bands such as band 1-216, optical seals such as optical seal 1-210, lenses such as lens 1-218, and electronic straps such as straps 1-205a~b may be replaced on a user-by-user basis so that these components are customized to fit and correspond to individual users of the HMD1-200.

[0075] Any of the features, components, and / or parts shown in Figure 1D, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1B, 1C, and 1E-1F and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1B, 1C, and 1E-1F, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1D.

[0076] Figure 1E shows an exploded view of an example of a display unit 1-306 of an HMD. Display unit 1-306 may include a front display assembly 1-308, a frame / housing assembly 1-350, and a curtain assembly 1-324. Display unit 1-306 may also include a sensor assembly 1-356, a logic board assembly 1-358, and a cooling assembly 1-360, disposed between the frame assembly 1-350 and the front display assembly 1-308. In at least one example, display unit 1-306 may also include a rear-facing display assembly 1-320, which includes first and second rear-facing display screens 1-322a, 1-322b, disposed between the frame 1-350 and the curtain assembly 1-324.

[0077] In at least one example, the display unit 1-306 may also include a motor assembly 1-362 configured as an adjustment mechanism for adjusting the position of the display screens 1-322a-b of the display assembly 1-320 relative to the frame 1-350. In at least one example, the display assembly 1-320 is mechanically coupled to a motor assembly 1-362 with at least one motor for each display screen 1-322a-b, so that the motors can translate the display screens 1-322a-b to match the interpupillary distance of the user's eyes.

[0078] In at least one example, the display unit 1-306 may include a dial or button 1-328 that is pressable relative to the frame 1-350 and accessible to the user outside the frame 1-350. The button 1-328 may be electronically connected to the motor assembly 1-362 via a controller so that the user can operate the button 1-328 to cause the motors of the motor assembly 1-362 to adjust the position of the display screens 1-322a-b.

[0079] Any of the features, components, and / or parts shown in Figure 1E, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1B, 1D, and 1F and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1B, 1D, and 1F, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1E.

[0080] Figure 1F shows an exploded view of another example of a display unit 1-406 of an HMD device similar to other HMD devices described herein. Display unit 1-406 may include a forward display assembly 1-402, a sensor assembly 1-456, a logic board assembly 1-458, a cooling assembly 1-460, a frame assembly 1-450, a rear-facing display assembly 1-421, and a curtain assembly 1-424. Display unit 1-406 may also include a motor assembly 1-462 for adjusting the positions of the first and second display subassemblies 1-420a, 1-420b of the rear-facing display assembly 1-421, which include the first and second display screens, respectively, for interpupillary adjustment, as described above.

[0081] Various components, systems, and assemblies shown in the exploded view of Figure 1F are described in more detail herein with reference to Figures 1B to 1E and subsequent figures referenced herein. Display units 1-406 shown in Figure 1F may be assembled and integrated with fastening mechanisms shown in Figures 1B to 1E, which include other components such as electronic straps, bands, and optical seals, and connecting assemblies.

[0082] Any of the features, components, and / or parts shown in Figure 1F, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1B to 1E and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1B to 1E, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1F.

[0083] Figure 1G shows a perspective exploded view of a front cover assembly 3-100 of an HMD device described herein, for example, front cover assembly 3-1 of the HMD 3-100 shown in Figure 1G, or any other HMD device illustrated and described herein. The front cover assembly 3-100 shown in Figure 1G may include a transparent or translucent cover 3-102, a shroud 3-104 (or "canopy"), an adhesive layer 3-106, a display assembly 3-108 including a lenticular lens panel or array 3-110, and a structural trim 3-112. The adhesive layer 3-106 can fasten the shroud 3-104 and / or the transparent cover 3-102 to the display assembly 3-108 and / or the trim 3-112. The trim 3-112 can fasten various components of the front cover assembly 3-100 to the frame or chassis of the HMD device.

[0084] In at least one example, as shown in Figure 1G, a display assembly 3-108 including a transparent cover 3-102, a shroud 3-104, and a lenticular lens array 3-110 can be curved to adapt to the curvature of the user's face. The transparent cover 3-102 and shroud 3-104 can be curved in two or three dimensions, for example, curving perpendicularly in the Z direction inside and outside the ZX plane, and curving horizontally in the X direction inside and outside the ZX plane. In at least one example, the display assembly 3-108 may include a display panel having a lenticular lens array 3-110, as well as pixels configured to project light through the shroud 3-104 and the transparent cover 3-102. The display assembly 3-108 can be curved in at least one direction, for example, horizontally, to adapt to the curvature of the user's face from one side (e.g., left side) to the other side (e.g., right side). In at least one example, as shown and described in more detail in subsequent figures, each layer or component of the display assembly 3-108, which may include a lenticular lens array 3-110 and a display layer, can be curved horizontally, similarly or concentrically, to adapt to the curvature of the user's face.

[0085] In at least one example, the shroud 3-104 may include a transparent or translucent material from which the display assembly 3-108 projects light. In one example, the shroud 3-104 may include one or more opaque portions, such as opaque ink-printed portions or other opaque film portions, on the rear surface of the shroud 3-104. The rear surface may be the surface of the shroud 3-104 that faces the user's eyes when the HMD device is worn. In at least one example, the opaque portions may be on the front surface of the shroud 3-104 opposite the rear surface. In at least one example, one or more opaque portions of the shroud 3-104 may include perimeter portions that visually conceal any components around the perimeter of the display screen of the display assembly 3-108. In this way, the opaque portions of the shroud conceal any other components, including electronic components, structural components, etc., of the HMD device that would otherwise be visible through the transparent or translucent cover 3-102 and / or the shroud 3-104.

[0086] In at least one example, the shroud 3-104 can define one or more aperture transparent portions 3-120 through which a sensor can send and receive signals. In one example, portion 3-120 is an aperture through which a sensor can extend or send and receive signals. In one example, portion 3-120 is a transparent portion, or a portion more transparent than the translucent or opaque portion around the shroud, through which the sensor can send and receive signals through the shroud and through the transparent cover 3-102. In one example, the sensor may include a camera, an IR sensor, a LUX sensor, or any other visual or non-visual environment sensor of the HMD device.

[0087] Any of the features, components, and / or parts shown in Figure 1G, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts described herein. Similarly, any of the features, components, and / or parts shown and described herein, including their arrangement and configuration, may be included, individually or in any combination, in the example of devices, features, components, and parts shown in Figure 1G.

[0088] Figure 1H shows an exploded view of an example of HMD device 6-100. HMD device 6-100 may include a sensor array or system 6-102 which includes one or more sensors, cameras, projectors, etc., attached to one or more components of HMD 6-100. In at least one example, the sensor system 6-102 may include a bracket 1-338 to which one or more sensors of the sensor system 6-102 can be fixed / attached.

[0089] Figure 1I shows a portion of the HMD device 6-100, including the front transparent cover 6-104 and the sensor system 6-102. The sensor system 6-102 may include multiple different sensors, emitters, and receivers, including a camera, IR sensor, and projector. The transparent cover 6-104 is shown in front of the sensor system 6-102 to show the relative positions of the various sensors and emitters and the orientation of each sensor / emitter in the system 6-102. As used herein, “lateral,” “side,” “lateral,” “horizontal,” and other similar terms refer to orientation or direction as indicated by the X-axis shown in Figure 1J. Terms such as “vertical,” “up,” “down,” and similar terms refer to orientation or direction as indicated by the Z-axis shown in Figure 1J. Terms such as “forward,” “backward,” “front,” “rear,” and similar terms refer to orientation or direction as indicated by the Y-axis shown in Figure 1J.

[0090] In at least one example, a transparent cover 6-104 can define the outer front surface of the HMD device 6-100, and a sensor system 6-102, including various sensors and their components, can be positioned behind the cover 6-104 in the Y-axis / direction. The cover 6-104 may be transparent or translucent to allow both the light detected by the sensor system 6-102 and the light emitted thereby to pass through the cover 6-104.

[0091] As described elsewhere in this specification, the HMD device 6-100 may include one or more controllers, including processors, for electrically coupling the various sensors and emitters of the sensor system 6-102 to one or more motherboards, processing units, and other electronic devices such as display screens. In addition, as will be shown in more detail below with reference to other figures, the various sensors, emitters, and other components of the sensor system 6-102 may be coupled to various structural frame members, brackets, etc. of the HMD device 6-100, which are not shown in Figure 1I. Figure 1I shows components of the sensor system 6-102 that are not attached to and electrically coupled from other components, for the sake of clarity as an example.

[0092] In at least one example, the device may include one or more controllers having processors configured to execute instructions stored on memory components electrically coupled to the processors. The instructions may include, or be executed by, one or more algorithms for self-correcting the angles and positions of various cameras described herein over time with use as the initial position, angle, or orientation of the cameras is impacted or deformed due to an unintended fall event or other event.

[0093] In at least one example, the sensor system 6-102 may include one or more scene cameras 6-106. System 6-102 may include two scene cameras 6-106 positioned on either side of the bridge or arch of the HMD device 6-100, such that each of the two cameras 6-102 roughly corresponds to the positions of the user's left and right eyes behind the cover 6-103. In at least one example, the scene cameras 6-106 are generally oriented forward in the Y direction to capture images in front of the user while the HMD 6-100 is in use. In at least one example, the scene cameras are color cameras and provide images and content for MR video passthrough to a display screen facing the user's eyes when the HMD device 6-100 is in use. The scene cameras 6-106 can also be used for environment and object reconstruction.

[0094] In at least one example, the sensor system 6-102 may include a first depth sensor 6-108 that is generally oriented forward in the Y direction. In at least one example, the first depth sensor 6-108 can be used for reconstructing the environment and objects, as well as tracking the user's hands and body. In at least one example, the sensor system 6-102 may include a second depth sensor 6-110 that is centrally positioned along the width of the HMD device 6-100 (for example, along the X axis). For example, the second depth sensor 6-110 can be positioned to align with the central bridge or feature above the user's nose when the HMD 6-100 is worn. In at least one example, the second depth sensor 6-110 can be used for reconstructing the environment and objects, as well as tracking the hands and body. In at least one example, the second depth sensor may include a LIDAR sensor.

[0095] In at least one example, the sensor system 6-102 may include a generally forward-facing depth projector 6-112 to project electromagnetic waves, for example, in the form of a predetermined pattern of light dots, into and within the field of view of the user and / or scene camera 6-106, or into and within the field of view including and beyond the field of view of the user and / or scene camera 6-106. In at least one example, the depth projector may project electromagnetic waves of light in the form of a dot light pattern that is reflected from objects and returned to the aforementioned depth sensors, including depth sensors 6-108, 6-110. In at least one example, the depth projector 6-112 may be used for environment and object reconstruction and hand and body tracking.

[0096] In at least one example, the sensor system 6-102 may include a downward-facing camera 6-114 having a field of view generally directed downward relative to the HMD device 6-100 in the Z-axis. In at least one example, the downward-facing camera 6-114 may be positioned on the left and right sides of the HMD device 6-100 as shown in the figure and may be used for hand and body tracking, headset tracking, and face avatar detection and creation in order to display a user avatar on the forward-facing display screen of the HMD device 6-100 as described elsewhere in this specification. The downward-facing camera 6-114 may be used to capture the facial expressions and movements of the user below the HMD device 6-100, including, for example, the cheeks, mouth, and chin.

[0097] In at least one example, the sensor system 6-102 may include a jaw camera 6-116. In at least one example, the jaw camera 6-116 may be positioned on the left and right sides of the HMD device 6-100 as shown in the figure and may be used for hand and body tracking, headset tracking, and face avatar detection and creation in order to display a user avatar on the forward-facing display screen of the HMD device 6-100 as described elsewhere in this specification. The jaw camera 6-116 may be used to capture the user's facial expressions and movements below the HMD device 6-100, including, for example, the user's jaw, cheeks, mouth, and chin. Regarding hand and body tracking, headset tracking, and face avatar,

[0098] In at least one example, the sensor system 6-102 may include a side camera 6-118. The side camera 6-118 may be oriented to capture left and right side views in the X-axis or direction relative to the HMD device 6-100. In at least one example, the side camera 6-118 may be used for hand and body tracking, headset tracking, and detection and reproduction of a facial avatar.

[0099] In at least one example, the sensor system 6-102 may include multiple eye-tracking and gaze-tracking sensors for determining the user's eye identification information, status, and gaze direction during and / or before use. In at least one example, the eye / gaze-tracking sensor may include nasal eye cameras 6-120 positioned on either side of the user's nose and adjacent to the user's nose when the HMD device 6-100 is worn. The eye / gaze sensor may also include lower eye cameras 6-122 positioned below each user's eye for capturing images of the eye for face avatar detection and creation, gaze tracking, and iris recognition functions.

[0100] In at least one example, the sensor system 6-102 includes an infrared illuminator 6-124 directed outward from the HMD device 6-100, which can illuminate the external environment and any objects within it with IR light for IR detection by one or more IR sensors of the sensor system 6-102. In at least one example, the sensor system 6-102 may include a flicker sensor 6-126 and an ambient light sensor 6-128. In at least one example, the flicker sensor 6-126 may detect the overhead light refresh rate to avoid display flicker. In one example, the infrared illuminator 6-124 may include a light-emitting diode and can be used in low-light environments, in particular, to illuminate the user's hand and other objects with low light for detection by the infrared sensors of the sensor system 6-102.

[0101] In at least one example, multiple sensors, including a scene camera 6-106, a downward-facing camera 6-114, a jaw camera 6-116, a side camera 6-118, a depth projector 6-112, and depth sensors 6-108 and 6-110, can be used in combination with an electrically coupled controller to combine depth data with camera data for hand tracking and sizing, for better hand tracking and object recognition and tracking capabilities of the HMD device 6-100. In at least one example, the downward-facing camera 6-114, jaw camera 6-116, and side camera 6-118 described above and shown in Figure 1I may be wide-angle cameras capable of operating in the visible and infrared spectra. In at least one example, these cameras 6-114, 6-116, and 6-118 may operate with monochrome light detection only to simplify image processing and increase sensitivity.

[0102] Any of the features, components, and / or parts shown in Figure 1I, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1J to 1L and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1J to 1L, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1I.

[0103] Figure 1J shows a downward perspective view of an example of the HMD6-200, including a cover or shroud 6-204 fixed to the frame 6-230. In at least one example, the sensor 6-203 of the sensor system 6-202 may be positioned around the periphery of the HDM6-200 such that the sensor 6-203 is positioned outward around the periphery of the display area or area 6-232 so as not to obstruct the view of the displayed light. In at least one example, the sensor may be positioned behind the shroud 6-204 and aligned with the transparent portion of the shroud to allow the sensor and projector to pass light back and forth through the shroud 6-204. In at least one example, opaque ink or other opaque material or film / layer can be placed on the shroud 6-204 around the display area 6-232 to conceal components of the HMD 6-200 outside the display area 6-232 other than the transparent portion defined by the opaque portion, through which sensors and projectors transmit and receive light and electromagnetic signals during operation. In at least one example, the shroud 6-204 allows light to pass through from the display (e.g., within the display area 6-232) but not radially outward from the display area around the periphery of the display and the shroud 6-204.

[0104] In some examples, the shroud 6-204 includes a transparent portion 6-205 and an opaque portion 6-207, as described above and elsewhere in this specification. In at least one example, the opaque portion 6-207 of the shroud 6-204 can define one or more transparent regions 6-209 from which sensors 6-203 of the sensor system 6-202 can send and receive signals. In the illustrated example, the sensor 6-203 of the sensor system 6-202, which transmits and receives signals through the shroud 6-204, or more specifically through the transparent area 6-209 of (or defined by) the opaque portion 6-207 of the shroud 6-204, may include the same or similar sensors as those shown in the example in Figure 1I, e.g., depth sensors 6-108 and 6-110, depth projector 6-112, first and second scene cameras 6-106, first and second downward-facing cameras 6-114, first and second side cameras 6-118, and first and second infrared illuminators 6-124. These sensors are also shown in the examples in Figures 1K and 1L. Other sensors, sensor types, number of sensors, and their relative positions may be included in one or more other examples of the HMD.

[0105] Any of the features, components, and / or parts shown in Figure 1J, including their arrangement and configuration, may be included, either individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1I and 1K-1L and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1I and 1K-1L, including their arrangement and configuration, may be included, either individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1J.

[0106] Figure 1K shows a partial front view of an example of an HMD device 6-300, including a display 6-334, brackets 6-336 and 6-338, and a frame or housing 6-330. The example shown in Figure 1K does not include a front cover or shroud to show brackets 6-336 and 6-338. For example, the shroud 6-204 shown in Figure 1J includes an opaque portion 6-207 that visually covers / obscures the view of anything outside (e.g., radially / circumferentially outward) of the display / display area 6-334, including sensors 6-303 and brackets 6-338.

[0107] In at least one example, various sensors of sensor system 6-302 are coupled to brackets 6-336, 6-338. In at least one example, scene cameras 6-306 have tight tolerances for angles relative to each other. For example, the tolerance for the mounting angle between two scene cameras 6-306 may be 0.5 degrees or less, e.g., 0.3 degrees or less. To achieve and maintain such tight tolerances, in one example, scene cameras 6-306 can be mounted to bracket 6-338 rather than to the shroud. The bracket may include a cantilever arm to which scene cameras 6-306 and other sensors of sensor system 6-302 can be mounted, such that their position and orientation remain undeformed in the event of a user-induced drop event resulting in any deformation of the other brackets 6-226, housing 6-330, and / or shroud.

[0108] Any of the features, components, and / or parts shown in Figure 1K, including their arrangement and configuration, may be included, either individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1I, 1J, and 1L and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1I-1J and 1L, including their arrangement and configuration, may be included, either individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1K.

[0109] Figure 1L shows a bottom view of an example of the HMD 6-400, including the front display / cover assembly 6-404 and the sensor system 6-402. The sensor system 6-402 may be similar to other sensor systems described above and elsewhere in this specification, including referring to Figures 1I to 1K. In at least one example, the jaw camera 6-416 may be oriented downward to capture an image of the user's lower facial features. In one example, the jaw camera 6-416 may be directly coupled to the frame or housing 6-430, or to one or more internal brackets directly coupled to the illustrated frame or housing 6-430. The frame or housing 6-430 may include one or more apertures / openings 6-415 from which the jaw camera 6-416 can send and receive signals.

[0110] Any of the features, components, and / or parts shown in Figure 1L, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in Figures 1I to 1K and described herein. Similarly, any of the features, components, and / or parts shown and described with reference to Figures 1I to 1K, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1L.

[0111] Figure 1M shows a rear perspective view of the interpupillary distance (IPD) adjustment system 11.1.1-102, which includes first and second optical modules 11.1.1-104a-b that are slidably engaged / coupled to the respective guide rods 11.1.1-108a-b and motors 11.1.1-110a-b of the left and right adjustment subsystems 11.1.1-106a-b. The IPD adjustment system 11.1.1-102 may include buttons 11.1.1-114 that are coupled to the bracket 11.1.1-112 and communicate electrically with the motors 11.1.1-110a-b. In at least one example, the buttons 11.1.1-114 can electrically communicate with the first and second motors 11.1.1-110a~b via a processor or other circuit component to activate the first and second motors 11.1.1-110a~b and change the positions of the first and second optical modules 11.1.1-104a~b relative to each other.

[0112] In at least one example, the first and second optical modules 11.1.1-104a~b may include respective display screens configured to project light toward the user's eyes when the HMD 11.1.1-100 is worn. In at least one example, the user can operate (e.g., press and / or rotate) the button 11.1.1-114 to activate the position adjustment of the optical modules 11.1.1-104a~b to match the interpupillary distance of the user's eyes. The optical modules 11.1.1-104a~b may also include one or more cameras or other sensors / sensor systems for imaging and measuring the user's IPD so that the optical modules 11.1.1-104a~b can be adjusted to match the IPD.

[0113] In one example, the user can operate buttons 11.1.1-114 to trigger automatic position adjustment of the first and second optical modules 11.1.1-104a~b. In another example, the user can operate buttons 11.1.1-114 to trigger manual adjustment, for example, by rotating buttons 11.1.1-114 in one or the other direction, so that the optical modules 11.1.1-104a~b move further away or closer until the user visually matches their IPD. In another example, the manual adjustment is communicated electronically via one or more circuits, and power for the movement of the optical modules 11.1.1-104a~b via motors 11.1.1-110a~b is provided by a power supply. In yet another example, the adjustment and movement of the optical modules 11.1.1-104a~b via the operation of buttons 11.1.1-114 is mechanically actuated via the movement of buttons 11.1.1-114.

[0114] Any of the features, components, and / or parts shown in Figure 1M, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts shown in any other figures shown and described herein. The same applies to any of the features, components, and / or parts shown in Figure 1M, including their arrangement and configuration, which may be shown and described, individually or in any combination, with reference to any other figures shown and described herein.

[0115] Figure 1N shows a partial front perspective view of the HMD 11.1.2-100, including an outer structural frame 11.1.2-102 and an inner or intermediate structural frame 11.1.2-104 that define the first and second apertures 11.1.2-106a and 11.1.2-106b. Views of apertures 11.1.2-106a-b may be obstructed by one or more other components of the HMD 11.1.2-100 coupled to the inner frame 11.1.2-104 and / or the outer frame 11.1.2-102, as shown in the figure; therefore, apertures 11.1.2-106a-b are shown as dotted lines in Figure 1N. In at least one example, the HMD 11.1.2-100 may include a first mounting bracket 11.1.2-108 coupled to the inner frame 11.1.2-104. In at least one example, the mounting bracket 11.1.2-108 is coupled to the inner frame 11.1.2-104 between the first and second apertures 11.1.2-106a and 11.1.2-106b.

[0116] The mounting bracket 11.1.2-108 may include an intermediate or central portion 11.1.2-109 coupled to the inner frame 11.1.2-104. In some examples, the intermediate or central portion 11.1.2-109 may not be the geometric middle or center of the bracket 11.1.2-108. Rather, the intermediate / central portion 11.1.2-109 may be positioned between a first cantilever extension arm and a second cantilever extension arm extending away from the intermediate portion 11.1.2-109. In at least one example, the mounting bracket 108 includes a first cantilever arm 11.1.2-112 and a second cantilever arm 11.1.2-114 that extend away from the intermediate portion 11.1.2-109 of the mounting bracket 11.1.2-108 coupled to the inner frame 11.1.2-104.

[0117] As shown in Figure 1N, the outer frame 11.1.2-102 may be defined with a curved shape on its underside to accommodate the user's nose when the user wears the HMD 11.1.2-100. The curved shape may be referred to as the nose bridge 11.1.2-111 and may be located in the center of the underside of the HMD 11.1.2-100 as shown. In at least one example, the mounting bracket 11.1.2-108 may be connected to the inner frame 11.1.2-102 between apertures 11.1.2-106a-b, such that the cantilever arms 11.1.2-112, 11.1.2-114 extend downward and laterally outward away from the intermediate portion 11.1.2-109 to complement the shape of the nose bridge 11.1.2-111 of the outer frame 11.1.2-104. In this way, the mounting bracket 11.1.2-108 is configured to accommodate the user's nose as described above. The shape of the nose bridge 11.1.2-111 accommodates the nose in such a way that the nose bridge 11.1.2-111 provides a curvature that curves above, over, and around the nose, along with the user's nose, for comfort and fit.

[0118] The first cantilever arm 11.1.2-112 may extend away from the intermediate portion 11.1.2-109 of the mounting bracket 11.1.2-108 in a first direction, and the second cantilever arm 11.1.2-114 may extend away from the intermediate portion 11.1.2-109 of the mounting bracket 11.1.2-10 in a second direction opposite to the first direction. The first and second cantilever arms 11.1.2-112 and 11.1.2-114 are referred to as "cantilevered" or "cantilevered" arms because each arm 11.1.2-112 and 11.1.2-114 includes a distal free end 11.1.2-116 and 11.1.2-118 that is not fixed to the inner and outer frames 11.1.2-102 and 11.1.2-104, respectively. In this way, arms 11.1.2-112 and 11.1.2-114 are cantilevered from an intermediate section 11.1.2-109 that can be connected to the inner frame 11.1.2-104, with their distal ends 11.1.2-102 and 11.1.2-104 not attached.

[0119] In at least one example, the HMD 11.1.2-100 may include one or more components coupled to the mounting bracket 11.1.2-108. In one example, the components include a plurality of sensors 11.1.2-110a~f. Each of the plurality of sensors 11.1.2-110a~f may include various types of sensors, such as cameras and IR sensors. In some examples, one or more of the sensors 11.1.2-110a~f may be used for object recognition in three-dimensional space, such that it is important to maintain the precise relative positions of two or more of the plurality of sensors 11.1.2-110a~f. The cantilevered nature of the mounting bracket 11.1.2-108 can protect the sensors 11.1.2-110a~f from damage and displacement in the event of an accidental drop by the user. Since sensors 11.1.2-110a~f are cantilevered on arms 11.1.2-112 and 11.1.2-114 of mounting bracket 11.1.2-108, stresses and deformations in the inner and / or outer frames 11.1.2-104 and 11.1.2-102 are not transmitted to the cantilever arms 11.1.2-112 and 11.1.2-114, and therefore do not affect the relative positioning of sensors 11.1.2-110a~f coupled to / mounted on mounting bracket 11.1.2-108.

[0120] Any of the features, components, and / or parts shown in Figure 1N, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, and components described herein. Similarly, any of the features, components, and / or parts shown and described herein, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and components shown in Figure 1N.

[0121] Figure 10 shows an example of an optical module 11.3.2-100 for use in electronic devices such as HMDs, including the HDM devices described herein. As shown in one or more other examples described herein, the optical module 11.3.2-100 may be one of two optical modules in an HMD, each optical module being positioned to project light toward the user's eye. In this way, the first optical module can project light toward the user's first eye via a display screen, and the second optical module of the same device can project light toward the user's second eye via another display screen.

[0122] In at least one example, the optical module 11.3.2-100 may include an optical frame or housing 11.3.2-102, which may also be referred to as a barrel or optical module barrel. The optical module 11.3.2-100 may also include a display 11.3.2-104, which includes one or more display screens, coupled to the housing 11.3.2-102. The display 11.3.2-104 may be coupled to the housing 11.3.2-102 such that the display 11.3.2-104 is configured to project light toward the user's eyes when the HMD, of which the display module 11.3.2-100 is part, is worn in use. In at least one example, the housing 11.3.2-102 may surround the display 11.3.2-104 and provide a coupling mechanism for coupling other components of the optical module described herein.

[0123] In one example, the optical module 11.3.2-100 may include one or more cameras 11.3.2-106 coupled to the housing 11.3.2-102. The cameras 11.3.2-106 may be positioned relative to the display 11.3.2-104 and the housing 11.3.2-102 so that the cameras 11.3.2-106 are configured to capture one or more images of the user's eyes while in use. In at least one example, the optical module 11.3.2-100 may also include a light strip 11.3.2-108 surrounding the display 11.3.2-104. In one example, the light strip 11.3.2-108 is positioned between the display 11.3.2-104 and the cameras 11.3.2-106. The light strip 11.3.2-108 may include multiple lights 11.3.2-110. Multiple lights may include one or more light-emitting diodes (LEDs) or other lights configured to project light toward the user's eyes when the HMD is worn. Individual lights 11.3.2-110 of the light strip 11.3.2-108 can be spaced apart around the strip 11.3.2-108 and thus can be spaced uniformly or unevenly around the display 11.3.2-104 at various locations on the strip 11.3.2-108 and around the display 11.3.2-104.

[0124] In at least one example, the housing 11.3.2-102 defines a viewing aperture 11.3.2-101 through which the user can see the display 11.3.2-104 when the HMD device is worn. In at least one example, LEDs are configured and positioned to emit light over the user's eyes through the viewing aperture 11.3.2-101. In one example, a camera 11.3.2-106 is configured to capture one or more images of the user's eyes through the viewing aperture 11.3.2-101.

[0125] As described above, each of the components and features of the optical module 11.3.2-100 shown in Figure 1O can be replicated in another (e.g., a second) optical module arranged with the HMD to interact with the user's other eye (e.g., project light and capture images).

[0126] Any feature, component, and / or part shown in Figure 1O, including their arrangement and configuration, alone or in any combination, may be included in any other example of devices, features, components, and parts shown in Figure 1P or otherwise described herein. Similarly, any feature, component, and / or part illustrated and described with reference to Figure 1P or otherwise described herein, including their arrangement and configuration, alone or in any combination, may be included in the examples of devices, features, components, and parts shown in Figure 1O.

[0127] Figure 1P shows a cross-sectional view of an example of an optical module 11.3.2-200, which includes a housing 11.3.2-202, a display assembly 11.3.2-204 coupled to the housing 11.3.2-202, and a lens 11.3.2-216 coupled to the housing 11.3.2-202. In at least one example, the housing 11.3.2-202 defines a first aperture or channel 11.3.2-212 and a second aperture or channel 11.3.2-214. Channels 11.3.2-212 and 11.3.2-214 may be configured to slidably engage with the respective rails or guide rods of the HMD device to allow the optical module 11.3.2-200 to adjust its position relative to the user's eyes to match the user's interpupillary distance (IPD). The housing 11.3.2-202 can slidably engage with the guide rod to fix the optical module 11.3.2-200 in place within the HMD.

[0128] In at least one example, the optical module 11.3.2-200 may also include a lens 11.3.2-216 coupled to the housing 11.3.2-202 and positioned between the display assembly 11.3.2-204 and the user's eyes when the HMD is worn. The lens 11.3.2-216 may be configured to direct light from the display assembly 11.3.2-204 to the user's eyes. In at least one example, the lens 11.3.2-216 may be part of a lens assembly that includes a corrective lens detachably attached to the optical module 11.3.2-200. In at least one example, lens 11.3.2-216 is positioned above light strip 11.3.2-208 and one or more eye-tracking cameras 11.3.2-206, so that the cameras 11.3.2-206 are configured to capture an image of the user's eye through lens 11.3.2-216, and light strip 11.3.2-208 includes a light configured to project light onto the user's eye through lens 11.3.2-216 during use.

[0129] Any of the features, components, and / or parts shown in Figure 1P, including their arrangement and configuration, may be included, individually or in any combination, in any other example of devices, features, components, and parts described herein. Similarly, any of the features, components, and / or parts shown and described herein, including their arrangement and configuration, may be included, individually or in any combination, in the examples of devices, features, components, and parts shown in Figure 1P.

[0130] Figure 2 is a block diagram of an example of a controller 110 according to several embodiments. While certain features are shown, those skilled in the art will understand from this disclosure that various other features have been omitted for brevity so as not to obscure more suitable embodiments of the embodiments disclosed herein. For this reason, in some embodiments, as a non-limiting example, the controller 110 includes one or more processing units 202 (e.g., a microprocessor, application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), graphics processing unit (GPU), central processing unit (CPU), processing core, etc.), one or more input / output (I / O) devices 206, and one or more communication interfaces 208 (e.g., Universal Serial Bus (USB), FIREWIRE®, THUNDERBOLT®, IEEE 802.3x, IEEE 802.1 The system includes 802.16x, Global Mobile Communication System (GSM), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Global Positioning System (GPS), Infrared (IR), Bluetooth®, ZIGBEE®, or similar types of interfaces, one or more programming (e.g., I / O) interfaces 210, memory 220, and one or more communication buses 204 for interconnecting these and various other components.

[0131] In some embodiments, one or more communication buses 204 include circuits for interconnecting and controlling communication between system components. In some embodiments, one or more I / O devices 206 include at least one of the following: a keyboard, mouse, touchpad, joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, etc.

[0132] Memory 220 includes high-speed random-access memory such as dynamic random-access memory (DRAM), static random-access memory (SRAM), double-data-rate random-access memory (DDRRAM), or other random-access solid-state memory devices. In some embodiments, memory 220 includes non-volatile memory such as one or more magnetic storage devices, optical storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 220 optionally includes one or more storage devices located remotely from one or more processing units 202. Memory 220 includes a non-temporary computer-readable storage medium. In some embodiments, memory 220, or the non-temporary computer-readable storage medium of memory 220, stores the following programs, modules, and data structures, or subsets thereof, including an optional operating system 230 and XR experience module 240.

[0133] The operating system 230 handles various basic system services and includes instructions for performing hardware-dependent tasks. In some embodiments, the XR experience module 240 is configured to manage and coordinate one or more XR experiences for one or more users (e.g., a single XR experience for one or more users, or multiple XR experiences for each group of one or more users). To this end, in various embodiments, the XR experience module 240 includes a data acquisition unit 241, a tracking unit 242, a coordination unit 246, and a data transmission unit 248.

[0134] In some embodiments, the data acquisition unit 241 is configured to acquire data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the display generation component 120 of Figure 1A, and optionally from one or more of the input device 125, output device 155, sensor 190, and / or peripheral device 195. For this purpose, in various embodiments, the data acquisition unit 241 includes instructions and / or logic for that purpose, as well as heuristics and metadata for that purpose.

[0135] In some embodiments, the tracking unit 242 is configured to map scene 105 and track the position / location of at least the display generation component 120 relative to scene 105 in Figure 1A, and optionally to one or more of the input device 125, output device 155, sensor 190, and / or peripheral device 195. To this end, in various embodiments, the tracking unit 242 includes instructions and / or logic for this purpose, as well as heuristics and metadata for this purpose. In some embodiments, the tracking unit 242 includes a hand tracking unit 244 and / or an eye tracking unit 243. In some embodiments, the hand tracking unit 244 is configured to track the position / location of one or more parts of the user's hand, and / or the movement of one or more parts of the user's hand, relative to the display generation component 120 and / or a coordinate system defined relative to the user's hand, relative to scene 105 in Figure 1A. The hand tracking unit 244 is described in more detail below with respect to Figure 4. In some embodiments, the eye-tracking unit 243 is configured to track the position and movement of the user's gaze (or, more broadly, the user's eyes, face, or head) relative to the scene 105 (e.g., the physical environment and / or the user (e.g., the user's hands)) or to XR content displayed via the display generation component 120. The eye-tracking unit 243 is described in more detail below with reference to Figure 5.

[0136] In some embodiments, the adjustment unit 246 is configured to manage and adjust the XR experience presented to the user by the display generation component 120 and optionally by one or more of the output devices 155 and / or peripheral devices 195. For this purpose, in various embodiments, the adjustment unit 246 includes instructions and / or logic for that purpose, as well as heuristics and metadata for that purpose.

[0137] In some embodiments, the data transmission unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the display generation component 120, and optionally to one or more of the input device 125, output device 155, sensor 190, and / or peripheral device 195. To this end, in various embodiments, the data transmission unit 248 includes instructions and / or logic for that purpose, as well as heuristics and metadata for that purpose.

[0138] While the data acquisition unit 241, tracking unit 242 (including, for example, eye-tracking unit 243 and hand-tracking unit 244), adjustment unit 246, and data transmission unit 248 are shown as residing on a single device (e.g., controller 110), it should be understood that in other embodiments, any combination of the data acquisition unit 241, tracking unit 242 (including, for example, eye-tracking unit 243 and hand-tracking unit 244), adjustment unit 246, and data transmission unit 248 may be located in separate computing devices.

[0139] Furthermore, Figure 2 is intended to illustrate the function of various features that may be present in a particular embodiment, in contrast to the structural schematics of the embodiments described herein. As will be recognized by those skilled in the art, the separately shown items can be combined, and some items can be separated. For example, several functional modules separately shown in Figure 2 can be realized in a single module, and the various functions of a single functional block can be realized by one or more functional blocks in various embodiments. The actual number of modules, as well as the division of certain functions and how functions are assigned between them, will vary depending on the implementation and, in some embodiments, will partially depend on a particular combination of hardware, software, and / or firmware selected for a particular implementation.

[0140] Figure 3 is a block diagram of an example of a display generation component 120 according to several embodiments. While certain features are shown, those skilled in the art will understand from this disclosure that various other features have been omitted for brevity so as not to obscure more suitable embodiments of the embodiments disclosed herein. For that purpose, in some non-limiting examples, the display generation component 120 (e.g., HMD) may include one or more processing units 302 (e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, etc.), one or more input / output (I / O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, FIREWIRE®, THUNDERBOLT®, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, infrared, BLUETOOTH®, ZIGBEE®, and / or similar types of interfaces), one or more programming (e.g., I / O) interfaces 310, one or more XR displays 312, one or more optional in-facing and / or out-facing image sensors 314, memory 320, and one or more communication buses 304 for interconnecting these and various other components.

[0141] In some embodiments, one or more communication buses 304 include circuits for interconnecting and controlling communication between system components. In some embodiments, one or more I / O devices and sensors 306 include at least one of the following: an inertial measuring unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., a blood pressure monitor, a heart rate monitor, a blood oxygen sensor, a blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptic engine, one or more depth sensors (e.g., structured light, time of flight, etc.).

[0142] In some embodiments, one or more XR displays 312 are configured to provide the user with an XR experience. In some embodiments, one or more XR displays 312 correspond to holographic, digital light processing (DLP), liquid crystal display (LCD), liquid crystal on silicon (LCoS), organic light-emitting field-effect transistor (OLET), organic light-emitting diode (OLED), surface conduction electron emission display (SED), field emission display (FED), quantum dot light-emitting diode (QD-LED), MEMS, and / or similar display types. In some embodiments, one or more XR displays 312 correspond to waveguide displays such as diffraction, reflection, polarization, and holographic. For example, a display generation component 120 (e.g., HMD) includes a single XR display. In another example, the display generation component 120 includes an XR display for each of the user's eyes. In some embodiments, one or more XR displays 312 can present MR or VR content.

[0143] In some embodiments, one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's face, including the user's eyes (and may be referred to as an eye-tracking camera). In some embodiments, one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's hands and optionally a portion of the user's arms (and may be referred to as a hand-tracking camera). In some embodiments, one or more image sensors 314 are configured to face forward to acquire image data corresponding to a scene that the user would view if a display generation component 120 (e.g., an HMD) were not present (and may be referred to as a scene camera). One or more optional image sensors 314 may include one or more RGB cameras (e.g., complementary metal-oxide-semiconductor (CMOS) image sensors or charge-coupled device (CCD) image sensors), one or more infrared (IR) cameras, one or more event-based cameras, and / or similar.

[0144] Memory 320 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices. In some embodiments, memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 320 optionally includes one or more storage devices located remotely from one or more processing units 302. Memory 320 includes a non-temporary computer-readable storage medium. In some embodiments, memory 320, or the non-temporary computer-readable storage medium of memory 320, stores the following programs, modules, and data structures, or subsets thereof, including an optional operating system 330 and XR presentation module 340.

[0145] The operating system 330 includes instructions for handling various basic system services and instructions for performing hardware-dependent tasks. In some embodiments, the XR presentation module 340 is configured to present XR content to the user via one or more XR displays 312. For this purpose, in various embodiments, the XR presentation module 340 includes a data acquisition unit 342, an XR presentation unit 344, an XR map generation unit 346, and a data transmission unit 348.

[0146] In some embodiments, the data acquisition unit 342 is configured to acquire data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the controller 110 in Figure 1A. To this end, in various embodiments, the data acquisition unit 342 includes instructions and / or logic for that purpose, as well as heuristics and metadata for that purpose.

[0147] In some embodiments, the XR presentation unit 344 is configured to present XR content via one or more XR displays 312. For this purpose, in various embodiments, the XR presentation unit 344 includes instructions and / or logic therefor, as well as heuristics and metadata therefor.

[0148] In some embodiments, the XR map generation unit 346 is configured to generate an XR map (for example, a 3D map of a mixed reality scene or a map of a physical environment in which computer-generated objects can be placed to generate extended reality) based on media content data. For this purpose, in various embodiments, the XR map generation unit 346 includes instructions and / or logic for that purpose, as well as heuristics and metadata for that purpose.

[0149] In some embodiments, the data transmission unit 348 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110 and optionally to one or more of the input device 125, output device 155, sensor 190, and / or peripheral device 195. To this end, in various embodiments, the data transmission unit 348 includes instructions and / or logic for that purpose, as well as heuristics and metadata for that purpose.

[0150] Although the data acquisition unit 342, XR presentation unit 344, XR map generation unit 346, and data transmission unit 348 are shown as residing on a single device (e.g., the display generation component 120 in Figure 1A), it should be understood that in other embodiments, any combination of the data acquisition unit 342, XR presentation unit 344, XR map generation unit 346, and data transmission unit 348 may reside in separate computing devices.

[0151] Furthermore, Figure 3 is intended to illustrate the functionality of various features that may be present in a particular implementation, in contrast to the structural schematics of the embodiments described herein. As will be recognized by those skilled in the art, the separately shown items can be combined, and some items can be separated. For example, several functional modules shown separately in Figure 3 can be realized within a single module, and the various functions of a single functional block can be performed by one or more functional blocks in various embodiments. The actual number of modules, as well as the division of certain functions and how functions are assigned between them, will vary depending on the implementation and, in some embodiments, will partially depend on a particular combination of hardware, software, and / or firmware selected for a particular implementation.

[0152] Figure 4 is a schematic diagram of an exemplary embodiment of the hand tracking device 140. In some embodiments, the hand tracking device 140 (Figure 1A) is controlled by the hand tracking unit 244 (Figure 2) to track the position / location of one or more parts of the user's hand and / or the movement of one or more parts of the user's hand relative to the scene 105 in Figure 1A (e.g., relative to parts of the physical environment surrounding the user, relative to the display generation component 120, or relative to parts of the user (e.g., the user's face, eyes, or head), and / or relative to a coordinate system defined for the user's hand). In some embodiments, the hand tracking device 140 is part of the display generation component 120 (e.g., embedded in or attached to a head-mounted device). In some embodiments, the hand tracking device 140 is separate from the display generation component 120 (e.g., located in a separate housing or attached to a separate physical support structure).

[0153] In some embodiments, the hand tracking device 140 includes an image sensor 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and / or color cameras) that captures three-dimensional scene information including at least the hand 406 of a human user. The image sensor 404 captures a hand image with sufficient resolution to allow for the distinction of fingers and their respective positions. The image sensor 404 can typically capture images of other parts of the user's body, or images of the entire body, and may have either a zoom function or a dedicated sensor with high magnification to capture an image of the hand at a desired resolution. In some embodiments, the image sensor 404 also captures a 2D color video image of the hand 406 and other elements of the scene. In some embodiments, the image sensor 404 is used in conjunction with other image sensors that capture the physical environment of the scene 105, or functions as an image sensor that captures the physical environment of the scene 105. In some embodiments, the image sensor 404 is positioned relative to the user or the user's environment such that the field of view of the image sensor or a portion thereof is used to define an interaction space in which hand movements captured by the image sensor are processed as input to the controller 110.

[0154] In some embodiments, the image sensor 404 outputs a sequence of frames containing 3D map data (and possibly color image data) to the controller 110, thereby extracting high-level information from the map data. This high-level information is typically provided to an application running on the controller via an application programming interface (API), which drives the display generation components 120 accordingly. For example, a user can interact with the software running on the controller 110 by moving their hand 406 to change the orientation of their hand.

[0155] In some embodiments, the image sensor 404 projects a spot pattern onto a scene including the hand 406 and captures an image of the projected pattern. In some embodiments, the controller 110 calculates the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation based on the lateral shift of the spot in the pattern. This approach is advantageous in that the user does not need to hold or wear any kind of beacon, sensor, or other marker. This gives the depth coordinates of points in the scene relative to a given reference plane at a specific distance from the image sensor 404. In this disclosure, it is assumed that the image sensor 404 defines an orthogonal set of x, y, and z axes such that the depth coordinates of points in the scene correspond to a z component measured by the image sensor. Alternatively, the image sensor 404 (e.g., a hand tracking device) may use other 3D mapping methods such as stereoscopic imaging or time-of-flight measurement based on one or more cameras or other types of sensors.

[0156] In some embodiments, the hand tracking device 140 captures and processes a time sequence of depth maps containing the user's hand while the user moves their hand (e.g., the entire hand or one or more fingers). Software running on the processor in the image sensor 404 and / or controller 110 processes the 3D map data to extract patch descriptors of the hand within these depth maps. Based on previous training, the software matches these descriptors against patch descriptors stored in the database 408 to estimate the hand pose in each frame. The pose typically includes the 3D location of the user's wrist and fingertips.

[0157] The software can also analyze the trajectory of the hand and / or fingers across multiple frames in a sequence to identify gestures. The posture estimation function described herein may be interleaved with the motion tracking function, so that patch-based posture estimation is performed only once every two (or more) frames, while tracking is used to detect changes in posture that occur over the remaining frames. Posture, motion, and gesture information is provided to an application program running on the controller 110 via the API described above. This program can, for example, move and modify the image presented on the display generation component 120, or perform other functions, depending on the posture and / or gesture information.

[0158] In some embodiments, the gesture includes an air gesture. An air gesture is a gesture detected by the user without (or independently of) touching an input element that is part of a device (e.g., a computer system 101, one or more input devices 125, and / or a hand tracking device 140), and is based on detected movement of a part of the user's body in the air (e.g., head, one or more arms, one or more hands, one or more fingers, and / or one or more legs), including movement of the user's body relative to an absolute reference (e.g., the angle of the user's arm relative to the ground, or the distance of the user's hand relative to the ground), movement of the user's body relative to another part of the user's body (e.g., movement of the user's hand relative to the user's shoulder, movement of one of the user's hands relative to the user's other hand, and / or movement of the user's fingers relative to another finger or part of the user's hand), and / or absolute movement of a part of the user's body (e.g., a tap gesture including movement of the hand in a predetermined posture by a predetermined amount and / or speed, or a shake gesture including a predetermined speed or amount of rotation of a part of the user's body).

[0159] In some embodiments, the input gestures used in the various examples and embodiments described herein include air gestures, as in some embodiments, performed by moving one or more of the user's fingers relative to other fingers or parts of the user's hand for interacting with an XR environment (e.g., a virtual or mixed reality environment). In some embodiments, an air gesture is a gesture detected without the user touching (or independently of) an input element that is part of the device, and is based on detected movement of a part of the user's body in the air, including movement of the user's body relative to an absolute reference (e.g., the angle of the user's arm relative to the ground, or the distance of the user's hand relative to the ground), movement of the user's body relative to another part of the user's body (e.g., movement of the user's hand relative to the user's shoulder, movement of the user's other hand relative to one hand, and / or movement of the user's fingers relative to another finger or part of the user's hand), and / or absolute movement of a part of the user's body (e.g., a tap gesture involving movement of the hand in a predetermined position by a predetermined amount and / or speed, or a shake gesture involving rotation of a part of the user's body by a predetermined speed or amount).

[0160] In some embodiments where the input gesture is an air gesture (i.e., no physical contact with an input device that provides the computer system with information about which user interface element is the target of user input, such as contact with a user interface element displayed on a touchscreen or contact with a mouse or trackpad to move a cursor over a user interface element), the gesture takes into account the user's attention (e.g., gaze) to determine the target of user input (e.g., in the case of direct input, as described below). Thus, in implementations involving air gestures, the input gesture is the detected attention (e.g., gaze) to the user interface element in combination (e.g., simultaneously) with the movement of the user's fingers (one or more) and / or hand to perform pinch and / or tap input, as described in more detail below.

[0161] In some embodiments, input gestures directed towards a user interface object are performed directly or indirectly by reference to the user interface object. For example, user input is performed directly toward the user interface object in response to the user performing an input gesture with their hand at a position corresponding to the user interface object's position in a three-dimensional environment (e.g., determined based on the user's current viewpoint). In some embodiments, the input gesture is performed indirectly toward the user interface object according to the user performing the input gesture while the user's hand position is not at a position corresponding to the user interface object's position in a three-dimensional environment, while detecting the user's attention (e.g., gaze) toward the user interface object. For example, in the case of a direct input gesture, the user can direct their input toward the user interface object by initiating the gesture at or near a position corresponding to the user interface object's display position (e.g., within a distance of 0.5 cm, 1 cm, 5 cm, or 0-5 cm from the optional outer edge or optional central portion). In the case of indirect input gestures, the user can direct their input towards the user interface object by paying attention to the user interface object (for example, by gazing at the user interface object), and while paying attention to the options, the user initiates the input gesture (for example, at any position detectable by the computer system) (for example, at a position that does not correspond to the display position of the user interface object).

[0162] In some embodiments, the input gestures (e.g., air gestures) used in the various examples and embodiments described herein include pinch and tap inputs for interacting with virtual or mixed reality environments, as in some embodiments. For example, the pinch and tap inputs described later are performed as air gestures.

[0163] In some embodiments, a pinch input is part of an air gesture that includes one or more of the following: a pinch gesture, a long pinch gesture, a pinch-and-drag gesture, or a double pinch gesture. For example, a pinch gesture that is an air gesture involves moving two or more fingers of a hand to touch each other, i.e., including an optional interruption (e.g., within 0 to 1 second) immediately after the touch. A long pinch gesture that is an air gesture involves moving two or more fingers of a hand to touch each other for at least a threshold time amount (e.g., at least 1 second) before detecting an interruption of contact between them. For example, a long pinch gesture includes the user holding a pinch gesture (e.g., if two or more fingers are in contact), and the long pinch gesture continues until an interruption of contact between the two or more fingers is detected. In some embodiments, a double pinch gesture that is an air gesture includes two (e.g., or more) pinch inputs (e.g., performed with the same hand) that are detected directly and consecutively (e.g., within a predetermined period of time) to each other. For example, the user performs a first pinch input (e.g., a pinch input or a long pinch input), releases the first pinch input (e.g., breaks contact between two or more fingers), and then performs a second pinch input within a predetermined period (e.g., within 1 second or 2 seconds) after releasing the first pinch input.

[0164] In some embodiments, an air gesture, a pinch-and-drag gesture, includes a pinch gesture (e.g., a pinch gesture or a long pinch gesture) performed in relation to (e.g., after) a drag input that changes the user's hand position from a first position (e.g., a drag initiation position) to a second position (e.g., a resistance termination position). In some embodiments, the user maintains the pinch gesture while performing the drag input and releases the pinch gesture (e.g., spreading two or more fingers) to terminate the drag gesture (e.g., at the second position). In some embodiments, the pinch input and drag input are performed by the same hand (e.g., the user pinches two or more fingers together and touches them to each other, and then moves the same hand to a second position in the air with a drag gesture). In some embodiments, the pinch input is performed by the user's first hand and the drag input is performed by the user's second hand (e.g., the user's second hand moves from the first position to the second position in the air while the user continues the pinch input with the user's first hand). In some embodiments, an input gesture that is an air gesture includes an input (e.g., a pinch input and / or a tap input) performed using both of the user's hands. For example, an input gesture includes two (e.g., or more) pinch inputs performed in relation to each other (e.g., simultaneously or within a predetermined period of time). For example, a first pinch gesture (e.g., a pinch input, a long pinch input, or a pinch and drag input) performed using the user's first hand, and a second pinch input performed using the other hand (e.g., a second hand of the user's hands) in relation to performing the pinch input using the first hand.

[0165] In some embodiments, a tap input performed as an air gesture (e.g., directed towards a user interface element) includes the movement of one or more of the user's fingers toward the user interface element, the movement of the user's hand toward the user interface element with the user's fingers (one or more) optionally extended toward the user interface element, a downward movement of the user's fingers (e.g., mimicking a mouse click or a tap on a touchscreen), or other default movements of the user's hand. In some embodiments, a tap input performed as an air gesture is detected based on the movement characteristics of the finger or hand that performs the tap gesture movement away from the user's viewpoint and / or toward the object that is the target of the tap input, followed by the end of the movement. In some embodiments, the end of the movement is detected based on a change in the movement characteristics of the finger or hand that performs the tap gesture (e.g., away from the user's viewpoint and / or the end of the movement toward the object that is the target of the tap input, a reversal of the direction of the finger or hand movement, and / or a reversal of the direction of acceleration of the finger or hand movement).

[0166] In some embodiments, the user's attention is determined to be directed towards a part of the three-dimensional environment based on the detection of a gaze directed towards that part of the three-dimensional environment (optionally, without requiring any other conditions). In some embodiments, for the device to determine that the user's attention is directed towards a part of the three-dimensional environment, the device determines that the user's attention is directed towards a part of the three-dimensional environment based on the detection of a gaze directed towards a part of the three-dimensional environment, subject to one or more additional conditions such as the user's viewpoint being within a distance threshold from the part of the three-dimensional environment, at least for a threshold duration (e.g., dwell time), and / or the gaze being directed towards a part of the three-dimensional environment. If one of the additional conditions is not met, the device determines that the user's attention is not directed towards the part of the three-dimensional environment to which the gaze is directed (e.g., until one or more additional conditions are met).

[0167] In some embodiments, the detection of a ready state configuration of the user or a part of the user is detected by the computer system. The detection of a ready state configuration of the hand is used by the computer system as an indication that the user is likely to be preparing to interact with the computer system using one or more air gesture inputs performed by the hand (e.g., pinch, tap, pinch and drag, double pinch, long pinch, or other air gestures described herein). For example, the ready state of a hand is determined based on whether the hand has a predetermined hand shape (e.g., a pre-pinch shape where the thumb and one or more fingers are extended and spaced apart, ready to perform a pinch or grab gesture, or a pre-tap shape where one or more fingers are extended and the palm is facing away from the user), whether the hand is in a predetermined position relative to the user's viewpoint (e.g., below the user's head, above the user's waist, or extended at least 15 cm, 20 cm, 25 cm, 30 cm, or 50 cm from the body), and / or whether the hand has moved in a particular manner (e.g., moved towards the area in front of the user above the user's waist, below the user's head, or away from the user's body or legs). In some embodiments, the ready state is used to determine whether an interactive element of the user interface is responsive to attention (e.g., gaze) input.

[0168] In scenarios where the input is described in reference to an air gesture, similar gestures may also be detected using hardware input devices attached to or held by one or more of the user's hands, in which case the position of the hardware input device in space may be tracked using optical tracking, one or more accelerometers, one or more gyroscopes, one or more magnetometers, and / or one or more inertial measurement units, and it should be understood that the position and / or movement of the hardware input device is used instead of the position and / or movement of one or more hands in the corresponding air gesture(s). User input can be detected using controls included in hardware input devices, such as one or more touch-sensitive input elements, one or more pressure-sensitive input elements, one or more buttons, one or more knobs, one or more dials, one or more joysticks, one or more hand or finger covers capable of detecting the position or change in position of parts of the hands and / or fingers relative to each other, relative to the user's body, and / or the user's physical environment, and / or other hardware input device controls. User input using controls included in hardware input devices is used in place of hand and / or finger gestures such as air taps or air pinches in corresponding air gestures(single or multiple). For example, a selection input described as being performed by an air tap or air pinch input can alternatively be detected by a button press, a tap on a touch-sensitive surface, a press on a pressure-sensitive surface, or other hardware input.As another example, a movement input described as being performed by an air pinch-and-drag (e.g., an air drag gesture or an air swipe gesture) can alternatively be detected based on interaction with hardware input controls such as button press-and-hold, touch on a touch-sensitive surface, or press on a pressure-sensitive surface, or based on hardware input that follows the movement of other hardware input devices in space (e.g., accompanying the hand to which the hardware input device is associated). Similarly, two-handed inputs, including movements of both hands relative to each other, can also be performed using various combinations of inputs detected by air gestures and / or one or more of the aforementioned hardware input devices, using one air gesture and one hardware input device held in the hand not performing the air gesture, two hardware input devices held in separate hands, or two air gestures performed by separate hands.

[0169] In some embodiments, the software may be downloaded electronically to the controller 110, for example, over a network, or instead, it may be provided on a tangible non-temporary medium such as an optical, magnetic, or electronic memory medium. In some embodiments, the database 408 is similarly stored in memory associated with the controller 110. Alternatively or additionally, some or all of the computer's described functions may be implemented in dedicated hardware such as a custom or semi-custom integrated circuit or a programmable digital signal processor (DSP). Although the controller 110 is shown in Figure 4, for example, as a separate unit from the image sensor 404, some or all of the controller's processing functions may be associated with the image sensor 404 by a suitable microprocessor and software, or by a dedicated circuit configuration within the housing of the image sensor 404 (e.g., a hand-tracking device), or in other ways. In some embodiments, at least some of these processing functions may be performed by a suitable processor integrated with the display generation component 120 (e.g., in a television set, handheld device, or head-mounted device), or by any other suitable computerized device such as a game console or media player. The sensing function of the image sensor 404 can also be integrated into a computer or other computerized device controlled by the sensor output.

[0170] Figure 4 further includes schematic diagrams of depth maps 410 captured by image sensor 404 according to several embodiments. The depth map includes a matrix of pixels, each having a depth value, as described above. Pixels 412 corresponding to the hand 406 are segmented in this map from the background and the wrist. The brightness of each pixel in the depth map 410 is inversely proportional to the depth value, i.e., the measured z-distance from image sensor 404, with the gradation becoming richer as the depth increases. Controller 110 processes these depth values to identify and segment image components (i.e., groups of adjacent pixels) that have the characteristics of a human hand. These characteristics may include, for example, the overall size, shape, and frame-to-frame movement of the depth map sequence.

[0171] Figure 4 also schematically shows the hand skeleton 414 that the controller 110 ultimately extracts from the depth map 410 of the hand 406, according to several embodiments. In Figure 4, the hand skeleton 414 is superimposed on the hand background 416, which has been segmented from the original depth map. In some embodiments, the hand (e.g., knuckles, fingertips, center of the palm, end of the hand connected to the wrist), and optionally major feature points on the wrist or arm connected to the hand, are identified and positioned on the hand skeleton 414. In some embodiments, the location and movement of these major feature points across multiple image frames are used by the controller 110 to determine, according to several embodiments, a hand gesture performed by the hand or the current state of the hand.

[0172] Figure 5 shows an exemplary embodiment of the eye-tracking device 130 (Figure 1A). In some embodiments, the eye-tracking device 130 is controlled by an eye-tracking unit 243 (Figure 2) to track the position and movement of the user's gaze to the scene 105 or to the XR content displayed via the display generation component 120. In some embodiments, the eye-tracking device 130 is integrated with the display generation component 120. For example, in some embodiments, if the display generation component 120 is a head-mounted device such as a headset, helmet, goggles, or glasses, or a handheld device positioned in a wearable frame, the head-mounted device includes both a component for generating XR content for user viewing and a component for tracking the user's gaze to the XR content. In some embodiments, the eye-tracking device 130 is separate from the display generation component 120. For example, if the display generation component is a handheld device or an XR chamber, the eye-tracking device 130 is optionally a separate device from the handheld device or XR chamber. In some embodiments, the eye-tracking device 130 is a head-mounted device or part of a head-mounted device. In some embodiments, the head-mounted eye-tracking device 130 is optionally used with a display generation component that is mounted on the head or a display generation component that is not mounted on the head. In some embodiments, the eye-tracking device 130 is not a head-mounted device, but is optionally used in combination with a head-mounted display generation component. In some embodiments, the eye-tracking device 130 is not a head-mounted device, but is optionally part of a non-head-mounted display generation component.

[0173] In some embodiments, the display generation component 120 uses a display mechanism (e.g., left and right near-eye display panels) that displays frames containing left and right images in front of the user's eyes to provide the user with a 3D virtual view. For example, the head-mounted display generation component may include left and right optical lenses (referred to herein as eyepieces) positioned between the display and the user's eyes. In some embodiments, the display generation component may include, or be coupled to, one or more external video cameras that capture video of the user's environment for display. In some embodiments, the head-mounted display generation component may have a transparent or translucent display on which the user can directly view the physical environment and display virtual objects on a transparent or translucent display. In some embodiments, the display generation component projects virtual objects onto the physical environment. The virtual objects are projected, for example, onto a physical surface or as holograms, so that the individual can use the system to observe the virtual objects superimposed on the physical environment. In such cases, separate display panels and image frames for the left and right eyes may not be required.

[0174] As shown in Figure 5, in some embodiments, the eye-tracking device 130 (e.g., gaze tracking device) includes at least one eye-tracking camera (e.g., an infrared (IR) camera or a near-IR (NIR) camera) and an illumination source (e.g., an IR or NIR light source such as an array or ring of LEDs) that emits light (e.g., IR or NIR light) toward the user's eye. The eye-tracking camera may be directed toward the user's eye to receive reflected IR or NIR light from the light source directly from the eye, or alternatively, it may be directed toward a "hot" mirror positioned between the user's eye and a display panel that reflects IR or NIR light from the eye to the eye-tracking camera while allowing visible light to pass through. The eye-tracking device 130 optionally captures images of the user's eye (e.g., as a video stream captured at 60-120 frames per second (fps)), analyzes the images to generate gaze tracking information, and communicates the gaze tracking information to the controller 110. In some embodiments, both of the user's eyes are tracked separately by their respective eye-tracking cameras and illumination sources. In some embodiments, only one of the user's eyes is tracked by a separate eye-tracking camera and light source.

[0175] In some embodiments, the eye-tracking device 130 is calibrated using a device-specific calibration process to determine the parameters of the eye-tracking device for a specific operating environment 100, e.g., the 3D geometric relationships and parameters of the LEDs, camera, hot mirror (if present), eyepiece, and display screen. The device-specific calibration process may be performed at the factory or another facility before delivery of the AR / VR device to the end user. The device-specific calibration process may be an automated calibration process or a manual calibration process. A user-specific calibration process may include estimating the eye parameters of a particular user, e.g., pupil location, central visual location, optical axis, visual axis, interpupillary distance. According to some embodiments, once the device-specific and user-specific parameters for the eye-tracking device 130 are determined, the images captured by the eye-tracking camera can be processed using a Glint-assisted method to determine the user's current visual axis and gaze point relative to the display.

[0176] As shown in Figure 5, the eye-tracking device 130 (e.g., 130A or 130B) includes an eyepiece(s) 520 and a gaze tracking system which includes at least one eye-tracking camera 540 (e.g., an infrared (IR) or near-IR (NIR) camera) positioned on the side of the user's face where eye tracking is performed, and an illumination source 530 (e.g., an IR or NIR light source such as an array or ring of NIR light-emitting diodes (LEDs)) that emits light (e.g., IR or NIR light) toward the user's eyes(s) 592. The eye-tracking camera 540 is positioned between the user's eye(s) 592 and the display 510 (e.g., the left or right display panel of a head-mounted display, or the display or projector of a handheld device) and may be directed towards a mirror 550 that transmits visible light while reflecting IR or NIR light from the eye(s) 592 (e.g., as shown at the top of Figure 5), or may be directed towards the user's eye(s) 592 to receive reflected IR or NIR light from the eye(s) 592 (e.g., as shown at the bottom of Figure 5).

[0177] In some embodiments, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provides the frames 562 to the display 510. For various purposes, for example, when processing the frames 562 for display, the controller 110 uses gaze tracking input 542 from the eye-tracking camera 540. The controller 110 optionally uses a glint-assisted method or other appropriate method to estimate the user's gaze point on the display 510 based on the gaze tracking input 542 obtained from the eye-tracking camera 540. The gaze point estimated from the gaze tracking input 542 is optionally used to determine the direction the user is currently looking.

[0178] The following describes, but is not intended to be limiting, several possible use cases of the user's current gaze direction. As an exemplary use case, the controller 110 may render virtual content differently based on the determined user gaze direction. For example, the controller 110 may generate virtual content at a higher resolution in the central visual region determined from the user's current gaze direction than in the peripheral region. As another example, the controller may position or move virtual content within the view based at least partially on the user's current gaze direction. As yet another example, the controller may display specific virtual content within the view based at least partially on the user's current gaze direction. As another exemplary use case in an AR application, the controller 110 may capture the physical environment of the XR experience and orient an external camera to focus in the determined direction. The external camera's autofocus mechanism can then focus on an object or surface in the environment that the user is currently viewing on the display 510. In another exemplary use case, the eyepiece 520 may be a focusing lens, and gaze tracking information is used by the controller to adjust the focus of the eyepiece 520 so that the virtual object currently being viewed by the user has appropriate binocular coordination to match the convergence of the user's eye 592. The controller 110 can utilize the gaze tracking information to orient and adjust the focus of the eyepiece 520 so that the nearby object being viewed by the user appears at the correct distance.

[0179] In some embodiments, the eye-tracking device is part of a head-mounted device, which is housed within a wearable housing and includes a display (e.g., display 510), two eyepieces (e.g., one or more eyepieces 520), an eye-tracking camera (e.g., one or more eye-tracking cameras 540), and a light source (e.g., an illumination source 530 (e.g., IR or NIR LEDs)). The light source emits light (e.g., IR or NIR light) towards the user's eye(s) 592. In some embodiments, the light sources may be arranged in a ring or circular pattern around each lens, as shown in Figure 5. In some embodiments, as an example, eight illumination sources 530 (e.g., LEDs) are arranged around each lens 520. However, more or fewer illumination sources 530 may be used, and other arrangements and locations of the illumination sources 530 may be used.

[0180] In some embodiments, the display 510 emits light within the visible light range and does not emit light within the IR or NIR range, thus not introducing noise into the gaze tracking system. Note that the location and angle of the eye-tracking camera(s) 540 are given as examples and are not intended to be limiting. In some embodiments, a single eye-tracking camera 540 is positioned on each side of the user's face. In some embodiments, two or more NIR cameras 540 can be used on each side of the user's face. In some embodiments, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some embodiments, a camera 540 operating at one wavelength (e.g., 850 nm) and a camera 540 operating at a different wavelength (e.g., 940 nm) may be used on each side of the user's face.

[0181] Embodiments of gaze tracking systems, such as those shown in Figure 5, can be used, for example, in computer-generated reality, virtual reality, and / or mixed reality applications to provide users with computer-generated reality, virtual reality, augmented reality, and / or augmented virtual experiences.

[0182] Figure 6 shows glint-assisted gaze tracking pipelines according to several embodiments. In some embodiments, the gaze tracking pipeline is implemented by a glint-assisted gaze tracking system (e.g., an eye-tracking device 130 as shown in Figures 1A and 5). The glint-assisted gaze tracking system can maintain a tracking state. Initially, the tracking state is off or "no". When in the tracking state, the glint-assisted gaze tracking system tracks the pupil contour and glint in the current frame by using prior information from previous frames when analyzing the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect the pupil and glint in the current frame, and if successful, initializes the tracking state to "yes" and continues in the tracking state for the next frame.

[0183] As shown in Figure 6, the gaze tracking camera can capture left and right images of the user's left and right eyes. The captured images are then fed into the gaze tracking pipeline for processing, which is initiated at 610. As indicated by the arrow returning to element 600, the gaze tracking system can continue to capture images of the user's eyes at a rate of, for example, 60 to 120 frames per second. In some embodiments, each set of captured images may be fed into the pipeline for processing. However, in some embodiments, or under some conditions, not all captured frames are processed by the pipeline.

[0184] At 610, if the tracking status is yes for the currently captured image, the method proceeds to element 640. At 610, if the tracking status is no, the image is analyzed to detect the user's pupil and glint in the image, as shown in 620. At 630, if the pupil and glint are successfully detected, the method proceeds to element 640. If they are not successfully detected, the method returns to element 610 and processes the next image of the user's eyes.

[0185] At 640, if proceeding from element 610, the current frame is analyzed to track the pupil and glint based in part on previous information from the previous frame. At 640, if proceeding from element 630, the tracking state is initialized based on the detected pupil and glint in the current frame. The results of processing at element 640 are checked to confirm that the tracking or detection results are reliable. For example, the results may be checked to determine whether a sufficient number of glints for pupil and gaze estimation are successfully tracked or detected in the current frame. At 650, if the results are unreliable, the tracking state is set to no at element 660, and the method returns to element 610 to process the next image of the user's eye. At 650, if the results are reliable, the method proceeds to element 670. At 670, the tracking state is set to yes (if not already yes), and the pupil and glint information is passed to element 680 to estimate the user's gaze point.

[0186] Figure 6 is intended to serve as an example of an eye-tracking technology that may be used in a particular implementation. As will be recognized by those skilled in the art, other eye-tracking technologies that currently exist or may be developed in the future may be used in computer system 101 to provide users with XR experiences in various embodiments, either in place of or in combination with the Glint-assisted eye-tracking technology described herein.

[0187] In some embodiments, the captured portion of the real-world environment 602 is used to provide the user with an XR experience, for example, a mixed reality environment in which one or more virtual objects are superimposed on a representation of the real-world environment 602.

[0188] Accordingly, this description describes several embodiments of three-dimensional environments (e.g., XR environments) that include representations of real-world objects and virtual objects. For example, a three-dimensional environment optionally includes a representation of a table existing in a physical environment, which is captured and displayed within the three-dimensional environment (e.g., actively via a computer system's camera and display, or passively via a computer system's transparent or translucent display). As described above, a three-dimensional environment optionally is a mixed reality system based on a physical environment, in which the three-dimensional environment is captured by one or more sensors of a computer system and displayed via a display generation component. As a mixed reality system, the computer system may optionally selectively display parts and / or objects of the physical environment so that each part and / or object of the physical environment appears to exist in the three-dimensional environment displayed by the computer system. Similarly, the computer system may optionally display virtual objects in a three-dimensional environment so that the virtual objects appear to exist in the real world (e.g., the physical environment) by placing virtual objects in each location within the three-dimensional environment that have corresponding locations in the real world. For example, a computer system may optionally display a vase in such a way that it appears as if a real vase were placed on a table in a physical environment. In some embodiments, individual locations in a three-dimensional environment have corresponding locations in the physical environment.Therefore, when a computer system is described as displaying virtual objects in separate locations relative to physical objects (for example, at or near the location of the user's hand, or on or near a physical table), the computer system displays the virtual objects in specific locations within a three-dimensional environment so that they appear to be at or near physical objects in the physical world (for example, if the virtual object is a real object at that specific location, then the virtual object will be displayed in the location within the three-dimensional environment that corresponds to the location within the physical environment where the virtual object would have been displayed).

[0189] In some embodiments, real-world objects existing in a physical environment displayed within a three-dimensional environment (e.g., real-world objects visible via and / or display-generating components) can interact with virtual objects existing only within the three-dimensional environment. For example, the three-dimensional environment may include a table and a vase placed on the table, where the table is a view (or representation) of a physical table in the physical environment, and the vase is a virtual object.

[0190] In a three-dimensional environment (for example, a real environment, a virtual environment, or an environment including a mixture of real and virtual objects), an object may be said to have depth or simulated depth, or an object may be said to be visible, displayed, or positioned at a different depth. In this context, depth refers to dimensions other than height or width. In some embodiments, depth is defined relative to a fixed set of coordinates (for example, a room or object has height, depth, and width defined relative to a fixed set of coordinates). In some embodiments, depth is defined relative to the user's location or viewpoint, in which case the depth dimension varies based on the user's location and / or the location and angle of the user's viewpoint. In some embodiments where depth is defined relative to the user's location positioned with respect to the surface of the environment (e.g., the floor or ground surface of the environment), objects that are away from the user along a line extending parallel to the surface are considered to have a greater depth in the environment, and / or the depth of an object is measured along an axis that extends outward from the user's location and is parallel to the surface of the environment (e.g., depth is defined in a coordinate system of a cylinder or substantially a cylinder, with the user's position at the center of a cylinder extending from the user's head to the user's feet). In some embodiments, depth is defined relative to the user's viewpoint (e.g., a direction relative to a point in space that determines which parts of the environment are visible through a head-mounted device or other display). Objects that are farther away from the user's viewpoint along a line extending parallel to the user's viewpoint are considered to have greater depth in the environment, and / or the depth of an object is measured along an axis extending outward from a line that extends from the user's viewpoint and is parallel to the user's viewpoint (e.g., depth is defined in a spherical or substantially spherical coordinate system with the origin of the viewpoint at the center of a sphere extending outward from the user's head).In some embodiments, depth is defined relative to a user interface container (e.g., a window or application on which application and / or system content is displayed), where the user interface container has height and / or width, and depth is a dimension orthogonal to the height and / or width of the user interface container. In some embodiments, where depth is defined relative to a user interface container, the height and / or width of the container is typically orthogonal or substantially orthogonal to a line extending from a user-based location (e.g., the user's viewpoint or the user's location) to the user interface container (e.g., the center of the user interface container, or another feature point of the user interface container) when the container is placed in a three-dimensional environment or is first displayed (e.g., consequently, the depth dimension of the container extends outward away from the user or the user's viewpoint). In some embodiments, where depth is defined relative to a user interface container, the depth of an object relative to the user interface container refers to the position of the object along the depth dimension of the user interface container. In some embodiments, multiple different containers may have different depth dimensions (e.g., different depth dimensions extending in different directions from the user or the user's viewpoint and / or away from different starting points). In some embodiments, when depth is defined relative to a user interface container, the direction of the depth dimension remains constant relative to the user interface container when the location of the user interface container, the user, and / or the user's viewpoint changes (e.g., when multiple different viewers are viewing the same container in a three-dimensional environment, such as during a face-to-face collaboration session, and / or when multiple participants are in a real-time communication session with shared virtual content containing the container). In some embodiments, for curved containers (e.g., including containers with curved surfaces or curved content areas), the depth dimension optionally extends within the surface of the curved container.In some contexts, z-separation (e.g., separation of two objects in depth dimensions), z-height (e.g., distance of one object from another object in depth dimensions), z-position (e.g., position of one object in depth dimensions), z-depth (e.g., position of one object in depth dimensions), or simulated z-dimension (e.g., depth used as object dimensions, environment dimensions, orientation in space, and / or orientation in simulated space) are used to refer to the concepts of depth as described above.

[0191] In some embodiments, the user may optionally interact with virtual objects in a three-dimensional environment using one or more hands, as if the virtual objects were real objects in a physical environment. For example, as described above, one or more sensors in the computer system may optionally capture one or more of the user's hands and display a representation of the user's hands in the three-dimensional environment (in a similar manner to, for example, displaying real-world objects in the three-dimensional environment as described above), or, in some embodiments, the user's hands are visible through the display-generating components by the ability to see the physical environment through the user interface, due to the transparency / transparency of some of the display-generating components displaying the user interface, or the projection of the user interface onto a transparent / translucent surface, or the projection of the user interface onto the user's eyes or the user's field of view. Thus, in some embodiments, the user's hands are displayed at separate locations in the three-dimensional environment and are processed as if they were objects in the three-dimensional environment that can interact with virtual objects in the three-dimensional environment as if they were real physical objects in the physical environment. In some embodiments, the computer system may update the display of the user's hands in the three-dimensional environment in conjunction with the movement of the user's hands in the physical environment.

[0192] In some of the embodiments described below, for example, to determine whether a physical object is directly interacting with a virtual object (e.g., whether a hand is touching, grasping, or holding a virtual object, or whether it is within a threshold distance from the virtual object), the computer system may optionally determine the "effective" distance between the physical object in the physical world and the virtual object in the three-dimensional environment. For example, a hand directly interacting with a virtual object may optionally include one or more of the fingers of a hand pressing a virtual button, a user's hand grasping a virtual vase, two fingers of a user's hand pinching / holding an application's user interface together, and other types of interactions described herein. For example, when determining whether a user is interacting with a virtual object and / or how a user is interacting with a virtual object, the computer system may optionally determine the distance between the user's hand and the virtual object. In some embodiments, the computer system determines the distance between the user's hand and the virtual object by determining the distance between the location of the hand in the three-dimensional environment and the location of the virtual object of interest in the three-dimensional environment. For example, one or more of the user's hands are located in a specific position in the physical world, which the computer system optionally captures and displays at a specific corresponding position in a three-dimensional environment (e.g., the position in the three-dimensional environment where the hands are displayed, if the hands are virtual hands rather than physical hands). The position of the hands in the three-dimensional environment is optionally compared to the position of a target virtual object in the three-dimensional environment to determine the distance between the one or more of the user's hands and the virtual object. In some embodiments, the computer system optionally determines the distance between the physical object and the virtual object by comparing the position in the physical world (as opposed to comparing the position in the three-dimensional environment).For example, when determining the distance between one or more of the user's hands and a virtual object, the computer system optionally determines the corresponding location of the virtual object in the physical world (for example, the position in the physical world where the virtual object would be located if it were a physical object rather than a virtual object), and then determines the distance between the corresponding physical position and one or more of the user's hands. In some embodiments, the same technique is optionally used to determine the distance between any physical object and any virtual object. Thus, when determining whether a physical object is in contact with a virtual object, or whether a physical object is within a threshold distance of a virtual object, as described herein, the computer system optionally performs one of the techniques described above to map the location of the physical object to a three-dimensional environment and / or to map the location of the virtual object to a physical environment.

[0193] In some embodiments, the same or similar techniques are used to determine where and what the user's gaze is directed, and / or where and what the physical stylus held by the user is directed. For example, if the user's gaze is directed to a particular position in the physical environment, the computer system optionally determines the corresponding position in the three-dimensional environment (e.g., the virtual position of the gaze), and if a virtual object is located at that corresponding virtual position, the computer system optionally determines that the user's gaze is directed to that virtual object. Similarly, the computer system optionally determines, based on the orientation of the physical stylus, where in the physical environment the stylus is pointing. In some embodiments, based on this determination, the computer system optionally determines the corresponding virtual position in the three-dimensional environment corresponding to the location in the physical environment that the stylus is pointing to, and optionally determines that the stylus is pointing to the corresponding virtual position in the three-dimensional environment.

[0194] Similarly, embodiments described herein may refer to the location of a user (e.g., a user of a computer system) and / or the location of a computer system in a three-dimensional environment. In some embodiments, the user of a computer system is holding, wearing, or otherwise positioned near the computer system. Thus, in some embodiments, the location of the computer system is used as a proxy for the user's location. In some embodiments, the location of the computer system and / or the user in the physical environment corresponds to individual locations in the three-dimensional environment. For example, if a user stands in a location facing an individual part of the physical environment that is visible through a display-generating component, the location of the computer system is the location in the physical environment (and its corresponding location in the three-dimensional environment) where the user will see objects in the physical environment in the same position, orientation, and / or size (e.g., absolutely and / or relative to each other) as the objects are visible through the display-generating component of the computer system in the three-dimensional environment. Similarly, if a virtual object displayed in a three-dimensional environment is a physical object in a physical environment (for example, the physical object is located in the same physical environment location as the one in the three-dimensional environment and has the same size and orientation as the one in the three-dimensional environment), then the computer system and / or user's location is the position from which the user will view the virtual object in the physical environment in the same position, orientation, and / or size (for example, absolutely, and / or relative to each other, and in relation to real-world objects) as it was displayed by the computer system's display generation components in the three-dimensional environment.

[0195] This disclosure describes various input methods for interaction with computer systems. Where one example is provided using one input device or method, and another example is provided using a different input device or method, each example may be compatible with the input device or method described in the other example, and their use should be considered optional. Similarly, various output methods for interaction with computer systems are described. Where one example is provided using one output device or method, and another example is provided using a different output device or method, each example may be compatible with the output device or method described in the other example, and their use should be considered optional. Similarly, various methods for interaction with virtual or mixed reality environments via computer systems are described. Where one example is provided using interaction with a virtual environment, and another example is provided using a mixed reality environment, each example may be compatible with the method described in the other example, and their use should be considered optional. Therefore, this disclosure discloses embodiments that are combinations of features of multiple examples, without exhaustively listing all features of the embodiments in the description of each exemplary embodiment.

[0196] User interface and related processes Here, we focus on embodiments of a user interface ("UI") and related processes that may be performed in a computer system such as a portable multifunction device or head-mounted device, which includes display generation components, one or more input devices, and (optionally) one or more cameras.

[0197] Figures 7A to 7S show an example of a first computer system that displays a virtual representation of the current viewpoint orientation of a user of a second computer system communicating with a first computer system in multiple orientations within a three-dimensional environment, depending on the movement of the user's current viewpoint. Figures 7A to 7S also show an example of a first computer system that displays different representations of the movement of the virtual representation based on whether the virtual representation is a first type virtual representation or a second type virtual representation.

[0198] Figure 7A shows a first computer system (e.g., electronic device) 101 that displays a three-dimensional environment 702 from the viewpoint of a first user (e.g., user 708a) of the first computer system 101 (e.g., facing the back wall of the physical environment in which the first computer system 101 is located) via a display generation component (e.g., display generation component 120 in Figure 1). In some embodiments, the first computer system 101 includes a display generation component (e.g., a touchscreen) and a plurality of image sensors (e.g., image sensor 314 in Figure 3). The image sensors optionally include one or more of a visible light camera, an infrared camera, a depth sensor, or any other sensors that the first computer system 101 can use to capture one or more images of the user or a part of the user (e.g., one or more of the user's hands) while the user is interacting with the computer system 101. In some embodiments, the user interfaces illustrated and described below may also be implemented on a head-mounted display, which includes a display generating component that displays the user interface or three-dimensional environment to the user, and sensors for detecting the physical environment and / or the movement of the user's hands (e.g., external sensors facing outward from the user) and / or the user's attention (e.g., gaze) (e.g., internal sensors facing inward towards the user's face).

[0199] Figures 7A to 7S show alternative views (e.g., a first alternative view 730a and a second alternative view 730b) of the three-dimensional environment 702 as displayed by the first computer system 101. In some embodiments, the first alternative view 730a and the second alternative view 730b of the three-dimensional environment 702 include alternative types of virtual representations of one or more users of one or more computer systems in a communication session with the first computer system 101 (e.g., the communication session has one or more characteristics of a communication session described with reference to Methods 800 and / or 900). In some embodiments, the first alternative view 730a and the second alternative view 730b of the three-dimensional environment 702 in Figures 7A to 7S include alternative representations of the movement of the virtual representation of the user of the second computer system in a communication session with the first computer system 101, depending on the movement of the user's current viewpoint relative to the three-dimensional environment 702.

[0200] Figures 7A to 7M show an overhead view 706 of the three-dimensional environment 702. As shown in the overhead view 706, the three-dimensional environment 702 is a shared environment between a first user 708a of the first computer system 101 and a second user 708b of the second computer system communicating with the first computer system 101. For example, the first user 708a views the three-dimensional environment 702 from a first perspective (e.g., a first viewpoint to the three-dimensional environment 702), and the second user 708b views the three-dimensional environment 702 from a second perspective (e.g., a second viewpoint to the three-dimensional environment 702). In some embodiments, the three-dimensional environment 702 is shared between the first computer system 101 and the second computer system within a communication session. For example, the first computer system 101 displays a first version of the three-dimensional environment 702 from the current viewpoint of the first user 708a, and the second computer system displays a second version of the three-dimensional environment 702 from the current viewpoint of the second user 708b (for example, the first user 708a and the second user 708b view and / or interact with the same shared three-dimensional environment (for example, including one or more virtual objects displayed in the three-dimensional environment shared in the communication session) during a communication session). In the overhead view 706, the location of the first user 708a corresponds to the location of the first user 708a's current viewpoint relative to the three-dimensional environment 702. The overhead view 706 further illustrates the representation of the orientation 710a of the first user 708a's current viewpoint relative to the three-dimensional environment 702 (for example, the orientation 710 is represented by an arrow (for example, indicating the direction of the first user 708a's current viewpoint relative to the three-dimensional environment 702)). In the overhead view 706, the location of the second user 708b corresponds to the location of the second user 708b's current viewpoint relative to the three-dimensional environment 702. The overhead view 706 further illustrates the representation of the orientation 710b of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (for example, the orientation 710b is represented by an arrow (for example, indicating the direction of the second user 708b's current viewpoint relative to the three-dimensional environment 702)).

[0201] Figure 7A shows a first computer system 101 displaying a virtual representation 704a of user 708b in a three-dimensional environment 702 (for example, from both a first alternate view 730a and a second alternate view 730b). In some embodiments, the user interface shown in Figure 7A and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (for example, as an AR, VR, MR, XR, or AR environment). As shown in Figure 7A, the alternate views of the three-dimensional environment 702 include the same type of virtual representation of a second user 708b. In some embodiments, the virtual representation 704a is an avatar (including one or more characteristics of the second type of virtual representation, as described with reference to Method 900, for example). The virtual representation 704a is displayed (for example, from the current viewpoint of the first user 708a) in a spatial arrangement (e.g., location and / or orientation) within the three-dimensional environment 702 that corresponds to the orientation (e.g., location and / or orientation) of the second user 708b's current viewpoint relative to the three-dimensional environment 702.

[0202] Figure 7B shows that in a first alternative view 730a of the three-dimensional environment 702, the first computer system 101 displays a virtual representation 704b of the second user 708b within the three-dimensional environment 702, depending on whether the virtual representation change criteria are met. In some embodiments, the user interface shown in Figure 7B and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). In some embodiments, the location and / or orientation of the virtual representation 704b of the second user 708b corresponds to the current orientation of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (e.g., the virtual representation 704b is an alternative representation of the current orientation of the second user 708b's current viewpoint compared to the virtual representation 704a). In the second alternative view 730b, the virtual representation change criterion is not met, and the first computer system 101 continues to display the virtual representation 704a of user 708b. In some embodiments, the virtual representation 704b is a representation of user 704a that is different from the avatar. In some embodiments, the virtual representation 704b includes one or more characteristics of the first virtual object and / or the first type of virtual representation described with reference to Method 800. In some embodiments, the virtual representation 704b is displayed as a three-dimensional shape in the three-dimensional environment 702 (for example, including one or more shapes as described with reference to Method 800). In some embodiments, the virtual representation 704b includes one or more surfaces (for example, a first surface and / or a second surface as described with reference to Method 800). As shown in Figure 7B, the first surface 732a of the virtual representation 704b (including, for example, one or more properties of the first surface of the first virtual object, as described with reference to Method 800) is displayed oriented to the current viewpoint of the first user 708a (for example, the first surface 732a corresponds to the front of the virtual representation 704b because the current viewpoint of the second user 708b is oriented (e.g., directed) to the current viewpoint of the first user 708a, as shown in the overhead view 706). In the first alternate view 730a, the first surface 732a is displayed along with the identifier of the second user 708b (e.g., the initials "JD").In some embodiments, the identifier of the second user 708b displayed on the first surface 732a includes one or more characteristics of the identifier of the second user displayed on the first surface of the first virtual object, as described with reference to Method 800.

[0203] As shown in Figure 7B, the virtual representation 704b is displayed simultaneously with the indication 718. In some embodiments, the indication 718 includes one or more characteristics of the indication that correspond to the identifier of the second user, as described with reference to Method 800. As shown in Figure 7B, the indication 718 includes the name of the second user 708b (e.g., "John Doe"). In some embodiments, the name included in the indication 718 is associated with the user profile of the second user 708b and / or with the username of the second user 708b. As shown in the second alternate view 730b of the three-dimensional environment 702, the virtual representation 704a is not displayed simultaneously with the indication 718.

[0204] In some embodiments, the virtual representation 704b is displayed in the three-dimensional environment 702 using animations independent of the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (for example, including one or more features that display a first virtual object using animations independent of the movement of the second user's current viewpoint relative to the three-dimensional environment, as described with reference to Method 800, and / or one or more features that display animations that include periodic movement of the virtual representation of a user of a second computer system, as described with reference to Method 900). As shown in Figure 7B, the virtual representation 704b is displayed with animations 714 (for example, represented by double-headed arrows on both sides of the virtual representation 704b in a first alternate view 730a) that correspond to the oscillation of the virtual representation 704b around the current location of the virtual representation 704b in the three-dimensional environment 702 (for example, including one or more features that display a first virtual object oscillating around the current location of the second user's current viewpoint relative to the three-dimensional environment, as described with reference to Method 800). As shown in the second alternative view 730b of the three-dimensional environment 702, the virtual representation 704a is not displayed along with the animation 714.

[0205] In some embodiments, the virtual representation change criteria that are satisfied to change a second user 708b's individual virtual representation from virtual representation 704a to virtual representation 704b (for example, as shown in the first alternate view 730a of Figure 7A) include one or more criteria (for example, one or more criteria are optionally used to change a second user 708b's individual virtual representation from virtual representation 704b to virtual representation 704a). For example, the criteria include receiving an indication of user input (for example, from the first user 708a or the second user 708b) to change the representation of the second user 708b's individual virtual representation from virtual representation 704a to virtual representation 704b, or optionally from virtual representation 704b to virtual representation 704a (for example, the second computer system sends the indication to the first computer system 101) (for example, the user input indication has one or more characteristics of the user input indication described with reference to Method 900). For example, the criteria include receiving an indication independent of user input (including, for example, one or more characteristics of the indication that one or more criteria are met independently of user input, as described with reference to Method 900). In some embodiments, in response to the first computer system 101 and / or the second computer system detecting a loss of tracking of the second user 704b's current viewpoint to the three-dimensional environment 702, the first computer system 101 optionally changes the display of the second user 708b's individual virtual representation from virtual representation 704a to virtual representation 704b (for example, or optionally, from virtual representation 704b to virtual representation 704a).

[0206] Figure 7C shows a second user 708b providing audio input (represented, for example, by "x" shown adjacent to the second user 708b in the overhead view 706) during a communication session with the first user 708a. In some embodiments, the user interface shown in Figure 7C and described below is implemented on a head-mounted display that displays a three-dimensional environment 702 (e.g., as an AR, VR, MR, XR, or AR environment) to the first user 708a. In some embodiments, the second computer system transmits to the first computer system an indication corresponding to the audio input received by the second computer system (e.g., from the second user 708b), which includes one or more characteristics of the indication corresponding to the audio input received by the second computer system from the second user, as described with reference to Method 800. As shown in Figure 7C, in response to the audio input provided by the second user 708b, the virtual representation 704b is displayed with the animation 720 (for example, displaying the virtual representation 704b with the animation 720 includes one or more properties of displaying the first virtual object in the three-dimensional environment with an animation based on the audio input received by the second computer system, as described with reference to Method 800). As shown in the alternative view 730b of the three-dimensional environment 702, the virtual representation 704a is not displayed with the animation 720 based on the audio input provided by the second user 708b. As shown in Figure 7C, the virtual representation 704b is not displayed with the animation 714, but the virtual representation 704b is displayed with the animation 720. Optionally, the first computer system 101 displays the virtual representation 704b with the animation 720 simultaneously with the animation 714 in response to receiving an indication corresponding to the audio input received by the second computer system.

[0207] Figure 7C shows a side view 712 of the physical environment of the second user 708b. In some embodiments, the physical environment of the second user includes one or more physical environments described with reference to Methods 800, 900, 1100, and / or 1200. As shown in the side view 712, user 708b is currently sitting in the physical environment (e.g., in a chair) (e.g., the current viewpoint of the second user 708b is currently located at a first height and / or oriented relative to the three-dimensional environment 702).

[0208] Figure 7D shows the vertical movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702. In some embodiments, the user interface shown in Figure 7D and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). As shown in side view 712 of the second user 708b's physical environment, the second user 708b has changed their vertical position within the physical environment compared to that shown in Figure 7C (e.g., the second user is standing instead of sitting). In some embodiments, according to the fact that the second user 708b's current viewpoint is represented within the three-dimensional environment 702 by a virtual representation 704a (e.g., as shown in the second alternate view 730b of the three-dimensional environment 702), the first computer system 101 displays the movement of the virtual representation 704a according to the vertical movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702. For example, as shown in Figure 7D, the virtual representation 704a is displayed at a new height in the three-dimensional environment 702 relative to the current viewpoint of the first user 708a, compared to what is shown in Figure 7C. In some embodiments, as the current viewpoint of the second user 708b is represented in the three-dimensional environment 702 by the virtual representation 704b (for example, as shown in the first alternate view 730a of the three-dimensional environment 702), the first computer system 101 stops displaying the movement of the virtual representation 704b in accordance with the vertical movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702. For example, as shown in Figure 7D, the virtual representation 704b is displayed at the same height in the three-dimensional environment 702 relative to the current viewpoint of the first user 708a, compared to what is shown in Figure 7C. In Figure 7D, the second user 708b stops providing the audio input provided by the second user 708b in Figure 7C. Therefore, the first computer system 101 stops displaying the animation 720 in the three-dimensional environment 702 (for example, optionally redisplaying the animation 714).

[0209] Figure 7E shows the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702. In some embodiments, the user interface shown in Figure 7E and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (for example, as an AR, VR, MR, XR, or AR environment). In some embodiments, depending on the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702, the first computer system 101 displays a virtual representation 704b having a first representation of movement (as described with reference to, for example, Method 900), according to the fact that the individual virtual representation of the second user 708b is a virtual representation 704b, and the first computer system 101 displays a virtual representation 704a having a second representation of movement (as described with reference to, for example, Method 900), according to the fact that the individual virtual representation of the second user 708b is a virtual representation 704a. In some embodiments, displaying a virtual representation 704b having a first representation of movement includes displaying the movement of the virtual representation 704b in accordance with the movement of the second user 708b's current viewpoint beyond a threshold amount relative to the three-dimensional environment 702 (for example, the threshold amount having one or more characteristics of the threshold amount described with reference to Method 800). In Figure 7E, the overhead view 706 includes an orientation threshold 722a (for example, corresponding to a threshold amount of change in the orientation of the second user 708b's current viewpoint relative to the three-dimensional environment 702) and a distance (for example, or optionally magnitude) threshold 722b (for example, corresponding to a threshold amount of distance of the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702). In some embodiments, the first computer system 101 changes the location and / or orientation (e.g., pose) of the virtual representation 704b in the three-dimensional environment 702 (e.g., with respect to the current viewpoint of the first user 708a) in accordance with the movement of the second user 708b's current viewpoint, which exceeds the orientation threshold 722a and / or distance threshold 722b with respect to the three-dimensional environment 702.In some embodiments, depending on whether the individual virtual representation of user 708b is virtual representation 704a, the first computer system 101 changes the location and / or orientation (e.g., pose) of virtual representation 704a in the three-dimensional environment 702 (e.g., with respect to the current viewpoint of the first user 708a), independently of whether the movement of the second user 708b's current viewpoint with respect to the three-dimensional environment exceeds the orientation threshold 722a and / or the distance threshold 722b. In Figure 7E, the movement of the second user 708b's current viewpoint does not exceed the orientation threshold 722a (e.g., the orientation 710b of the second user 708b's current viewpoint is within the orientation threshold 722a in the overhead view 706) and / or the distance threshold 722b (e.g., the location of the second user 708b's current viewpoint is within the distance threshold 722b in the overhead view 706). Therefore, in the first alternate view 730a of the three-dimensional environment 702, the first computer system 101 does not change the orientation of the virtual representation 704b in the three-dimensional environment 702 based on the movement of the second user 708b's current viewpoint, and in the second alternate view 730b of the three-dimensional environment 702, the first computer system 101 changes the orientation of the virtual representation 704b in the three-dimensional environment 702 based on the movement of the second user 708b's current viewpoint.

[0210] Figure 7F shows the current viewpoint shift of the second user 708b exceeding the orientation threshold 722a for the three-dimensional environment 702. In some embodiments, the user interface shown in Figure 7F and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). In some embodiments, the current viewpoint shift of the second user 708b shown in Figure 7F is a continuation of the current viewpoint shift of the second user 708b from Figure 7E. As shown in the overhead view 706, the current viewpoint shift of the second user 708b ensures that the orientation 710b of the second user 708b is not within the orientation threshold 722a. In accordance with the movement of the second user 708b's current viewpoint, which exceeds the orientation threshold 722a relative to the three-dimensional environment 702, the first computer system 101 (for example, in the first alternate view 730a) changes the orientation of the virtual representation 704b in the three-dimensional environment 702 (for example, relative to the first user 708a's current viewpoint). As shown in the second alternate view 730b, the first computer system 101 changes (for example, or optionally continues to change) the orientation of the virtual representation 704a in the three-dimensional environment 702 (for example, relative to the first user 708a's current viewpoint). In some embodiments, the first computer system 101 displays the continuous movement of the virtual representation 704a based on the movement of the second user 708b's current viewpoint, as shown in Figures 7E to 7F (for example, the first computer system 101 does not stop displaying the virtual representation 704a and / or does not stop displaying the movement of the virtual representation 704a).

[0211] As shown in the first alternate view 730a of the three-dimensional environment 702 in Figure 7F, the virtual representation 704b continues to be displayed with animation 714 as the virtual representation 704b moves in accordance with the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702. In some embodiments, the first computer system 101 stops displaying the virtual representation 704b with animation 714 as the virtual representation 704b, which is displayed based on the movement of the second user 708b's current viewpoint in the three-dimensional environment 702 (for example, relative to the first user 708a's current viewpoint), moves.

[0212] In Figure 7F, the indication 718 continues to be displayed along with the virtual representation 704b as the virtual representation 704b moves within the three-dimensional environment 702 (for example, relative to the current viewpoint of the first user 708a). As shown in the first alternate view 730a, the indication 718 is displayed within the three-dimensional environment 702 (for example, relative to the current viewpoint of the first user 708a) in the same orientation as shown in Figures 7B-7E. In some embodiments, the first computer system 101 maintains the orientation of the indication 718 relative to the three-dimensional environment 702 (for example, relative to the current viewpoint of the first user 708a) when displaying the movement of the virtual representation 704b (for example, changes in orientation and / or location relative to the current viewpoint of the first user 708a). In some embodiments, maintaining the orientation of the indication 718 when displaying the movement of the virtual representation 704b includes one or more properties that maintain the display of the indication corresponding to the identifier of a second user in the three-dimensional environment in a first orientation relative to the three-dimensional environment, in response to the receipt of the indication, as described with reference to Method 800.

[0213] Figure 7G shows the current viewpoint movement of the second user 708b relative to the three-dimensional environment 702. In some embodiments, the user interface shown in Figure 7G and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). In some embodiments, the current viewpoint movement of the second user 708b is a continuous movement of the second user 708b's current viewpoint as shown in Figures 7E-7F. In some embodiments, the current viewpoint movement of the second user 708b continues to exceed the orientation threshold 722a (e.g., as shown in Figures 7E-7F). Thus, in the first alternate view 730a of the three-dimensional environment 702, the first computer system 101 changes (e.g., or optionally continues to change) the orientation of the virtual representation 704b in the three-dimensional environment 702 (e.g., from the first user 708a's current viewpoint). As shown in the first alternate view 730a of the three-dimensional environment 702, the second surface 732b of the virtual representation 704b is shown (for example, because the orientation 710b of the second user 708b's current viewpoint is oriented away from the first user 708a's current viewpoint). In some embodiments, the second surface 732b is a surface of the virtual representation 704b that includes an orientation opposite to that of the first surface 732a (for example, as shown in Figures 7B to 7E) and does not include the identifier of the second user 708b shown on the first surface 732a (for example, as described with reference to Figure 7B). In the second alternate view 730a of the three-dimensional environment 702, the first computer system 101 changes the orientation of the virtual representation 704a (for example, or optionally continues to change) based on the change in the orientation of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (for example, and relative to the first user 708a's current viewpoint). In some embodiments, the first computer system 101 displays a continuous movement of the virtual representation 704a based on the movement of the current viewpoint of the second user 708b shown in Figures 7E to 7G (for example, the first computer system 101 does not stop displaying the virtual representation 704a and / or does not stop displaying the movement of the virtual representation 704a).

[0214] As shown in the overhead view 706 of Figure 7G, the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 includes a change in the location of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (compared to, for example, the location of the second user 708b's current viewpoint shown in the overhead views 706 of Figures 7A to 7F). The overhead view 706 shows that the change in the location of the second user 708b's current viewpoint does not exceed the distance threshold 722b. Therefore, in the first alternate view 730a of the three-dimensional environment 702, the first computer system 101 does not change the location of the virtual representation 704b relative to the three-dimensional environment 702 (for example, relative to the first user 708a's current viewpoint) based on the change in the location of the second user 708b's current viewpoint. In the overhead view 706, the virtual representation 704b is shown in a location within the three-dimensional environment 702 (e.g., and orientation represented by an arrow shown adjacent to the virtual representation 704b in the overhead view 706) to represent the difference in location of the virtual representation 704b compared to the current viewpoint of the second user 708b within the three-dimensional environment 702. In the second alternate view 730b of the three-dimensional environment 702, the first computer system 101 changes the location of the virtual representation 704b relative to the three-dimensional environment 702 (e.g., relative to the current viewpoint of the first user 708a) based on the change in the location of the current viewpoint of the second user 708b.

[0215] Figure 7H shows the current viewpoint movement of the second user 708b beyond the distance threshold 722b relative to the three-dimensional environment 702. In some embodiments, the user interface shown in Figure 7H and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). In some embodiments, the current viewpoint movement of the second user 708b shown in Figure 7H is a continuation of the current viewpoint movement of the second user 708b shown in Figures 7E to 7G. As shown in the overhead view 706 of Figure 7H, the current viewpoint movement of the second user 708b means that the location of the second user 708b's current viewpoint is no longer within the distance threshold 722b. Therefore, in a first alternative view 730a of the three-dimensional environment 702, the first computer system 101 displays an animation 740 of the movement of the virtual representation 704b based on the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (for example, relative to the current viewpoint of the first user 708a). In some embodiments, displaying the virtual representation 704b having the animation 740 includes one or more properties of displaying a first virtual object having an animation corresponding to the movement of the first virtual object from a first pose to a second pose based on the movement of the second user's current viewpoint, as described with reference to Method 800. As shown in the first alternate view 730a of the three-dimensional environment 702 in Figure 7H, the animation 740 includes movement corresponding to the movement of the second user 708b's current viewpoint (represented, for example, by arrows displayed on both sides of the virtual representation 704b) and simultaneously changes the visual prominence of the virtual representation 704b relative to the three-dimensional environment 702 (represented, for example, by a change in the visual appearance of the virtual representation 704b). For example, displaying the animation 740 includes increasing the transparency of the virtual representation 704b relative to the three-dimensional environment 702 (for example, such a virtual representation 704b appears to gradually disappear from the first user 708a's current viewpoint). As shown in the second alternate view 730b of the three-dimensional environment 702, the first computer system 101 changes the location of the virtual representation 704a based on the movement of the second user 708b's current viewpoint.In some embodiments, the first computer system 101 displays the continuous movement of the virtual representation 704a based on the movement of the current viewpoint of the second user 708b as shown in Figures 7E to 7H (for example, the first computer system 101 does not stop displaying the virtual representation 704a and / or does not stop displaying the movement of the virtual representation 704a).

[0216] As shown in Figure 7H, while the first computer system 101 is displaying animation 740 (for example, including the movement of the virtual representation 704b to a greater distance from the current viewpoint of the first user 708a), the first computer system 101 maintains the size of the virtual representation 704b relative to the three-dimensional environment 702 (for example, the display size of the virtual representation 704b decreases as the virtual representation 704b moves to a greater distance from the current viewpoint of the first user 708a). While displaying animation 740, the first computer system 101 maintains the display size of the indication 718 relative to the current viewpoint of the first user 708a (for example, the size of the indication 718 is changed by the first computer system 101 relative to the three-dimensional environment 702 so that the indication 718 is displayed at a consistent display size while the movement of the virtual representation 704b is being displayed).

[0217] Figure 7I shows a further shift in the current viewpoint location of the second user 708b relative to the three-dimensional environment 702. In some embodiments, the user interface shown in Figure 7I and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). In some embodiments, the shift in the current viewpoint of the second user 708b relative to the three-dimensional environment 702 shown in Figure 7I is a continuation of the shift in the current viewpoint of the second user 708b shown in Figures 7E-7H. In some embodiments, displaying a first representation of the movement of the virtual representation 704b includes discontinuing the display of the virtual representation 704b in the three-dimensional environment 702 and redisplaying the virtual representation 704b in the three-dimensional environment 702 based on the movement of the current viewpoint of a second user 708b (including, for example, one or more properties of discontinuing the display of the first virtual object in the three-dimensional environment before the first virtual object reaches a second pose, and then redisplaying the first virtual object in the three-dimensional environment, as described with reference to Method 800). As shown in the first alternate view 730a of the three-dimensional environment 702, the first computer system 101 discontinues the display of the virtual representation 704b in the three-dimensional environment 702 during the movement of the virtual representation 704b (for example, from a first pose to a second pose, as described with reference to Method 800), based on the movement of the current viewpoint of a second user 708b. As shown in the second alternative view 730b of the three-dimensional environment 702, the first computer system 101 maintains the display of the virtual representation 704a and changes the location of the virtual representation 704a based on the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (e.g., relative to the first user 708a's current viewpoint) (e.g., a change in location). In some embodiments, the first computer system 101 displays the continuous movement of the virtual representation 704a based on the movement of the second user 708b's current viewpoint as shown in Figures 7E to 7I (e.g., the first computer system 101 does not stop displaying the virtual representation 704a and / or stop displaying the movement of the virtual representation 704a).

[0218] Figure 7J shows a further shift in the current viewpoint of the second user 708b, including changes in location and orientation relative to the three-dimensional environment 702 (for example, as shown in the overhead view 706). In some embodiments, the user interface shown in Figure 7J and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (for example, as an AR, VR, MR, XR, or AR environment). In some embodiments, the shift in the current viewpoint of the second user 708b relative to the three-dimensional environment 702 shown in Figure 7J is a continuation of the shift in the current viewpoint of the second user 708b shown in Figures 7E to 7I. In some embodiments, displaying a first representation of the movement of the virtual representation 704b includes displaying the virtual representation 704b in one or more intermediate poses (including, for example, location and / or orientation relative to the three-dimensional environment 702) during the movement of the virtual representation 704b (from a first pose to a second pose, as described with reference to Method 800). For example, the first computer system 101 displays the virtual representation 704b in intermediate poses within the three-dimensional environment 702 in accordance with the movement of the second user 708b's current viewpoint beyond a threshold distance relative to the three-dimensional environment 702 (e.g., 0.1, 0.2, 0.5, 1, 2, 5, or 10 meters). The first alternate view 730a of the three-dimensional environment 702 shown in Figure 7J shows a virtual representation 704b displayed in an intermediate pose (e.g., location and orientation relative to the three-dimensional environment 702) corresponding to the orientation of the second user 708b's current viewpoint as the second user 708b moves its current viewpoint relative to the three-dimensional environment 702 (e.g., the intermediate pose corresponds to the location and orientation of the second user 708b's current viewpoint as shown in the overhead view 706). In some embodiments, displaying the virtual representation 704b in an intermediate pose includes one or more properties that display the first virtual object in one or more intermediate poses in the three-dimensional environment between the first and second poses as the movement of the second user's current viewpoint from the first viewpoint to the second viewpoint exceeds a threshold distance relative to the three-dimensional environment, as described with reference to Method 800.As shown in the second alternative view 730b of the three-dimensional environment 702, the first computer system 101 maintains the display of the virtual representation 704a and changes the location of the virtual representation 704a based on the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (e.g., a change in location and orientation). In some embodiments, the first computer system 101 displays the continuous movement of the virtual representation 704a based on the movement of the second user 708b's current viewpoint as shown in Figures 7E to 7J (e.g., the first computer system 101 does not stop displaying the virtual representation 704a and / or stop displaying the movement of the virtual representation 704a).

[0219] In Figure 7J, the second user 708b provides an audio input to the second computer system. In some embodiments, the second computer system sends an indication to the first computer system 101 corresponding to the audio input provided by the second user 708b. As shown in Figure 7J, upon receiving the audio input (represented in the overhead view 706 as "x" shown adjacent to the second user 708b), the first computer system 101 displays an animation 720 having a virtual representation 704b (as shown in the first alternate view 730a of the three-dimensional environment 702, for example). In some embodiments, upon receiving an indication from a second computer system corresponding to an audio input provided by a second user 708b, the first computer system 101 provides the first user 708a with an audio output spatialized to the current orientation (e.g., location and / or orientation) of the virtual representation 704b in the three-dimensional environment 702 (including, for example, one or more properties that provide an audio output corresponding to an audio input received by the second computer system spatialized to a first orientation (e.g., or second orientation) of the first virtual object in the three-dimensional environment, as described with reference to Method 800). For example, according to the fact that the virtual representation 704b is displayed in the three-dimensional environment 702 at a different location from the second user 708b's current viewpoint (e.g., as shown in Figures 7G-7H), the audio output is spatialized to the location of the virtual representation 704b in the three-dimensional environment 702 (e.g., not to the location of the second user 708b's current viewpoint relative to the three-dimensional environment 702).

[0220] Figure 7K shows a further shift in the current viewpoint of the second user 708b, including changes in location and orientation relative to the three-dimensional environment 702. In some embodiments, the shift in the current viewpoint of the second user 708b relative to the three-dimensional environment 702 shown in Figure 7K is a continuation of the shift in the current viewpoint of the second user 708b shown in Figures 7E to 7J. In some embodiments, the user interface shown in Figure 7K and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). As shown in Figure 7K (for example, in the first alternate view 730a of the three-dimensional environment 702), the first computer system 101 discontinues displaying the virtual representation 704b in the three-dimensional environment 702 while the virtual representation 704b is moving, based on the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (for example, the continued movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 causes the first computer system 101 to discontinue displaying the virtual representation 704b in accordance with displaying the first representation of the movement (for example, as described with reference to Figure 7I)). As shown in the second alternate view 730b of the three-dimensional environment 702, the first computer system 101 maintains the display of the virtual representation 704a and changes the location and orientation of the virtual representation 704a in the three-dimensional environment 702 based on the movement of the second user 708b's current viewpoint relative to the three-dimensional environment 702 (for example, relative to the first user 708a's current viewpoint). In some embodiments, the first computer system 101 displays the continuous movement of the virtual representation 704a based on the movement of the current viewpoint of the second user 708b, as shown in Figures 7E to 7J (for example, the first computer system 101 does not stop displaying the virtual representation 704a and / or does not stop displaying the movement of the virtual representation 704a).

[0221] In some embodiments, in Figure 7K, the second user 708b determines the movement of the current viewpoint relative to the three-dimensional environment 702 (for example, the second user 708b stops moving relative to the three-dimensional environment 702 (e.g., a change in location and / or orientation) for a threshold period (e.g., 0.1, 0.5, 1, 2, 5, or 10 seconds) (e.g., optionally corresponding to less than a threshold movement amount (e.g., a movement distance and / or change in orientation))). In some embodiments, the second user 708b of the current viewpoint settles into an updated orientation relative to the three-dimensional environment 702 (e.g., including one or more characteristics of a second orientation as described with reference to Method 800). In some embodiments, detecting the movement of the second user 708b's current viewpoint fixed relative to the three-dimensional environment 702 includes one or more characteristics of detecting events that involve movement of the second user's current viewpoint less than a threshold movement amount for a longer period than a time threshold, as described with reference to Method 800.

[0222] Figure 7K1 shows similar and / or the same concepts as those shown in Figure 7K (with many of the same reference numerals). Unless otherwise noted below, it should be understood that elements shown in Figure 7K1 that have the same reference numerals as elements shown in Figures 7A to 7N have one or more or all of the same characteristics. Figure 7K1 includes a computer system 101 which includes (or is the same as) a display generation component 120. In some embodiments, the computer system 101 and the display generation component 120 each have one or more characteristics of the computer system 101 shown in Figures 7A to 7N and the display generation component 120 shown in Figures 1 and 3, and in some embodiments, the computer system 101 and the display generation component 120 shown in Figures 7A to 7N have one or more characteristics of the computer system 101 and the display generation component 120 shown in Figure 7K1.

[0223] In FIG. 7K1, the display generation component 120 includes one or more internal image sensors 314a (e.g., the eye tracking camera 540 described with reference to FIG. 5) oriented towards the user's face. In some embodiments, the internal image sensor 314a is used for eye tracking (e.g., detecting the user's gaze). The internal image sensor 314a is optionally disposed on the left and right portions of the display generation component 120 to enable eye tracking of the user's left and right eyes. The display generation component 120 also includes external image sensors 314b and 314c facing outward from the user to detect and / or capture the physical environment and / or the movement of the user's hand. In some embodiments, the image sensors 314a, 314b, and 314c have one or more of the characteristics of the image sensor 314 described with reference to FIGS. 7A-7N.

[0224] In FIG. 7K1, the display generation component 120 is shown as being configured to display content that optionally corresponds to the content described as being displayed and / or visible via the display generation component 120 with reference to FIGS. 7A-7N. In some embodiments, the content is displayed by a single display (e.g., display 510 of FIG. 5) included in the display generation component 120. In some embodiments, the display generation component 120 includes two or more displays (e.g., left and right display panels for the user's left and right eyes, as described with reference to FIG. 5) having displayed outputs that are merged (e.g., by the user's brain) to create a view of the content shown in FIG. 7K1.

[0225] The display generation component 120 has a field of view corresponding to the content shown in FIG. 7K1 (e.g., a field of view captured by the external image sensors 314b and 314c and / or visible to the user via the display generation component 120). Since the display generation component 120 is optionally a head-mounted device, the field of view of the display generation component 120 is optionally the same or similar to the user's field of view.

[0226] In some embodiments, the computer system 101 responds to user input as described with reference to Figures 7A to 7N. It should be understood that one or more or all aspects of the present disclosure illustrated or described with reference to Figures 7A to 7N and / or described with reference to the corresponding method(s) are optionally implemented on the computer system 101 and the display generation unit 120 in a manner similar to or similar to that shown in Figure 7K1.

[0227] Figure 7L shows a first computer system 101 redisplaying a virtual representation 704b in accordance with the movement of the current viewpoint of a second user 708b settling on the three-dimensional environment 702 of Figure 7K. In some embodiments, the user interface shown in Figure 7L and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (e.g., as an AR, VR, MR, XR, or AR environment). As shown in the first alternate view 730a of the three-dimensional environment 702, the first computer system 101 redisplays the virtual representation 704b within the three-dimensional environment 702 (e.g., the first computer system 101 gradually increases the opacity of the virtual representation 704b relative to the three-dimensional environment 702 (e.g., represented by the visual appearance of the virtual representation 704b in Figure 7L)). In some embodiments, while increasing the visual prominence of the virtual representation 704b, the first computer system 101 displays an animation 740 (as shown and described with reference to, for example, Figure 7H). For example, the first computer system 101 displays the movement of the virtual representation 704b toward a location in the three-dimensional environment 702 that corresponds to the updated orientation of the second user 708b's current viewpoint relative to the three-dimensional environment 702. As shown in the overhead view 706, the virtual representation 704b is redisplayed toward a location in the three-dimensional environment 702 that does not correspond to the updated orientation of the second user 708b's current viewpoint (for example, because the first representation of the movement of the virtual representation 704b includes redisplaying the virtual representation 704b with an animation corresponding to the movement of the virtual representation 704b toward a location in the three-dimensional environment 702 that corresponds to the updated orientation of the second user 708b's current viewpoint). In some embodiments, redisplaying the virtual representation 704b in the three-dimensional environment 702 includes one or more properties that display the movement of the first virtual object toward a second pose corresponding to the movement of the user's current viewpoint toward a second viewpoint, as described with reference to Method 800.As shown in the second alternative view 730b of the three-dimensional environment 702, the first computer system 101 maintains the display of the virtual representation 704a at the location and orientation within the three-dimensional environment 702 shown in FIG. 7K (e.g., because the posture of the virtual representation 704a within the three-dimensional environment 702 shown in FIG. 7L, as the second representation of the movement of the virtual representation 704a is displayed, currently reflects the updated posture of the current perspective of the second user 708b with respect to the three-dimensional environment 702).

[0228] FIG. 7L1 shows concepts similar and / or the same as those shown in FIG. 7L (with many of the same reference numerals). Unless otherwise indicated below, elements shown in FIG. 7L1 having the same reference numerals as the elements shown in FIGS. 7A - 7N are to be understood to have one or more or all of the same characteristics. FIG. 7L1 includes (or is the same as) the computer system 101 that includes the display generation component 120. In some embodiments, the computer system 101 and the display generation component 120 each have one or more of the characteristics of the computer system 101 shown in FIGS. 7A - 7N and the display generation component 120 shown in FIGS. 1 and 3, and in some embodiments, the computer system 101 and the display generation component 120 shown in FIGS. 7A - 7N have one or more of the characteristics of the computer system 101 and the display generation component 120 shown in FIG. 7L1.

[0229] In Figure 7L1, the display generation component 120 includes one or more internal image sensors 314a (e.g., the eye-tracking camera 540 described with reference to Figure 5) oriented toward the user's face. In some embodiments, the internal image sensors 314a are used for eye tracking (e.g., detecting the user's gaze). The internal image sensors 314a are optionally positioned on the left and right portions of the display generation component 120 to enable eye tracking of the user's left and right eyes. The display generation component 120 also includes external image sensors 314b and 314c facing outward from the user to detect and / or capture the physical environment and / or the user's hand movements. In some embodiments, the image sensors 314a, 314b, and 314c have one or more of the characteristics of the image sensor 314 described with reference to Figures 7A to 7N.

[0230] In Figure 7L1, the display generation component 120 is shown as displaying content that optionally corresponds to the content described as being displayed and / or visible via the display generation component 120 with reference to Figures 7A to 7N. In some embodiments, the content is displayed by a single display included in the display generation component 120 (e.g., display 510 in Figure 5). In some embodiments, the display generation component 120 includes two or more displays (e.g., left and right display panels for the user's left and right eyes, respectively, as described with reference to Figure 5) having displayed outputs that are merged (e.g., by the user's brain) to create a view of the content shown in Figure 7L1.

[0231] The display generation component 120 has a field of view corresponding to the content shown in Figure 7L1 (for example, a field of view captured by external image sensors 314b and 314c and / or visible to the user via the display generation component 120). Since the display generation component 120 is optionally a head-mounted device, the field of view of the display generation component 120 is optionally the same as or similar to the user's field of view.

[0232] In some embodiments, the computer system 101 responds to user input as described with reference to Figures 7A to 7N. It should be understood that one or more or all aspects of the present disclosure illustrated or described with reference to Figures 7A to 7N and / or described with reference to the corresponding method(s) may optionally be implemented on the computer system 101 and the display generation unit 120 in a manner similar to or similar to that shown in Figure 7L1.

[0233] Figure 7M shows a virtual representation 704b displayed in an updated pose within the three-dimensional environment 702, corresponding to the updated pose of the second user 708b's current viewpoint (for example, within a first alternative view 730a of the three-dimensional environment 702). In some embodiments, the user interface shown in Figure 7M and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (for example, as an AR, VR, MR, XR, or AR environment). In some embodiments, in accordance with the movement of the second user 708b's current viewpoint fixed to the three-dimensional environment 702, the first computer system 101 updates a threshold movement amount (for example, to display a first representation of the movement of the virtual representation 704b) for the updated pose of the second user 708b's current viewpoint. Therefore, in the overhead view 706 of Figure 7M, the orientation threshold 722a and the distance threshold 722b are shown relative to the updated orientation of the second user 708b's current viewpoint relative to the three-dimensional environment 702. In some embodiments, the second user 708b is represented by a virtual representation 704b in the three-dimensional environment 702, and as the second user 708b begins to move the current viewpoint relative to the three-dimensional environment 702 beyond the orientation threshold 22a and / or distance threshold 722b shown in the overhead view 706, the first computer system 101 displays a first representation of the movement of the virtual representation 704b in accordance with the movement of the second user 708b's current viewpoint. As shown in the second alternative view 730b of the three-dimensional environment 702, the first computer system 101 maintains the display of the virtual representation 704a in the location and orientation within the three-dimensional environment 702 shown in Figures 7K to 7L (for example, because the orientation of the virtual representation 704a in the three-dimensional environment 702 shown in Figures 7K to 7L already reflects the updated orientation of the second user 708b's current viewpoint relative to the three-dimensional environment 702, as the orientation of the virtual representation 704a in the three-dimensional environment 702 shown in Figures 7K to 7L displays the second representation of the movement of the virtual representation 704a).

[0234] In some embodiments, displaying the virtual representation 704b in an updated orientation relative to the three-dimensional environment 702 of Figure 7M includes displaying the animation 714 (as shown and described with reference to, for example, Figure 7B). In some embodiments, displaying the virtual representation 704b in an updated orientation relative to the three-dimensional environment 702 of Figure 7M includes displaying the indication 718 with the same orientation (compared to, for example, those shown in Figures 7B-7H) and the same display size (compared to, for example, those shown in Figures 7B-7H) relative to the current viewpoint of the first user 708a.

[0235] Figure 7N shows a first individual virtual representation of a second user of a second computer system communicating with a first computer system 101, and a second individual virtual representation of a third user of a computer system communicating with the first computer system 101. In some embodiments, the user interface shown in Figure 7N and described below is implemented on a head-mounted display that displays a three-dimensional environment 702 (e.g., as an AR, VR, MR, XR, or AR environment) to the first user 708a. In some embodiments, virtual representations 724a (e.g., shown in the second alternate view 730b) and 724b (e.g., shown in the first alternate view 730a) are the respective virtual representations of the second user. In some embodiments, virtual representations 726a (e.g., shown in the first alternate view 730a) and 726b (e.g., shown in the second alternate view 730b) are the respective virtual representations of the third user. In some embodiments, virtual representations 724a and 726a are shown in Figures 7A to 7M and have one or more characteristics of the virtual representation 704a described above. In some embodiments, virtual representations 724b and 726b are shown in Figures 7B to 7M and have one or more characteristics of the virtual representation 704b described above.

[0236] Figure 7O shows a first computer system 101 that changes the display of a third user's individual virtual representation (for example, in both the first alternate view 730a and the second alternate view 730b of the three-dimensional environment 702) in response to the fulfillment of virtual representation change criteria. In some embodiments, the user interface shown in Figure 7O and described below is implemented on a head-mounted display that displays the three-dimensional environment 702 to the first user 708a (for example, as an AR, VR, MR, XR, or AR environment). In some embodiments, the virtual representation change criteria have one or more characteristics of the virtual representation change criteria described with reference to Figure 7B. In particular, in the first alternate view 730a of the three-dimensional environment 702, the first computer system 101 changes the display of the third user's individual virtual representation from virtual representation 726a to virtual representation 726b (for example, as shown in Figure 7N). In the second alternative view 730b of the three-dimensional environment 702, the first computer system 101 changes the display of the third user's individual virtual representation from virtual representation 726b to virtual representation 726a (for example, as shown in Figure 7N).

[0237] As shown in Figure 70 (for example, in the first alternate view 730a), the virtual representations 724b and 726b are displayed in the three-dimensional environment 702 with different visual characteristics (including, for example, color, brightness, and / or saturation), as represented by the difference in appearance between the virtual representation 724b in the first alternate view 730a of the three-dimensional environment 702 and the virtual representation 726b in the second alternate view 730b of the three-dimensional environment 702. In some embodiments, displaying the virtual representations 724b and 726b with different visual characteristics includes one or more characteristics of displaying a first virtual object with a distinct visual characteristic having a first value, and displaying a second virtual object with a distinct visual characteristic having a second value different from the first value, as described with reference to Method 800. In Figure 70, the indications 718 displayed with the virtual representations 724b and 726b respectively include different identifiers 718 (for example, corresponding to different names of a second user and a third user). In Figure 7O, the first surfaces 732a of virtual representations 724b and 726b each contain different identifiers (for example, optionally corresponding to different initials of a second user and a third user).

[0238] As shown in Figure 7O (for example, the second alternative view 730b), virtual representations 724a and 726a correspond to avatars displayed in the three-dimensional environment 702 with different visual characteristics (for example, virtual representation 726a is displayed with a hat, and virtual representation 724a is not displayed with a hat). In some embodiments, virtual representation 726a is an avatar having one or more visual features that can be customized by a third user (for example, virtual representation 726a has features that correspond to the physical characteristics of the third user (for example, a preferred visual appearance of virtual representation 726a (for example, associated with the user profile of the third user and stored in the memory of the third computer system))). In some embodiments, the virtual representation 724a is an avatar having one or more visual features that can be customized by a second user (for example, the virtual representation 726a has features corresponding to the physical characteristics of the second user (for example, a preferred visual appearance of the virtual representation 726a (for example, associated with the second user's user profile and stored in the memory of the second computer system))).

[0239] Figure 7P shows a first computer system 101 displaying a virtual representation 740 of user 744b in a three-dimensional environment 702, in a pose independent of user 744b's current viewpoint. In some embodiments, the virtual representation 740 corresponds to a placeholder representation that the first computer system 101 displays in the three-dimensional environment 702 according to the fulfillment of one or more criteria (e.g., the fulfillment of a user status change criterion, as shown in Figure 7P). For example, one or more criteria include a criterion that a second computer system (e.g., the second computer system is communicating with the first computer system 101 in a communication session) can no longer detect, and / or does not expect to detect, a shift in user 744b's current viewpoint relative to the three-dimensional environment 702 (e.g., relative to a second three-dimensional environment that is visible to user 744b and corresponds to the three-dimensional environment 702 displayed by the second computer system). For example, the second computer system no longer tracks one or more parts of user 744b (for example, corresponding to one or more physical parts of user 744b's body (e.g., head, eyes, hands, arms, and / or torso)) (for example, due to one or more errors in one or more input devices of the second computer system). For example, the second computer system loses network connectivity and does not communicate the current orientation of user 744b's current viewpoint to the first computer system 101 (for example, through indications as described with reference to methods 800 and / or 900). In some embodiments, the virtual representation 740 has one or more characteristics of a third type of virtual representation, as described with reference to method 900. In some embodiments, if one or more criteria are no longer met (for example, due to the first computer system 101 receiving an indication from a second computer system corresponding to the user 744b's current viewpoint orientation (for example, corresponding to the user 744b's current viewpoint movement)), the first computer system 101 ceases to display the virtual representation 740 and displays a different virtual representation (for example, virtual representations 704a and / or 704b).For example, the first computer system 101 displays one or more representations of virtual movement that are different from the virtual representation 740 corresponding to the movement of the user 744b's current viewpoint, based on an indication received from the second computer system.

[0240] As shown in the overhead view 706 of Figure 7P, user 744a (for example, a user associated with the first computer system 101) views the virtual representation 740 of user 744b directly from their field of view (for example, the field of view of users 744a, 744b, and 744c are represented by arrows 748a, 748b, and 748c, respectively). Furthermore, as shown in the overhead view 706, a third user 744c (for example, associated with the first computer system 101 and a third computer system communicating with the second computer system) is represented in the three-dimensional environment 702 by a virtual representation (for example, the virtual representation 750 shown and described with reference to Figure 7S). In some embodiments, the virtual representation of user 744c is not of the same type as the virtual representation 740 (for example, the virtual representation of user 744c is displayed as the same type of virtual representation as the virtual representations 704a and / or 704b shown and described with reference to Figures 7A to 7M). In Figure 7P, based on the location of user 744c's current viewpoint relative to user 744a's current viewpoint in the three-dimensional environment 702, user 744c's virtual representation is not visible within user 744a's field of view in the three-dimensional environment 702.

[0241] In Figure 7P, the virtual representation 740 is displayed together with an indication 718. In some embodiments, the indication 718 includes one or more characteristics of the indication 718 shown and described with reference to Figures 7A to 7O. As shown in Figure 7P, an indication 742 is displayed together with the virtual representation 740. In some embodiments, the indication 742 includes information about the status of user 744b in a communication session. For example, as shown in Figure 7P, the indication 742 includes information about why user 744b is represented by the virtual representation 740 in the three-dimensional environment 702 (for example, the indication 742 includes information that user 744b has a poor network connection (for example, that the first computer system 101 does not receive one or more indications from the second computer system corresponding to the location and / or orientation of user 744b's current viewpoint)). In Figure 7P, the indication 742 is displayed on top of the virtual representation 740. In some embodiments, the indication 742 is displayed at different locations within the three-dimensional environment 702 (e.g., below and / or to the side of the virtual representation 740 from the current viewpoint of user 744a). In some embodiments, the indication 742 has one or more characteristics of an indication that correspond to the current status of a user of a second computer system in a communication session, as described with reference to Method 900.

[0242] In some embodiments, the virtual representation 740 is displayed in a location within the three-dimensional environment 702 that is independent of the user 744b's current viewpoint orientation (e.g., location and / or orientation). For example, the location and / or orientation of the virtual representation 740 within the three-dimensional environment 702 is not based on the current location and / or orientation of the user 744b's current viewpoint relative to the three-dimensional environment 702. In some embodiments, the location and / or orientation of the virtual representation 740 within the three-dimensional environment 702 is based on the current location and / or orientation of the user 744a's current viewpoint relative to the three-dimensional environment 702. For example, the virtual representation 740 is displayed in an orientation within the three-dimensional environment 702 such that the user 744a has a direct field of view of the first surface 732a from the user 744a's current viewpoint. For example, the virtual representation 740 is displayed at a height relative to the three-dimensional environment 702 based on the height of user 744a's current viewpoint relative to the three-dimensional environment 702 (for example, the virtual representation is not displayed at a height within the three-dimensional environment 702 corresponding to the height of user 744b's current viewpoint relative to the three-dimensional environment 702).

[0243] Figure 7Q shows a first computer system 101 that maintains the display of a virtual representation 740 of user 744b in the same orientation relative to the three-dimensional environment 702 in response to a change in user 744b's current viewpoint. As shown in the overhead view 706, user 744b moves relative to the three-dimensional environment 702 (for example, user 744b's movement from the position shown in Figure 7P is represented by arrow 746a). In response to the movement of user 744b's current viewpoint relative to the three-dimensional environment 702, the first computer system 101 maintains the display of the virtual representation 744 in the same location and / or orientation within the three-dimensional environment 702 (for example, overhead view 706 shows that the virtual representation 740 does not change position and / or orientation compared to that shown in Figure 7P). In some embodiments, the movement of user 744b's current viewpoint exceeds a threshold movement amount (for example, thresholds 722a and / or 722b shown and described with reference to Figures 7E to 7I). In some embodiments, the second computer system is unable to detect a shift in the user 744b's current viewpoint that exceeds a threshold shift and / or fails to provide the first computer system 101 with an indication corresponding to the user 744b's current viewpoint shift. Therefore, the first computer system 101 maintains the display of the virtual representation 740 in a pose within the three-dimensional environment 702 independent of the user 744b's current viewpoint shift (for example, according to a different virtual representation (e.g., virtual representations 704a and / or 704b) than the virtual representation 740 displayed in the three-dimensional environment to represent the user 744b, the first computer system 101 displays a representation of the movement of the virtual representation corresponding to the user 744b's current viewpoint shift). In some embodiments, maintaining the display of the virtual representation 740 in a pose within the three-dimensional environment 702 independent of the user 744b's current viewpoint shift includes maintaining the display of the virtual representation 740 at the same height relative to the three-dimensional environment 702 (e.g., the height is based on the height of the user 744a's current viewpoint relative to the three-dimensional environment 702).

[0244] Figure 7R shows a first computer system 101 displaying a virtual representation 740 in the same orientation relative to user 744a's current viewpoint as a result of a change in user 744a's current viewpoint relative to the three-dimensional environment 702. As shown in the overhead view 706, user 744a moves to a different location relative to the three-dimensional environment 702 (for example, causing a change in user 744a's current viewpoint relative to the three-dimensional environment 702). Additionally, as shown in the overhead view 706, user 744b continues to move relative to the three-dimensional environment 702 (for example, represented by the length of arrow 746b compared to arrow 748a shown in Figure 7Q). In response to the user 744a's current viewpoint shift in relation to the three-dimensional environment 702, the first computer system 101 changes the orientation of the virtual representation 740 relative to the three-dimensional environment 702 so that the first surface 732a is displayed in the same orientation relative to the user 744a's current viewpoint as it was displayed before the user 744a's current viewpoint shift (for example, in the orientation relative to the user 744a's current viewpoint shown in Figure 7Q). For example, as shown in Figure 7R, the user 744a has a direct field of view from the user 744a's current viewpoint to the virtual representation 740 (for example, represented by the direction of arrow 748a). Furthermore, as shown in Figure 7R, in response to the movement of user 744b's current viewpoint relative to the three-dimensional environment 702, the first computer system 101 continues to maintain the display of the virtual representation 740 in an orientation within the three-dimensional environment 702 that is independent of the change in user 744b's current viewpoint (for example, changes in the orientation of the virtual representation 740 are based on changes in user 744a's current viewpoint, and not on the movement of user 744b's current viewpoint).

[0245] Figure 7S shows a first computer system 101 displaying a virtual representation 750 and a virtual representation 740 simultaneously in a three-dimensional environment 702. In some embodiments, the virtual representation 750 corresponds to a virtual representation of user 744c (a different user from user 744b associated with the computer system in a communication session with the first computer system 101). In some embodiments, the virtual representation 750 corresponds to a virtual representation of the same type as the virtual representation 740 (for example, the virtual representations 740 and 750 have one or more characteristics of a third type of virtual representation as described with reference to Method 900). In some embodiments, the virtual representation 750 is a representation of user 744c (for example, shown in the overhead view 706 in Figures 7P to 7R), and user 744c is associated with the third computer system in a communication session with the first computer system 101 and the second computer system. In some embodiments, the third computer system is unable to track the movement of one or more parts of user 744c (e.g., corresponding to the head, eyes, arms, hands, and / or torso). Therefore, the first computer system 101 displays the representation 750 in the three-dimensional environment 702 in a pose independent of user 744c's current viewpoint (for example, because the third computer system cannot detect the movement of user 744c's current viewpoint relative to the three-dimensional environment and / or does not communicate the position and / or orientation of user 744c's current viewpoint to the first computer system 101). As shown in Figure 7S, the virtual representation 750 is displayed with an indication 742 containing information about the user's current status, which is represented by the indication 742 (for example, the indication 742 contains information that the third computer system is currently unable to track one or more parts of user 744c).Furthermore, as shown in Figure 7S, displaying virtual representations 740 and 750 within the three-dimensional environment 702 includes displaying virtual representations 740 and 750 at the same height relative to the three-dimensional environment 702 (for example, virtual representations 740 and 750 are displayed at a height within the three-dimensional environment 702 based on the height of the current viewpoint of user 744a (e.g., the user viewing the three-dimensional environment 702) relative to the three-dimensional environment 702). Additionally, as shown in Figure 7S, displaying virtual representations 740 and 750 within the three-dimensional environment 702 includes displaying virtual representations 740 and 750 in an orientation relative to the three-dimensional environment 702 that allows user 744a (e.g., the user viewing the three-dimensional environment 702) to have a direct field of view of virtual representations 740 and 750 from user 744a's current viewpoint.

[0246] Figure 8 is a flowchart illustrating exemplary method 800 for displaying a virtual representation of a user in one or more poses in a three-dimensional environment in response to a shift in the user's current viewpoint, according to several embodiments. In some embodiments, method 800 is executed on a computer system (e.g., computer system 101 in Figure 1, such as a tablet, smartphone, wearable computer, or head-mounted device) that includes a display generating component (e.g., display generating component 120 in Figures 1, 3, and 4) (e.g., a head-up display, a display, a touchscreen, and / or a projector) and one or more cameras (e.g., a camera pointing downward with the user's hand (e.g., a color sensor, an infrared sensor, and other depth-sensing cameras), or a camera pointing forward from the user's head). In some embodiments, method 800 is stored on a non-temporary computer-readable storage medium and managed by instructions executed by one or more processors of the computer system, such as one or more processors 202 of the computer system 101 (e.g., control unit 110 in Figure 1A). Some operations of method 800 are optionally combined, and / or the order of some operations is optionally changed.

[0247] In some embodiments, Method 800 is performed in a first computer system communicating with a display generating component, one or more input devices, and a second computer system. In some embodiments, the first computer system is or includes an electronic device such as a mobile device (e.g., a tablet, smartphone, media player, or wearable device) or a computer. In some embodiments, the display generating component is a display integrated with the first computer system (optionally, a touchscreen display), an external display such as a monitor, projector, or television, or a hardware component (optionally, integrated or external) for projecting a user interface or making a user interface visible to one or more users. In some embodiments, one or more input devices include an electronic device or component that can receive user input (e.g., capture or detect user input) and transmit information associated with the user input to the electronic device. Examples of input devices include image sensors (e.g., cameras), location sensors, hand tracking sensors, eye tracking sensors, motion sensors (e.g., hand motion sensors), orientation sensors, microphones (and / or other audio sensors), touchscreens (optionally integrated or external), remote control devices (e.g., external), another mobile device (e.g., separate from the electronic device), handheld devices (e.g., external), and / or controllers. In some embodiments, a second computer system has one or more characteristics of the first computer system (e.g., a display generation component and one or more input devices having one or more characteristics of the display generation component and one or more input devices described with reference to the first computer system and communicating with the display generation component and one or more input devices).

[0248] In some embodiments, during a communication session with a second computer system, in which the first computer system is associated with a first user and the second computer system is associated with a second user, the first computer system displays a first virtual object (802a) via a display generation component that represents the orientation (e.g., location and / or orientation) of the second user's current viewpoint in the second computer system relative to a three-dimensional environment, and the first virtual object is displayed in a first orientation (e.g., position and / or orientation) in the three-dimensional environment representing the first viewpoint of the second user, such as a virtual representation 704b displayed in the three-dimensional environment 702 in Figure 7B. In some embodiments, the three-dimensional environment is generated, displayed, or otherwise made viewable by the first computer system. For example, the three-dimensional environment is an extended reality (XR) environment such as a virtual reality (VR) environment, a mixed reality (MR) environment, or an augmented reality (AR) environment. In some embodiments, the three-dimensional environment includes one or more virtual objects and / or representations of objects in the physical environment of the user of the computer system. In some embodiments, the three-dimensional environment has one or more characteristics of the three-dimensional and / or virtual environments described with reference to Methods 900, 1100, 1300, and / or 1500. In some embodiments, the communication session is a real-time (e.g., or near real-time) communication session including audio (e.g., real-time voice audio from the first user and / or the second user, and / or audio content from media shared between the first user and the second user), video (e.g., real-time video of the environments of the first user and / or the second user, and / or video content from media shared between the first user and the second user), and / or other shared content (e.g., images, applications, and / or interactive media (e.g., video game media)). In some embodiments, the first computer system optionally initiates and / or receives a request to join a communication session with the second computer system.In some embodiments, upon initiating and / or receiving a request to join a communication session, the first and / or second computer systems initiate a display of a three-dimensional environment to facilitate communication between the first user of the first computer system and the second user of the second computer system. In some embodiments, the first virtual object is a virtual representation of the second user that is not an avatar (for example, the virtual object does not include a virtual representation of one or more physical characteristics of the second user, a person, and / or an animal). In some embodiments, the first virtual object includes a virtual representation of a shape such as a circle (e.g., a coin), an ellipse, a square, a rhombus, a triangle, a sphere, a cylinder, a cube, a cone, or a cuboid. For example, the shape of the first virtual object includes three dimensions (e.g., length, width, and depth relative to a three-dimensional environment). In some embodiments, the first virtual object has one or more standard visual characteristics (e.g., shape and / or size) used by the computer system to represent one or more different users in a three-dimensional environment (e.g., the size, shape, color and / or brightness of the first virtual object does not differ based on different users in a communication session and / or is customizable thereto (e.g., a communication session includes a first user, a second user and optionally one or more additional users)). In some embodiments, displaying the first virtual object includes displaying an annotation adjacent to the first virtual object (e.g., above, below, or to the side of it). For example, the annotation includes the name of a second user of the second computer system. In some embodiments, the first orientation of the first virtual object corresponds to the orientation (e.g., including location and / or orientation) of the second user of the computer system's current viewpoint relative to the three-dimensional environment. For example, the position of the first virtual object includes an orientation (e.g., based on spherical or polar coordinates) relative to the three-dimensional environment (e.g., relative to a reference location within the three-dimensional environment) based on the orientation of the current viewpoint of the second user of the second computer system relative to the three-dimensional environment.For example, the position of the first virtual object includes an orientation with respect to the current viewpoint of the first user of the first computer system, based on the orientation of the current viewpoint of the second user of the second computer system with respect to the current viewpoint of the first user of the first computer system.

[0249] In some embodiments, while displaying a first virtual object in a first pose within a three-dimensional environment, the first computer system receives an indication from the second computer system (802b) corresponding to the pose of the second user's current viewpoint relative to the three-dimensional environment (e.g., position and / or orientation), such as a change in the pose of the second user's current viewpoint as shown in the overhead view 706 of Figures 7E to 7K1. In some embodiments, the indication is a signal received from the second computer system (e.g., through a network such as a personal, local, or wide-area network) or from one or more servers communicating with the first and second computer systems, corresponding to an input received by one or more input devices of the second computer system. In some embodiments, the indication includes information regarding the movement of the second user's current viewpoint to a pose. For example, input received by one or more input devices of the second computer system optionally includes a physical movement of at least a portion of the second user (e.g., head, neck, and / or torso) relative to the second user's physical environment, from a first posture of a portion of the second user to a second posture of a portion of the second user (e.g., the second user's physical environment is optionally not the first user's physical environment). In some embodiments, the physical movement of the second user corresponds to a movement of the user's second viewpoint relative to the three-dimensional environment. In some embodiments, a movement of the second user's current viewpoint relative to a posture includes a movement of the second user's head and / or eyes relative to the three-dimensional environment. In some embodiments, a movement of the second user's current viewpoint relative to a posture includes a physical movement of the second user relative to the second user's physical environment (e.g., the second user sits or stands, the second user rotates one or more parts of their body, or the second user moves from a first location in the physical environment to a second location in the physical environment). In some embodiments, the movement of the second user's current viewpoint to a posture optionally does not involve the second user's physical movement relative to the second user's physical environment.For example, a shift in the second user's current viewpoint to a posture is triggered by an input received by a second computer system in response to a request from the second user to shift the current viewpoint relative to the three-dimensional environment (e.g., the input is a touch input provided on a touch-sensitive surface of the second computer system, or the input is an audio input (e.g., a voice command) provided by the second user to the second computer system). In some embodiments, the first computer system receives an indication from the second computer system if the position and / or orientation of the posture corresponds to a shift (e.g., based on position and / or orientation) of the second user's current viewpoint relative to the three-dimensional environment from a previous one that satisfies one or more criteria (e.g., as described later). For example, the second computer system determines whether the posture of the second user's current viewpoint satisfies one or more criteria before sending an indication to the first computer system. In some embodiments, the indication received by the first computer system from the second computer system optionally does not include movement information. For example, the second computer system sends one or more indications to the first computer system (for example, periodically during a communication session) that correspond to the current orientation of the second user's current viewpoint (for example, with respect to a three-dimensional environment), and based on the one or more indications received, the first computer system optionally determines whether the change in the current orientation of the second user's current viewpoint satisfies one or more criteria (for example, one or more criteria described below), and displays the first virtual object in a second orientation different from the first orientation (for example, as described below).

[0250] In some embodiments, upon receiving an indication (802c), the computer system displays the first virtual object in a second orientation (e.g., position and / or orientation) different from the first orientation (e.g., position and / or orientation) in the three-dimensional environment representing the second user's second viewpoint, such as displaying the virtual representation 704b in the updated orientation of Figure 7M (802d). In some embodiments, the threshold of the user's current viewpoint includes a threshold distance from the location of the first viewpoint in the three-dimensional environment to the location of the second viewpoint in the three-dimensional environment. For example, the threshold distance between the location of the second viewpoint and the location of the first viewpoint in the three-dimensional environment is optionally 0.1, 0.2, 0.5, 0.1, 0.2, 0.5, 1, 2, 5, or 10 m. In some embodiments, the threshold includes a threshold change in the orientation of the second viewpoint relative to the first viewpoint in the three-dimensional environment. For example, the threshold change in the orientation of the second viewpoint is optionally -90, -75, -60, -45, -30, -15, 15, 30, 45, 60, 75, or 90 degrees relative to the orientation of the first viewpoint in the three-dimensional environment. In some embodiments, displaying the first virtual object at the second location includes displaying the first virtual object with a new orientation relative to the three-dimensional environment (e.g., relative to a reference location in the three-dimensional environment, or relative to the current viewpoint of the first user in the three-dimensional environment). For example, a change in the orientation of the first virtual object optionally corresponds to a change in the orientation of the second virtual object from its current viewpoint. In some embodiments, displaying the first virtual object at a second location in the three-dimensional environment includes displaying an annotation associated with the first virtual object (e.g., the name of a second user or other identifier) at the second location in the three-dimensional environment.In some embodiments, the determination that the movement of the second user's current viewpoint relative to the three-dimensional environment satisfies one or more criteria is made in the second computer system (for example, optionally, before the first computer system receives an indication). For example, an indication is transmitted by the second computer system in accordance with the determination that the movement of the second user's current viewpoint relative to the three-dimensional environment satisfies one or more criteria. For example, an indication received by the first computer system contains information about the decision made by the second computer system.

[0251] In some embodiments, the movement of the second user's current viewpoint does not exceed a threshold (e.g., position and / or orientation) relative to the three-dimensional environment, and in accordance with the determination that the movement of the second user's current viewpoint does not satisfy one or more criteria, the computer system maintains the display of the first virtual object in the three-dimensional environment in the first orientation (e.g., position and / or orientation), such as maintaining the virtual representation 704b in the same orientation in the three-dimensional environment as shown in Figure 7E compared to Figure 7D (802e), in response to the movement of the second user's current viewpoint not exceeding an orientation threshold 722a and / or distance threshold 722b. In some embodiments, the location of the second viewpoint relative to the three-dimensional environment does not differ from the location of the first viewpoint relative to the three-dimensional environment by a threshold distance amount, and therefore one or more criteria are not satisfied. In some embodiments, the orientation of the second viewpoint relative to the three-dimensional environment does not differ from the orientation of the first viewpoint relative to the three-dimensional environment by a threshold orientation amount, and therefore one or more criteria are not satisfied. In some embodiments, one or more criteria are not met because the current viewpoint movement does not exceed the threshold velocity, threshold magnitude of the movement criterion, threshold orientation change, and / or threshold movement distance described below. In some embodiments, maintaining the display of a first virtual object at a first location in a three-dimensional environment includes maintaining the same position and / or orientation (e.g., polar or spherical coordinates) with respect to the three-dimensional...

Claims

1. It is a method, In a first computer system communicating with a second computer system, the display generation components, one or more input devices, The second computer system, wherein the first computer system is associated with a first user, and the second computer system is associated with a second user, during a communication session with the second computer system, via the display generation component, displays a first virtual object that represents the orientation of the second user's current viewpoint of the second computer system to a three-dimensional environment, wherein the first virtual object is displayed in a first orientation in the three-dimensional environment that represents the first viewpoint of the second user, While the first virtual object is displayed in the three-dimensional environment in the first orientation, the second computer system receives an indication corresponding to the second user's current viewpoint orientation relative to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the movement of the second user's current viewpoint to the three-dimensional environment from the second user's first viewpoint to the second user's second viewpoint satisfies one or more criteria, including a criterion that is satisfied when the movement of the second user's current viewpoint exceeds a threshold for the three-dimensional environment, the first virtual object is displayed in the three-dimensional environment representing the second user's second viewpoint in a second pose different from the first pose, A method comprising: maintaining the display of the first virtual object in the first pose within the three-dimensional environment, in accordance with the determination that the movement of the second user's current viewpoint does not satisfy one or more of the criteria, because the movement of the second user's current viewpoint does not exceed the threshold for the three-dimensional environment.

2. The method according to claim 1, wherein the threshold includes a threshold velocity of the movement of the second user's current viewpoint from the second user's first viewpoint to the second user's second viewpoint relative to the three-dimensional environment.

3. The method according to claim 1 or 2, wherein the threshold includes a threshold magnitude for the movement of the second user's current viewpoint from the second user's first viewpoint to the second user's second viewpoint relative to the three-dimensional environment.

4. The method according to any one of claims 1 to 3, wherein the threshold includes a threshold change in the orientation of the second user's current viewpoint from the second user's first viewpoint to the second user's second viewpoint with respect to the three-dimensional environment.

5. The method according to any one of claims 1 to 4, wherein the threshold distance includes the threshold distance of the movement of the second user's current viewpoint from the second user's first viewpoint to the second user's second viewpoint with respect to the three-dimensional environment.

6. The method according to any one of claims 1 to 5, wherein displaying the first virtual object in the first orientation within the three-dimensional environment includes displaying the first virtual object in a first orientation to the three-dimensional environment corresponding to the orientation of the second user's first viewpoint to the three-dimensional environment, and displaying the first virtual object in the second orientation within the three-dimensional environment includes displaying the first virtual object in a second orientation different from the first orientation to the three-dimensional environment corresponding to the orientation of the second user's second viewpoint to the three-dimensional environment.

7. While the first virtual object is displayed in the three-dimensional environment in the first orientation, the first orientation with respect to the three-dimensional environment is such that the first orientation displays an indication in the three-dimensional environment corresponding to the identifier of the second user, based on the current viewpoint of the first user with respect to the three-dimensional environment. The method according to any one of claims 1 to 6, further comprising: maintaining the display of the indication corresponding to the identifier of the second user in the three-dimensional environment in the first orientation to the three-dimensional environment in response to receiving the indication.

8. Displaying the first virtual object within the three-dimensional environment is, A first surface oriented in a first direction corresponding to the second user's current viewpoint with respect to the three-dimensional environment, wherein the first surface displays the first virtual object having the first surface, which is displayed in a first visual appearance. The method according to any one of claims 1 to 7, comprising displaying the first virtual object having a second surface, the second surface being oriented in a second direction opposite to the first direction, wherein the second surface is displayed in a second visual appearance different from the first visual appearance.

9. The method according to claim 8, wherein displaying the first surface in the first visual appearance includes displaying the second user identifier on the first surface, and displaying the second surface in the second visual appearance includes displaying the second surface without including the second user identifier on the second surface.

10. The first virtual object in the first posture is located at a first distance from the first viewpoint of the first user and has a first size relative to the three-dimensional environment. The method according to any one of claims 1 to 9, wherein the first virtual object in the second posture is at a second distance different from the first viewpoint of the first user and has the first size relative to the three-dimensional environment.

11. The first virtual object in the first posture includes an indication corresponding to the identifier of the second user, the indication corresponding to the identifier of the second user is at a first distance from the first viewpoint of the first user and has a first size relative to the three-dimensional environment. The method according to any one of claims 1 to 10, wherein the first virtual object in the second posture includes the indication corresponding to the identifier of the second user, the indication corresponding to the identifier of the second user is at a second distance from the first viewpoint of the first user and has a second size different from the first size with respect to the three-dimensional environment.

12. The method according to any one of claims 1 to 11, further comprising displaying a second virtual object in the three-dimensional environment while the first virtual object is displayed in the three-dimensional environment, the second virtual object representing the orientation of the current viewpoint of a third user of the third computer system in the communication session with respect to the three-dimensional environment, wherein the first virtual object is displayed with a distinct visual characteristic having a first value, and the second virtual object is displayed with the distinct visual characteristic having a second value different from the first value.

13. The method according to any one of claims 1 to 12, wherein displaying the first virtual object in the three-dimensional environment includes displaying the first virtual object in an animation independent of the movement of the second user's current viewpoint relative to the three-dimensional environment.

14. The method according to claim 13, wherein displaying the animation includes displaying the first virtual object oscillating around the current location of the second user's current viewpoint relative to the three-dimensional environment.

15. Displaying the first virtual object within the three-dimensional environment is, To display the first virtual object having the first surface, which is oriented in a first direction with respect to the three-dimensional environment, wherein the first surface includes a flat surface having a first dimension with respect to the three-dimensional environment, The method according to any one of claims 1 to 14, comprising displaying the first virtual object having the second surface, the second surface being oriented in a second direction opposite to the first direction with respect to the three-dimensional environment, the second surface including a flat surface having the first value of the dimension with respect to the three-dimensional environment, the first surface being positioned at a first distance from the second surface, the first distance having a second value smaller than the first value with respect to the three-dimensional environment.

16. The method according to claim 15, wherein displaying the first virtual object in the three-dimensional environment includes displaying the first virtual object as a three-dimensional virtual object including the first distance between the first surface and the second surface.

17. While the first virtual object is displayed in the three-dimensional environment in the first orientation, the second computer system receives an indication from the second computer system corresponding to the audio input received by the second computer system from the second user, The method according to any one of claims 1 to 16, further comprising: displaying the first virtual object in the three-dimensional environment using an animation based on the audio input received by the second computer system, in response to receiving the indication corresponding to the audio input received by the second computer system.

18. The method according to any one of claims 1 to 17, wherein displaying the first virtual object in the third-dimensional environment in the second orientation includes displaying an animation corresponding to the movement of the first virtual object from the first orientation to the second orientation based on the movement of the second user's current viewpoint, and the displaying of the animation includes ceasing to display the first virtual object in the third-dimensional environment before the first virtual object reaches the second orientation, and then redisplaying the first virtual object in the third-dimensional environment.

19. The method according to claim 18, wherein, before discontinuing the display of the first virtual object in the three-dimensional environment, the movement of the first virtual object away from the first viewpoint corresponding to the movement of the second user's current viewpoint away from the first viewpoint is displayed.

20. The method according to claim 19, wherein displaying the movement of the first virtual object away from the first posture includes displaying the movement of the first virtual object with respect to the three-dimensional environment as a nonlinear velocity.

21. The method according to any one of claims 18 to 20, wherein redisplaying the first virtual object in the three-dimensional environment includes displaying the movement of the first virtual object toward the second pose corresponding to the movement of the user's current viewpoint toward the second viewpoint.

22. The method according to claim 21, wherein displaying the movement of the first virtual object toward the second posture includes displaying the movement of the first virtual object with respect to the three-dimensional environment as a nonlinear velocity.

23. The aforementioned animation is The method according to any one of claims 18 to 22, comprising, after discontinuing the display of the first virtual object in the three-dimensional environment, displaying the first virtual object in one or more intermediate poses in the three-dimensional environment between the first pose and the second pose, in accordance with a determination that the movement of the second user's current viewpoint from the first viewpoint to the second viewpoint exceeds a threshold distance to the three-dimensional environment, wherein the one or more intermediate poses are associated with one or more poses of the second user's current viewpoint during the movement of the second user's current viewpoint.

24. Displaying the aforementioned animation means While the first virtual object is not displayed in the three-dimensional environment, an event is detected that includes movement of the second user less than a threshold movement amount of the current viewpoint, which is longer than a time threshold. The method according to any one of claims 18 to 23, comprising: redisplaying the first virtual object in the three-dimensional environment in a separate pose corresponding to the current viewpoint of the second user in response to the detection of the event.

25. While the first virtual object is displayed in the three-dimensional environment, the second computer system receives an indication from the second computer system corresponding to the audio input received by the second computer system from the second user, Upon receiving the indication corresponding to the audio input received by the second computer system, To provide an audio output corresponding to the audio input received by the second computer system, which is spatialized in the first pose of the first virtual object in the three-dimensional environment, in accordance with the determination that the first virtual object is displayed in the first pose. The method according to any one of claims 1 to 24, further comprising providing an audio output corresponding to the audio input received by the second computer system, which is spatialized in the second orientation of the first virtual object in the three-dimensional environment, according to the determination that the first virtual object is displayed in the second orientation.

26. While the first virtual object is displayed in the second orientation within the three-dimensional environment, the second computer system receives a second indication, different from the first indication, that corresponds to the orientation of the second user's current viewpoint relative to the three-dimensional environment. Upon receiving the second indication, Displaying the first virtual object in a third posture different from the second posture, according to a determination that the movement of the second user's current viewpoint to the three-dimensional environment from the second viewpoint to a third viewpoint different from the second viewpoint satisfies one or more criteria, including the criteria that are satisfied when the movement from the second viewpoint to the third viewpoint exceeds the threshold for the three-dimensional environment, The method according to any one of claims 1 to 25, further comprising: maintaining the display of the first virtual object in the second pose in the three-dimensional environment according to a determination that the movement of the user's current viewpoint from the second viewpoint to the third viewpoint does not satisfy one or more of the criteria.

27. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, A second computer system, wherein the first computer system is associated with a first user, and during a communication session with the second computer system, the second computer system, via the display generation component, displays a first virtual object representing the orientation of the second user's current viewpoint in the second computer system with respect to a three-dimensional environment, wherein the first virtual object is displayed in a first orientation in the three-dimensional environment representing the first viewpoint of the second user. While the first virtual object is displayed in the three-dimensional environment in the first orientation, the second computer system receives an indication corresponding to the second user's current viewpoint orientation relative to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the movement of the second user's current viewpoint to the three-dimensional environment from the second user's first viewpoint to the second user's second viewpoint satisfies one or more criteria, including a criterion that is satisfied when the movement of the second user's current viewpoint exceeds a threshold for the three-dimensional environment, the first virtual object is displayed in the three-dimensional environment representing the second user's second viewpoint in a second pose different from the first pose, A computer system including instructions to maintain the display of the first virtual object in the first pose within the three-dimensional environment, in accordance with the determination that the movement of the second user's current viewpoint does not satisfy one or more of the criteria, because the movement of the second user's current viewpoint does not exceed the threshold for the three-dimensional environment.

28. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, the computer system A second computer system, wherein the first computer system is associated with a first user, and during a communication session with the second computer system, the second computer system, via the display generation component, displays a first virtual object that represents the orientation of the second user's current viewpoint in the second computer system with respect to a three-dimensional environment, wherein the first virtual object is displayed in a first orientation in the three-dimensional environment that represents the first viewpoint of the second user. While the first virtual object is displayed in the three-dimensional environment in the first orientation, the second computer system receives an indication corresponding to the second user's current viewpoint orientation relative to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the movement of the second user's current viewpoint to the three-dimensional environment from the second user's first viewpoint to the second user's second viewpoint satisfies one or more criteria, including a criterion that is satisfied when the movement of the second user's current viewpoint exceeds a threshold for the three-dimensional environment, the first virtual object is displayed in the three-dimensional environment representing the second user's second viewpoint in a second pose different from the first pose, A non-temporary computer-readable storage medium that causes the second user to perform a method including maintaining the display of the first virtual object in the first pose within the three-dimensional environment, in accordance with the determination that the movement of the second user's current viewpoint does not satisfy one or more of the criteria, because the movement of the second user's current viewpoint does not exceed the threshold for the three-dimensional environment.

29. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A second computer system, wherein the first computer system is associated with a first user, and during a communication session with the second computer system, the second computer system is associated with a second user, via the display generation component, a first virtual object representing the orientation of the second computer system's current viewpoint to a three-dimensional environment, wherein the first virtual object is displayed in a first orientation in the three-dimensional environment representing the second user's first viewpoint, and means for displaying the first virtual object. Means for receiving an indication from a second computer system corresponding to the second user's current viewpoint orientation relative to the three-dimensional environment while the first virtual object is displayed in the first orientation within the three-dimensional environment, Upon receiving the aforementioned indication, In accordance with the determination that the movement of the second user's current viewpoint to the three-dimensional environment from the second user's first viewpoint to the second user's second viewpoint satisfies one or more criteria, including a criterion that is satisfied when the movement of the second user's current viewpoint exceeds a threshold for the three-dimensional environment, the first virtual object is displayed in the three-dimensional environment representing the second user's second viewpoint in a second pose different from the first pose, A computer system comprising: means for maintaining the display of the first virtual object in the first pose within the three-dimensional environment, in accordance with the determination that the movement of the second user's current viewpoint does not satisfy one or more of the criteria, since the movement of the second user's current viewpoint does not exceed the threshold for the three-dimensional environment.

30. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 1 to 26.

31. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs, when executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, include instructions causing the computer system to execute any of the methods according to claims 1 to 26.

32. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising means for performing the method described in any one of claims 1 to 26.

33. It is a method, In a first computer system communicating with a second computer system, the display generation components, one or more input devices, During the communication session with the second computer system, The display generation component is used to display a virtual representation of the user's current viewpoint orientation of the second computer system relative to the three-dimensional environment at a first location within the three-dimensional environment. While the second computer system is displaying the virtual representation of the user's current viewpoint orientation in the first location within the three-dimensional environment, the second computer system receives an indication corresponding to the user's current viewpoint orientation relative to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the virtual representation of the user of the second computer system is a first type of virtual representation, a first representation of the movement of the virtual representation of the user of the second computer system corresponding to the change in the user's current viewpoint in the three-dimensional environment from a first pose in the three-dimensional environment to a second pose in the three-dimensional environment is displayed in the three-dimensional environment, A method comprising: determining that the virtual representation of the user of the second computer system is a second type of virtual representation different from the first type, and displaying in the three-dimensional environment a second representation of the virtual representation of the user of the second computer system that corresponds to the change in the user's current viewpoint in the three-dimensional environment from a first pose in the three-dimensional environment to a second pose in the three-dimensional environment.

34. Displaying the first movement representation of the user's virtual representation in the second computer system includes displaying the first degree of movement of the user's virtual representation corresponding to the change in the user's current viewpoint in the second computer system from the first pose in the three-dimensional environment to the second pose in the three-dimensional environment, The method according to claim 33, wherein displaying the second movement representation of the user's virtual representation in the second computer system includes displaying a second movement of the user's virtual representation that is greater than the first movement, corresponding to the change in the user's current viewpoint in the second computer system from the first pose in the three-dimensional environment to the second pose in the three-dimensional environment.

35. Displaying the first representation of the movement of the user's virtual representation in the second computer system includes stopping the display of the user's virtual representation in the third-dimensional environment and redisplaying the user's virtual representation in the third-dimensional environment while changing the user's virtual representation in the third-dimensional environment from the first pose in the third-dimensional environment to the second pose in the third-dimensional environment, The method according to claim 33 or 34, wherein, while the virtual representation of the user of the second computer system is being changed from the first pose in the three-dimensional environment to the second pose in the three-dimensional environment, displaying the second representation of the movement of the virtual representation of the user of the second computer system does not include discontinuing the display of the virtual representation of the user of the second computer system in the three-dimensional environment.

36. During the communication session with the second computer system, while the second computer system is displaying the virtual representation of the user's current viewpoint and posture, Through the display generation component, a virtual representation of the current viewpoint orientation of a user of a third computer system, which is different from the first computer system and the second computer system, is displayed at a second location within the three-dimensional environment, with respect to the three-dimensional environment. While the third computer system is displaying the virtual representation of the user's current viewpoint orientation in the second location within the three-dimensional environment, the third computer system receives an indication from the third computer system corresponding to the user's current viewpoint orientation in relation to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the virtual representation of the user of the third computer system is a virtual representation of the first type, a separate first representation of the movement of the virtual representation of the user of the third computer system, corresponding to the change in the user's current viewpoint in the three-dimensional environment from a third pose to a fourth pose in the three-dimensional environment, is displayed in the three-dimensional environment. The method according to claim 35, further comprising: determining that the virtual representation of the user of the third computer system is a virtual representation of a second type different from the first type, and displaying in the three-dimensional environment a separate second representation of the virtual representation of the user of the third computer system, which corresponds to the change in the current viewpoint of the user of the third computer system from the third pose in the three-dimensional environment to the fourth pose in the three-dimensional environment, and which is different from the separate first representation.

37. Displaying the first representation of the movement of the virtual representation of the user of the second computer system is: In accordance with the determination that the change in the user's current viewpoint in the second computer system does not exceed a threshold amount, the first degree of movement of the user's virtual representation in the second computer system relative to the three-dimensional environment is displayed without discontinuing the display of the user's virtual representation in the second computer system within the three-dimensional environment. In accordance with the determination that the change in the user's current viewpoint in the second computer system exceeds the threshold amount, the display of the user's virtual representation in the second computer system within the three-dimensional environment is discontinued. The method according to any one of claims 33 to 36, wherein displaying the second movement representation of the virtual representation of the user of the second computer system is a second movement greater than the first movement of the virtual representation of the user of the second computer system relative to the three-dimensional environment, without discontinuing the display of the virtual representation of the user of the second computer system in the three-dimensional environment, independently of whether the change in the current viewpoint of the user of the second computer system exceeds the threshold movement amount.

38. Displaying the first representation of the movement of the user's virtual representation in the second computer system includes, in accordance with the determination that the change in the user's current viewpoint in the second computer system includes movement in a first direction relative to the three-dimensional environment, discontinuing the display of the movement of the user's virtual representation in the second computer system in a direction corresponding to the first direction relative to the three-dimensional environment, The method according to any one of claims 33 to 37, wherein displaying the second representation of the movement of the virtual representation of the user of the second computer system includes displaying the movement of the virtual representation of the user of the second computer system in a direction corresponding to the first direction with respect to the three-dimensional environment, in accordance with the determination that the change in the current viewpoint of the user of the second computer system includes the movement in the first direction with respect to the three-dimensional environment.

39. While the virtual representation of the user of the second computer system is displayed within the three-dimensional environment, and the second computer system is not receiving an indication from the user corresponding to a change in the user's current viewpoint orientation relative to the three-dimensional environment, In accordance with the determination that the virtual representation of the user in the second computer system is of the first type, the system displays an animation including periodic movement of the virtual representation of the user in the second computer system. The method according to any one of claims 33 to 38, further comprising displaying the virtual representation of the user in the second computer system without displaying the movement of the virtual representation of the user in the three-dimensional environment, in accordance with the determination that the virtual representation of the user in the second computer system is a virtual representation of the second type.

40. While the second computer system is displaying the virtual representation of the user, it receives indications of user input, The method according to any one of claims 33 to 39, further comprising changing the display of the virtual representation of the user in the second computer system from the first type of virtual representation to the second type of virtual representation in response to receiving the indication of the user input.

41. While the second computer system is displaying the virtual representation of the user, an indication is received that one or more criteria are met, independently of user input. The method according to any one of claims 33 to 40, further comprising: receiving an indication that one or more criteria have been met independently of user input; changing the display of the virtual representation of the user in the second computer system from the first type of virtual representation to the second type of virtual representation.

42. While the second computer system is displaying the virtual representation of the user, it receives indications of user input, The method according to any one of claims 33 to 41, further comprising changing the display of the virtual representation of the user in the second computer system from the second type of virtual representation to the first type of virtual representation in response to receiving the indication of user input.

43. While the second computer system is displaying the virtual representation of the user, an indication is received that one or more criteria are met, independently of user input. The method according to any one of claims 33 to 42, further comprising: receiving an indication that one or more criteria have been met independently of user input; changing the display of the user's virtual representation in the second computer system from the second type of virtual representation to the first type of virtual representation.

44. The method according to any one of claims 33 to 43, further comprising, in response to receiving the indication, determining that the virtual representation of the user of the second computer system is a third type of virtual representation different from the first and second types, displaying the virtual representation of the user of the second computer system in a third orientation in the three-dimensional environment, wherein the third orientation of the virtual representation of the user of the second computer system in the three-dimensional environment is independent of the orientation of the user's current viewpoint in the second computer system with respect to the three-dimensional environment.

45. The method according to claim 44, wherein the third type of virtual representation includes an indication corresponding to the current status of the user of the second computer system within the communication session.

46. The method according to claim 44 or 45, wherein displaying the virtual representation of the user in the second computer system in the third posture within the three-dimensional environment includes discontinuing displaying a representation of the movement of the virtual representation of the user in the second computer system corresponding to the change in the user's current viewpoint relative to the three-dimensional environment.

47. Displaying the virtual representation of the user of the third type of the second computer system is: The method according to any one of claims 44 to 46, comprising: displaying a first surface of the virtual representation of the user of the second computer system of the third type, oriented in a first direction with respect to the current viewpoint of the user of the first computer system in the three-dimensional environment, in accordance with the current viewpoint of the user of the first computer system having a first spatial arrangement with respect to the virtual representation of the user of the second computer system of the third type; and displaying a first surface of the virtual representation of the user of the second computer system of the third type, oriented in a first direction with respect to the current viewpoint of the user of the first computer system in the three-dimensional environment, in accordance with the current viewpoint of the user of the first computer system having a second spatial arrangement different from the first spatial arrangement with respect to the virtual representation of the user of the second computer system of the third type.

48. Displaying the first representation of the movement of the user's virtual representation in the second computer system includes displaying the user's virtual representation in the second computer system at a first height relative to the three-dimensional environment that corresponds to the height of the user's current viewpoint relative to the three-dimensional environment, Displaying the second representation of the movement of the user's virtual representation in the second computer system includes displaying the user's virtual representation in the second computer system at a first height relative to the three-dimensional environment that corresponds to the height of the user's current viewpoint relative to the three-dimensional environment, The method according to any one of claims 44 to 47, wherein displaying the virtual representation of the user of the second computer system in the third orientation within the three-dimensional environment includes displaying the virtual representation of the user of the second computer system at a second height relative to the three-dimensional environment based on the height of the current viewpoint of the user of the first computer system relative to the three-dimensional environment.

49. Displaying the virtual representation of the user of the second computer system in the third orientation within the three-dimensional environment includes displaying the virtual representation of the user of the second computer system at a first height relative to the three-dimensional environment based on the height of the user's current viewpoint relative to the three-dimensional environment, and the method is The method according to any one of claims 44 to 48, further comprising displaying a virtual representation of a user of a third computer system within a communication session in the three-dimensional environment while displaying the virtual representation of the user of the second computer system within the three-dimensional environment, and displaying the virtual representation of the user of the third computer system at a first height relative to the three-dimensional environment based on the height of the user's current viewpoint of the first computer system relative to the three-dimensional environment, in accordance with the determination that the virtual representation of the user of the third computer system is a third type of virtual representation.

50. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, During a communication session with the second computer system, Through the display generation component, a virtual representation of the user's current viewpoint orientation relative to the three-dimensional environment is displayed at a first location within the three-dimensional environment. While the second computer system is displaying the virtual representation of the user's current viewpoint orientation at the first location in the three-dimensional environment, the second computer system receives an indication corresponding to the user's current viewpoint orientation relative to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the virtual representation of the user of the second computer system is a first type of virtual representation, a first representation of the movement of the virtual representation of the user of the second computer system corresponding to the change in the user's current viewpoint in the three-dimensional environment from a first pose in the three-dimensional environment to a second pose in the three-dimensional environment is displayed in the three-dimensional environment. A computer system including instructions to display in the three-dimensional environment a second representation of the virtual representation of the user of the second computer system, which is different from the first representation, corresponding to the change in the user's current viewpoint in the second computer system from a first pose in the three-dimensional environment to a second pose in the three-dimensional environment, according to a determination that the virtual representation of the user of the second computer system is a second type of virtual representation different from the first type.

51. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, the computer system During a communication session with the second computer system, The display generation component is used to display a virtual representation of the user's current viewpoint orientation of the second computer system relative to the three-dimensional environment at a first location within the three-dimensional environment. While the second computer system is displaying the virtual representation of the user's current viewpoint orientation in the first location within the three-dimensional environment, the second computer system receives an indication corresponding to the user's current viewpoint orientation relative to the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the virtual representation of the user of the second computer system is a first type of virtual representation, a first representation of the movement of the virtual representation of the user of the second computer system corresponding to the change in the user's current viewpoint in the three-dimensional environment from a first pose in the three-dimensional environment to a second pose in the three-dimensional environment is displayed in the three-dimensional environment, A non-temporary computer-readable storage medium that causes a method to be performed which includes: displaying in the three-dimensional environment a second representation of the virtual representation of the user of the second computer system that is different from the first representation, corresponding to the change in the user's current viewpoint in the second computer system from the first pose in the three-dimensional environment to the second pose in the three-dimensional environment, according to a determination that the virtual representation of the user of the second computer system is a second type of virtual representation different from the first type.

52. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and During a communication session with the second computer system, Means for displaying a virtual representation of the user's current viewpoint orientation of the second computer system relative to the three-dimensional environment at a first location within the three-dimensional environment via the display generation component, Means for receiving an indication from the second computer system corresponding to the user's current viewpoint orientation relative to the three-dimensional environment, while the second computer system is displaying the virtual representation of the user's current viewpoint orientation at the first location in the three-dimensional environment. Upon receiving the aforementioned indication, In accordance with the determination that the virtual representation of the user of the second computer system is a first type of virtual representation, a first representation of the movement of the virtual representation of the user of the second computer system corresponding to the change in the user's current viewpoint in the three-dimensional environment from a first pose in the three-dimensional environment to a second pose in the three-dimensional environment is displayed in the three-dimensional environment. A computer system comprising: a means for displaying in the three-dimensional environment a second representation of the virtual representation of the user of the second computer system, which is different from the first representation of the virtual representation of the user of the second computer system, corresponding to the change in the user's current viewpoint in the three-dimensional environment from the first pose in the three-dimensional environment to the second pose in the three-dimensional environment, according to a determination that the virtual representation of the user of the second computer system is a second type of virtual representation different from the first type of the virtual representation of the user of the second computer system.

53. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 33 to 49.

54. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs, when executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, include instructions causing the computer system to execute any of the methods according to claims 33 to 49.

55. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising means for carrying out the method described in any one of claims 33 to 49.

56. It is a method, In a first computer system communicating with a display generation component and one or more input devices, During a communication session with one or more computer systems other than the first computer system, The display generation component displays a three-dimensional environment from the first viewpoint of a first user of the first computer system, wherein the three-dimensional environment includes one or more virtual objects, each containing one or more virtual representations of one or more users of the one or more computer systems. While the three-dimensional environment is being displayed from the first viewpoint of the first user, a first input is received via one or more input devices corresponding to a request to change the spatial arrangement of a first virtual object among the one or more virtual objects in the three-dimensional environment from a first spatial arrangement relative to the first viewpoint of the first user to a second spatial arrangement relative to the first viewpoint of the first user, While receiving the first input, To reduce the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment, A method comprising changing the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user in accordance with the first input, while the one or more virtual representations of the one or more users have the reduced visual splendor with respect to the three-dimensional environment.

57. The method according to claim 56, wherein the first virtual object is shared content shared with one or more computer systems within the communication session.

58. The method according to claim 56 or 57, wherein the first virtual object is a virtual representation of one or more users.

59. The one or more virtual objects include a plurality of virtual objects that have a shared spatial arrangement with respect to each other, and the method is The method according to any one of claims 56 to 58, further comprising maintaining the shared spatial arrangement of the plurality of virtual objects relative to each other while receiving the first input and changing the spatial arrangement of the first virtual object relative to the first viewpoint of the first user from the first spatial arrangement relative to the first viewpoint of the first user to the second spatial arrangement relative to the first viewpoint of the first user.

60. The method according to any one of claims 56 to 59, wherein, in a view of the communication session from the perspective of a second user among the one or more users, while the first input is being received by the first computer system, the spatial arrangement of the first computer system's virtual representation of the first user in a second three-dimensional environment changes according to the changing spatial arrangement of the first virtual object in relation to the first viewpoint of the first user.

61. The first virtual object is a virtual representation of one or more users, and the method is After receiving the first input, In accordance with the determination that the virtual representation of the user has a spatial arrangement with respect to the first viewpoint of the first user that exceeds a spatial arrangement threshold, the virtual representation of the user is displayed with a visual splendor greater than the reduced visual splendor with respect to the three-dimensional environment. The method according to any one of claims 56 to 60, further comprising: displaying the user's visual representation with the reduced visual prominence in the three-dimensional environment, in accordance with the determination that the user's virtual representation has a spatial arrangement with respect to the first viewpoint of the first user that does not exceed the spatial arrangement threshold.

62. The method according to any one of claims 56 to 61, wherein, in a view of the communication session from the perspective of a second user among the one or more users, while the first input is being received by the first computer system, the visual prominence of the virtual representation of the first user in the first computer system is reduced relative to a second three-dimensional environment.

63. The method according to any one of claims 56 to 62, wherein reducing the visual prominence of the one or more virtual representations of the one or more users with respect to the three-dimensional environment includes increasing the transparency of the one or more virtual representations of the one or more users with respect to the three-dimensional environment.

64. The method according to any one of claims 56 to 62, wherein reducing the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment includes discontinuing the display of the one or more virtual representations of the one or more users in the three-dimensional environment.

65. The method according to any one of claims 56 to 62, wherein reducing the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment includes changing the display of the one or more virtual representations of the one or more users from one or more virtual representations of a first type to one or more virtual representations of a second type.

66. The method according to any one of claims 56 to 65, wherein reducing the visual prominence of one or more virtual representations of one or more users with respect to the three-dimensional environment includes reducing the visual prominence of multiple virtual representations of multiple users with respect to the three-dimensional environment.

67. To detect the end of the first input, The method according to any one of claims 56 to 66, further comprising: in response to detecting the termination of the first input, increasing the visual splendor of one or more virtual representations of one or more users to an amount of visual splendor greater than the reduced visual splendor.

68. The method according to any one of claims 56 to 67, further comprising increasing the visual splendor of multiple virtual representations of multiple users to an amount of visual splendor greater than the reduced visual splendor, in response to the detection of the end of the first input.

69. The first virtual object is shared content shared with one or more computer systems within the communication session, and the method is After receiving the first input, the system receives a second input corresponding to a request to change the spatial arrangement of a second virtual object among the one or more virtual objects from a third spatial arrangement relative to the first viewpoint of the first user to a fourth spatial arrangement relative to the first viewpoint of the first user, While receiving the second input, in accordance with the determination that the second virtual object is not shared with one or more computer systems within the communication session, To maintain the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment, The method according to any one of claims 56 to 68, further comprising changing the spatial arrangement of the second virtual object relative to the first viewpoint of the first user in accordance with the second input, while the one or more virtual representations of the one or more users have the maintained visual prominence with respect to the three-dimensional environment.

70. The method according to any one of claims 56 to 69, wherein the first input includes an air gesture performed by the first user.

71. The method according to any one of claims 56 to 70, wherein the first input includes the attention of the first user directed to a first location in the three-dimensional environment associated with the first virtual object.

72. The method according to claim 71, wherein the first location in the three-dimensional environment corresponds to a region of the three-dimensional environment outside the first virtual object, having a predetermined spatial relationship with respect to the first virtual object in the three-dimensional environment.

73. The method according to claim 71 or 72, further comprising displaying first visual feedback in the three-dimensional environment indicating that the spatial arrangement of the first virtual object relative to the first viewpoint of the first user may be changed in response to further input, in response to detection of the attention of the first user directed to the first location in the three-dimensional environment.

74. The method according to claim 73, wherein the first visual feedback includes a second virtual object displayed on the surface in the three-dimensional environment.

75. The method according to claim 73 or 74, wherein the first visual feedback includes a virtual representation of the three-dimensional environment, the first visual feedback including one or more virtual elements corresponding to one or more current spatial arrangements of one or more virtual objects in the three-dimensional environment.

76. The method according to claim 75, further comprising changing the visual appearance of one or more virtual elements included in the virtual representation of the three-dimensional environment while simultaneously changing the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user while receiving the first input.

77. While the three-dimensional environment is being displayed from the first user's first viewpoint, and before the first input is received, the three-dimensional environment is displayed from the first user's first viewpoint in a first visual appearance, wherein the first visual appearance is not based on the first spatial arrangement of the first virtual objects. The method according to any one of claims 56 to 76, further comprising, while receiving the first input, displaying the three-dimensional environment in a second visual appearance from the first viewpoint of the first user, which is different from the first visual appearance, the second visual appearance being independent of the second spatial arrangement of the first virtual objects.

78. The method according to claim 77, wherein displaying the three-dimensional environment with the second visual appearance includes displaying the three-dimensional environment with reduced brightness compared to displaying the three-dimensional environment with the first visual appearance.

79. The method according to claim 77 or 78, wherein displaying the three-dimensional environment in the second visual appearance includes displaying one or more virtual objects included in the three-dimensional environment in the second visual appearance.

80. The method according to any one of claims 77 to 79, wherein displaying the three-dimensional environment in the second visual appearance includes displaying one or more representations of one or more objects in the first user's physical environment in the second visual appearance.

81. The method according to any one of claims 77 to 80, wherein displaying the three-dimensional environment in the second visual appearance includes displaying a boundary around at least a first portion of the one or more virtual objects.

82. The method according to claim 81, wherein the boundary is displayed in the three-dimensional environment according to the determination that at least the first portion of the one or more virtual objects is shared content shared with the one or more computer systems within the communication session.

83. The method according to claim 81 or 82, wherein the boundary is displayed relative to the surface in the three-dimensional environment.

84. The boundaries are displayed with a first size based on the first shared spatial arrangement, according to at least the first portion of the one or more virtual objects in the three-dimensional environment having a first shared spatial arrangement, The method according to any one of claims 81 to 83, wherein the boundary is displayed on the second shared spatial arrangement, which is different from the first size, according to the second shared spatial arrangement, based on the second shared spatial arrangement, in accordance with at least the first portion of the one or more virtual objects in the three-dimensional environment having a second shared spatial arrangement which is different from the first shared spatial arrangement.

85. The method according to any one of claims 81 to 84, wherein displaying the boundary around the first portion of the one or more virtual objects includes displaying the boundary around one or more locations in the three-dimensional environment that correspond to one or more available viewpoint locations for participants of the communication session that correspond to the shared content in the three-dimensional environment.

86. Displaying the boundary around the first portion of the one or more virtual objects is, The first portion of the boundary is displayed with a first visual prominence, such that the first portion of the boundary is displayed at a first distance from the first viewpoint of the first user in the three-dimensional environment. The method according to any one of claims 82 to 85, comprising displaying the second portion of the boundary with a second visual strife less than the first visual strife, such that the second portion of the boundary is displayed at a second distance greater than the first distance from the first viewpoint of the first user in the three-dimensional environment.

87. While changing the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user, the distance of the boundary from the first viewpoint of the first user is changed in accordance with the change in the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user, The method according to any one of claims 82 to 86, further comprising changing the visual prominence of the boundary to the three-dimensional environment in response to changing the distance of the boundary from the first viewpoint of the first user.

88. While changing the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user, the spatial arrangement between the boundary and the visible objects in the three-dimensional environment is changed. The method according to any one of claims 82 to 87, further comprising: changing the visual prominence of at least the portion of the boundary in accordance with the determination that at least the portion of the boundary has spatial competition with the object with respect to the first viewpoint of the first user while changing the spatial arrangement between the boundary and the object visible in the three-dimensional environment.

89. Displaying the boundary around the first portion of the one or more virtual objects is, A first region of the three-dimensional environment, which includes one or more locations corresponding to at least the first portion of the one or more virtual objects, wherein the first region of the three-dimensional environment displays a first portion of the boundary around the first region, which is at a first distance from the first viewpoint of the user. The method according to any one of claims 82 to 88, comprising displaying a second portion of the boundary around a second region of the three-dimensional environment that does not include the one or more locations corresponding to at least the first portion of the one or more virtual objects, wherein the second region of the three-dimensional environment is at a second distance less than the first distance from the first viewpoint of the user.

90. The method according to any one of claims 82 to 89, wherein displaying the boundary around at least the first portion of the one or more virtual objects includes displaying the first portion around the boundary perpendicular to a vector extending from a location in the three-dimensional environment corresponding to the first viewpoint of the first user to a location in the three-dimensional environment corresponding to the center of the boundary.

91. Displaying the boundary around at least the first portion of the one or more virtual objects is, Displaying the boundary in a first orientation in the three-dimensional environment based on the orientation of the shared content in the three-dimensional environment, according to at least the first portion of the one or more virtual objects that include shared content shared with the one or more computer systems within the communication session, The method according to any one of claims 82 to 90, comprising displaying the boundary in a second orientation different from the first orientation, which includes displaying a first portion of the outer perimeter of the boundary perpendicular to a vector extending from a location in the three-dimensional environment corresponding to a first viewpoint of a first user to a location in the three-dimensional environment corresponding to the center of the boundary, such that at least the first portion of the one or more virtual objects does not contain content shared with the one or more computer systems within the communication session.

92. The method according to any one of claims 77 to 91, further comprising, in response to detecting the end of the first input, displaying the three-dimensional environment via the display generation component in the first visual appearance from the first viewpoint of the first user.

93. Changing the spatial arrangement of the first virtual object relative to the first viewpoint of the first user according to the first input is: The method according to any one of claims 56 to 92, comprising pivoting the first virtual object and the first portion of the one or more virtual objects shared within the communication session with the one or more computer systems around a first location in the three-dimensional environment corresponding to the first viewpoint of the first user, in accordance with the determination that the first input includes input from the first portion of the first user but does not include input from the second portion of the first user.

94. Changing the spatial arrangement of the first virtual object relative to the first viewpoint of the first user according to the first input is: Changing the spatial arrangement of the first virtual object relative to the first viewpoint of the first user, in accordance with the determination that the first input includes input from the first portion of the first user and the second portion of the first user, includes pivoting the first virtual object and the first portion of the one or more virtual objects around a separate location in the three-dimensional environment that is different from the first location. In accordance with the determination that the first input is directed to the first virtual object, the individual locations correspond to a second location in the three-dimensional environment associated with the first virtual object in the three-dimensional environment. The method according to claim 93, wherein, according to the determination that the first input is directed to a second virtual object among the one or more virtual objects, the individual location corresponds to a third location in the three-dimensional environment associated with the second virtual object in the three-dimensional environment.

95. The first virtual object is a virtual object among the one or more virtual objects that is different from the one or more virtual representations of the one or more users, and the first input is directed to the first virtual object, the method according to any one of claims 56 to 94.

96. The first virtual object is a first virtual representation of one or more virtual representations of one or more users, and the first input is directed to the first virtual representation, the method according to any one of claims 56 to 95.

97. The method according to claim 96, further comprising, while receiving the first input, displaying a visual indication corresponding to the request to change the spatial arrangement of the first virtual object in the three-dimensional environment by a separate spatial arrangement of the first virtual object in the three-dimensional environment.

98. The method according to claim 97, wherein the visual indication is displayed on a surface that is visible in the three-dimensional environment below the first virtual object with respect to the first viewpoint of the first user.

99. The method according to claim 97 or 98, further comprising, while receiving the first input corresponding to the request to change the spatial arrangement of the first virtual object in the three-dimensional environment, ceasing to display a visual indication corresponding to the request to change the spatial arrangement of the virtual object in the three-dimensional environment in the individual spatial arrangements for one or more virtual representations of one or more users different from the first virtual representation.

100. While the three-dimensional environment is being displayed from the first viewpoint of the first user, a second input is received via one or more input devices that corresponds to a request to change the spatial arrangement of the first virtual object among the one or more virtual objects in the three-dimensional environment relative to the first viewpoint of the first user, In response to detecting the second input, The spatial arrangement of the first virtual object relative to the first viewpoint of the first user is changed according to the second input, in accordance with the determination that the first virtual object is a virtual representation of one or more virtual representations of one or more users, and the second input satisfies one or more first criteria, including the criteria that are satisfied when the first input includes a first air gesture. The first virtual object is shared content with one or more computer systems within the communication session and is not a virtual representation of the user, and the spatial arrangement of the first virtual object relative to the first viewpoint of the first user is changed according to the second input in accordance with the determination that the first input satisfies one or more first criteria, The first virtual object is shared content with one or more computer systems within the communication session, is not a virtual representation of a user, and the spatial arrangement of the first virtual object relative to the first viewpoint of the first user is changed according to the second input, in accordance with the determination that the first input satisfies one or more second criteria different from the one or more first criteria, which include criteria that are satisfied when the first input includes a second air gesture different from the first air gesture. The method according to any one of claims 56 to 99, further comprising: the first virtual object being a virtual representation of one or more virtual representations of one or more users, and ceasing to change the spatial arrangement of the first virtual object with respect to the first viewpoint of the user according to the second input, in accordance with the determination that the first input satisfies one or more second criteria.

101. Changing the spatial arrangement of the first virtual object is: In accordance with the determination that the first input is directed to an individual virtual representation of the one or more virtual representations of the one or more users, the first virtual object and a first portion of the one or more virtual objects shared with the one or more computer systems within the communication session are pivoted around a first location in the three-dimensional environment corresponding to the first viewpoint of the first user, using a first pivot radius that extends from the first viewpoint of the user to the individual virtual representation. The method according to any one of claims 56 to 100, comprising: pivoting the first virtual object and the first portion of the one or more virtual objects shared with the one or more computer systems in the communication session about a first location, using a second pivot radius extending from the first viewpoint of the user to the individual virtual object, based on the determination that the first input is directed to a particular virtual object among the one or more virtual objects, which are different from the one or more virtual representations of the one or more users, that are shared with the one or more computer systems in the communication session.

102. While receiving the first input, The method further includes, in accordance with the determination that the first virtual object is a virtual representation of one or more virtual representations of one or more users, displaying a boundary around the one or more virtual representations of one or more users in the three-dimensional environment. In accordance with the determination that the spatial distribution of one or more locations in the three-dimensional environment corresponding to one or more current viewpoints of one or more users in the three-dimensional environment is a first spatial distribution, the boundary has a first size, The method according to any one of claims 56 to 101, wherein, according to the determination that the spatial distribution of the one or more locations in the three-dimensional environment corresponding to the one or more current viewpoints of the one or more users in the three-dimensional environment is a second spatial distribution different from the first spatial distribution, the boundary has a second size different from the first size.

103. The first virtual object is a virtual representation of a second user of a second computer system among the one or more computer systems, and the method is While receiving the first input, Receiving information from the second computer system that corresponds to the updated orientation of the second user's current viewpoint in relation to the three-dimensional environment, The method according to any one of claims 56 to 102, further comprising: changing the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user in accordance with the first input and in accordance with the information received from the second computer system, in response to receiving the information from the second computer system.

104. The first virtual object is a virtual representation of a second user of a second computer system among the one or more computer systems, and the method is While receiving the first input, without following information corresponding to the updated orientation of the second user's current viewpoint to the three-dimensional environment, the spatial arrangement of the first virtual object relative to the first user's first viewpoint is changed according to the first input, The method according to any one of claims 56 to 102, further comprising: receiving the first input, and then changing the spatial arrangement of the first virtual object relative to the first viewpoint of the first user in accordance with the information corresponding to the updated orientation of the second user's current viewpoint relative to the three-dimensional environment.

105. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, During a communication session with one or more computer systems other than the aforementioned computer system, Through the display generation component, a three-dimensional environment is displayed from the first viewpoint of a first user of the first computer system, wherein the three-dimensional environment includes one or more virtual objects, each including one or more virtual representations of one or more users of the one or more computer systems. While the three-dimensional environment is being displayed from the first viewpoint of the first user, a first input is received via one or more input devices, which corresponds to a request to change the spatial arrangement of a first virtual object among the one or more virtual objects in the three-dimensional environment from a first spatial arrangement relative to the first viewpoint of the first user to a second spatial arrangement relative to the first viewpoint of the first user. While receiving the first input, To reduce the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment, A computer system including instructions to change the spatial arrangement of the first virtual object relative to the first viewpoint of the first user in accordance with the first input, while the one or more virtual representations of the one or more users have the reduced visual splendor relative to the three-dimensional environment.

106. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, the computer system During a communication session with one or more computer systems other than the aforementioned computer system, The display generation component displays a three-dimensional environment from the first viewpoint of a first user of the first computer system, wherein the three-dimensional environment includes one or more virtual objects, each containing one or more virtual representations of one or more users of the one or more computer systems. While the three-dimensional environment is being displayed from the first viewpoint of the first user, a first input is received via one or more input devices corresponding to a request to change the spatial arrangement of a first virtual object among the one or more virtual objects in the three-dimensional environment from a first spatial arrangement relative to the first viewpoint of the first user to a second spatial arrangement relative to the first viewpoint of the first user, While receiving the first input, To reduce the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment, A non-temporary computer-readable storage medium that causes a method to be performed which includes changing the spatial arrangement of the first virtual object relative to the first viewpoint of the first user in accordance with the first input, while the one or more virtual representations of the one or more users have the reduced visual prominence with respect to the three-dimensional environment.

107. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and During a communication session with one or more computer systems other than the aforementioned computer system, means for displaying a three-dimensional environment from the first viewpoint of a first user of the first computer system via the display generation component, wherein the three-dimensional environment includes one or more virtual objects, each including one or more virtual representations of one or more users of the one or more computer systems; While the three-dimensional environment is being displayed from the first viewpoint of the first user, means for receiving a first input via one or more input devices that corresponds to a request to change the spatial arrangement of a first virtual object among the one or more virtual objects in the three-dimensional environment from a first spatial arrangement with respect to the first viewpoint of the first user to a second spatial arrangement with respect to the first viewpoint of the first user, While receiving the first input, To reduce the visual prominence of the one or more virtual representations of the one or more users in the three-dimensional environment, A computer system comprising: means for changing the spatial arrangement of the first virtual object with respect to the first viewpoint of the first user in accordance with the first input, while the one or more virtual representations of the one or more users have the reduced visual prominence with respect to the three-dimensional environment.

108. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 56 to 104.

109. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs, when executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, include instructions causing the computer system to execute any of the methods according to claims 56 to 104.

110. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising means for performing the method described in any one of claims 56 to 104.

111. It is a method, In a first computer system communicating with a display generation component and one or more input devices, During a communication session with one or more computer systems other than the first computer system, The three-dimensional environment including the first virtual object is displayed via the aforementioned display generation component, While the three-dimensional environment including the first virtual object is displayed at a first location relative to the first viewpoint of a first user of the first computer system, a first input is detected via one or more input devices that corresponds to a request to move the first virtual object from the first location relative to the first viewpoint of the first user in the three-dimensional environment to a second location different from the first location, While the first input is being detected, In accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session, while moving the first virtual object from a first location to a second location relative to the first user's first viewpoint in the three-dimensional environment, a first visual feedback is displayed in the three-dimensional environment. A method comprising: displaying a second visual feedback, different from the first visual feedback, in the three-dimensional environment while moving the first virtual object from a first location to a second location relative to the first viewpoint of the first user in the three-dimensional environment, in accordance with the determination that the first virtual object is not shared with the one or more computer systems in the communication session.

112. The method according to claim 111, wherein displaying the first visual feedback within the three-dimensional environment includes changing the visual appearance of the three-dimensional environment outside the first virtual object, and displaying the second visual feedback within the three-dimensional environment does not include changing the visual appearance of the three-dimensional environment outside the first virtual object.

113. The method according to claim 112, wherein changing the visual appearance of the three-dimensional environment includes changing the visual appearance of one or more virtual objects, including the first virtual object, that are displayed within the three-dimensional environment.

114. The method according to claim 112 or 113, wherein changing the visual appearance of the three-dimensional environment includes changing the visual appearance of one or more parts of the first user's physical environment that are visible within the three-dimensional environment.

115. While the three-dimensional environment including the first virtual object is being displayed, and before the first input is detected, In accordance with the determination that the first virtual object is not shared with one or more computer systems within the communication session, a first virtual element that can be selected to move the first virtual object relative to the first viewpoint of the first user in a three-dimensional environment is displayed together with the first virtual object, The method according to any one of claims 111 to 114, further comprising: determining that the first virtual object is shared with one or more computer systems within the communication session; and ceasing to display the first virtual element together with the first virtual object.

116. The method according to claim 115, wherein displaying the first virtual element together with the first virtual object includes displaying the virtual element together with the first virtual object, independently of whether the attention of the first user is directed to a location associated with the first virtual object in the three-dimensional environment.

117. While the three-dimensional environment including the first virtual object is being displayed, and before the first input is detected, In accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session, a first virtual element is displayed together with the first virtual object, which is selectable for moving one or more virtual objects including the first virtual object, with respect to the first viewpoint of the first user in the three-dimensional environment. The method according to any one of claims 111 to 116, further comprising: determining that the first virtual object is not shared with one or more computer systems within the communication session; and ceasing to display the first virtual element together with the first virtual object.

118. The method according to claim 117, wherein displaying the first virtual element together with the first virtual object includes displaying the first virtual element together with the first virtual object when the attention of the first user is directed to a location associated with the first virtual object in the three-dimensional environment, and not displaying the virtual element together with the first virtual object when the attention of the first user is not directed to the location associated with the first virtual object in the three-dimensional environment.

119. Displaying the first visual feedback within the three-dimensional environment includes, in accordance with the first input, displaying the movement of one or more virtual objects different from the first virtual object relative to the first viewpoint of the first user within the three-dimensional environment, The method according to any one of claims 111 to 118, wherein displaying the second visual feedback in the three-dimensional environment includes displaying the movement of the first virtual object relative to the first viewpoint of the first user in the three-dimensional environment in accordance with the first input, without including the movement of one or more virtual objects other than the first virtual object relative to the first viewpoint of the first user in the three-dimensional environment.

120. While displaying a virtual representation of a second user of a second computer system among the one or more computer systems at a third location relative to the first viewpoint of the first user in the three-dimensional environment via the display generation component, a second input is detected via one or more input devices that corresponds to a request to move the virtual representation of the second user from the third location to a fourth location different from the third location relative to the first viewpoint of the first user in the three-dimensional environment. The method according to any one of claims 111 to 119, further comprising: detecting the first input while moving the virtual representation of the second user from the third location to the fourth location relative to the first viewpoint of the first user in the three-dimensional environment; and displaying the first visual feedback in the three-dimensional environment.

121. The method according to any one of claims 111 to 120, wherein displaying the first visual feedback within the three-dimensional environment includes reducing the visual prominence of one or more virtual representations of one or more users of the one or more computer systems displayed within the three-dimensional environment.

122. The method according to any one of claims 111 to 121, further comprising displaying a first virtual element that can be selected to move the first virtual object relative to the first viewpoint of the first user in the three-dimensional environment while the three-dimensional environment including the first virtual object is being displayed, wherein the first input includes an input directed to the first virtual element.

123. While the three-dimensional environment including the first virtual object is being displayed, In accordance with the determination that the first virtual object is not shared with one or more computer systems within the communication session, selectable options are displayed along with the first virtual object to discontinue displaying the first virtual object in the three-dimensional environment, The method according to any one of claims 111 to 122, further comprising: determining that the first virtual object is shared with one or more computer systems within the communication session; and ceasing to display the selectable option having the first virtual object which is selectable to ceasing to display the first virtual object in the three-dimensional environment.

124. While the three-dimensional environment including the first virtual object is being displayed, In accordance with the determination that the first virtual object is not shared with one or more computer systems within the communication session, selectable options for changing the size of the first virtual object relative to the three-dimensional environment are displayed along with the first virtual object, The method according to any one of claims 111 to 123, further comprising: determining that the first virtual object is shared with one or more computer systems within the communication session; and discontinuing to display the selectable option having the first virtual object which is selectable to change the size of the first virtual object relative to the three-dimensional environment.

125. While the first input is being detected, In accordance with the determination that the first virtual object is not shared with one or more computer systems within the communication session, the movement of the first virtual object in multiple dimensions, including individual dimensions, with respect to the first viewpoint of the first user in the three-dimensional environment is permitted. The method according to any one of claims 111 to 124, further comprising allowing the movement of the first virtual object in multiple dimensions, not including the individual dimensions, with respect to the first viewpoint of the first user in the three-dimensional environment, in accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session.

126. The method according to claim 125, wherein the individual dimensions are the vertical dimensions with respect to the first viewpoint of the first user in the three-dimensional environment.

127. Displaying the second visual feedback within the three-dimensional environment includes changing the size of the first virtual object relative to the three-dimensional environment based on the distance of the first virtual object from the first viewpoint of the first user. The method according to any one of claims 111 to 126, wherein displaying the first visual feedback within the three-dimensional environment does not involve changing the size of the first virtual object relative to the three-dimensional environment based on the distance of the first virtual object from the first viewpoint of the first user.

128. While the three-dimensional environment including the first virtual object is being displayed, In accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session, a visual indication that the first virtual object is shared with one or more computer systems within the communication session with the first computer system is displayed together with the first virtual object, The method according to any one of claims 111 to 127, further comprising: determining that the first virtual object is not shared with the one or more computer systems within the communication session; and ceasing to display the visual indication together with the first virtual object.

129. The first virtual object is shared with one or more computer systems within the communication session, and the method is While displaying the three-dimensional environment including the first virtual object having the visual indication, To detect a second input corresponding to the selection of the visual indication displayed together with the first virtual object, The method of claim 128, further comprising, in response to detecting the second input, ceasing to share the first virtual object with the one or more computer systems within the communication session.

130. The method according to any one of claims 111 to 129, further comprising displaying a second virtual object in the three-dimensional environment simultaneously with the first virtual object while the three-dimensional environment including the first virtual object is being displayed, wherein the first virtual object is shared with the one or more computer systems within the communication session, and the second virtual object is not shared with the one or more computer systems within the communication session.

131. The method according to claim 130, wherein displaying the first visual feedback includes displaying the movement of the second virtual object relative to the first viewpoint of the first user in the three-dimensional environment in accordance with the first input.

132. The method according to claim 130 or 131, wherein displaying the second visual feedback does not include displaying the movement of the first virtual object relative to the first viewpoint of the first user in the three-dimensional environment in accordance with the first input.

133. While moving the first virtual object away from the first location relative to the first viewpoint of the first user in the three-dimensional environment according to the first input, The method according to any one of claims 111 to 132, further comprising, in response to detection of further movement input to move the first virtual object beyond the movement limit, in accordance with the determination that the location of the first virtual object corresponds to a movement limit in the three-dimensional environment, ceasing the movement of the first virtual object beyond the movement limit.

134. While moving the first virtual object away from the first location relative to the first viewpoint of the first user in the three-dimensional environment according to the first input, According to the determination that the location of the first virtual object corresponds to the movement limit in the three-dimensional environment, In response to detecting further movement input to move the first virtual object beyond the aforementioned movement limit, the first virtual object is moved beyond the aforementioned movement limit. After moving the first virtual object beyond the movement limit, the end of the first input is detected, The method according to any one of claims 111 to 132, further comprising: displaying the first virtual object in a location within the movement limit in response to detecting the end of the first input while the first virtual object is outside the movement limit.

135. While detecting the first input, according to the determination that the first virtual object is shared with one or more computer systems within the communication session, In accordance with the determination that the movement of the first virtual object to the communication session is permitted, in a view of the communication session from the perspective of a second user of a second computer system among the one or more computer systems, the first virtual object is moving to the second three-dimensional environment in accordance with the movement of the first virtual object from the first location to the second location relative to the first viewpoint of the first user in the three-dimensional environment. The method according to any one of claims 111 to 134, in accordance with the determination that the movement of the first virtual object to the communication session is not permitted, in the view of the communication session from the perspective of a second user of the second computer system among the one or more computer systems, the first virtual object does not move relative to the second three-dimensional environment in accordance with the movement of the first virtual object from a first location to a second location relative to the first viewpoint of the first user in the three-dimensional environment.

136. While displaying the three-dimensional environment including the first virtual object, display the virtual elements in the three-dimensional environment that can be selected to change the current status of the virtual elements, While detecting the first input, according to the determination that the first virtual object is shared with one or more computer systems within the communication session, In accordance with the current status of the virtual element being a first status, the movement of the first virtual object to the communication session is permitted. The method according to claim 135, further comprising: discontinuing the movement of the first virtual object to the communication session in accordance with the fact that the current status of the virtual element is a second status different from the first status.

137. While detecting the first input, according to the determination that the first virtual object is shared with one or more computer systems within the communication session, Allowing the movement of the first virtual object to the communication session in accordance with the first input corresponding to the first air gesture, The method according to claim 135, further comprising: discontinuing to allow the movement of the first virtual object to the communication session in accordance with the first input corresponding to a second air gesture different from the first air gesture.

138. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, During a communication session with one or more computer systems other than the aforementioned computer system, A three-dimensional environment including a first virtual object is displayed via the aforementioned display generation component. While the three-dimensional environment including the first virtual object is displayed at a first location relative to the first viewpoint of a first user of the first computer system, a first input is detected via one or more input devices that corresponds to a request to move the first virtual object from the first location relative to the first viewpoint of the user in the three-dimensional environment to a second location different from the first location. While the first input is being detected, In accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session, while moving the first virtual object from a first location to a second location relative to the first user's first viewpoint in the three-dimensional environment, a first visual feedback is displayed in the three-dimensional environment. A computer system including instructions to display a second visual feedback, different from the first visual feedback, in the three-dimensional environment while moving the first virtual object from a first location to a second location relative to the first viewpoint of the first user in the three-dimensional environment, based on the determination that the first virtual object is not shared with one or more computer systems within the communication session.

139. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, the computer system During a communication session with one or more computer systems other than the aforementioned computer system, The three-dimensional environment including the first virtual object is displayed via the aforementioned display generation component, While the three-dimensional environment including the first virtual object is displayed at a first location relative to the first viewpoint of a first user of the first computer system, a first input is detected via one or more input devices that corresponds to a request to move the first virtual object from the first location relative to the first viewpoint of the user in the three-dimensional environment to a second location different from the first location. While the first input is being detected, In accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session, while moving the first virtual object from a first location to a second location relative to the first user's first viewpoint in the three-dimensional environment, a first visual feedback is displayed in the three-dimensional environment. A non-temporary computer-readable storage medium that performs a method including, in accordance with the determination that the first virtual object is not shared with one or more computer systems within the communication session, moving the first virtual object from a first location to a second location relative to the first viewpoint of the first user in the three-dimensional environment, while displaying a second visual feedback different from the first visual feedback in the three-dimensional environment.

140. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and During a communication session with one or more computer systems other than the first computer system, means for displaying a three-dimensional environment including a first virtual object via the display generation component, While the three-dimensional environment including the first virtual object is displayed at a first location relative to the first viewpoint of a first user of the first computer system, means for detecting a first input via one or more input devices that corresponds to a request to move the first virtual object from the first location relative to the first viewpoint of the user in the three-dimensional environment to a second location different from the first location, While the first input is being detected, In accordance with the determination that the first virtual object is shared with one or more computer systems within the communication session, while moving the first virtual object from a first location to a second location relative to the first user's first viewpoint in the three-dimensional environment, a first visual feedback is displayed in the three-dimensional environment. A computer system comprising: means for displaying a second visual feedback in the three-dimensional environment, different from the first visual feedback, while moving the first virtual object from a first location to a second location relative to the first viewpoint of the first user in the three-dimensional environment, in accordance with the determination that the first virtual object is not shared with one or more computer systems in the communication session.

141. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 111 to 137.

142. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs, when executed by one or more processors of a computer system communicating with a display generation component and one or more input devices, include instructions causing the computer system to execute any of the methods according to claims 111 to 137.

143. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and A computer system comprising means for performing the method described in any one of claims 111 to 137.

144. It is a method, In a computer system that communicates with one or more input devices and display generation components, While a user of the computer system is participating in the communication session with the first participant of the communication session, The display generation component displays a visual representation of the first participant within the three-dimensional environment, wherein the visual representation is of a first type of visual representation, and the visual representation has a first spatial arrangement with respect to the current viewpoint of the user of the computer system. While displaying the visual representation of the first participant in the three-dimensional environment, first information including audio indications provided to the communication session by the first participant is acquired. In response to obtaining the first information, and in accordance with the determination that the information obtained regarding the direction of the first participant's attention satisfies one or more of the first criteria, The visual representation of the first participant, wherein the visual representation is of the first type of visual representation, and the visual representation has the first spatial arrangement with respect to the current viewpoint of the user of the computer system, and the display of the visual representation is maintained. In accordance with the determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a first direction, the first visual feedback associated with the audio provided by the first participant is presented to the first participant's visual representation, wherein the first visual feedback has a first visual appearance. A method comprising presenting, to the visual representation of the first participant, a second visual feedback associated with the audio provided by the first participant, wherein the second visual feedback has a second visual appearance different from the first visual appearance, in accordance with a determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a second direction different from the first direction.

145. In response to obtaining the first information, and in accordance with the determination that one or more of the first criteria are met, The method according to claim 144, further comprising: displaying the first visual feedback in a third visual appearance indicating the first offset degree, according to a determination that the offset corresponding to the direction of attention to the current orientation of the visual representation is a first offset degree; and displaying the first visual feedback in a fourth visual appearance indicating the second offset degree, which is different from the third visual appearance, according to a determination that the offset corresponding to the direction of attention to the current orientation of the visual representation is a second offset degree different from the first offset degree.

146. The method according to claim 144 or 145, wherein the first visual feedback includes a first simulated glow effect, and the second visual feedback includes a second simulated glow effect.

147. The method according to claim 146, wherein the individual simulated glow effect displays the individual simulated glow effect extending from the edge of the visual representation of the participant toward the central portion of the visual representation of the participant.

148. The method according to claim 146 or 147, wherein the individual simulated glow effect is displayed on the edge of the participant's visual representation.

149. The first visual appearance includes displaying a first portion of the visual representation of the participant corresponding to the first direction at a first size of the first visual feedback, and displaying a second portion of the visual representation of the participant corresponding to the second direction at a second size different from the first size of the first visual feedback, The method according to any one of claims 144 to 148, wherein the second visual appearance includes displaying the first portion of the visual representation at a third size of the second visual feedback, and displaying the second portion of the visual representation at a fourth size of the second visual feedback, which is different from the fourth size.

150. The spatial relationship arrangement of the participant's visual representation with respect to the user's current viewpoint includes the current orientation of the user's visual representation with respect to the three-dimensional environment of the computer system. The method according to any one of claims 144 to 149, wherein the information obtained with respect to the direction of the participant's attention indicates the direction of the participant's attention to the current orientation of the visual representation.

151. Presenting the first visual appearance of the participant's visual representation includes displaying the directional bias of the first visual feedback to the participant's visual representation, The method according to any one of claims 144 to 150, wherein presenting the second visual appearance of the participant's visual representation includes displaying the second visual feedback without directional bias toward the participant's visual representation.

152. In response to the acquisition of the first information including the indication of the audio, in accordance with the determination that one or more first criteria are met, and in accordance with the determination that the direction of the participant's attention is the first direction, In accordance with the determination that the audio indication indicates a first volume of the audio, the first glow effect is displayed at the first volume. The method according to any one of claims 144 to 151, further comprising: displaying the first glow effect at a second size different from the first size, in accordance with the determination that the indication of the audio indicates a second size of the audio different from the first size of the audio.

153. In response to obtaining the aforementioned audio indication, Presenting the participant's visual representation in the first visual appearance, in accordance with the determination that the first visual characteristic of the participant's visual representation is a first value, includes displaying the second visual characteristic of the first visual feedback with a second value. The method according to any one of claims 144 to 152, wherein presenting the participant's visual representation in a first visual appearance, in accordance with the determination that the first visual characteristic of the participant's visual representation is a third value different from the first value, includes displaying the second visual characteristic of the first visual feedback having a fourth value different from the third value.

154. In response to obtaining the aforementioned information, The method according to any one of claims 144 to 153, further comprising, in accordance with the determination that one or more of the first criteria are not met, changing the spatial arrangement of the participant's visual representation with respect to the three-dimensional environment according to the first information, wherein the participant's visual representation has a current orientation with respect to the three-dimensional environment corresponding to the direction of the participant's attention to the three-dimensional environment, and having a second spatial relationship with respect to the user's current viewpoint that is different from the first spatial relationship at the end of the change according to the second information.

155. While displaying the participant's visual representation in the second spatial relationship with respect to the user's current viewpoint, acquiring second information different from the first information, including individual indications of audio provided by the participant to the communication session; In response to obtaining the second piece of information, In accordance with the determination that the second piece of information associated with the direction of the participant's attention satisfies one or more of the first criteria, Maintaining the display of the visual representation of the participant having the second spatial arrangement with respect to the user's current viewpoint, In accordance with the determination that the second information obtained regarding the direction of the participant's attention indicates that the direction of the participant's attention is a third direction, the presenting of a third visual feedback to the participant's visual representation, which is associated with the audio provided by the participant, wherein the third visual feedback has a third visual appearance. The method according to claim 154, further comprising presenting a fourth visual feedback to the visual representation of the participant, associated with the audio provided by the participant, wherein the fourth visual feedback has a fourth visual appearance different from the first visual appearance, based on a determination that the second information obtained with respect to the direction of the participant's attention indicates that the direction of the participant's attention is a fourth direction different from the third direction.

156. While the user of the computer system is participating in the communication session with the second participant of the communication session, the display generation component displays a visual representation of the second participant in the three-dimensional environment, wherein the visual representation of the second participant is a second type of visual representation different from the first type of visual representation, and the visual representation has a second spatial arrangement with respect to the current viewpoint of the user of the computer system. While displaying the visual representation of the second type of the second participant, obtaining second information associated with the second participant, The method according to any one of claims 144 to 155, further comprising: obtaining the second information associated with the second participant, and determining that the second information indicates the direction of the second participant's attention, and moving the visual representation of the second participant according to the second information.

157. The method according to claim 156, wherein the visual representation of the first participant and the visual representation of the second participant are displayed simultaneously.

158. The method according to any one of claims 144 to 157, further comprising, while the user of the computer system is participating in the communication session of the participant and a second participant of the communication session who is different from the participant, displaying the visual representation of the participant which is the first type of visual representation, the display generation component to display a visual representation of the second participant, which is different from the visual representation of the participant, in the three-dimensional environment, wherein the visual representation of the second participant is the first type of visual representation.

159. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, While a user of the computer system is participating in the communication session with the first participant of the communication session, Through the display generation component, a visual representation of the first participant is displayed in a three-dimensional environment, wherein the visual representation is of a first type of visual representation, and the visual representation has a first spatial arrangement with respect to the current viewpoint of the user of the computer system. While displaying the visual representation of the first participant within the three-dimensional environment, first information including audio indications provided to the communication session by the first participant is acquired. In response to obtaining the first information, and in accordance with the determination that the information obtained regarding the direction of the first participant's attention satisfies one or more of the first criteria, The visual representation of the first participant, wherein the visual representation is a first type of visual representation, and the visual representation maintains the display of the visual representation having the first spatial arrangement with respect to the current viewpoint of the user of the computer system. In accordance with the determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a first direction, the first visual feedback associated with the audio provided by the first participant is presented to the first participant's visual representation, wherein the first visual feedback has a first visual appearance. A computer system including instructions to present, to the visual representation of the first participant, a second visual feedback associated with the audio provided by the first participant, wherein the second visual feedback has a second visual appearance different from the first visual appearance, based on a determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a second direction different from the first direction.

160. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of an electronic device communicating with a display generation component and one or more input devices, the electronic device... While a user of the computer system is participating in the communication session with the first participant of the communication session, The display generation component displays a visual representation of the first participant in a three-dimensional environment, wherein the visual representation is of a first type of visual representation, and the visual representation has a first spatial arrangement with respect to the current viewpoint of the user of the computer system. While displaying the visual representation of the first participant in the three-dimensional environment, first information including audio indications provided to the communication session by the first participant is acquired. In response to obtaining the first information, and in accordance with the determination that the information obtained regarding the direction of the first participant's attention satisfies one or more of the first criteria, The visual representation of the first participant, wherein the visual representation is of the first type of visual representation, and the visual representation has the first spatial arrangement with respect to the current viewpoint of the user of the computer system, and the display of the visual representation is maintained. In accordance with the determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a first direction, the first visual feedback associated with the audio provided by the first participant is presented to the first participant's visual representation, wherein the first visual feedback has a first visual appearance. A non-temporary computer-readable storage medium that causes the machine to perform a method including presenting a second visual feedback to the visual representation of the first participant, associated with the audio provided by the first participant, where the second visual feedback has a second visual appearance different from the first visual appearance, based on a determination that the information obtained with respect to the direction of the first participant's attention indicates that the direction of the first participant's attention is a second direction different from the first direction.

161. An electronic device that communicates with a display generation component and one or more input devices, wherein the electronic device is One or more processors, Memory and The means by which a user of the computer system participates in the communication session with the first participant of the communication session, Means for displaying a visual representation of the first participant in a three-dimensional environment via the display generation component, wherein the visual representation is of a first type, and the visual representation has a first spatial arrangement with respect to the current viewpoint of the user of the computer system. Means for acquiring first information, including audio indications provided to the communication session by the first participant, while the visual representation of the first participant is displayed in the three-dimensional environment, In response to obtaining the first information, and in accordance with the determination that the information obtained regarding the direction of the first participant's attention satisfies one or more of the first criteria, The visual representation of the first participant, wherein the visual representation is a first type of visual representation, and the visual representation maintains the display of the visual representation having the first spatial arrangement with respect to the current viewpoint of the user of the computer system. In accordance with the determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a first direction, the first visual feedback associated with the audio provided by the first participant is presented to the first participant's visual representation, wherein the first visual feedback has a first visual appearance. An electronic device comprising: means for presenting a second visual feedback to the visual representation of the first participant, associated with the audio provided by the first participant, wherein the second visual feedback has a second visual appearance different from the first visual appearance, based on a determination that the information obtained regarding the direction of the first participant's attention indicates that the direction of the first participant's attention is a second direction different from the first direction.

162. It is an electronic device, One or more processors, Memory and An electronic device comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 144 to 158.

163. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when executed by one or more processors of an electronic device, the instructions cause the electronic device to execute any of the methods according to claims 144 to 158.

164. It is an electronic device, One or more processors, Memory and An electronic device comprising means for carrying out the method described in any one of claims 144 to 158.

165. It is a method, In a computer system that communicates with one or more input devices and display generation components, The computer system's user is participating in a communication session with one or more participants, and while the user has a current viewpoint of the computer system's three-dimensional environment, information is obtained that the position of a first participant in the communication session corresponds to the first position in the computer system's three-dimensional environment relative to the user's current viewpoint. In response to obtaining the aforementioned information, In accordance with the determination that one or more first criteria are met, including criteria that are met when the position corresponding to the first participant is outside the viewport of the computer system, the first feedback associated with the first position of the first participant is provided, wherein the first feedback indicates the spatial relationship between the user's current viewpoint and the first position. A method comprising discontinuing the presentation of the first feedback in accordance with a determination that one or more of the first criteria are not met.

166. The method according to claim 165, wherein providing the first feedback includes playing audio corresponding to the first participant.

167. The method according to claim 166, wherein the audio corresponding to the first participant is generated as if it were originating from the first position in the three-dimensional environment.

168. While the user of the computer system is participating in the communication session with one or more participants, the system acquires second information that the position of a second participant, different from the first participant, in the communication session corresponds to the second position in the three-dimensional environment of the computer system relative to the user's current viewpoint. The method according to any one of claims 165 to 167, further comprising:, in response to obtaining the second information, presenting a second feedback, different from the first feedback, associated with the second position of the second participant, beyond the threshold time period after presenting the first feedback, based on a determination that one or more second criteria are met, including criteria that are met when the position of the first participant corresponds to the first position and the position of the second participant corresponds to the second position within a mutual threshold time period, wherein the second feedback indicates a spatial relationship between the second position and the user's current viewpoint.

169. The method according to any one of claims 165 to 168, wherein the first feedback includes audio and visual feedback indicating the spatial relationship between the user's current viewpoint and the first position relative to the user's current viewpoint.

170. The method according to any one of claims 165 to 169, wherein the first feedback includes a simulated glow effect displayed on a separate portion of the user's current viewport, and the spatial relationship of the separate portion to the viewport corresponds to the spatial relationship between the user's current viewpoint and the first position.

171. While the user of the computer system is participating in the communication session, and before acquiring the information that the position of the first participant corresponds to the first position in the three-dimensional environment, the first audio presents the first audio, which, unlike the first feedback, is received while the first audio is being presented. The method according to any one of claims 165 to 170, further comprising modifying one or more characteristics of the first audio in accordance with the acquisition of the aforementioned information and in accordance with the determination that one or more of the aforementioned first criteria are met.

172. Providing the first feedback includes playing one or more first tones, and the method is The user of the computer system is participating in the communication session with one or more participants, and while the user has the current viewpoint of the computer system's three-dimensional environment, the user acquires second information different from the first information, which is that the position of a second participant, different from the first participant, in the communication session corresponds to a second position in the computer system's three-dimensional environment relative to the user's current viewpoint. In response to obtaining the second piece of information, The method according to any one of claims 165 to 171, further comprising: presenting second feedback indicating a spatial relationship between the current viewpoint and the second position, in accordance with a determination that one or more second criteria are met, including criteria that are met when the position corresponding to the second participant is outside the viewport of the computer system, wherein presenting the second feedback includes playing one or more second tones that are different from the first one or more tones.

173. The method according to claim 172, wherein the first one or more tones include a first tone, the second one or more tones include a second tone, and the first tone and the second tone are separated by one or more intervals.

174. The user of the computer system is participating in the communication session with one or more participants, and while the user has the current viewpoint of the computer system's three-dimensional environment, the user acquires third information that is different from the information and different from the second information, wherein the position of a third participant, different from the first participant and different from the second participant in the communication session, corresponds to a third position in the computer system's three-dimensional environment with respect to the user's current viewpoint. In response to obtaining the information in the third instance, The method according to claim 172 or 173, further comprising, in accordance with the determination that one or more of the second criteria are met, presenting a third feedback indicating a spatial relationship between the current viewpoint and the third position, wherein presenting the third feedback includes reproducing one or more third tones that are different from the first one or more tones and different from the second one or more tones.

175. While the user of the computer system is participating in the communication session with one or more participants, Acquiring third information that is different from the first information and different from the second information, including an indication of a request to cease including a representation of the second participant different from the first participant in the three-dimensional environment, In response to obtaining the third information, the inclusion of the second participant's representation in the three-dimensional environment is discontinued. After the information described in item 3 above is obtained, To obtain a fourth piece of information that is different from the first piece of information, different from the second piece of information, and different from the third piece of information, which is that the position of the first participant in the communication session corresponds to a fourth position in the three-dimensional environment of the computer system relative to the user's current viewpoint, The method according to any one of claims 172 to 174, further comprising presenting a fourth feedback, which includes playing one or more first tones indicating a spatial relationship between the user's current viewpoint and the fourth position, in response to obtaining the fourth information and in accordance with the determination that one or more second criteria different from the one or more first criteria are met.

176. While the user of the computer system is participating in a communication session with one or more participants, while the user has the current viewpoint of the computer system's three-dimensional environment, and after the second feedback is presented, the system acquires fourth information that the location of the fourth participant corresponds to a fourth position in the three-dimensional environment relative to the user's current viewpoint. In response to obtaining the fourth piece of information The computer system provides a fourth feedback, different from the first feedback, which includes, in accordance with the determination that one or more third criteria are met, including a criterion that is met when the period elapsed since the presentation of the second feedback is less than a threshold period, the computer system presents a fourth feedback, which includes playing one or more fourth tones that are different from the first one or more tones, The method of claim 175, further comprising the computer system presenting a fifth feedback, different from the first and fourth feedbacks, which includes playing one or more first tones, in accordance with the determination that one or more third criteria are not met.

177. While the user and the first participant are participating in the communication session, the acquisition of second information different from the information, including an indication of a request to cease including a representation of the first participant in the three-dimensional environment, The method according to any one of claims 165 to 176, further comprising presenting a second feedback different from the first feedback, which includes, in response to obtaining the second information, playing a first audio associated with the discontinuation of including the first participant's representation in the three-dimensional environment.

178. The method according to claim 177, wherein providing the first feedback includes playing a second separate audio, the second separate audio being generated as if it were emanating from the first position, and the first audio being generated as if it were emanating from a separate position in the three-dimensional environment associated with the first participant.

179. The user of the computer system, while participating in a second communication session with one or more participants, which is different from the first communication session, acquires second information that the position of the first participant in the second communication session corresponds to a second position in the three-dimensional environment of the computer system relative to the user's current viewpoint. In response to obtaining the second piece of information, In accordance with the determination that one or more second criteria are met, including criteria that are met when the position corresponding to the first participant corresponds to one or more first positions in the three-dimensional environment, the second feedback associated with the second position of the first participant is provided, wherein the second feedback indicates the spatial relationship between the user's current viewpoint and the one or more first positions. The method according to any one of claims 165 to 178, further comprising discontinuing the presentation of the second feedback in accordance with a determination that one or more of the second criteria are not met.

180. Providing the second feedback includes playing audio corresponding to one or more of the first positions, and the method is Acquiring third information different from the second information, which includes an indication of a request to cease including a representation of a second participant different from the first participant in the three-dimensional environment while the user and one or more participants are participating in the communication session, In response to obtaining the second piece of information, The method according to claim 179, further comprising: presenting a first feedback and a third feedback different from the second feedback, including playing a first audio associated with the discontinuation of including the second participant's representation in the three-dimensional environment, in accordance with the determination that one or more third criteria different from the one or more second criteria are met; and presenting a fourth feedback different from the first feedback, including playing a second audio associated with the discontinuation of including the second participant in the three-dimensional environment, in accordance with the determination that one or more third criteria are not met.

181. The method according to claim 180, wherein the one or more third criteria include a criterion that is met when a period of time longer than a time threshold has elapsed since the individual audio associated with the discontinuation of including individual representations of individual participants of the one or more participants in the three-dimensional environment was played, and the method further comprises discontinuing to provide the third feedback in accordance with the determination that the one or more third criteria are not met.

182. While the user of the computer system is participating in a second communication session with one or more participants, which is different from the communication session, Acquiring third information that the position of a second participant in the second communication session corresponds to a third position in the three-dimensional environment of the computer system relative to the user's current viewpoint, and, in response to acquiring the third information and in accordance with the determination that one or more of the second criteria are met, providing third feedback associated with the third position of the second participant, wherein the third feedback indicates a spatial relationship between the user's current viewpoint and one or more of the first positions. To obtain a fourth piece of information, which is that the location of the third participant in the second communication session corresponds to a fourth position in the three-dimensional environment of the computer system relative to the user's current viewpoint, In response to obtaining the fourth information and in accordance with the determination that one or more of the second criteria are met, the fourth feedback is provided, which is associated with the fourth position of the third participant, and the fourth feedback indicates the spatial relationship between the user's current viewpoint and one or more of the first positions. Obtaining a fifth piece of information, including an indication of a request to cease including the representation of the second participant within the three-dimensional environment, In response to obtaining the fifth piece of information, the inclusion of the second participant's representation in the three-dimensional environment is discontinued. After the fifth piece of information is obtained, a sixth piece of information is obtained indicating that the position of the second participant corresponds to the fifth position in the three-dimensional environment of the computer system relative to the user's current viewpoint. The method according to any one of claims 179 to 181, further comprising providing the third feedback in response to obtaining the sixth information and in accordance with the determination that one or more of the second criteria are met.

183. After obtaining the second information, a third piece of information different from the second information is obtained in response to a request to discontinue including the first participant in the three-dimensional environment. In response to obtaining the information in the third instance, In accordance with the determination that one or more third criteria are met, including a criterion that is met when a threshold time has elapsed since the inclusion of the first participant into the three-dimensional environment was discontinued, a third feedback is provided, the third feedback indicating the discontinuation of the inclusion of the first participant into the three-dimensional environment. The method according to any one of claims 179 to 182, further comprising discontinuing the presentation of the third feedback in accordance with the determination that one or more of the third criteria are not met.

184. The method according to any one of claims 179 to 183, wherein the second feedback includes playing back a first audio having one or more characteristics configured to simulate an audio source providing the audio located at positions corresponding to the first one or more positions in the three-dimensional environment.

185. While the user of the computer system is participating in the communication session with one or more participants, the system acquires a third piece of information different from the second piece of information, which is that the position of a second participant, different from the first participant, in the communication session corresponds to a third position in the computer system's three-dimensional environment relative to the user's current viewpoint. In response to obtaining the information in the third instance, The method according to any one of claims 179 to 184, further comprising: providing a third feedback in accordance with a determination that one or more second criteria are met, including criteria that are met when the location corresponding to the first participant and the location corresponding to the second participant are included in one or more second positions of the three-dimensional environment, wherein providing the second feedback includes playing a first audio having one or more characteristics configured to simulate an audio source providing the audio located in a position corresponding to one or more second positions of the three-dimensional environment.

186. The first one or more positions are associated with the user's current viewpoint relative to the three-dimensional environment, and the method is In accordance with the acquisition of the second information and the determination that one or more of the second criteria are met, In accordance with the determination that the current viewpoint is the first viewpoint relative to the three-dimensional environment, the first feedback is provided, wherein one or more of the first positions correspond to the first locations relative to the three-dimensional environment. The method according to any one of claims 179 to 185, further comprising presenting the first feedback in accordance with the determination that the current viewpoint is a second viewpoint different from the first viewpoint with respect to the three-dimensional environment, wherein one or more first positions correspond to a second location with respect to the three-dimensional environment.

187. The method according to any one of claims 165 to 186, wherein the one or more first criteria are not met when the location corresponding to the first participant is inside the viewport of the computer system.

188. The user of the computer system, while participating in a second communication session with one or more participants, which is different from the first communication session, acquires second information that the position of the first participant in the second communication session corresponds to a second position in the three-dimensional environment of the computer system relative to the user's current viewpoint. The method according to any one of claims 165 to 187, further comprising, in response to obtaining the second information, providing a second piece of feedback associated with the second position of the first participant, wherein the second piece of feedback indicates a spatial relationship between the user's current viewpoint and the second position, and the presentation of the second piece of feedback, which is different from the first piece of feedback, is performed independently of the spatial relationship between the user's current viewpoint and the second position.

189. While the user and the first participant are participating in the communication session, the visual representation of the first participant is displayed at individual positions in the three-dimensional environment with a first visual prominence, wherein the visual representation of the first participant has a first spatial arrangement with respect to the user's current viewpoint. While the visual representation of the first participant is displayed with a first level of visual splendor, and while the visual representation of the first participant has a first spatial arrangement with respect to the user's current viewpoint, an indication of a request to share individual first content within the communication session is detected via one or more input devices; In response to detecting the indication of the request, while maintaining the user's current viewpoint, The individual first content is displayed in the initial position within the three-dimensional environment, in a first spatial relationship to the visual representation of the first participant displayed at the first visual splendor level, via the display generation component. The visual splendor of the first participant's visual representation is reduced to a second visual splendor level different from the first visual splendor level. Unlike the first feedback, the second feedback presents the reduction in the visual splendor of the first participant's visual representation from a first visual splendor level to a second visual splendor level. After reducing the visual splendor of the first participant's visual representation to the second visual splendor level, To display the visual representation of the first participant through the display generation component, with a third level of visual splendor that is higher than the second level of visual splendor, and with respect to the user's current viewpoint, which is different from the first spatial relationship, The method according to any one of claims 165 to 188, further comprising presenting a third feedback, different from the second feedback, that shows the second spatial relationship between the position corresponding to the first participant and the user's current viewpoint.

190. Displaying the individual first content in the initial position, and displaying the visual representation of the first participant having a third level of visual prominence and a second spatial relationship with respect to the user's current viewpoint, while detecting an indication of a request to replace the individual first content with individual second content different from the individual first content via one or more input devices, In response to detecting the indication of the request for replacing the individual first content, Replacing the aforementioned individual first content with the aforementioned individual second content, To discontinue the display of the aforementioned visual representation of the first participant, After discontinuing the display of the first participant's visual representation, To display the visual representation of the first participant having an updated spatial relationship with respect to the user's current viewpoint via the display generation component, The method according to claim 189, further comprising presenting a fourth feedback, different from the second feedback, that shows the updated spatial relationship between the first participant and the user's current viewpoint.

191. The method according to any one of claims 165 to 190, wherein the information that the position of the first participant in the communication session corresponds to the first position in the three-dimensional environment of the computer system is associated with a request to update the spatial arrangement of the elements of the communication session with respect to the user's current viewpoint.

192. The method according to any one of claims 165 to 191, wherein the information that the position of the first participant in the communication session corresponds to the first position in the three-dimensional environment of the computer system is associated with the first participant participating in the communication session.

193. The method according to any one of claims 165 to 192, wherein the information that the position of the first participant in the communication session corresponds to the first position in the three-dimensional environment of the computer system is associated with an input for moving the first participant in the individual three-dimensional environment of the first participant.

194. The method according to any one of claims 165 to 193, wherein the information that the position of the first participant in the communication session corresponds to the first position in the three-dimensional environment of the computer system is associated with a request to change the visual representation of the first participant in the communication session from a first type of visual representation to a second type of visual representation.

195. The method according to any one of claims 165 to 194, wherein the information that the position of the first participant in the communication session corresponds to the first position in the three-dimensional environment of the computer system is associated with a request to change the spatial arrangement of elements of the communication session, including one or more visual representations of the one or more participants of the communication session relative to each other.

196. The method according to any one of claims 165 to 195, further comprising, in response to the acquisition of the information while the user is participating in the communication session, presenting first audio which is not localized to the first position associated with the first participant, in accordance with the determination that one or more first criteria are met, and before presenting the first feedback, wherein the first audio is different from the first feedback.

197. Acquiring second information different from the first information, including a request to stop including the representation of the participant in the communication session while the representation corresponding to the first participant is included in the three-dimensional environment, The method according to claim 196, further comprising, in response to obtaining the second information, presenting a second audio which, unlike the first audio, is not localized to one or more positions associated with each participant, including the first participant.

198. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, The user of the computer system is participating in a communication session with one or more participants, and while the user has a current viewpoint of the computer system's three-dimensional environment, the system acquires information that the position of a first participant in the communication session corresponds to the first position in the computer system's three-dimensional environment relative to the user's current viewpoint. In response to obtaining the aforementioned information, In accordance with the determination that one or more first criteria are met, including criteria that are met when the position corresponding to the first participant is outside the viewport of the computer system, first feedback is provided associated with the first position of the first participant, wherein the first feedback provides first feedback indicating the spatial relationship between the user's current viewpoint and the first position. A computer system including an instruction to discontinue providing the first feedback in accordance with a determination that one or more of the first criteria are not met.

199. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of an electronic device communicating with a display generation component and one or more input devices, the electronic device... The computer system's user is participating in a communication session with one or more participants, and while the user has a current viewpoint of the computer system's three-dimensional environment, information is obtained that the position of a first participant in the communication session corresponds to the first position in the computer system's three-dimensional environment relative to the user's current viewpoint. In response to obtaining the aforementioned information, In accordance with the determination that one or more first criteria are met, including criteria that are met when the position corresponding to the first participant is outside the viewport of the computer system, the first feedback associated with the first position of the first participant is provided, wherein the first feedback indicates the spatial relationship between the user's current viewpoint and the first position. A non-temporary computer-readable storage medium that causes the system to perform a method including ceasing to provide the first feedback in accordance with a determination that one or more of the first criteria are not met.

200. An electronic device that communicates with a display generation component and one or more input devices, wherein the electronic device is One or more processors, Memory and Means for acquiring information that, while a user of the computer system is participating in a communication session with one or more participants, and the user has a current viewpoint of the computer system's three-dimensional environment, the position of a first participant in the communication session corresponds to a first position in the computer system's three-dimensional environment relative to the user's current viewpoint; In response to obtaining the aforementioned information, In accordance with the determination that one or more first criteria are met, including criteria that are met when the position corresponding to the first participant is outside the viewport of the computer system, first feedback is provided associated with the first position of the first participant, wherein the first feedback provides first feedback indicating the spatial relationship between the user's current viewpoint and the first position. An electronic device comprising means for discontinuing the presentation of the first feedback in accordance with a determination that one or more of the first criteria are not met.

201. It is an electronic device, One or more processors, Memory and An electronic device comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 165 to 197.

202. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when executed by one or more processors of an electronic device, the instructions cause the electronic device to perform any of the methods according to claims 165 to 197.

203. It is an electronic device, One or more processors, Memory and An electronic device comprising means for carrying out the method described in any one of claims 165 to 197.

204. It is a method, In a computer system that communicates with one or more input devices and display generation components, While a user of the computer system is participating in a communication session with one or more participants, and while the three-dimensional environment is visible through the display generation component, the user receives an indication to display a first spatial representation of a first participant among the one or more participants in the three-dimensional environment. A method comprising, upon receiving the indication for displaying the first spatial representation of the first participant in the three-dimensional environment, displaying the first spatial representation of the first participant in the three-dimensional environment in accordance with a first transition sequence via the display generation component, wherein the first transition sequence is Displaying the first spatial representation of the first participant in the three-dimensional environment according to a first visual model which defines one or more visual properties of the first spatial representation according to a first set of values when displayed according to the first visual model, A method comprising: gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model, wherein the first set of values for the one or more visual properties is different from the second set of values for the one or more visual properties, and the first set of values for the one or more visual properties is different from the second set of values for the one or more visual properties, to being displayed according to a second visual model.

205. Gradually transitioning the displayed first spatial representation of the first participant from being displayed according to the first visual model to being displayed according to the second visual model is: In accordance with the first visual model, the display of the first part of the first spatial representation is discontinued, Displaying the first portion of the first spatial representation according to the second visual model while one or more portions of the first spatial representation other than the first portion are not displayed according to the second visual model, The method according to claim 204, comprising gradually transitioning one or more parts of the first spatial representation other than the first part so that they are displayed according to a second visual model, wherein the order in which the one or more parts are gradually transitioned so that they are displayed according to the second visual model is based on the location of each part of the one or more parts other than the first part.

206. The method according to claim 204 or 205, wherein the first visual model is a low-fidelity visual model and the second visual model is a high-fidelity visual model.

207. The method according to any one of claims 204 to 206, wherein the first spatial representation of the first participant includes a face region, the one or more visual characteristics include the appearance of the face region, the visual appearance of the face region of the second visual model is based on an image of the face associated with the first participant, and the visual appearance of the face region of the first visual model is not based on an image of the face associated with the first participant.

208. The method according to any one of claims 204 to 207, wherein the one or more visual characteristics include a first color associated with one or more parts of the first spatial representation, the first color of the second visual model is a skin tone associated with the first participant, and the first color of the low-fidelity visual model is a color not based on the skin tone associated with the first participant.

209. The method according to any one of claims 204 to 208, wherein the first spatial representation includes size and shape, and the size and shape of the first spatial representation are based on one or more spatial characteristics associated with the first participant.

210. Receiving an indication that one or more parts of the first participant's body have moved while the first spatial representation of the first participant is displayed in the three-dimensional environment according to the first visual model or the second visual model, The method according to any one of claims 204 to 209, further comprising: receiving the indication that one or more parts of the body of the first participant have moved, and independently of whether the first spatial representation of the first participant is displayed according to a first visual model or a second visual model, modifying the display of the displayed first spatial representation of the first participant in accordance with the received indication that one or more parts of the body of the first participant have moved.

211. The first spatial representation of the first participant includes a central region, and gradually transitioning the displayed first spatial representation of the first participant from the first visual model to the second visual model is: In accordance with the first visual model, the central region of the first spatial representation is to be discontinued, After ceasing to display the central region of the first spatial representation according to the first visual model, and while one or more non-central regions of the first spatial representation are not displayed according to the second visual model, the central region of the first spatial representation is displayed according to the second visual model. The method according to any one of claims 204 to 210, comprising gradually transitioning one or more non-central regions of the first spatial representation to be displayed according to the second visual model.

212. The method according to any one of claims 204 to 211, wherein the gradual transition of the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model includes starting the transition from a portion of the first spatial representation closer to the user's viewpoint and ending the transition with a portion of the first spatial representation further from the user's viewpoint.

213. The method according to any one of claims 204 to 212, wherein gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model includes correcting blurring of at least a portion of the three-dimensional environment behind the displayed first spatial representation with respect to the user's viewpoint.

214. The method according to any one of claims 204 to 213, wherein the one or more visual characteristics include visual noise, and gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model includes gradually reducing the magnitude of the visual noise displayed on the first spatial representation.

215. The method according to any one of claims 204 to 214, wherein the shape of the displayed first spatial representation is based on the personification of the first participant.

216. The method according to any one of claims 204 to 215, wherein the shape of the displayed first spatial representation is based on the placeholder representation of the first participant.

217. Receiving an indication to discontinue the display of the first spatial representation of the first participant in the three-dimensional environment, The method according to any one of claims 204 to 216, further comprising: initiating a process to discontinue displaying the first spatial representation, which includes displaying a second transition sequence in the three-dimensional environment in response to receiving the indication to discontinue displaying the first spatial representation of the first participant, the second transition sequence comprising gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a second visual model to being displayed according to a first visual model.

218. The method according to any one of claims 204 to 217, wherein the received indication for displaying the first spatial representation of the first participant is based on the first participant joining the communication session.

219. The method according to any one of claims 204 to 218, wherein the received indication for displaying the first spatial representation of the first participant is based on a request to modify the position of the first participant relative to one or more virtual elements in the communication session.

220. The method according to any one of claims 204 to 219, wherein the received indication for displaying the first spatial representation of the first participant is based on a request to change the representation type associated with the first participant.

221. While displaying the first spatial representation of the first participant, an indication is received to display the second spatial representation of the first participant in the three-dimensional environment. Upon receiving the indication for displaying the second spatial representation of the first participant within the three-dimensional environment, Displaying the first spatial representation of the first participant according to a second transition sequence, the second transition sequence includes gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a second visual model to being displayed according to a first visual model, After the first spatial representation is displayed according to the first visual model, the display of the first spatial representation is stopped. After discontinuing the display of the first spatial representation, a third transition sequence occurs in the three-dimensional environment, wherein the third transition sequence is: Displaying the second spatial representation of the first participant within the three-dimensional environment according to the first visual model, The method according to any one of claims 204 to 220, further comprising displaying the second spatial representation of the first participant according to a third transition sequence, which includes gradually transitioning the displayed second spatial representation of the first participant from the first visual model to the second visual model.

222. A computer system that communicates with a display generation component and one or more input devices, wherein the computer system is One or more processors, Memory and One or more programs, An electronic device comprising, wherein one or more programs are stored in the memory and executed by one or more processors, A computer system that, while a user of the computer system is participating in a communication session with one or more participants and while the three-dimensional environment is visible via the display generation component, receives an indication to display a first spatial representation of a first participant among the one or more participants in the three-dimensional environment, and in response to receiving the indication to display the first spatial representation of the first participant in the three-dimensional environment, includes a command via the display generation component to display the first spatial representation of the first participant in the three-dimensional environment according to a first transition sequence, wherein the first transition sequence is Displaying the first spatial representation of the first participant in the three-dimensional environment according to a first visual model which defines one or more visual properties of the first spatial representation according to a first set of values when displayed according to the first visual model, A computer system comprising: gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model, wherein the first set of values for the one or more visual properties is different from the second set of values for the one or more visual properties, to being displayed according to a second visual model.

223. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when the instructions are executed by one or more processors of an electronic device communicating with a display generation component and one or more input devices, the electronic device... While a user of the computer system is participating in a communication session with one or more participants, and while the three-dimensional environment is visible through the display generation component, the user receives an indication to display a first spatial representation of a first participant among the one or more participants in the three-dimensional environment. A non-temporary computer-readable storage medium, which, upon receiving an indication for displaying the first spatial representation of the first participant in the three-dimensional environment, causes the medium to perform a method including displaying the first spatial representation of the first participant in the three-dimensional environment via the display generation component, wherein the first transition sequence is Displaying the first spatial representation of the first participant in the three-dimensional environment according to a first visual model which defines one or more visual properties of the first spatial representation according to a first set of values when displayed according to the first visual model, A non-temporary computer-readable storage medium, according to a first transition sequence, which includes gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model, wherein the first set of values for one or more visual properties is different from the second set of values for one or more visual properties, to being displayed according to the second visual model that defines the one or more visual properties of the first spatial representation according to the second visual model.

224. An electronic device that communicates with a display generation component and one or more input devices, wherein the electronic device is One or more processors, Memory and Means for receiving an indication to display a first spatial representation of a first participant among the one or more participants in the three-dimensional environment while a user of the computer system is participating in a communication session with one or more participants, and while the three-dimensional environment is visible via the display generation component, An electronic device comprising: means for displaying the first spatial representation of the first participant in the three-dimensional environment in accordance with a first transition sequence via a display generation component in response to receiving the indication for displaying the first spatial representation of the first participant in the three-dimensional environment, wherein the first transition sequence is Displaying the first spatial representation of the first participant in the three-dimensional environment according to a first visual model which defines one or more visual properties of the first spatial representation according to a first set of values when displayed according to the first visual model, An electronic device comprising: gradually transitioning the displayed first spatial representation of the first participant from being displayed according to a first visual model to being displayed according to a second visual model, wherein the first set of values for one or more visual characteristics is different from the second set of values for one or more visual characteristics, to being displayed according to a second visual model.

225. It is an electronic device, One or more processors, Memory and An electronic device comprising one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing the method according to any one of claims 204 to 221.

226. A non-temporary computer-readable storage medium for storing one or more programs, wherein the one or more programs include instructions, and when executed by one or more processors of an electronic device, the instructions cause the electronic device to perform any of the methods according to claims 204 to 221.

227. It is an electronic device, One or more processors, Memory and An electronic device comprising means for carrying out the method according to any one of claims 204 to 221.