Detection and obscuring of display screens in augmented reality content

CN122200293APending Publication Date: 2026-06-12SNAP INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SNAP INC
Filing Date
2021-12-23
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In augmented reality technology, existing technologies struggle to effectively detect and process the displays of electronic devices, resulting in poor interactivity and consistency between virtual content and the real environment.

Method used

The machine learning model detects the display representation of electronic devices and generates a bounding box around the display, adjusting the visual appearance of the display, including changing the luminance values ​​of pixels to enhance the visibility and interactivity of the display.

🎯Benefits of technology

It improves the detection and processing efficiency of displays in augmented reality systems, and enhances the interactivity and consistency between virtual content and the real environment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122200293A_ABST
    Figure CN122200293A_ABST
Patent Text Reader

Abstract

The invention relates to detection and obscuring of display screens in augmented reality content. The invention relates to a method comprising: receiving first image data captured by a camera of an eyewear device; detecting, using a machine learning model, a representation of a display screen of an electronic device in the first image data; selecting at least a portion of the representation of the display screen of the electronic device; adjusting a visual appearance of the portion of the representation of the display screen while the representation of the display screen is detected in a field of view of a user who is using the eyewear device; and causing a display system using the eyewear device to display the adjusted visual appearance.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the invention patent application filed on December 23, 2021, with application number 202180088792.0 (international phase application number PCT / US2021 / 065154) and entitled "Detection and blurring of display screen in augmented reality content". Priority Claim

[0002] This application claims priority to U.S. Provisional Patent Application No. 63 / 132,955, filed December 31, 2020, the entire contents of which are incorporated herein by reference for all purposes. Background Technology

[0003] With the increasing use of digital images, the affordability of portable computing devices, the availability of increased capacity digital storage media, and the increased bandwidth and accessibility of network connections, digital images have become an integral part of the daily lives of more and more people.

[0004] Some electronic eye-wearing devices (such as so-called smart glasses) allow users to interact with virtual content while engaging in an activity. Users wear these devices and can observe the real-world environment through them while interacting with virtual content displayed on the devices. Summary of the Invention

[0005] According to a first aspect of the present invention, a method is provided, comprising: receiving first image data captured by a camera device of an eye-wearing device; detecting, using a machine learning model, a representation of a display screen of an electronic device in the first image data; selecting at least a portion of the representation of the display screen of the electronic device; and adjusting the visual appearance of the aforementioned portion of the representation of the display screen while the representation of the display screen of the electronic device is detected in the field of view of a user using the eye-wearing device, wherein adjusting the visual appearance of the aforementioned portion of the representation of the display screen while the representation of the display screen of the electronic device is detected in the field of view of a user using the eye-wearing device comprises at least: generating a representation of a bounding box surrounding the representation of the display screen, the representation of the bounding box comprising a set of pixels corresponding to at least four sides; and modifying a second set of pixels corresponding to the representation of the display screen by changing a first luminance value of at least a third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value; and causing a display system using the eye-wearing device to display the adjusted visual appearance.

[0006] According to a second aspect of the invention, a system is provided, comprising: a processor; and a memory including instructions that, when executed by the processor, cause the processor to perform operations, the operations including: receiving first image data captured by a camera device of an eye-wearing device; detecting, using a machine learning model, a representation of a display screen of an electronic device in the first image data; selecting at least a portion of the representation of the display screen of the electronic device; and adjusting the visual appearance of the portion of the representation of the display screen while the representation of the display screen of the electronic device is detected in the field of view of a user using the eye-wearing device, wherein adjusting the visual appearance of the portion of the representation of the display screen while the representation of the display screen of the electronic device is detected in the field of view of a user using the eye-wearing device includes at least: generating a representation of a bounding box surrounding the representation of the display screen, the representation of the bounding box including a set of pixels corresponding to at least four sides; and modifying a second set of pixels corresponding to the representation of the display screen by changing a first luminance value of at least a third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value; and causing a display system using the eye-wearing device to display the adjusted visual appearance.

[0007] According to a third aspect of the invention, a non-transitory computer-readable medium including instructions that, when executed by a computing device, cause the computing device to perform operations, said operations including: receiving first image data captured by a camera device of an eye-wearing device; detecting, using a machine learning model, a representation of a display screen of an electronic device in the first image data; selecting at least a portion of the representation of the display screen of the electronic device; and adjusting the visual appearance of said portion of the representation of the display screen while the representation of the display screen of the electronic device is detected in the field of view of a user using the eye-wearing device, wherein adjusting the visual appearance of said portion of the representation of the display screen while the representation of the display screen of the electronic device is detected in the field of view of a user using the eye-wearing device includes at least: generating a representation of a bounding box surrounding the representation of the display screen, said bounding box including a set of pixels corresponding to at least four sides; and modifying a second set of pixels corresponding to the representation of the display screen by changing at least a first luminance value of a third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value; and causing a display system using the eye-wearing device to display the adjusted visual appearance. Attached Figure Description

[0008] To facilitate the identification of any particular element or action in discussion, one or more highest-order digits in the figure reference numerals indicate the figure number in which the element was first introduced.

[0009] Figure 1 This is a diagrammatic representation of a networked environment in which the present disclosure can be deployed, based on some example implementations.

[0010] Figure 2This is a graphical representation of a message client application based on some example implementations.

[0011] Figure 3 It is a graphical representation of a data structure maintained in a database, based on some example implementations.

[0012] Figure 4 It is a graphical representation of messages based on some example implementation methods.

[0013] Figure 5 A front perspective view of an eye-wearing device in the form of smart glasses, including an eye-wearing system, is shown according to an example embodiment.

[0014] Figure 6 This illustrates, according to some embodiments, additional information corresponding to a given message, such as... Figure 4 A schematic diagram of the structure of message annotations as described above.

[0015] Figure 7 This is a block diagram illustrating various modules of an eye-wearing system according to certain example embodiments.

[0016] Figure 8 An example of AR content is shown in which a display screen (e.g., included by a given electronic device) is detected in the user's field of view when using an eye-wearing device.

[0017] Figure 9 This is a flowchart illustrating a method according to some example implementations.

[0018] Figure 10 This is a block diagram illustrating a software architecture in which the present disclosure can be implemented according to some example embodiments.

[0019] Figure 11 It is a schematic representation of a machine in the form of a computer system according to some example embodiments, in which a set of instructions can be executed to cause the machine to perform any or more of the methods discussed herein. Detailed Implementation

[0020] Users with broad interests from various locations can capture digital images of a wide range of subjects and make the captured images available to others via networks such as the Internet. Enhancing the user experience of digital images and providing a variety of features, enabling computing devices to perform image processing operations on various objects and / or features captured under varying conditions (e.g., changes in image scale, noise, lighting, motion, or geometric distortion), can be challenging and computationally intensive.

[0021] Augmented reality (AR) technology aims to bridge the gap between virtual and real-world environments by providing an augmented real-world environment that utilizes electronic information. Therefore, the electronic information appears to be part of the real-world environment as perceived by the user. In this example, AR technology also provides a user interface for interacting with the electronic information overlaid on the augmented real-world environment.

[0022] Augmented reality (AR) systems enable the combination of real and virtual environments to varying degrees to facilitate real-time interaction with the user. Therefore, an AR system as described herein can include a variety of possible combinations of real and virtual environments, including augmented reality that primarily comprises real elements and is closer to the real environment than a virtual environment (e.g., without real elements). In this way, the real environment can be connected to the virtual environment via the AR system. Users immersed in the AR environment can navigate through it, and the AR system can track the user's viewpoint to provide visualizations based on how the user is positioned within the environment. Augmented reality (AR) experiences can be provided in messaging client applications (or messaging systems) as described in the embodiments herein.

[0023] Implementations of the subject matter technologies described herein support various operations involving AR content, which utilize given electronic devices (such as wearable headphone devices, e.g., eye-wearing devices) and mobile computing devices to capture and modify such content.

[0024] In various scenarios, users of mobile computing devices frequently use and increasingly leverage messaging systems to provide different types of functionality in a convenient manner. As described herein, the messaging system of this subject includes practical applications that provide improvements in capturing image data and rendering AR content (e.g., images, videos, etc.) based on the captured image data by offering technological improvements, at least for capturing image data using power- and resource-constrained electronic devices. This improvement in capturing image data, achieved through the techniques provided by this subject, reduces latency and increases efficiency in processing the captured image data, thereby also reducing power consumption in the capturing device.

[0025] As further discussed herein, the infrastructure of this subject matter supports the creation and sharing of interactive media, including 3D content or AR effects, referred to herein as messages, across all the various components of a messaging system. In the example implementations described herein, messages may enter the system from a field camera device or via a storage device (e.g., where messages including 3D content and / or AR effects are stored in memory or a database). The system of this subject matter supports motion sensor input as well as the loading of external effects and asset data.

[0026] As mentioned in this article, the phrases “augmented reality experience,” “augmented reality content item,” and “augmented reality content generator” include or refer to various image processing operations corresponding to image modification, filtering, AR content generator, media overlay, transformation, etc., and may also include the playback of audio or music content during the presentation of AR content or media content, as further described in this article.

[0027] Figure 1 This is a block diagram illustrating an example messaging system 100 for exchanging data (e.g., messages and associated content) over a network. The messaging system 100 includes multiple instances of client devices 102, each instance hosting multiple applications including a messaging client application 104. Each messaging client application 104 is communicatively coupled to other instances of the messaging client application 104 and a messaging server system 108 via a network 106 (e.g., the Internet).

[0028] The messaging client application 104 can communicate and exchange data with another messaging client application 104 and the messaging server system 108 via the network 106. The data exchanged between messaging client applications 104 and between messaging client application 104 and messaging server system 108 includes functions (e.g., commands that call functions) and payload data (e.g., text, audio, video, or other multimedia data).

[0029] The messaging system 100 includes an eye-wearing device 150, which hosts an eye-wearing system 160 and other applications. The eye-wearing device 150 is communicatively coupled to a client device 102 via a network 106, which may include a direct connection via a dedicated short-range communication path, such as Bluetooth™ or Wi-Fi.

[0030] Eye-worn device 150 can be a head-mounted portable system worn by a user, including a display system (e.g., a head-mounted display device) capable of presenting a visualization of an augmented reality environment to the user. Eye-worn device 150 can be powered by a battery. In the example, the display system controlled by eye-worn system 160 of eye-worn device 150 provides the user with a stereoscopic presentation of the augmented reality environment, which realizes a three-dimensional visual display rendering a specific scene. Furthermore, eye-worn device 150 can include various sensors, including but not limited to camera devices, image sensors, touch sensors, microphones, inertial measurement units (IMUs), heart rate sensors, temperature sensors, and other types of sensors. Moreover, eye-worn device 150 can include hardware elements such as hardware buttons or switches capable of receiving user input. User input detected by such sensors and / or hardware elements corresponds to various input modalities to initiate specific operations. For example, such input modalities may include, but are not limited to, face tracking, eye tracking (e.g., gaze direction), hand tracking, posture tracking, biometric readings (e.g., heart rate, pulse, pupil dilation, respiration, temperature, brain waves, smell), speech or audio recognition (e.g., specific hot words), and activation of buttons or switches.

[0031] The eye-wearing device 150 can be communicatively coupled to a base device such as client device 102. Typically, such a base device may include more computing resources and / or available power compared to the eye-wearing device 150. In this example, the eye-wearing device 150 can operate in various modes. For example, the eye-wearing device 150 can operate in an independent mode, independent of any base device.

[0032] The eye-wearing device 150 can also operate in a wireless tethered mode (e.g., connected wirelessly to a base device such as client device 102), thus working with a given base device. When the eye-wearing device 150 operates in wireless tethered mode, at least a portion of processing user input and / or rendering the augmented reality environment can be offloaded to the base device, thereby reducing the processing burden on the eye-wearing device 150. For example, in one implementation, the eye-wearing device 150 works with client device 102 to generate an augmented reality environment that includes physical and / or virtual objects, enabling different forms of real-time interaction (e.g., visual, auditory, and / or physical or tactile interaction) between the user and the generated augmented reality environment. In the example, the eye-wearing device 150 provides rendering of a scene corresponding to an augmented reality environment that can be perceived and interacted with by the user in real time. Additionally, as part of presenting the rendered scene, the eye-wearing device 150 can provide the user with sound, haptic, or tactile feedback. The content of a given rendered scene can depend on available processing power, network availability and capacity, available battery power, and the current system workload.

[0033] In one implementation, the eye-wearing system 160 generates messages that include a recording of the real environment and generates an augmented reality environment that includes two-dimensional (2D) video for sharing and playback. In another implementation, the eye-wearing system 160 generates messages and subsequently generates a three-dimensional (3D) representation that incorporates information from all sensors and / or combines the recording with messages from other users (e.g., different point-of-view (POV)). It should also be understood that the client device 102 can also generate such an augmented reality environment either working with or independently of the eye-wearing device 150.

[0034] As a user moves around the eye-wearing device 150, the eye-wearing system 160 automatically or selectively moves augmented reality or virtual reality content from one virtual location to another. For example, the user or wearer of the eye-wearing device 150 may initially look towards a first part of the real-world environment (e.g., the first room in a house). The user can provide input (e.g., using a voice-activated or touch-activated interface of the client device 102 or the eye-wearing device 150) to initiate or access virtual content comprising one or more objects.

[0035] Message transceiver server system 108 provides server-side functionality to a specific message transceiver client application 104 via network 106. While some functions of message transceiver system 100 are described herein as being performed by message transceiver client application 104 or by message transceiver server system 108, the location of certain functions within message transceiver client application 104 or message transceiver server system 108 is a design choice. For example, it may technically be preferred to initially deploy certain technologies and functions within message transceiver server system 108, but subsequently migrate those technologies and functions to message transceiver client application 104, in which client device 102 has sufficient processing capabilities.

[0036] The messaging server system 108 supports various services and operations provided to the messaging client application 104. Such operations include sending data to and receiving data from the messaging client application 104, and processing data generated by the messaging client application 104. For example, this data may include message content, client device information, geolocation information, media comments and overlays, message content persistence conditions, social network information, and on-site event information. Data exchange within the messaging system 100 is activated and controlled through functions available via the user interface (UI) of the messaging client application 104.

[0037] Now, specifically to message transceiver server system 108, application programming interface (API) server 110 is coupled to application server 112 and provides a programming interface to application server 112. Application server 112 is communicatively coupled to database server 118, which facilitates access to database 120, which stores data associated with messages processed by application server 112.

[0038] Application Programming Interface (API) server 110 receives and sends message data (e.g., commands and message payloads) between client device 102 and application server 112. Specifically, API server 110 provides a set of interfaces (e.g., routines and protocols) that message sending and receiving client application 104 can call or query to invoke functions of application server 112. Application Programming Interface (API) server 110 exposes various functions supported by application server 112, including account registration, login functionality, sending messages from one messaging client application 104 to another messaging client application 104 via application server 112, sending media files (e.g., images or videos) from messaging client application 104 to messaging server application 114, possible access for another messaging client application 104, setting up a collection of media data (e.g., stories), retrieving the friend list of the user of client device 102, retrieving such a collection, retrieving messages and content, adding and deleting friends to the social graph, the location of friends within the social graph, and opening application events (e.g., involving messaging client application 104).

[0039] Application server 112 hosts multiple applications and subsystems, including messaging server application 114, image processing system 116, and social networking system 122. Messaging server application 114 implements numerous messaging techniques and functions, particularly involving the aggregation and other processing of content (e.g., text and multimedia content) received from multiple instances of messaging client application 104. As will be described in further detail, text and media content from multiple sources can be aggregated into collections of content (e.g., referred to as stories or libraries). Messaging server application 114 then makes these collections available to messaging client application 104. Considering the hardware requirements for additional processor- and memory-intensive processing of data, such processing can also be performed on the server side by messaging server application 114.

[0040] Application server 112 also includes an image processing system 116, which is dedicated to performing various image processing operations typically on images or videos received within the payload of messages at message transceiver server application 114.

[0041] Social networking system 122 supports various social networking functions and services, and makes these functions and services available to message sending and receiving server application 114. To this end, social networking system 122 maintains and accesses entity graph 304 (such as...) within database 120. Figure 3 (As shown in the diagram). Examples of the functions and services supported by the social networking system 122 include identifying other users of the messaging system 100 with whom a particular user has a relationship or who a particular user “follows”, as well as identifying the interests of a particular user and other entities.

[0042] Application server 112 is communicatively coupled to database server 118, which provides easy access to database 120, in which data associated with messages processed by message sending and receiving server application 114 is stored.

[0043] Figure 2 This is a block diagram illustrating further details of a messaging system 100 according to an example implementation. Specifically, the messaging system 100 is shown as including a messaging client application 104 and an application server 112, which in turn include several subsystems, namely a short-timer system 202, a collection management system 204, and an annotation system 206.

[0044] The short-lived timer system 202 is responsible for granting temporary access to content permitted by the message sending client application 104 and the message sending server application 114. To this end, the short-lived timer system 202 incorporates multiple timers that selectively display messages and associated content based on durations and display parameters associated with messages or sets of messages (e.g., stories), and enable access to messages and associated content via the message sending client application 104. Further details regarding the operation of the short-lived timer system 202 are provided below.

[0045] The collection management system 204 is responsible for managing collections of media (e.g., collections of text, image, video, and audio data). In some examples, collections of content (e.g., messages, including images, videos, text, and audio) can be organized into 'event libraries' or 'event stories'. Such collections can be made available for a specified time period (e.g., the duration of an event related to the content). For example, content related to a concert can be made available as a 'story' during the duration of the concert. The collection management system 204 can also be responsible for publishing icons that notify the user interface of the messaging client application 104 of the existence of a specific collection.

[0046] Furthermore, the collection management system 204 includes a curation interface 208, which enables collection managers to manage and curate specific content collections. For example, the curation interface 208 allows event organizers to curate collections of content related to a specific event (e.g., removing inappropriate content or redundant messages). Additionally, the collection management system 204 employs machine vision (or image recognition technology) and content rules to automatically curate content collections. In some embodiments, users may be paid compensation to include user-generated content in the collection. In such cases, the curation interface 208 operates to automatically pay such users for using their content.

[0047] Annotation system 206 provides various functionalities that enable users to annotate or otherwise modify or edit media content associated with messages. For example, annotation system 206 provides functionality related to generating and publishing media overlays for messages processed by messaging system 100. Annotation system 206 operablely supplies media overlays or supplements (e.g., image filters) to messaging client application 104 based on the geographic location of client device 102. In another example, annotation system 206 operablely supplies media overlays to messaging client application 104 based on other information, such as the social network information of the user of client device 102. Media overlays may include audio and visual content as well as visual effects. Examples of audio and visual content include images, text, logos, animations, and sound effects. Examples of visual effects include color overlays. Audio and visual content or visual effects may be applied to media content items (e.g., photographs) at client device 102. For example, media overlays may include text that can be overlaid on a photograph taken by client device 102. In another example, media overlays include location identifiers (e.g., Venice Beach) overlays, names of on-site events, or business names (e.g., beach cafes) overlays. In another example, annotation system 206 uses the geographic location of client device 102 to identify media overlays that include the name of a merchant at the geographic location of client device 102. The media overlays may include other tags associated with the merchant. The media overlays may be stored in database 120 and accessed through database server 118.

[0048] In one example implementation, annotation system 206 provides a user-based publishing platform that allows users to select geographic locations on a map and upload content associated with those locations. Users can also specify the environment in which particular media overlays should be provided to other users. Annotation system 206 generates media overlays that include the uploaded content and associates the uploaded content with the selected geographic location.

[0049] In another example implementation, annotation system 206 provides a merchant-based publishing platform that enables merchants to select specific media coverage associated with geographic locations through a bidding process. For example, annotation system 206 associates the media coverage of the highest bidder with the corresponding geographic location within a predefined time period.

[0050] Figure 3 This is a schematic diagram illustrating a data structure 300 that can be stored in a database 120 of a message transceiver server system 108 according to some example implementations. Although the contents of the database 120 are shown as including multiple tables, it should be understood that the data can be stored in other types of data structures (e.g., as an object-oriented database).

[0051] Database 120 includes message data stored in message table 314. Entity table 302 stores entity data, including entity diagram 304. Entities maintained in entity table 302 can include individuals, company entities, organizations, objects, locations, events, etc. Regardless of type, any entity storing data in message transceiver server system 108 can be an identifiable entity. Each entity is assigned a unique identifier and an entity type identifier (not shown).

[0052] Entity Graph 304 also stores information about the relationships and associations between entities. As an example only, such relationships could be social relationships or professional relationships based on interests or activities (e.g., working in a common company or organization).

[0053] Database 120 also stores annotation data in annotation table 312 in the form of filters. Filters stored in annotation table 312 are associated with and applied to videos (stored in video table 310) and / or images (stored in image table 308). In one example, a filter is an overlay displayed as an image or video during presentation to the recipient user. Filters can be of various types, including user-selected filters from a library of filters presented to the sending user by message client application 104 when the sending user is composing a message. Other types of filters include geolocation filters (also known as geographic filters), which can be presented to the sending user based on geographic location. For example, based on geographic location information determined by the GPS unit of client device 102, message client application 104 can present neighborhood-specific or location-specific geolocation filters within the user interface. Another type of filter is a data filter, which can be selectively presented to the sending user by the messaging client application 104 based on other inputs or information collected by the client device 102 during the message creation process. Examples of data filters include the current temperature at a specific location, the current speed of the sending user, the battery life of the client device 102, or the current time.

[0054] Other annotation data that can be stored in image table 308 is augmented reality content generators (e.g., corresponding to an application AR content generator, augmented reality experience, or augmented reality content item). Augmented reality content generators can be real-time special effects and sounds that can be added to images or videos.

[0055] As described above, augmented reality content generators, augmented reality content items, overlays, image transformations, AR images, and similar terms refer to modifications that can be made to videos or images. This includes real-time modifications, which modify an image as it is captured using the device's sensors and then display the modified image on the device's screen. It also includes modifications to stored content, such as video clips in a library that can be modified. For example, in a device that accesses multiple augmented reality content generators, a user can use a single video clip with multiple augmented reality content generators to see how different generators will modify the stored clip. For example, by selecting different augmented reality content generators for the content, multiple generators applying different pseudo-random motion models can be applied to the same content. Similarly, real-time video capture can be used with the illustrated modifications to show how the video image currently captured by the device's sensors will modify the captured data. Such data can simply be displayed on the screen and not stored in memory, or the content captured by the device's sensors can be recorded and stored in memory with or without modification (or both). In some systems, preview features can show how different augmented reality content generators look simultaneously in different windows on the display. This can, for example, enable the simultaneous viewing of multiple windows with different pseudo-random animations on a monitor.

[0056] Therefore, data and various systems that use augmented reality content generators or other such transformation systems to modify content using that data can involve: detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.); tracking of such objects as they leave, enter, and move within the field of view of a video frame; and modification or transformation of such objects while they are being tracked. In various implementations, different methods can be used to implement such transformations. For example, some implementations may involve generating a three-dimensional mesh model of one or more objects and using transformations and animated textures of the model within the video to implement the transformation. In other implementations, tracking points on the object can be used to place an image or texture (which may be two-dimensional or three-dimensional) at the tracked location. In yet another implementation, neural network analysis of video frames can be used to place images, models, or textures within content (e.g., images or frames of a video). Thus, augmented reality content generators involve both images, models, and textures used to create transformations within content, and additional modeling and analysis information required to implement such transformations using object detection, tracking, and placement.

[0057] Real-time video processing can be performed using any type of video data (e.g., video streams, video files, etc.) stored in the memory of any type of computerized system. For example, a user can load video files and store them in the device's memory, or the device's sensors can be used to generate video streams. Additionally, computer-animated models can be used to process any object, such as a human face and body parts, animals, or inanimate objects (such as chairs, cars, or other objects).

[0058] In some implementations, when a specific modification is selected along with the content to be transformed, the computing device identifies the elements to be transformed and then detects and tracks them if they exist in the video frames. The elements of the object are modified according to the modification request, thereby transforming the frames of the video stream. The transformation of the video stream frames can be performed using different methods for different types of transformations. For example, for a transformation of frames that primarily involves changing the form of object elements, feature points are calculated for each element of the object (e.g., using an Active Shape Model (ASM) or other known methods). Then, for each element of at least one element of the object, a feature point-based mesh is generated. This mesh is used for subsequent stages of tracking the elements of the object in the video stream. During tracking, the aforementioned mesh for each element is aligned with the position of each element. Additional points are then generated on the mesh. A first set of first points is generated for each element based on the modification request, and a second set of points is generated for each element based on the first set of points and the modification request. The frames of the video stream can then be transformed by modifying the elements of the object based on the first set of points, the second set of points, and the mesh. In this method, the background of the modified object can also be changed or deformed by tracking and modifying the background.

[0059] In one or more embodiments, a transformation of some regions of an object using the elements of the object can be performed by calculating feature points for each element of the object and generating a mesh based on the calculated feature points. Points are generated on the mesh, and then various regions based on these points are generated. The elements of the object are then tracked by aligning the regions for each element with the positions for each of at least one element, and the frames of the video stream can be transformed by modifying the characteristics of the regions based on a request for modification. Depending on the specific request for modification, the characteristics of the mentioned regions can be transformed in different ways. Such modifications may involve: changing the color of the region; removing at least some portions of the region from the frames of the video stream; including one or more new objects into the region based on the request for modification; and modifying or deforming the elements of the region or object. In various embodiments, any combination of such modifications or other similar modifications may be used. For certain models to be animated, some feature points can be selected as control points to determine the entire state space of options for model animation.

[0060] In some implementations of computer animation models that use face detection to transform image data, a specific face detection algorithm (e.g., Viola-Jones) is used to detect faces in the image. Then, an Active Shape Model (ASM) algorithm is applied to the facial regions of the image to detect facial feature reference points.

[0061] In other implementations, other methods and algorithms suitable for face detection can be used. For example, in some implementations, features are located using landmarks representing distinguishable points present in most of the images considered. For example, for facial landmarks, the location of the left pupil could be used. Secondary landmarks can be used if the initial landmarks are not identifiable (e.g., if the person is wearing an eye patch). Such a landmark identification process can be used for any such object. In some implementations, the set of landmarks forms a shape. The shape can be represented as a vector using the coordinates of the points in the shape. One shape is aligned with another shape using a similarity transformation (allowing translation, scaling, and rotation) that minimizes the average Euclidean distance between the points of the shapes. The average shape is the average of the aligned training shapes.

[0062] In some implementations, the search begins with a landmark search based on an average shape aligned with the position and size of the face determined by a global face detector. This search then repeats the following steps: adjusting the position of shape points to suggest a provisional shape by template matching the image texture around each point, and then conforming the provisional shape to a global shape model until convergence occurs. In some systems, individual template matching is unreliable, and the shape model aggregates the results of weak template matchers to form a stronger overall classifier. The entire search is repeated at each level of the image pyramid, from coarse to fine resolution.

[0063] The transformation system can be implemented by capturing image or video streams on a client device (e.g., client device 102) and performing complex image manipulations locally on client device 102 while maintaining an appropriate user experience, computation time, and power consumption. Complex image manipulations can include size and shape changes, emotion transformations (e.g., changing a face from frowning to smiling), state transformations (e.g., aging a subject, reducing apparent age, changing gender), style transformations, application of graphical elements, and any other suitable image or video manipulations implemented by a convolutional neural network that has been configured to execute efficiently on client device 102.

[0064] In some example implementations, a computer animation model for transforming image data can be used by a system in which a user can capture an image or video stream (e.g., a selfie) using a client device 102 having a neural network operating as part of a messaging client application 104 operating on the client device 102. A transformation system operating within the messaging client application 104 determines the presence of a face within the image or video stream and provides a modification icon associated with the computer animation model for transforming the image data, or the computer animation model may exist in association with the interface described herein. The modification icon includes changes that may be used to modify the user's face within the image or video stream as part of a modification operation. Once a modification icon is selected, the transformation system initiates processing to transform the user's image to reflect the selected modification icon (e.g., generating a smiling face on the user). In some implementations, once an image or video stream is captured and a specified modification is selected, the modified image or video stream can be presented in a graphical user interface displayed on a mobile client device. The transformation system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. In other words, users can capture image or video streams, and once an edit icon is selected, the modified result can be presented in real-time or near real-time. Furthermore, the modifications can be persistent while the video stream is captured and the selected edit icon continues to toggle. Machine-trained neural networks can be used to achieve such modifications.

[0065] In some implementations, the graphical user interface (GUI) presenting the modifications performed by the transformation system may offer the user additional interactive options. Such options may be based on the interface used to initiate content capture and selection for a specific computer animation model (e.g., initiated from a content creator GUI). In various implementations, the modifications may be persistent after the initial selection of the modification icon. The user can toggle the modification on or off by tapping or otherwise selecting the face being modified by the transformation system and save it for later viewing or browsing other areas of the imaging application. In cases where the transformation system modifies multiple faces, the user can globally toggle the modification on or off by tapping or selecting a single face modified and displayed within the GUI. In some implementations, individual faces within a group of multiple faces can be modified separately, or such modifications can be toggled individually by tapping or selecting individual faces or a series of individual faces displayed within the GUI.

[0066] In some example implementations, a graphics processing pipeline architecture is provided that enables the application of different augmented reality experiences (e.g., AR content generators) in corresponding different layers. Such a graphics processing pipeline provides a scalable rendering engine for providing multiple augmented reality experiences included in composite media (e.g., images or videos) or composite AR content for rendering by messaging client application 104 (or messaging system 100).

[0067] As mentioned above, video table 310 stores video data, which in one embodiment is associated with messages maintained in message table 314. Similarly, image table 308 stores image data associated with messages stored in entity table 302. Entity table 302 can associate various annotations from annotation table 312 with various images and videos stored in image table 308 and video table 310.

[0068] Story table 306 stores data related to collections of messages and associated image, video, or audio data, compiled into collections (e.g., stories or libraries). The creation of a specific collection can be initiated by a specific user (e.g., each user whose record is maintained in entity table 302). A user can create a 'personal story' in the form of a collection of content that has already been created and sent / broadcast by that user. For this purpose, the user interface of messaging client application 104 may include user-selectable icons that allow the sending user to add specific content to his or her personal story.

[0069] Collections can also constitute 'live stories,' which are collections of content from multiple users created manually, automatically, or using a combination of manual and automatic techniques. For example, 'live stories' can constitute a curated stream of user-submitted content from different locations and events. Users whose client devices have location services enabled and are at a common location event at a specific time can be presented with options, for example, via the user interface of messaging client application 104, to contribute content to a specific live story. The messaging client application 104 can identify live stories to users based on their location. The end result is a 'live story' told from a community perspective.

[0070] Another type of content collection is called a 'location story', which allows users whose client devices 102 are located in a specific geographic location (e.g., on a college or university campus) to contribute to a specific collection. In some implementations, contributing to a location story may require a second level of authentication to verify that the end user belongs to a specific organization or other entity (e.g., is a student on a university campus).

[0071] Figure 4 This is a schematic diagram illustrating the structure of a message 400 according to some embodiments, generated by a message transceiver client application 104 or an eye-wearing system 160 for transmission to another message transceiver client application 104 or a message transceiver server application 114. The content of a particular message 400 is used to populate a message table 314 stored in a database 120 accessible to the message transceiver server application 114. Similarly, the content of message 400 is stored in memory as 'in transit' or 'in flight' data of the client device 102 or application server 112. Message 400 is shown as including the following elements:

[0072] Message Identifier 402: A unique identifier that identifies message 400.

[0073] Message text payload 404: The text to be generated by the user via the user interface of the client device 102 and included in message 400.

[0074] Message image payload 406: Image data captured by the camera component of the client device 102 or retrieved from the memory component of the client device 102 and included in message 400.

[0075] Message video payload 408: Video data captured by the camera device component or retrieved from the memory component of the client device 102 and included in message 400.

[0076] Message audio payload 410: Audio data captured by the microphone or retrieved from the memory component of the client device 102 and included in message 400.

[0077] Message annotation 412: Annotation data (e.g., filters, stickers or other enhancements) representing annotations to be applied to message image payload 406, message video payload 408 or message audio payload 410 of message 400.

[0078] Message duration parameter 414: A parameter value, in seconds, indicating the amount of time that the content of the message (e.g., message image payload 406, message video payload 408, message audio payload 410) will be presented or made accessible to the user via the message sending and receiving client application 104.

[0079] Message geolocation parameter 416: Geographic location data (e.g., latitude and longitude coordinates) associated with the message's content payload. Multiple message geolocation parameter values ​​416 may be included in the payload, each of which is associated with a content item included in the content (e.g., a specific image within the message image payload 406 or a specific video within the message video payload 408).

[0080] Message Story Identifier 418: An identifier value that identifies one or more sets of content (e.g., 'story') associated with a specific content item in the message image payload 406 of message 400. For example, multiple images within the message image payload 406 may each be associated with multiple sets of content using their respective identifier values.

[0081] Message Tag 420: Each message 400 can be labeled with multiple tags, each tag indicating the subject of the content included in the message payload. For example, in the case where a specific image included in the message image payload 406 depicts an animal (e.g., a lion), the tag value can be included within the message tag 420 indicating the relevant animal. The tag value can be manually generated based on user input or can be automatically generated using, for example, image recognition.

[0082] Message sender identifier 422: An identifier (e.g., a message sending system identifier, email address, or device identifier) ​​indicating the user of the client device 102 on which message 400 is generated and from which message 400 is sent.

[0083] Message receiver identifier 424: An identifier (e.g., message sending and receiving system identifier, email address, or device identifier) ​​indicating the user of the client device 102 to which message 400 is addressed.

[0084] The content (e.g., values) of each element of message 400 can be pointers to locations in tables where content data values ​​are stored. For example, image values ​​in message image payload 406 can be pointers to locations (or their addresses) within image table 308. Similarly, values ​​in message video payload 408 can point to data stored in video table 310, values ​​in message annotation 412 can point to data stored in annotation table 312, values ​​in message story identifier 418 can point to data stored in story table 306, and values ​​in message sender identifier 422 and message receiver identifier 424 can point to user records stored in entity table 302.

[0085] Figure 5 A front perspective view of an eye-wearing device 150 in the form of a pair of smart glasses, including an eye-wearing system 160, is shown according to an example embodiment. The eye-wearing device 150 includes a body 503, which includes a front component or frame 506 and a pair of temples 509 connected to the frame 506 to support the frame 506 in a proper position on the user's face when the eye-wearing device 150 is worn. The frame 506 can be made of any suitable material, such as plastic or metal, including any suitable shape memory alloy.

[0086] The eye-wearing device 150 includes a pair of optical elements in the form of a lens 512, the pair of optical elements being held by corresponding optical element holders having the form of a pair of frame borders 515 forming part of a frame 506. The frame borders 515 are connected via a nose bridge 518. In other embodiments, one or both of the optical elements may be a display, a display assembly, or a combination of a lens and a display.

[0087] Frame 506 includes a pair of end members 521 defining the lateral ends of frame 506. In this example, various electronic components are housed in one or both of the end members 521. Temples 509 are coupled to the respective end members 521. In this example, temples 509 are coupled to frame 506 via corresponding hinges to allow hinged movement between a wearable mode and a folded mode, in which the temples 509 pivot toward frame 506 to rest substantially flat against frame 506. In other embodiments, temples 509 may be coupled to frame 506 by any suitable means, or may be rigidly or fixedly attached to frame 506 to form an integral part thereof.

[0088] Each temple in the temple 509 includes a front portion that attaches to the frame 506 and a suitable rear portion for attachment to the user's ear, such as... Figure 5The example embodiments show curved or pointed components. In some embodiments, frame 506 is formed from a single piece of material to have a uniform or monolithic construction. In some embodiments, the entire body 503 (including both frame 506 and temple 509) may be a uniform or monolithic construction.

[0089] The eye-wearing device 150 has portable electronic components, including computing devices such as computer 524 or low-power processors, which can be of any suitable type in different embodiments for carrying by the body 503. In some embodiments, computer 524 is at least partially housed in one or both of the temples 509. In this embodiment, the various components of computer 524 are housed in the lateral end members 521 of frame 506. Computer 524 includes one or more processors having memory (e.g., volatile storage devices such as random access memory or registers), storage devices (e.g., non-volatile storage devices), wireless communication circuitry (e.g., BLE communication devices and / or WiFi Direct devices), and power supply. Computer 524 includes low-power circuitry, high-speed circuitry, and in some embodiments, a display processor. Various embodiments may include these elements in different configurations or integrated in different ways.

[0090] The computer 524 further includes a battery 527 or other suitable portable power supply device. In one embodiment, the battery 527 is disposed in one of the temples 509. Figure 5 In the eye-wearing device 150 shown, a battery 527 is shown disposed in one of the end components 521, which is electrically coupled to the rest of the computer 524 housed in the corresponding end component 521.

[0091] The eye-wear device 150 supports camera functionality, and in this example includes a camera device 530, which is mounted in one of the end components 521 and faces forward so as to be more or less aligned with the line of sight of the wearer of the eye-wear device 150. The camera device 530 is configured to capture digital images (also referred to herein as digital photographs or pictures) and digital video content. The operation of the camera device 530 is controlled by a camera device controller provided by a computer 524, meaning that image data of the images or videos captured by the camera device 530 is temporarily stored on a memory forming part of the computer 524. In some embodiments, the eye-wear device 150 may have, for example, a pair of camera devices 530 housed in corresponding end components 521.

[0092] As will be described in more detail below, the carried computer 524 and lens 512 are configured together to provide an eye-wearing system 160 that automatically and selectively recenters virtual content by moving it from a first virtual position to a second virtual position, bringing the virtual content into the field of view of lens 512. Specifically, lens 512 may display virtual content or one or more virtual objects. This makes the virtual content appear to the user as integrated into the real-world environment viewed through lens 512. In some embodiments, the virtual content is received from client device 102. In some embodiments, the virtual content is received directly from application server 112.

[0093] The eye-wearing device 150 includes an accelerometer, a touch interface, and a voice command system. Based on input received by the eye-wearing device 150 from the accelerometer, touch interface, and voice command system, the eye-wearing device 150 can control the user's interaction with virtual content. In one example, user interaction can control the playback of content presented on lens 512. In another example, user interaction can browse playlists or music or video libraries. In yet another example, user interaction can handle the dialogue the user is participating in, for example, by scrolling various three-dimensional or two-dimensional dialogue elements (e.g., chat bubbles) and selecting individual dialogue elements to generate messages to be sent to the participants in the dialogue.

[0094] An eye-wearing system 160 (which may be implemented by a computer 524) assigns virtual content to virtual locations. The eye-wearing system 160 monitors the current virtual location within the field of view of the real-world environment. The eye-wearing system 160 retrieves virtual content for display within a specified range of the current virtual location within the field of view. When the eye-wearing device 150 moves to point to a new portion of the real-world environment associated with a different set of virtual locations, the eye-wearing system 160 excludes any virtual content that is not within the range of that different set of virtual locations. For example, when the eye-wearing device 150 moves to point to a new portion of the real-world environment that does not overlap with a previously displayed portion of the real-world environment, the eye-wearing system 160 excludes any virtual content that is not within the range of that different set of virtual locations.

[0095] The eye-wearing system 160 can receive a request to bring virtual content into the current field of vision. In response, the eye-wearing system 160 updates the specified virtual position associated with the virtual content to a virtual position associated with the current field of vision in the real-world environment. Therefore, the virtual content is now removed from the field of vision to be included in the current field of vision, allowing the user to interact with the virtual content. In some cases, the user may interact only with the virtual content within the field of vision of lens 512. If the user moves to face a different direction that causes the virtual content to disappear from the field of vision, the user input can no longer control or interact with the previously displayed virtual content until the virtual content is brought back into the field of vision.

[0096] The eye-wear device 150 also includes one or more communication devices, such as a Bluetooth Low Energy (BLE) communication interface. This BLE communication interface enables the eye-wear device 150 to communicate wirelessly with the client device 102. Alternatively, or in addition to the BLE communication interface, other forms of wireless communication, such as a WiFi direct interface, may be used. The BLE communication interface implements a standard number of BLE communication protocols.

[0097] A first communication protocol implemented by the BLE interface of the eye-wearing device 150 enables the establishment of an unencrypted link between the eye-wearing device 150 and the client device 102. In this first protocol, the link-layer communication (physical interface or medium) between the eye-wearing device 150 and the client device 102 includes unencrypted data. In this first protocol, the application layer (the communication layer that operates on the physically exchanged data) encrypts and decrypts the data physically exchanged in unencrypted form at the link layer of the BLE communication interface. In this way, the data exchanged at the physical layer can be freely read by an eavesdropping device, but the eavesdropping device cannot decrypt the exchanged data without performing a decryption operation at the application layer.

[0098] A second communication protocol implemented by the BLE interface of the eye-wearing device 150 enables the establishment of an encrypted link between the eye-wearing device 150 and the client device 102. In this second protocol, the link-layer communication (physical interface) between the eye-wearing device 150 and the client device 102 receives data from the application layer and adds a first type of encryption to the data before exchanging data over the physical medium. In this second protocol, the application layer (the communication layer that operates on the physically exchanged data) can, or may not, encrypt and decrypt data physically exchanged in encrypted form using the first type of encryption at the link layer of the BLE communication interface. That is, the data can be encrypted first by the application layer and then further encrypted by the physical layer before being exchanged over the physical medium. After being exchanged over the physical medium, the data is subsequently decrypted by the physical layer and then decrypted again by the application layer (e.g., using a different type of encryption). In this way, because the data is encrypted over the physical medium, the data exchanged over the physical layer cannot be read by eavesdropping devices.

[0099] In some implementations, client device 102 uses a first protocol to communicate with eye-wearing device 150 to exchange images, videos, or virtual content between message transceiver client 104 and eye-wearing device 150.

[0100] As described above, media overlays, such as AR content generators, overlays, image transformations, AR images, and similar terms, refer to modifications that can be made to video or images. This includes real-time modifications, which modify an image with said modifications as it is captured using the device's sensors and then displayed on the device's screen. It also includes modifications to stored content, such as video clips in a library that can be modified. For example, in a device that can access multiple media overlays (e.g., AR content generators), a user can use multiple AR content generators on a single video clip to see how different AR content generators will modify the stored clip. For example, by selecting different AR content generators for the same content, multiple AR content generators applying different pseudo-random motion models can be applied to that same content. Similarly, real-time video capture can be used in conjunction with the illustrated modifications to show how the video image currently captured by the device's sensors will modify the captured data. Such data can simply be displayed on the screen without being stored in memory, or the content captured by the device's sensors can be recorded and stored in memory with or without modification (or both). In some systems, preview features can show how different AR content generators look simultaneously in different windows on the display. For example, this can make it possible to view multiple windows with different pseudo-random animations on the monitor at the same time.

[0101] Therefore, data and various systems that use AR content generators or other such transformation systems to modify content can involve: detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.); tracking of such objects as they leave, enter, and move within the field of view of a video frame; and modification or transformation of such objects while they are being tracked. In various implementations, different methods can be used to implement such transformations. For example, some implementations may involve generating a 3D mesh model of one or more objects and using transformations of the model and animated textures in the video to implement the transformation. In other implementations, tracking points on an object can be used to place an image or texture (which may be two-dimensional or three-dimensional) at the tracked location. In a further implementation, neural network analysis of video frames can be used to place images, models, or textures within content (e.g., images or video frames). Thus, lens data involves both images, models, and textures used to create transformations within the content, and additional modeling and analysis information required to implement such transformations using object detection, tracking, and placement.

[0102] Real-time video processing can be performed using any type of video data (e.g., video streams, video files, etc.) stored in the memory of any type of computerized system. For example, a user can load a video file and store it in the device's memory, or a video stream can be generated using the device's sensors. Furthermore, computer-animated models can be used to process any object, such as a human face and parts of the human body, animals, or inanimate objects (such as chairs, cars, or other objects).

[0103] In some implementations, when a specific modification is selected along with the content to be transformed, the computing device identifies the element to be transformed and then detects and tracks the element to be transformed if it exists in a frame of the video. The elements of the object are modified according to the modification request, thus transforming the frames of the video stream. The transformation of the video stream frames can be performed using different methods for different kinds of transformations. For example, for frame transformations that primarily involve changing the form of object elements, feature points of each element in the object are calculated (e.g., using an Active Shape Model (ASM) or other known methods). A feature point-based mesh is then generated for each element in at least one element of the object. This mesh is used for subsequent stages of tracking the elements of the object in the video stream. During tracking, the aforementioned mesh for each element is aligned with the position of each element. Additional points are then generated on the mesh. A first set of first points is generated for each element based on the modification request, and a second set of points is generated for each element based on the first set of points and the modification request. The frames of the video stream can then be transformed by modifying the elements of the object based on the first and second set of points and the mesh. In this method, the background of the modified object can also be changed or deformed by tracking and modifying the background.

[0104] In one or more embodiments, transformations that alter some regions of an object using its elements can be performed by calculating feature points for each element of the object and generating a mesh based on the calculated feature points. Points are generated on the mesh, and then various regions based on these points are generated. The elements of the object are then tracked by aligning the regions for each element with positions for each of at least one element, and the properties of the regions can be modified based on requests for modification, thereby transforming frames of the video stream. Depending on the specific request for modification, the characteristics of the mentioned regions can be transformed in different ways. Such modifications may involve: changing the color of the region; removing at least some portions of the region from frames of the video stream; including one or more new objects into the region based on the request for modification; and modifying or deforming the elements of the region or object. In various embodiments, any combination of such modifications or other similar modifications can be used. For certain models to be animated, some feature points can be selected as control points to determine the entire state space of options for model animation.

[0105] In some implementations of computer animation models that use face detection to transform image data, a specific face detection algorithm (e.g., Viola-Jones) is used to detect faces in the image. Then, an Active Shape Model (ASM) algorithm is applied to the facial regions of the image to detect facial feature reference points.

[0106] In other implementations, other methods and algorithms suitable for face detection can be used. For example, in some implementations, features are located using landmarks representing distinguishable points present in most of the images considered. For example, for facial landmarks, the location of the left pupil could be used. Secondary landmarks can be used if the initial landmarks are not identifiable (e.g., if the person is wearing an eye patch). Such a landmark identification process can be used for any such object. In some implementations, the set of landmarks forms a shape. The shape can be represented as a vector using the coordinates of the points in the shape. One shape is aligned with another shape using a similarity transformation (allowing translation, scaling, and rotation) that minimizes the average Euclidean distance between the points of the shapes. The average shape is the average of the aligned training shapes.

[0107] In some implementations, the search begins with a landmark search based on an average shape aligned with the position and size of the face determined by a global face detector. This search then repeats the following steps: adjusting the position of shape points to suggest a provisional shape by template matching the image texture around each point, and then conforming the provisional shape to a global shape model until convergence occurs. In some systems, individual template matching is unreliable, and the shape model aggregates the results of weak template matchers to form a stronger overall classifier. The entire search is repeated at each level of the image pyramid, from coarse to fine resolution.

[0108] The transformation system can be implemented by capturing image or video streams on a client device and performing complex image manipulations locally on the client device (such as client device 102) while maintaining an appropriate user experience, computation time, and power consumption. Complex image manipulations can include size and shape changes, emotion transformations (e.g., changing a face from frowning to smiling), state transformations (e.g., aging a subject, reducing apparent age, changing gender), style transformations, application of graphical elements, and any other suitable image or video manipulations implemented by a convolutional neural network that has been configured to execute efficiently on the client device.

[0109] In some example implementations, a computer animation model for transforming image data can be used by a system in which a user can capture an image or video stream (e.g., a selfie) using a client device 102 having a neural network operating as part of a messaging client application 104 operating on the client device 102. A transformation system operating within the messaging client application 104 determines the presence of a face within the image or video stream and provides a modification icon associated with the computer animation model for transforming the image data, or the computer animation model may exist in association with the interface described herein. The modification icon includes changes that may be used to modify the user's face within the image or video stream as part of a modification operation. Once a modification icon is selected, the transformation system initiates processing to transform the user's image to reflect the selected modification icon (e.g., generating a smiling face on the user). In some implementations, once an image or video stream is captured and a specified modification is selected, the modified image or video stream can be presented in a graphical user interface displayed on a mobile client device. The transformation system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. In other words, users can capture image or video streams, and once an edit icon is selected, the modified result can be presented in real-time or near real-time. Furthermore, the modifications can be persistent while the video stream is captured and the selected edit icon continues to toggle. Machine-trained neural networks can be used to achieve such modifications.

[0110] In some implementations, the graphical user interface (GUI) presenting the modifications performed by the transformation system may offer the user additional interactive options. Such options may be based on the interface used to initiate content capture and selection for a specific computer animation model (e.g., initiated from a content creator GUI). In various implementations, the modifications may be persistent after the initial selection of the modification icon. The user can toggle the modification on or off by tapping or otherwise selecting the face being modified by the transformation system and save it for later viewing or browsing other areas of the imaging application. In cases where the transformation system modifies multiple faces, the user can globally toggle the modification on or off by tapping or selecting a single face modified and displayed within the GUI. In some implementations, individual faces within a group of multiple faces can be modified separately, or such modifications can be toggled individually by tapping or selecting individual faces or a series of individual faces displayed within the GUI.

[0111] In some example implementations, a graphics processing pipeline architecture is provided that enables the application of different media overlays in corresponding different layers. Such a graphics processing pipeline provides a scalable rendering engine for providing multiple augmented reality content generators included in composite media (e.g., images or videos) or composite AR content rendered by messaging client application 104 (or messaging system 100).

[0112] As discussed herein, the infrastructure of this subject matter supports the creation and sharing of interactive messages with interactive effects across all the various components of messaging system 100. In the example, to provide such interactive effects, a given interactive message may include image data as well as 2D or 3D data. The infrastructure described herein enables the provision of other forms of 3D and interactive media (e.g., 2D media content) across the subject matter system, allowing such interactive media to be shared across messaging system 100 along with photo and video messages. In the example implementation described herein, messages may enter the system from a field camera device or via a storage device (e.g., messages with 2D or 3D content or augmented reality (AR) effects (e.g., 3D effects or other interactive effects) are stored in memory or a database). In the example of interactive messages with 3D data, the subject matter system supports motion sensor input and manages the transmission and storage of 3D data, as well as the loading of external effects and asset data.

[0113] As described above, interactive messages include images combined with 2D or 3D effects and depth data. In the example implementation, in addition to conventional image textures, this theme system is used to render the message to visualize spatial details / geometry as seen by the camera device. When a viewer interacts with the message by moving their client device, the movement triggers a corresponding change in the viewpoint that renders the image and geometry to the viewer.

[0114] In this implementation, the system provides AR effects (which may include 3D effects using 3D data or interactive 2D effects without using 3D data) that work in conjunction with other components of the system to provide particles, shaders, 2D assets, and 3D geometry that can occupy different 3D planes within a message. In the example, AR effects as described herein are rendered to the user in real-time.

[0115] As mentioned in this article, gyroscope-based interaction refers to a type of interaction in which the rotation of a given client device is used as input to change aspects of the effect (e.g., rotating a phone along the x-axis to change the color of light in a scene).

[0116] As mentioned in this article, augmented reality content generators refer to those that can add real-time special effects and / or sound to messages and modify images and / or 3D data with AR effects and / or other 3D content (such as 3D animated graphic elements), 3D objects (e.g., non-animated), etc.

[0117] The following discussion involves example data combining such message storage according to some implementations.

[0118] Figure 6 This illustrates, according to some embodiments, the additional information generated by the message sending and receiving client application 104 or the eye-wearing system 160, including information corresponding to a given message, as shown above. Figure 4 A schematic diagram of the structure of message annotation 412 as described in the document.

[0119] In the implementation method, such as Figure 3 As shown, including Figure 6 The content of the specific message 400, as shown in the additional data, is used to populate the message table 314 for a given message stored in database 120, which is then accessible by the message sending and receiving client application 104. (As shown in...) Figure 6 As shown, message annotation 412 includes the following elements corresponding to various data: Augmented Reality (AR) Content Identifier 652: Identifier of the AR content generator used in the message. Message identifier 654: The identifier of the message. Asset Identifier 656: A set of identifiers for assets in the message. For example, an asset identifier may be included for an asset determined by a specific AR content generator. In this implementation, such an asset is created by an AR content generator on the sender-side client device, uploaded to the message transceiver server application 114, and used on the receiver-side client device to recreate the message. Examples of typical assets include: Raw still RGB images captured by a camera device Post-processed images with AR content generator effects applied to the original images Augmented Reality (AR) Content Metadata 658: Additional metadata associated with the AR content generator corresponding to AR identifier 652, such as: AR Content Generator Category: Corresponds to the type or category of a specific AR content generator. оAR Content Generator Carousel Index Carousel Groups: Carousel groups can be populated and utilized when a qualified captured AR content generator is inserted into the carousel interface. In the implementation, the new value "AR_DEFAULT_GROUP" (for example, the default group assigned to the AR content generator can be added to the list of valid group names) is used.

[0120] o Capture metadata 660 corresponding to additional metadata, such as: Image metadata of camera device Camera device inherent data •focal length • Main point Other camera device information (e.g., camera device location). Sensor information: Gyroscope sensor data Positioning sensor data Accelerometer sensor data Other sensor data Position sensor data

[0121] Figure 7 This is a block diagram illustrating various modules of an eye-wearing system 160 according to some example embodiments. The eye-wearing system 160 is shown to include an AR content recording system 700. As further shown, the AR content recording system 700 includes a camera module 702, a capture module 704, an image data processing module 706, a rendering module 708, and a content recording module 710. The various modules of the AR content recording system 700 are configured to communicate with each other (e.g., via a bus, shared memory, or switch). Any one or more of these modules can be implemented using one or more computer processors 720 (e.g., by configuring one or more such computer processors to perform the functions described for that module), and therefore may include one or more computer processors 720 (e.g., a set of processors provided by the eye-wearing device 150).

[0122] Any one or more modules described may be implemented using hardware alone (e.g., one or more computer processors 720 of a machine (e.g., machine 1100) or may be implemented using a combination of hardware and software. For example, any described module of the eye-wearing system 160 may physically include an arrangement of one or more computer processors 720 (e.g., a subset of one or more computer processors of a machine (e.g., machine 1100) or one or more computer processors of a machine (e.g., machine 1100) configured to perform the operations described herein for that module. As another example, any module of the AR content recording system 700 may include software, hardware, or both software and hardware that configure an arrangement of one or more computer processors 720 (e.g., one or more computer processors of a machine (e.g., machine 1100)) to perform the operations described herein for that module. Thus, different modules of the AR content recording system 700 may include and be configured with different arrangements of such computer processors 720 or a single arrangement of such computer processors 720 at different points in time. Furthermore, any two or more modules of the eye-wearing system 160 can be combined into a single module, and the functionality described herein for a single module can be subdivided across multiple modules. Additionally, according to various example implementations, modules described herein as implemented in a single machine, database, or device can be distributed across multiple machines, databases, or devices.

[0123] Camera device module 702 performs camera device-related operations, including functions relating to the operation of one or more camera devices involved in the eye-wearing device 150. In the example, camera device module 702 can access camera device functions across different processes executing on the eye-wearing device 150 to determine surfaces for face or surface tracking, and in response to various requests from such processes for camera device data or image data (e.g., frames) (e.g., image data involving a specific resolution or format), provide metadata to such processes that consume the requested camera device data or image data. As mentioned herein, a “process” or “computational process” can refer to an instance of a computer program executed by one or more threads of a given processor.

[0124] As mentioned in this paper, surface tracing refers to the operation of tracing one or more representations of surfaces corresponding to planes (e.g., a given horizontal plane, floor, table) in an input frame. In the example, surface tracing is accomplished using hit testing and / or ray casting techniques. In the example, hit testing determines whether selected points (e.g., pixels or sets of pixels) in the input frame intersect with a surface or plane representing a physical object in the input frame. In the example, ray casting utilizes a Cartesian-based coordinate system (e.g., x and y coordinates) and projects rays (e.g., vectors) onto a world view of the camera device, such as that captured in the input frame, to detect planes where the rays intersect.

[0125] As further shown, the camera device module 702 receives an input frame (or alternatively, a copy of the input frame in the embodiment). The camera device module 702 may include various tracking functions based on the type of object to be tracked. In the example, the camera device module 702 includes tracking capabilities for surface tracking, face tracking, object tracking, etc. In an implementation, the camera device module 702 may execute only one of each of multiple tracking processes at a time to facilitate the management of computing resources at the client device 102 or the eye-wearing device 150. Additionally, the camera device module 702 may perform one or more object recognition or detection operations on the input frame.

[0126] As mentioned herein, tracking refers to the operation used to determine the spatial properties (e.g., position and / or orientation) of a given object (or a portion thereof) during a post-processing phase. In an implementation, the position and orientation of an object are measured continuously during tracking. Different objects may be tracked, such as a user's head, eyes, or limbs, surfaces, or other objects. Tracking involves dynamic sensing and measurement to enable the rendering of virtual objects and / or effects relative to physical objects in a three-dimensional space corresponding to the scene (e.g., an input frame). Therefore, camera module 702 determines metrics corresponding to at least the relative position and orientation of one or more physical objects in the input frame and includes these metrics in the tracking data provided to rendering module 708. In the example, camera module 702 updates such metrics from frame to subsequent frame (e.g., tracking over time).

[0127] In its implementation, the camera device module 702 provides tracking data (e.g., metadata) corresponding to the aforementioned metrics (e.g., position and orientation) as output. In some instances, the camera device module 702 includes logic for shape recognition, edge detection, or any other suitable object detection mechanism. It is also possible for the camera device module 702 to identify objects of interest as predetermined object types, thereby matching a range of shapes, edges, or landmarks with object types in a set of predetermined object types.

[0128] In one implementation, the camera module 702 may utilize techniques that combine information from the device's motion sensors (e.g., accelerometer and gyroscope sensors) with analysis of the scene provided in the input frames. For example, the camera module 702 detects features in the input frames and, therefore, uses information derived at least in part from data from the device's motion sensors to track the differences in the respective positions of such features across several input frames.

[0129] As mentioned herein, face tracking refers to operations used to track representations of facial features (such as portions of a user's face) in an input frame. In some implementations, camera device module 702 includes face tracking logic to identify all or part of a face within one or more images and to track facial landmarks across a set of images in a video stream. As mentioned herein, object tracking refers to tracking representations of physical objects in an input frame.

[0130] In one implementation, the camera module 702 utilizes machine learning techniques (e.g., from the current field of view of the eye-wearing device 150) to detect whether a physical object corresponding to the display screen representation is included in the captured image data. The implementation of this detection will be described below. Figure 8 and Figure 9 This will be discussed in more detail later.

[0131] In the example, camera module 702 utilizes a machine learning model; such a neural network is used to detect the representation of the display screen in the image data. The neural network model can refer to a feedforward deep neural network implemented as an approximation of a function f. This type of model is called a feedforward model because information flows through a function evaluated based on the input x, through one or more intermediate operations that define f, and finally to the output y. Feedforward deep neural networks are called networks because they can be represented by connecting different operations together. A model of a feedforward deep neural network can be represented as a graph showing how operations are connected together from the input layer, through one or more hidden layers, and finally connected to the output layer. Each node in such a graph represents an operation that will be performed in the example. However, it should be understood that other types of neural networks are anticipated by the implementation described herein. For example, recurrent neural networks such as Long Short-Term Memory (LSTM) neural networks can be provided for annotation, or convolutional neural networks (CNNs) can be utilized.

[0132] In this example, for computer vision techniques of this subject matter, the camera device module 702 utilizes a convolutional neural network model to detect representations of displays (or other applicable objects) in image data. Such a convolutional neural network (CNN) can be trained using training data comprising images of thousands of displays, allowing the trained CNN to be provided with input data (e.g., image or video data) and perform the task of detecting the presence of displays in the input data. Convolution operations involve finding local patterns in input data such as image data. Therefore, such patterns learned by the CNN can be identified in any other part of the image data, advantageously providing translation invariance. For example, an image of a display viewed from the side can still produce a correct classification of the display as if it were viewed from the front. Similarly, in cases of occlusion, when the object to be detected (e.g., a display) is partially occluded and not visible, the CNN can still detect the object in the image data.

[0133] In this implementation, the camera module 702 acts as an intermediary between the capture module 704 and other components of the AR content recording system 700. As described above, the camera module 702 can receive requests for captured image data from the image data processing module 706. The camera module 702 can also receive requests for captured image data from the content recording module 710. The camera module 702 can forward such requests to the capture module 704 for processing.

[0134] The capture module 704 (e.g., in response to a aforementioned request from another component) captures images (which may also include depth data) captured by one or more camera devices of the eye-wearing device 150. For example, the image is a photograph captured by an optical sensor (e.g., a camera device) of the eye-wearing device 150. The image includes one or more real-world features, such as a user's face or a real-world object detected in the image. In some embodiments, the image includes metadata describing the image. Each captured image may be included in a data structure referred to herein as a "frame," which may include raw image data along with metadata and other information. In embodiments, the capture module 704 may send the captured image data and metadata as (captured) frames to one or more components of the AR content recording system 700. The transmission of captured frames may occur asynchronously, which can lead to synchronization problems, as one component may receive and process the same frame shortly before or after another component receives and processes a given frame. In applications used to render AR effects and AR environments, such synchronization problems can lead to perceptual lag relative to the user's viewpoint (e.g., non-responsive perception or malfunction), which reduces and degrades the immersive experience of the AR environment. As discussed further below, implementations of this subject matter thus enable the generation of temporal information (e.g., timestamps) for each captured frame to facilitate synchronization of operations and improve the rendering of AR effects and AR environments presented to the viewing user of the eye-wearing device 150.

[0135] Image data processing module 706 generates tracking data and other metadata for the captured image data, including metadata associated with operations used to generate AR content and AR effects applied to the captured image data. Image data processing module 706 performs operations on the received image data. For example, various image processing operations are performed by image data processing module 706. Image data processing module 706 performs these operations based on algorithms or techniques corresponding to animation and / or providing visual and / or auditory effects to the received image data. In an implementation, a given augmented reality content generator may utilize image data processing module 706 to perform operations as part of generating AR content and AR effects, which are then provided to a rendering process to render such AR content and AR effects (e.g., including 2D or 3D effects), etc.

[0136] Rendering module 708 performs AR content rendering for display by eye-wearing system 160 based on data provided by at least one of the aforementioned modules. In the example, rendering module 708 utilizes a graphics processing pipeline to perform graphics operations to render AR content for display. In the example, rendering module 708 implements a scalable rendering engine that supports multiple image processing operations corresponding to various augmented reality content generators. In the example, rendering module 708 can receive composite AR content items for rendering on a display provided by eye-wearing device 150.

[0137] In some implementations, the rendering module 708 provides a graphics system for rendering one or more two-dimensional (2D) objects from a three-dimensional (3D) world (real or fictional) onto a 2D display screen. In some implementations, such a graphics system (e.g., a graphics system included on an eye-wearing device 150) includes a graphics processing unit (GPU) for performing image processing operations and rendering graphic elements for display.

[0138] In implementations, the GPU includes a logical graphics processing pipeline that can receive a representation of a 2D or 3D scene and provide a bitmap output representing a 2D image for display. Existing application programming interfaces (APIs) have implemented graphics pipeline models. Examples of such APIs include the Open Graphics Library (OPENGL) API and the Metal API. The graphics processing pipeline comprises many stages that transform a set of vertices, textures, buffers, and state information into image frames on the screen. In implementations, one stage of the graphics processing pipeline is the shader, which can be used as part of a specific augmented reality content generator applied to the input frame (e.g., an image or video). Shaders can be implemented as code running on a dedicated processing unit (also called a shader unit or shader processor) that typically executes several computation threads, programmed to generate appropriate levels of color and / or special effects for the fragment being rendered. For example, a vertex shader processes vertex attributes (position, texture coordinates, color, etc.), and a pixel shader processes pixel attributes (texture values, color, z-depth, and alpha value). In some instances, the pixel shader is called a fragment shader.

[0139] It should be understood that other types of shader processing can be provided. In the example, the entire frame is rendered using a specific sampling rate within the graphics processing pipeline, and / or pixel shading is performed at a specific per-pixel rate. In this way, a given electronic device (e.g., an eye-wearing device 150) operates the graphics processing pipeline to convert information corresponding to objects into bitmaps that can be displayed by the electronic device.

[0140] The content recording module 710 sends a request to the camera device module 702 to initiate the recording of image data via one or more cameras provided by the eye-wearing device 150. In this embodiment, the camera device module 702 acts as an intermediary between other components in the AR content recording system. For example, the camera device module may receive a request from the content recording module 710 to initiate recording and forward the request to the capture module 704 for processing. Upon receiving the request from the camera device module 702, the capture module 704 performs an operation to initiate image data capture by the cameras provided by the eye-wearing device 150. The captured image data (including timestamp information for each frame of the captured image data) can then be sent to the content recording module 710 for processing. In this example, the content recording module 710 may perform an operation to process the captured image data for rendering by the rendering module 708.

[0141] In one implementation, components of the AR content recording system 700 can communicate using an inter-process communication (IPC) protocol. Alternatively, components of the AR content recording system 700 can communicate via an API provided by the AR content recording system 700.

[0142] In one implementation, the camera module 702 receives a signal or command (or request) to stop recording image data (e.g., sent from the content recording module 710). In response, the camera module 702 sends a request to the capture module 704 to stop capturing image data. In response to the request to stop recording, the capture module 704 complies with the request and ceases further operations of capturing image data using one or more cameras of the eye-wearing device 150. After receiving the signal or command to stop recording, the camera module 702 may also asynchronously send a signal to the image data processing module 706 indicating that recording of image data (e.g., capture of image data by the capture module 704) has (been requested) stopped. Upon receiving the signal, the image data processing module 706 performs operations to complete or terminate image processing operations, including performing operations to generate metadata related to AR content items and AR effects. This metadata can then be sent to the capture module 704, which then generates a composite AR content item including the metadata. Composite AR content items can be received by rendering module 708 and rendered for display on a display device provided by eye-wearing device 150.

[0143] Figure 8 An example is shown where AR content on a display screen (e.g., included by a given electronic device) is detected in the user's field of vision while using an eye-wearing device 150.

[0144] As shown in the first AR environment 800, the field of view 810 includes objects (e.g., a blanket) indicated by AR content 815, which provides a note about the current object in the field of view 810 not being detected as a display screen. In an implementation, the visual appearance of the current object in the field of view 810 is adjusted (e.g., increasing brightness or luminance, changing color values, etc.).

[0145] In the example, other types of adjustments can be performed within field of view 810. For example, the brightness or luminance of the current object can be reduced, making it blurry, its color inverted, or converted to grayscale.

[0146] As shown in the second AR environment 850, the field of view 860 includes an object 870 (e.g., a blanket) covered with AR content 865, which provides annotations about the current object in the field of view 810 being detected as a display screen. In this example, a portion of the field of view 860 is selected corresponding to the detected display screen, and the visual appearance of that portion is adjusted (e.g., brightness or luminance is reduced).

[0147] In one implementation, when a user changes the field of view displayed by the display system of the eye-wearing device 150, the visual appearance of the AR environment can be adjusted to increase the brightness or luminance of objects not considered as a display screen, or to decrease the brightness or luminance of objects detected as a display screen (while retaining or discarding the modification of other objects in the field of view).

[0148] In the implementation, as described above, the camera device module 702 utilizes a machine learning model such as a CNN to detect objects in the input image data, such as representations of a display screen. For example, the input image data (e.g., provided by the capture module 704) is fed into the CNN, and the input is interpreted as a tensor with multiple dimensions (e.g., four dimensions). In this example, the first axis represents the batch size, which is 1 in this case, the second axis represents the height of the image data, the third axis represents the width of the image data, and the fourth axis represents the number of color channels. The image is represented in this way using a channel-last convention, such as the number of channels appearing in the fourth dimension (N × H × W × C). Alternatively, an alternative first-channel convention can be provided, which places the number of channels immediately after the batch axis (N × C × H × W). For the input image, if the input image is a color image representing the red, green, and blue (RGB) channels respectively, the number of channels is 3, and for a monochrome image, the number of channels is 1.

[0149] Then, the CNN utilized by the camera device module 702 performs convolution operations involving several filters on the image data to generate corresponding feature maps. In this example, such feature maps represent features automatically learned by the CNN. Subsequent operations involve pooling, which involves reducing the size of the feature maps to a smaller size. In one example, max pooling or average pooling can be used. Max pooling involves selecting the largest feature in a sliding window, and average pooling involves averaging the features. The downsampled feature maps are then further convolved and fed deeper into the CNN. In the CNN, earlier layers detect simple features such as edges, while later layers combine these previously detected features into complex features such as patterns, object parts, etc. As the CNN moves deeper, the size of the image data decreases while the depth (e.g., the number of channels) increases. In the CNN, high-level feature maps are fed into a fully connected neural network that includes dense layers and activation functions. The CNN then uses similar processing provided in the dense neural network to determine the final prediction (e.g., whether the image data includes a representation of the display screen).

[0150] Figure 9 This is a flowchart illustrating a method 900 according to some example implementations. Method 900 can be implemented using computer-readable instructions for execution by one or more computer processors, such that operation of method 900 can be performed by an eye-wearing device 150 (particularly regarding the above). Figure 7 The various components of the AR content recording system 700 described herein are used in part or in whole; therefore, method 900 is described below by way of example with reference to it. However, it should be understood that at least some of the operation of method 900 can be deployed on various other hardware configurations, and method 900 is not intended to be limited to AR content recording system 700.

[0151] At operation 902, the camera module 702 receives first image data captured by the camera device of the eye-wearing device.

[0152] At operation 904, the camera device module 702 uses a machine learning model to detect the representation of the display screen in the first image data.

[0153] At operation 906, the image data processing module 706 selects at least a portion of the display screen's representation.

[0154] At operation 908, the image data processing module 706 adjusts the visual appearance of the portion represented on the display screen.

[0155] At operation 910, rendering module 708 causes the display system using the eye-wearing device to display the adjusted visual appearance.

[0156] In one implementation, the image data processing module 706 uses a machine learning model to detect the representation of the display screen, including performing object detection processing on the first image data to determine the representation of the display screen, wherein the machine learning model determines a prediction of the representation of the display screen included in the first image data.

[0157] In implementation, the machine learning model includes a convolutional neural network (CNN).

[0158] In one implementation, the CNN determines a region of interest where a representation of the display screen exists, the region of interest including a portion of the first image data.

[0159] In this implementation, the region of interest includes candidate bounding boxes, which are represented on the display screen.

[0160] In this implementation, the CNN generates a feature map based on the region of interest and provides a vector of values ​​corresponding to the feature map as output, which includes corresponding values ​​describing the content of the region of interest.

[0161] In one implementation, the feature map includes a set of features, and the classifier model generates a classification from the feature set of features in the feature map, which includes display objects.

[0162] In one implementation, the image data processing module 706 adjusts the visual appearance of portions of the display screen representation by generating a representation of a bounding box surrounding the representation of the display screen, the representation of which includes a set of pixels corresponding to at least four sides. The pixel set is modified by changing at least a first color value of a first pixel to a second color value, wherein the second color value is different from the first color value, and a second pixel set corresponding to the display screen representation is modified by changing at least a first luminance value of a third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value.

[0163] In one implementation, the image data processing module 706 modifies the pixel set by generating at least a message indicating that the screen has been detected.

[0164] In an implementation, the image data processing module 706 performs the following operations: receiving second image data captured by a camera device, the second image data being received when the camera device of the eye-wearing device moves based on changes in the position of the head of the user wearing the eye-wearing device; detecting that the representation of the display screen no longer exists in the second image data; and modifying the second image data to indicate that the representation of the display screen no longer exists, the modification including reducing a portion of the second image data to reduce at least one luminance value of the pixels from the second image data.

[0165] Figure 10This is a block diagram illustrating an example software architecture 1006, which can be used in conjunction with various hardware architectures described herein. Figure 10 This is a non-limiting example of a software architecture, and it should be understood that many other architectures can be implemented to facilitate the functionality described herein. Software Architecture 1006 can be implemented in, for example... Figure 11 The execution is performed on the hardware of machine 1100, which includes processor 1104, memory 1114, and (input / output) I / O components 1118, etc. A representative hardware layer 1052 is shown, and this representative hardware layer 1052 can represent, for example... Figure 11 The machine 1100. A representative hardware layer 1052 includes a processing unit 1054 having associated executable instructions 1004. The executable instructions 1004 represent executable instructions of the software architecture 1006, including implementations of the methods, components, etc., described herein. Hardware layer 1052 also includes a memory and / or storage module memory / storage device 1056, which also has executable instructions 1004. Hardware layer 1052 may also include other hardware 1058.

[0166] exist Figure 10 In the example architecture, software architecture 1006 can be conceptualized as a stack of layers, each providing specific functionality. For example, software architecture 1006 may include layers such as operating system 1002, library 1020, framework / middleware 1018, application 1016, and presentation layer 1014. Operationally, application 1016 and / or other components within these layers can call API call 1008 via the software stack and receive responses to API call 1008, as shown in message 1012. The layers shown are representative in nature, and not all software architectures have all layers. For example, some mobile operating systems or dedicated operating systems may not provide framework / middleware 1018, while other operating systems may provide such a layer. Other software architectures may include additional or different layers.

[0167] Operating system 1002 can manage hardware resources and provide public services. Operating system 1002 may include, for example, a core 1022, services 1024, and drivers 1026. Core 1022 can serve as an abstraction layer between hardware and other software layers. For example, core 1022 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, etc. Services 1024 can provide other public services to other software layers. Drivers 1026 are responsible for controlling or interfacing with the underlying hardware. For example, depending on the hardware configuration, drivers 1026 may include display drivers, camera drivers, Bluetooth® drivers, flash memory drivers, serial communication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers, audio drivers, power management drivers, etc.

[0168] Library 1020 provides common infrastructure used by application 1016 and / or other components and / or layers. Library 1020 provides functionality that allows other software components to perform tasks more easily than by directly interfacing with the functions of the underlying operating system 1002 (e.g., kernel 1022, services 1024, and / or drivers 1026). Library 1020 may include system library 1044 (e.g., the C standard library), which provides functions such as memory allocation functions, string manipulation functions, mathematical functions, etc. Additionally, library 1020 may include API library 1046, such as media libraries (e.g., libraries supporting the rendering and manipulation of various media formats such as MPREG4, H.264, MP3, AAC, AMR, JPG, and PNG), graphics libraries (e.g., OpenGL frameworks that can be used to render 2D and 3D graphical content on a display), database libraries (e.g., SQLite that provides various relational database functions), network libraries (e.g., WebKit that provides web browsing functionality), etc. Library 1020 may also include various other libraries 1848 to provide many other APIs to application 1016 and other software components / modules.

[0169] The framework / middleware 1018 (sometimes referred to as middleware) provides a higher level of common infrastructure that can be used by application 1016 and / or other software components / modules. For example, the framework / middleware 1018 can provide various graphical user interface (GUI) functions, advanced resource management, advanced location services, etc. The framework / middleware 1018 can provide a wide range of other APIs that can be used by application 1016 and / or other software components / modules, some of which may be specific to a particular operating system 1002 or platform.

[0170] Application 1016 includes built-in application 1038 and / or third-party application 1040. Examples of representative built-in applications 1038 may include, but are not limited to: contact applications, browser applications, book reader applications, location applications, media applications, messaging applications, and / or game applications. Third-party applications 1040 may include applications developed by entities other than platform-specific vendors using the Android™ or iOS™ Software Development Kit (SDK), and may be mobile software running on mobile operating systems such as iOS™, Android™, Windows® Phone, or other mobile operating systems. Third-party applications 1040 may invoke API calls 1008 provided by the mobile operating system (such as operating system 1002) to facilitate the functions described herein.

[0171] Application 1016 can use built-in operating system functions (e.g., kernel 1022, service 1024, and / or driver 1026), libraries 1020, and frameworks / middleware 1018 to create user interfaces to interact with the system's users. Alternatively or additionally, in some systems, interaction with the user can be achieved through a presentation layer such as presentation layer 1014. In these systems, the application / component's 'logic' can be separated from the application / component's user-interacting aspects.

[0172] Figure 11 This is a block diagram illustrating components of a machine 1100, according to some example embodiments, capable of reading instructions from a machine-readable medium (e.g., a machine-readable storage medium) and executing any or more of the methods discussed herein. Specifically, Figure 11A schematic representation of a machine 1100 in the form of an example computer system is shown, within which instructions 1110 (e.g., software, programs, applications, applets, or other executable code) can be executed to cause the machine 1100 to perform any or more of the methods discussed herein. Thus, instructions 1110 can be used to implement the modules or components described herein. Instructions 1110 transform a general, unprogrammed machine 1100 into a specific machine 1100 programmed to perform the described and illustrated functions in the described manner. In alternative embodiments, machine 1100 operates as a standalone device or can be coupled (e.g., networked) to other machines. In a networked deployment, machine 1100 can operate as a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. Machine 1100 may include, but is not limited to: server computers, client computers, personal computers (PCs), tablet computers, laptop computers, netbooks, set-top boxes (STBs), personal digital assistants (PDAs), entertainment media systems, cellular phones, smartphones, mobile devices, wearable devices (e.g., smartwatches), smart home devices (e.g., smart appliances), other smart devices, web devices, network routers, network switches, network bridges, or any machine capable of sequentially or otherwise executing instructions 1110 specifying actions to be taken by machine 1100. Furthermore, although only a single machine 1100 is shown, the term "machine" should also be considered as a collection of machines that individually or jointly execute instructions 1110 to perform any one or more of the methods discussed herein.

[0173] Machine 1100 may include processor 1104 (including processors 1108 to 1112), memory / storage device 1106, and I / O components 1118, which may be configured to communicate with each other, such as via bus 1102. Memory / storage device 1106 may include memory 1114, such as main memory or other memory storage device, and storage cell 1116, such as processor 1104, which may access both memory 1114 and storage cell 1116 via bus 1102. Storage cell 1116 and memory 1114 store instructions 1110 embodying any one or more methods or functions described herein. Instructions 1110 may also reside wholly or partially within memory 1114, storage cell 1116, at least one of processors 1104 (e.g., the processor's cache memory), or any suitable combination thereof during execution by machine 1100. Thus, memory 1114, storage cell 1116, and the memory of processor 1104 are examples of machine-readable media.

[0174] I / O component 1118 may include a wide variety of components for receiving input, providing output, generating output, transmitting information, exchanging information, capturing measurement results, etc. The specific I / O component 1118 included in a particular machine 1100 will depend on the type of machine. For example, a portable machine such as a mobile phone will likely include a touch input device or other such input mechanism, while a headless server machine will likely not include such a touch input device. It will be understood that I / O component 1118 may include... Figure 11 Many other components are not shown. For the sake of simplicity in the discussion below, I / O components 1118 are grouped according to function, and this grouping is by no means limiting. In various example embodiments, I / O components 1118 may include output components 1126 and input components 1128. Output components 1126 may include visual components (e.g., displays such as plasma display panels (PDPs), light-emitting diode (LED) displays, liquid crystal displays (LCDs), projectors, or cathode ray tube (CRT) displays), auditory components (e.g., speakers), haptic components (e.g., vibration motors, resistance mechanisms), other signal generators, etc. Input components 1128 may include alphanumeric input components (e.g., keyboards, touchscreens configured to receive alphanumeric input, photoelectric keyboards, or other alphanumeric input components), point-based input components (e.g., mice, touchpads, trackballs, joysticks, motion sensors, or other pointing instruments), haptic input components (e.g., physical buttons, touchscreens or other haptic input components that provide position and / or force for touch or touch gestures), audio input components (e.g., microphones), etc.

[0175] In other example implementations, I / O component 1118 may include biometric identification component 1130, motion component 1134, environmental component 1136 or positioning component 1138, and various other components. For example, biometric identification component 1130 may include components for detecting expressions (e.g., hand gestures, facial expressions, voice expressions, body posture, or eye tracking), measuring biosignals (e.g., blood pressure, heart rate, body temperature, sweating, or brain waves), and identifying a person (e.g., voice recognition, retinal recognition, facial recognition, fingerprint recognition, or EEG-based recognition). Motion component 1134 may include accelerometer components (e.g., accelerometer), gravity sensor components, rotation sensor components (e.g., gyroscope), etc. Environmental component 1136 may include, for example, a lighting sensor component (e.g., a photometer), a temperature sensor component (e.g., one or more thermometers that detect ambient temperature), a humidity sensor component, a pressure sensor component (e.g., a barometer), an hearing sensor component (e.g., one or more microphones that detect background noise), a proximity sensor component (e.g., an infrared sensor that detects nearby objects), a gas sensor (e.g., a gas detection sensor that detects the concentration of hazardous gases or measures pollutants in the atmosphere for safety reasons), or other components that can provide indications, measurements, or signals corresponding to the surrounding physical environment. Positioning component 1138 may include a position sensor component (e.g., a GPS receiver component), an altitude sensor component (e.g., an altimeter or barometer that detects air pressure from which altitude can be derived), an orientation sensor component (e.g., a magnetometer), etc.

[0176] A wide variety of technologies can be used to implement communication. I / O component 1118 may include communication component 1140, which is operable to couple machine 1100 to network 1132 or device 1120 via coupling 1124 and coupling 1122, respectively. For example, communication component 1140 may include network interface component or other suitable device to interface with network 1132. In other examples, communication component 1140 may include wired communication component, wireless communication component, cellular communication component, near field communication (NFC) component, Bluetooth® component (e.g., Bluetooth Low Energy®), Wi-Fi® component, and other communication components that provide communication via other modalities. Device 1120 may be another machine or any peripheral device from a variety of peripheral devices (e.g., a peripheral device coupled via USB).

[0177] Furthermore, the communication component 1140 can detect identifiers or include components operable to detect identifiers. For example, the communication component 1140 may include a radio frequency identification (RFID) tag reader component, an NFC smart tag detection component, an optical reader component (e.g., an optical sensor for detecting one-dimensional barcodes such as Universal Product Code (UPC) barcodes; multi-dimensional barcodes such as Quick Response (QR) codes, Aztec codes, data matrices, dataglyphs, MaxiCodes, PDF417, supercodes, UCC RSS-2D barcodes, and other optical codes), or an acoustic detection component (e.g., a microphone for identifying audio signals of the tag). Additionally, various information can be obtained via the communication component 1140, such as location obtained via Internet Protocol (IP) geolocation, location obtained via Wi-Fi® signal triangulation, location obtained by detecting NFC beacon signals that can indicate a specific location, etc.

[0178] The following discussion involves various terms or phrases mentioned throughout the publicly available content of this topic.

[0179] "Signal medium" means any intangible medium capable of storing, encoding, or carrying instructions executable by a machine, and includes digital or analog communication signals or other intangible media to facilitate the communication of software or data. The term "signal medium" should be considered to include any form of modulated data signal, carrier wave, etc. The term "modulated data signal" means a signal whose characteristics are set or altered in such a way as to encode information in the signal. The terms "transmission medium" and "signal medium" mean the same thing and may be used interchangeably in this disclosure.

[0180] "Communications network" refers to one or more parts of a network, which can be an ad hoc network, intranet, extranet, virtual private network (VPN), local area network (LAN), wireless LAN (WLAN), wide area network (WAN), wireless WAN (WWAN), metropolitan area network (MAN), the Internet, a part of the Internet, a part of the Public Switched Telephone Network (PSTN), a Common Old-Style Telephone Service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, another type of network, or a combination of two or more such networks. For example, a network or part of a network may include a wireless network or a cellular network, and the coupling may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile Communications (GSM) connection, or other types of cellular or wireless coupling. In this example, coupling can enable any data transmission technology of various types, such as Single Carrier Radio Transmission (1xRTT), Evolved Data Optimization (EVDO), General Packet Radio Service (GPRS), Enhanced Data Rate Evolution (EDGE) for GSM, 3rd Generation Partnership Project (3GPP) including 3G, 4th Generation Wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High-Speed ​​Packet Access (HSPA), Global Microwave Access Interoperability (WiMAX), Long Term Evolution (LTE) standards, other data transmission technologies defined by various standards setting organizations, other long-distance protocols, or other data transmission technologies.

[0181] A "processor" refers to any circuit or virtual circuit (a physical circuit simulated by logic executed on an actual processor) that manipulates data values ​​according to control signals (e.g., "commands," "opcodes," "machine codes," etc.) and generates corresponding output signals applied to operate a machine. For example, a processor can be a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Radio Frequency Integrated Circuit (RFIC), or any combination thereof. A processor can also be a multi-core processor having two or more independent processors (sometimes referred to as "cores") capable of executing instructions simultaneously.

[0182] "Machine storage medium" refers to one or more storage devices and / or media (e.g., centralized or distributed databases and / or associated caches and servers) that store executable instructions, routines, and / or data. Therefore, the above term should be considered to include, but is not limited to, solid-state memory and optical and magnetic media, including memory internal or external to the processor. Specific examples of machine storage media, computer storage media, and / or device storage media include, by way of example, non-volatile memory, including, for example, semiconductor memory devices such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), FPGA, and flash memory devices; disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms "machine storage medium," "device storage medium," and "computer storage medium" mean the same thing and are used interchangeably in this disclosure. The terms "machine storage medium," "computer storage medium," and "device storage medium" expressly exclude carrier waves, modulated data signals, and other such media, at least some of which are covered by the term "signal medium."

[0183] A “component” refers to a device, physical entity, or logic having boundaries defined by function or subroutine calls, branch points, APIs, or other technologies that provide for partitioning or modularizing specific processing or control functions. Components can be combined with other components via their interfaces to perform machine processing. A component can be an encapsulated functional hardware unit designed for use with other components and can be part of a program that typically performs a specific function within a related function. Components can constitute software components (e.g., code embodied on a machine-readable medium) or hardware components. A “hardware component” is a tangible unit capable of performing certain operations and can be configured or arranged in some physical manner. In various example implementations, one or more computer systems (e.g., standalone computer systems, client computer systems, or server computer systems) or one or more hardware components (e.g., processors or processor groups) of a computer system can be configured by software (e.g., an application or application portion) to operate as hardware components performing certain operations as described herein. Hardware components can also be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware component may include dedicated circuitry or logic permanently configured to perform certain operations. Hardware components can be dedicated processors, such as field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). Hardware components can also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware component may include software executed by a general-purpose processor or other programmable processor. Once configured by such software, the hardware component becomes a specific machine (or a specific component of a machine) uniquely tailored to perform the configured function and is no longer a general-purpose processor. It will be understood that decisions to implement hardware components mechanically in dedicated and permanently configured circuitry or in temporarily configured (e.g., software-configured) circuitry may be driven by cost and time considerations. Therefore, the phrase "hardware component" (or "hardware-implemented component") should be understood to include tangible entities, i.e., entities that are physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain way or perform certain operations described herein. Considering the implementation of hardware components being temporarily configured (e.g., programmed), it is not necessary to configure or instantiate each hardware component in the hardware component at any given time. For example, in cases where the hardware components include a general-purpose processor configured as a dedicated processor via software, this general-purpose processor can be configured as different dedicated processors (e.g., including different hardware components) at different times. The software accordingly configures one or more specific processors to constitute a specific hardware component at one time and different hardware components at different times. Hardware components can provide information to and receive information from other hardware components. Accordingly, the described hardware components can be considered communicatively coupled.In the presence of multiple hardware components, communication can be achieved through signal transmission (e.g., via appropriate circuitry and buses) between or among two or more hardware components. In embodiments where multiple hardware components are configured or instantiated at different times, such communication between hardware components can be achieved, for example, by storing information in a memory structure accessible to the multiple hardware components and retrieving information from that memory structure. For example, a hardware component can perform an operation and store the output of that operation in a communication-coupled memory device. Another hardware component can then access the memory device at a subsequent time to retrieve and process the stored output. Hardware components can also initiate communication with input or output devices and can operate on resources (e.g., information collection). The various operations of the example methods described herein can be performed, at least in part, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors can constitute processor-implemented components that operate to perform one or more operations or functions described herein. As used herein, a "processor-implemented component" refers to a hardware component implemented using one or more processors. Similarly, the methods described herein can be implemented at least in part by processors, where a particular processor or one or more processors are examples of hardware. For example, at least some operations of the methods can be performed by one or more processors or processor-implemented components. Furthermore, one or more processors can also operate to support the execution of related operations in a “cloud computing” environment or as “Software as a Service” (SaaS). For example, at least some operations can be performed by a group of computers (as an example of a machine including processors), where these operations can be accessed via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., APIs). The execution of certain operations can be distributed among processors, residing not only within a single machine but also deployed across multiple machines. In some example implementations, the processor or processor-implemented component can reside in a single geographic location (e.g., in a home environment, office environment, or server cluster). In other example implementations, the processor or processor-implemented component can be distributed across multiple geographic locations.

[0184] "Carrier signal" refers to any intangible medium capable of storing, encoding, or carrying instructions to be executed by a machine, and includes digital or analog communication signals or other intangible media to facilitate the communication of such instructions. Instructions can be sent or received over a network using a transmission medium via a network interface device.

[0185] "Computer-readable medium" refers to both machine storage media and transmission media. Therefore, these terms include both storage devices / media and carrier / modulated data signals. The terms "machine-readable medium," "computer-readable medium," and "device-readable medium" refer to the same thing and may be used interchangeably in this disclosure.

[0186] "Client device" means any machine that interfaces with a communication network to obtain resources from one or more server systems or other client devices. Client devices can be, but are not limited to, mobile phones, desktop computers, laptop computers, portable digital assistants (PDAs), smartphones, tablet computers, ultrabooks, netbooks, laptop computers, multiprocessor systems, microprocessor-based or programmable consumer electronics, game consoles, set-top boxes, or any other communication device that a user can use to access the network. In this disclosure, client devices are also referred to as "electronic devices."

[0187] A "brief message" is a message that is accessible for a limited period of time. Brief messages can be text, images, videos, etc. The access time for a brief message can be set by the message sender. Alternatively, the access time can be a default setting or a setting specified by the recipient. Regardless of the setting method, the message is temporary.

[0188] "Signal medium" means any intangible medium capable of storing, encoding, or carrying instructions executable by a machine, and includes digital or analog communication signals or other intangible media to facilitate the communication of software or data. The term "signal medium" should be considered to include any form of modulated data signal, carrier wave, etc. The term "modulated data signal" means a signal whose characteristics are set or altered in such a way as to encode information in the signal. The terms "transmission medium" and "signal medium" mean the same thing and may be used interchangeably in this disclosure.

[0189] "Communications network" refers to one or more parts of a network, which can be an ad hoc network, intranet, extranet, virtual private network (VPN), local area network (LAN), wireless LAN (WLAN), wide area network (WAN), wireless WAN (WWAN), metropolitan area network (MAN), the Internet, a part of the Internet, a part of the Public Switched Telephone Network (PSTN), a Common Old-Style Telephone Service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, other types of networks, or a combination of two or more such networks. For example, a network or part of a network may include a wireless network or a cellular network, and the coupling may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile Communications (GSM) connection, or other types of cellular or wireless coupling. In this example, coupling can enable any data transmission technology of various types, such as Single Carrier Radio Transmission (1xRTT), Evolved Data Optimization (EVDO), General Packet Radio Service (GPRS), Enhanced Data Rate Evolution (EDGE) for GSM, 3rd Generation Partnership Project (3GPP) including 3G, 4th Generation Wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High-Speed ​​Packet Access (HSPA), Global Microwave Access Interoperability (WiMAX), Long Term Evolution (LTE) standards, other data transmission technologies defined by various standards setting organizations, other long-distance protocols, or other data transmission technologies.

[0190] A "processor" refers to any circuit or virtual circuit (a physical circuit simulated by logic executed on an actual processor) that manipulates data values ​​according to control signals (e.g., "commands," "opcodes," "machine codes," etc.) and generates corresponding output signals applied to operate a machine. For example, a processor can be a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Radio Frequency Integrated Circuit (RFIC), or any combination thereof. A processor can also be a multi-core processor having two or more independent processors (sometimes referred to as "cores") capable of executing instructions simultaneously.

[0191] "Machine storage medium" refers to one or more storage devices and / or media (e.g., centralized or distributed databases and / or associated caches and servers) that store executable instructions, routines, and / or data. Therefore, the above term should be considered to include, but is not limited to, solid-state memory and optical and magnetic media, including memory internal or external to the processor. Specific examples of machine storage media, computer storage media, and / or device storage media include, by way of example, non-volatile memory, including, for example, semiconductor memory devices such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), FPGA, and flash memory devices; disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms "machine storage medium," "device storage medium," and "computer storage medium" mean the same thing and are used interchangeably in this disclosure. The terms "machine storage medium," "computer storage medium," and "device storage medium" expressly exclude carrier waves, modulated data signals, and other such media, at least some of which are covered by the term "signal medium."

[0192] A “component” refers to a device, physical entity, or logic having boundaries defined by function or subroutine calls, branch points, APIs, or other technologies that provide for partitioning or modularizing specific processing or control functions. Components can be combined with other components via their interfaces to perform machine processing. A component can be an encapsulated functional hardware unit designed for use with other components and can be part of a program that typically performs a specific function within a related function. Components can constitute software components (e.g., code embodied on a machine-readable medium) or hardware components. A “hardware component” is a tangible unit capable of performing certain operations and can be configured or arranged in some physical manner. In various example implementations, one or more computer systems (e.g., standalone computer systems, client computer systems, or server computer systems) or one or more hardware components (e.g., processors or processor groups) of a computer system can be configured by software (e.g., an application or application portion) to operate to perform certain operations as described herein. Hardware components can also be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware component may include dedicated circuitry or logic permanently configured to perform certain operations. Hardware components can be dedicated processors, such as field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). Hardware components can also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware component may include software executed by a general-purpose processor or other programmable processor. Once configured by such software, the hardware component becomes a specific machine (or a specific component of a machine) uniquely tailored to perform the configured function and is no longer a general-purpose processor. It will be understood that decisions to implement hardware components mechanically in dedicated and permanently configured circuitry or in temporarily configured (e.g., software-configured) circuitry may be driven by cost and time considerations. Therefore, the phrase "hardware component" (or "hardware-implemented component") should be understood to include tangible entities, i.e., entities that are physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain way or perform certain operations described herein. Considering the implementation of hardware components being temporarily configured (e.g., programmed), it is not necessary to configure or instantiate each hardware component in the hardware component at any given time. For example, where the hardware components include a general-purpose processor that is configured by software to become a dedicated processor, this general-purpose processor can be configured as different dedicated processors (e.g., including different hardware components) at different times. The software accordingly configures one or more specific processors to constitute a specific hardware component at one time and different hardware components at different times. Hardware components can provide information to and receive information from other hardware components.Accordingly, the described hardware components can be considered communicationally coupled. In the presence of multiple hardware components, communication can be achieved through signal transmission (e.g., via appropriate circuitry and buses) between or among the two or more hardware components. In embodiments where multiple hardware components are configured or instantiated at different times, such communication between hardware components can be achieved, for example, by storing information in a memory structure accessible to the multiple hardware components and retrieving information from that memory structure. For example, a hardware component can perform an operation and store the output of that operation in a communicationally coupled memory device. Another hardware component can then access the memory device at a subsequent time to retrieve and process the stored output. Hardware components can also initiate communication with input or output devices and can operate on resources (e.g., information collection). The various operations of the example methods described herein can be performed, at least in part, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors can constitute components of a processor implementation that operates to perform one or more operations or functions described herein. As used herein, a “processor-implemented component” refers to a hardware component implemented using one or more processors. Similarly, the methods described herein can be implemented at least in part by processors, where a particular processor or one or more processors are examples of hardware. For example, at least some operations of the methods can be performed by one or more processors or processor-implemented components. Furthermore, one or more processors can also operate to support the execution of related operations in a “cloud computing” environment or as “Software as a Service” (SaaS). For example, at least some operations can be performed by a group of computers (as an example of a machine including processors), where these operations can be accessed via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., APIs). The execution of certain operations can be distributed among processors, residing not only within a single machine but also deployed across multiple machines. In some example implementations, the processor or processor-implemented component can reside in a single geographic location (e.g., in a home environment, office environment, or server cluster). In other example implementations, the processor or processor-implemented component can be distributed across multiple geographic locations.

[0193] "Carrier signal" refers to any intangible medium capable of storing, encoding, or carrying instructions to be executed by a machine, and includes digital or analog communication signals or other intangible media to facilitate the communication of such instructions. Instructions can be sent or received over a network using a transmission medium via a network interface device.

[0194] "Computer-readable medium" refers to both machine storage media and transmission media. Therefore, these terms include both storage devices / media and carrier / modulated data signals. The terms "machine-readable medium," "computer-readable medium," and "device-readable medium" refer to the same thing and may be used interchangeably in this disclosure.

[0195] "Client device" refers to any machine that interfaces with a communication network to obtain resources from one or more server systems or other client devices. Client devices can be, but are not limited to, mobile phones, desktop computers, laptop computers, portable digital assistants (PDAs), smartphones, tablet computers, ultrabooks, netbooks, laptop computers, multiprocessor systems, microprocessor-based or programmable consumer electronics, game consoles, set-top boxes, or any other communication device that a user can use to access the network.

[0196] A "brief message" is a message that is accessible for a limited period of time. Brief messages can be text, images, videos, etc. The access time for a brief message can be set by the message sender. Alternatively, the access time can be a default setting or a setting specified by the recipient. Regardless of the setting method, the message is temporary.

[0197] Furthermore, according to embodiments of this disclosure, the following configurations 1-20 are provided.

[0198] 1. A method comprising: Receive first image data captured by the camera device of the eye-wearing device; The representation of the display screen in the first image data is detected using a machine learning model; Select at least a portion of what is displayed on the screen; Adjusting the visual appearance of the portion represented on the display screen; and This enables the display system of the eye-wearing device to display the adjusted visual appearance.

[0199] 2. The method according to configuration 1, wherein detecting the representation of the display screen using the machine learning model includes: Object detection processing is performed on the first image data to determine the representation of the display screen, wherein the machine learning model determines a prediction of the representation of the display screen included in the first image data.

[0200] 3. The method according to configuration 2, wherein the machine learning model includes a convolutional neural network (CNN).

[0201] 4. The method according to configuration 3, wherein the CNN determines that a region of interest exists in the representation of the display screen, the region of interest including a portion of the first image data.

[0202] 5. The method according to configuration 4, wherein the region of interest includes candidate bounding boxes, and the candidate bounding boxes include a representation of the display screen.

[0203] 6. The method according to configuration 4, wherein the CNN generates a feature map based on the region of interest and provides a vector of values ​​corresponding to the feature map as output, the vector including corresponding values ​​describing the content of the region of interest.

[0204] 7. The method according to configuration 6, wherein the feature map includes a feature set, and a classifier model generates a classification from the feature set of the feature map, the classification including display objects.

[0205] 8. The method according to configuration 1, wherein adjusting the visual appearance of the portion represented by the display screen includes: Generate a representation of a bounding box surrounding the display screen, the representation of the box comprising a set of pixels corresponding to at least four sides; The pixel set is modified by changing at least a first color value of a first pixel to a second color value, wherein the second color value is different from the first color value; and The second set of pixels corresponding to the representation of the display screen is modified by changing at least the first luminance value of the third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value.

[0206] 9. The method according to configuration 8 further includes: The pixel set is modified by generating at least a message indicating that the screen has been detected.

[0207] 10. The method according to configuration 1 further includes: Receive second image data captured by the camera device, the second image data being received as the camera device of the eye-wearing device moves based on changes in the position of the head of the user wearing the eye-wearing device; The display screen indicates that the representation no longer exists in the second image data; and The second image data is modified to indicate that the representation of the display screen no longer exists. The modification includes reducing a portion of the second image data to reduce at least one luminance value of the pixels from the second image data.

[0208] 11. A system comprising: Processor; and The memory includes instructions that, when executed by the processor, cause the processor to perform operations, the operations including: Receive first image data captured by the camera device of the eye-wearing device; The representation of the display screen in the first image data is detected using a machine learning model; Select at least a portion of what is displayed on the screen; Adjusting the visual appearance of the portion represented on the display screen; and This enables the display system of the eye-wearing device to display the adjusted visual appearance.

[0209] 12. The system according to configuration 11, wherein using the machine learning model to detect the representation of the display screen includes: Object detection processing is performed on the first image data to determine the representation of the display screen, wherein the machine learning model determines a prediction of the representation of the display screen included in the first image data.

[0210] 13. The system according to configuration 12, wherein the machine learning model includes a convolutional neural network (CNN).

[0211] 14. The system according to configuration 13, wherein the CNN determines that a region of interest exists in the representation of the display screen, the region of interest including a portion of the first image data.

[0212] 15. The system according to configuration 14, wherein the region of interest includes candidate bounding boxes, the candidate bounding boxes including a representation of the display screen.

[0213] 16. The system according to configuration 14, wherein the CNN generates a feature map based on the region of interest and provides a vector of values ​​corresponding to the feature map as output, the vector including corresponding values ​​describing the content of the region of interest.

[0214] 17. The system according to configuration 16, wherein the feature map includes a feature set, and a classifier model generates a classification from the feature set of the feature map, the classification including display objects.

[0215] 18. The system according to configuration 11, wherein adjusting the visual appearance of the portion represented by the display screen includes: Generate a representation of a bounding box surrounding the display screen, the representation of the box comprising a set of pixels corresponding to at least four sides; The pixel set is modified by changing at least a first color value of a first pixel to a second color value, wherein the second color value is different from the first color value; and The second set of pixels corresponding to the representation of the display screen is modified by changing at least the first luminance value of the third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value.

[0216] 19. The system according to configuration 18, wherein the operation further includes: The pixel set is modified by generating at least a message indicating that the screen has been detected.

[0217] 20. A non-transitory computer-readable medium including instructions that, when executed by a computing device, cause the computing device to perform operations, the operations including: Receive first image data captured by the camera device of the eye-wearing device; The representation of the display screen in the first image data is detected using a machine learning model; Select at least a portion of what is displayed on the screen; Adjusting the visual appearance of the portion represented on the display screen; and This enables the display system of the eye-wearing device to display the adjusted visual appearance.

Claims

1. A method comprising: Receive first image data captured by the camera device of the eye-wearing device; The machine learning model is used to detect the representation of the electronic device's display screen in the first image data; Select at least a portion of the display screen of the electronic device; While the display of the electronic device is detected in the field of vision of a user using the eye-wearing device, the visual appearance of said portion of the display is adjusted, wherein adjusting the visual appearance of said portion of the display while the display of the electronic device is detected in the field of vision of a user using the eye-wearing device includes at least: Generate a representation of a bounding box surrounding the display screen, the representation of the bounding box including a set of pixels corresponding to at least four sides, and The second set of pixels corresponding to the representation of the display screen is modified by changing at least the first luminance value of the third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value; and This enables the display system of the eye-wearing device to display the adjusted visual appearance.

2. The method according to claim 1, wherein, Detecting the representation of the display screen using the machine learning model includes: Object detection processing is performed on the first image data to determine the representation of the display screen, wherein the machine learning model determines a prediction of the representation of the display screen included in the first image data.

3. The method according to claim 2, wherein, The machine learning model includes a convolutional neural network (CNN).

4. The method according to claim 3, wherein, The CNN determines that there exists a region of interest represented by the display screen, the region of interest including a portion of the first image data.

5. The method according to claim 4, wherein, The region of interest includes candidate bounding boxes, which are represented by the display screen.

6. The method according to claim 4, wherein, The CNN generates a feature map based on the region of interest and provides a vector of values ​​corresponding to the feature map as output, the vector including corresponding values ​​describing the content of the region of interest.

7. The method according to claim 6, wherein, The feature map includes a set of features, and the classifier model generates a classification from the feature set of the feature map, the classification including display objects.

8. The method according to claim 1, wherein, Adjusting the visual appearance of the portion represented by the display screen further includes: The second set of pixels corresponding to the representation of the display screen is modified by changing at least the specific luminance value of the fourth pixel to a second specific luminance value, wherein the second specific luminance value is less than the specific luminance value.

9. The method according to claim 1, wherein, Adjusting the visual appearance of the portion represented by the display screen further includes: The second set of pixels corresponding to the representation of the display screen is modified by changing the color value of at least the fourth pixel to the second color value.

10. The method according to claim 1, further comprising: Receive second image data captured by the camera device, the second image data being received as the camera device of the eye-wearing device moves based on changes in the position of the head of the user wearing the eye-wearing device; The display screen indicates that the representation no longer exists in the second image data; as well as The second image data is modified to indicate that the representation of the display screen no longer exists. The modification includes reducing a portion of the second image data to reduce at least one luminance value of the pixels from the second image data.

11. A system comprising: processor; as well as The memory includes instructions that, when executed by the processor, cause the processor to perform operations, the operations including: Receive first image data captured by the camera device of the eye-wearing device; The machine learning model is used to detect the representation of the electronic device's display screen in the first image data; Select at least a portion of the display screen of the electronic device; While the display of the electronic device is detected in the field of vision of a user using the eye-wearing device, the visual appearance of said portion of the display is adjusted, wherein adjusting the visual appearance of said portion of the display while the display of the electronic device is detected in the field of vision of a user using the eye-wearing device includes at least: Generate a representation of a bounding box surrounding the display screen, the representation of the bounding box including a set of pixels corresponding to at least four sides, and The second set of pixels corresponding to the representation of the display screen is modified by changing at least the first luminance value of the third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value; and This enables the display system of the eye-wearing device to display the adjusted visual appearance.

12. The system according to claim 11, wherein, Detecting the representation of the display screen using the machine learning model includes: Object detection processing is performed on the first image data to determine the representation of the display screen, wherein the machine learning model determines a prediction of the representation of the display screen included in the first image data.

13. The system according to claim 12, wherein, The machine learning model includes a convolutional neural network (CNN).

14. The system according to claim 13, wherein, The CNN determines that there exists a region of interest represented by the display screen, the region of interest including a portion of the first image data.

15. The system according to claim 14, wherein, The region of interest includes candidate bounding boxes, which are represented by the display screen.

16. The system according to claim 14, wherein, The CNN generates a feature map based on the region of interest and provides a vector of values ​​corresponding to the feature map as output, the vector including corresponding values ​​describing the content of the region of interest.

17. The system according to claim 16, wherein, The feature map includes a set of features, and the classifier model generates a classification from the feature set of the feature map, the classification including display objects.

18. The system according to claim 11, wherein, Adjusting the visual appearance of the portion represented by the display screen further includes: The second set of pixels corresponding to the representation of the display screen is modified by changing at least the specific luminance value of the fourth pixel to a second specific luminance value, wherein the second specific luminance value is less than the specific luminance value.

19. The system according to claim 11, wherein, Adjusting the visual appearance of the portion represented by the display screen further includes: The second set of pixels corresponding to the representation of the display screen is modified by changing the color value of at least the fourth pixel to the second color value.

20. A non-transitory computer-readable medium comprising instructions that, when executed by a computing device, cause the computing device to perform operations, the operations including: Receive first image data captured by the camera device of the eye-wearing device; The machine learning model is used to detect the representation of the electronic device's display screen in the first image data; Select at least a portion of the display screen of the electronic device; While the display of the electronic device is detected in the field of vision of a user using the eye-wearing device, the visual appearance of said portion of the display is adjusted, wherein adjusting the visual appearance of said portion of the display while the display of the electronic device is detected in the field of vision of a user using the eye-wearing device includes at least: Generate a representation of a bounding box surrounding the display screen, the representation of the bounding box including a set of pixels corresponding to at least four sides, and The second set of pixels corresponding to the representation of the display screen is modified by changing at least the first luminance value of the third pixel to a second luminance value, wherein the second luminance value is greater than the first luminance value; and This enables the display system of the eye-wearing device to display the adjusted visual appearance.