System and method for virtual fitting during live streaming
By acquiring 3D models of users and products, simulating fitting postures, and rendering them into streaming media data, the problem of users having difficulty imagining the appearance of products is solved, achieving a more accurate product evaluation and fitting experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP LTD
- Filing Date
- 2021-02-26
- Publication Date
- 2026-06-23
AI Technical Summary
Users find it difficult to accurately imagine how a product will look and perform on themselves based on streaming data, especially in reviews of clothing or wearable products.
By acquiring 3D models of users and products, the system simulates the product's appearance on the user and renders it into streaming media data to enhance the user's visual experience.
It provides more accurate product appearance and performance evaluation, enhances the user's try-on experience, and helps users better understand how the product will look on them.
Smart Images

Figure CN120017918B_ABST
Abstract
Description
[0001] Case Analysis
[0002] This application is a divisional application of Chinese Patent No. 202180014731.X, filed on February 26, 2021, entitled "System and Method for Virtual Fitting During Live Streaming".
[0003] Cross-references to related applications
[0004] This application is based on and claims priority to U.S. Provisional Patent Application No. 62 / 987,474, filed on March 10, 2020, entitled “System and Method for Virtual Fitting During Live Streaming”, the entire contents of which are hereby incorporated herein by reference. Technical Field
[0005] This invention generally relates to methods and systems associated with virtual try-on applications. More specifically, embodiments of the invention provide methods and systems for enhancing streaming media data to include virtual try-on data. Background Technology
[0006] When considering whether to buy a product online, users often struggle to visualize its appearance or performance. This is especially true for clothing or other wearable products. To better understand a product's attributes, users frequently watch images and / or videos of demonstrators reviewing and commenting on it. These images and videos showcase the product on the demonstrator, who discusses its various advantages and disadvantages. However, due to differences in body shape, even seeing the product on a demonstrator doesn't necessarily help users fully visualize how it would look or perform on themselves.
[0007] The embodiments of the present invention address these and other problems individually and collectively. Summary of the Invention
[0008] The method includes evaluating streaming data to obtain a first three-dimensional (3D) model associated with the product. The method also includes obtaining a second 3D model associated with a user. The first 3D model is then fitted onto the second 3D model, and the fitted model is posed in a manner estimated from the presenter in the streaming data. The posed model is then rendered and presented to a viewer along with the streaming data. Embodiments of the invention are applicable to various applications in virtual reality and computer-based fitting systems.
[0009] One embodiment of the present invention relates to a method comprising receiving an indication of media content being viewed by a user, identifying a product associated with the media content, obtaining a first 3D model representing the product, obtaining a second 3D model representing the user, determining a demonstration pose based on the media content, applying the demonstration pose to the second 3D model, generating a third 3D model by having the second 3D model try on the first 3D model, and presenting the third 3D model to the user in the demonstration pose.
[0010] Another embodiment of the present invention relates to a system including a processor and a memory including instructions, wherein when the instructions are executed by the processor, the system at least receives an indication of media content being viewed by a user, identifies a product associated with the media content, acquires a first 3D model representing the product, acquires a second 3D model representing the user, determines a demonstration pose based on the media content, applies the demonstration pose to the second 3D model, generates a third 3D model by having the second 3D model try on the first 3D model, and presents the third 3D model to the user in the demonstration pose.
[0011] Another embodiment of this disclosure relates to a non-transitory computer-readable medium storing specific computer-executable instructions that, when executed by a processor, at least cause a computer system to receive an instruction that a user is viewing media content, identify a product associated with the media content, obtain a first 3D model representing the product, obtain a second 3D model representing the user, determine a demonstration pose based on the media content, apply the demonstration pose to the second 3D model, generate a third 3D model by having the second 3D model try on the first 3D model, and present the third 3D model to the user in the demonstration pose.
[0012] Compared to traditional systems, this system offers several advantages. For example, embodiments of the invention relate to a method and system for providing a user with a more accurate assessment of how clothing or other wearable products look on him / her. In this system, streaming media data is enhanced using the user's virtual try-on data. To this end, a product model associated with the streaming media data is identified, and a user model associated with the viewer of the streaming media data is obtained. The product model is tried on onto the user model, which is posed in a manner similar to the silhouette of a presenter in the streaming media data. The product model and the user model are rendered and presented together with the streaming media data (e.g., enhanced within the streaming media data). Attached Figure Description
[0013] Figure 1Illustrative examples of systems that can use virtual try-on information to enhance streaming video according to at least some embodiments are shown.
[0014] Figure 2 A system architecture for enhancing streaming data using virtual fitting information is shown according to at least some embodiments.
[0015] Figure 3 A simplified flowchart of a method for presenting a data stream augmented using virtual fitting data, according to an embodiment of the present invention, is shown.
[0016] Figure 4 An illustrative example of a technique for obtaining a 3D model using sensor data according to at least some embodiments is shown.
[0017] Figure 5 An example of a graphical user interface (GUI) for demonstrating achievable features according to an embodiment of the present invention is shown.
[0018] Figure 6 A flowchart is shown, illustrating a process for presenting virtual fitting data to a user, according to at least some embodiments.
[0019] Figure 7 Examples of components of a computer system according to some embodiments are shown.
[0020] Figure 8 A block diagram of an apparatus for presenting virtual fitting data to a user, according to at least some embodiments, is shown. Detailed Implementation
[0021] This invention generally relates to methods and systems related to virtual reality applications. More specifically, embodiments of the invention provide methods and systems for determining the fit of a user and a product. Embodiments of the invention are applicable to various applications in virtual reality and computer-based try-on systems.
[0022] Figure 1 Illustrative examples of systems that can enhance streaming video using virtual try-on information, according to at least some embodiments, are shown. Figure 1 In this context, user equipment 102 is used to provide a request for virtual try-on information to mobile application server 104. In some cases, the user equipment can be used to obtain user data 106, which can be provided to mobile application server 104 for generating virtual try-on information.
[0023] In one example, user equipment 102 represents a suitable computing device that includes one or more graphics processing units (GPUs), one or more general-purpose processors (GPPs), and one or more memories storing computer-readable instructions that can be executed by at least one of the processors to perform various functions of embodiments of the present invention. For example, user equipment 102 can be any of a smartphone, tablet, laptop, personal computer, game console, or smart TV. User equipment 102 may also include a ranging camera (i.e., a depth sensor) and / or an RGB optical sensor (e.g., a camera).
[0024] The user device can be used to capture and / or generate user data 106. User data 106 may include information related to a specific user (e.g., the user of user device 102) for whom virtual try-on data should be created. User data 106 may include data about the user that can be used to generate the virtual try-on data. For example, user data 106 may include the user's dimensions. User data 106 can be captured in any suitable format. For example, user data 106 may include point clouds, 3D meshes, or models, or strings containing measurements at predetermined locations. In some cases, capturing user data 106 includes receiving manually entered information about the user into user device 102. For example, the user may input measurements of various parts of the user's body via a keyboard. In some cases, acquiring user data 106 may include using a camera and / or depth sensor to acquire image / depth information related to the user. User device 102 may also be configured to generate a 3D model based on the acquired image / depth information. References are made below. Figure 1 Let me explain the process in more detail.
[0025] Mobile application server 104 includes any computing device capable of generating a data stream enhanced with virtual try-on data for a user according to the technology described herein. To generate the enhanced data stream, mobile application server 104 can receive user data 106 from user device 102. It should be noted that although mobile application server 104 can receive user data 106 simultaneously with a request to generate virtual try-on data, mobile application server 104 can also receive user data 106 prior to and independently of any request to generate virtual try-on data. For example, mobile application server 104 can receive user data 106 during the registration phase when a user creates an account on mobile application server 104.
[0026] A request for virtual try-on data may refer to streaming data 108. Streaming data 108 may be streaming video (e.g., a live stream) or other suitable dynamic media content. Streaming data 108 may be represented as at least one presenter 110 and at least one product 112. Mobile application server 104 may obtain an identifier (product identifier 114) for at least one product 112 and data related to the presenter's posture (posture data 116) from streaming data 108. In some embodiments, one or more of product identifier 114 or posture data 116 may be associated with streaming data 108 via metadata attached to streaming data 108. In some embodiments, one or more machine vision techniques may be used to determine one or more of product identifier 114 and / or posture data 116 from images within streaming data 108.
[0027] Mobile application server 104 may include or access object model data 118, from which product data 120 may be obtained to fulfill a request. Object model data 118 may include any computer-readable storage medium on which one or more 3D models are stored. For example, object model data 118 may be a database maintained by mobile application server 104 or another server. The 3D models stored in object model data 118 may represent products that can be worn by a user, such as clothing (e.g., garments) or accessories. In some embodiments, object model data 118 may store 3D models of multiple versions of a product (e.g., different sizes and / or styles). When a product identifier 114 for a specific product is received, mobile application server 104 retrieves product data 120 from object model data 118, which includes a 3D model associated with that specific product.
[0028] Mobile application server 104 can be configured to combine user data 106 and product data 120 to generate a try-on avatar for the user. Mobile application server 104 can also pose the try-on avatar based on pose data 116. Once the try-on avatar is generated, mobile application server 104 can use it to enhance streaming data 108 to generate enhanced streaming data 122. Once enhanced streaming data 122 is generated, it can be sent back to user device 102, where it can be rendered on a display for the user to view.
[0029] For clarity, Figure 1 A certain number of components are shown. However, it should be understood that the number of each component in the embodiments of the present invention may exceed one. Furthermore, some embodiments of the present invention may include fewer or more than [number missing]. Figure 1 All components are shown. In addition... Figure 1The components in the system can communicate using any suitable communication protocol via any suitable communication medium (including the Internet).
[0030] Figure 2 The architecture of a system for enhancing streaming data using virtual fitting information, according to at least some embodiments, is shown. Figure 2 In this context, user equipment 202 can communicate with at least several other components, including mobile application server 204. Mobile application server 204 can perform at least a portion of the processing functions required by a mobile application installed on the user equipment. User equipment 202 and mobile application server 204 can be referenced. Figure 1 Examples of user device 102 and mobile application server 104 are described separately.
[0031] User equipment 202 can be any suitable electronic device having at least some of the functions described in this invention. Specifically, user equipment 202 can be any electronic device capable of capturing user data and / or presenting enhanced data streams on a display. In some embodiments, user equipment may be able to establish a communication session with another electronic device (e.g., mobile application server 204) and send / receive data to / from that electronic device. User equipment has the ability to download and / or execute mobile applications. User equipment includes mobile communication devices as well as personal computers and thin client devices. In some embodiments, user equipment includes any portable electronic device with basic communication-related functions. For example, user equipment can be a smartphone, a personal data assistant (PDA), or any other suitable handheld device. User equipment can be implemented as a self-contained unit having various components integrated into the user equipment (e.g., input sensors, one or more processors, memory, etc.). The reference to "output" of a component or "output" of a sensor in this invention does not necessarily mean that the output is sent outside the user equipment. The outputs of various components may remain within the self-contained unit defining the user equipment.
[0032] In one illustrative configuration, user equipment 202 may include at least one memory 206 and one or more processing units (or processors) 208. Processor 208 may be suitably implemented as hardware, computer-executable instructions, firmware, or a combination thereof. The computer-executable instructions or firmware implementation of processor 208 may include computer-executable instructions or machine-executable instructions written in any suitable programming language for performing the various functions described above. User equipment 202 may also include one or more input sensors 210 for receiving user input and / or environmental input. Various input sensors 210 capable of detecting user input or environmental input may be present, such as accelerometers, camera devices, depth sensors, microphones, global positioning system (e.g., GPS) receivers, etc. The one or more input sensors 210 may include ranging camera devices (e.g., depth sensors) capable of generating depth images and camera devices for acquiring image information.
[0033] For the purposes of this invention, a ranging camera (e.g., a depth sensor) can be any device used to identify the distance or range between one or more objects and the ranging camera. In some embodiments, the ranging camera can generate a depth image (or depth map) on which pixel values correspond to the detection distance of the pixel. Pixel values can be obtained directly in physical units (e.g., meters). In at least some embodiments of the invention, the user equipment can employ a ranging camera that operates using structured light. In a ranging camera that operates using structured light, a projector projects light onto one or more objects in a structured pattern. The light can be located outside the visible light range (e.g., infrared or ultraviolet). The ranging camera has one or more camera devices for acquiring images of objects with reflective patterns. Distance information can then be generated based on distortions detected in the pattern. It should be noted that although this invention focuses on the use of ranging cameras that use structured light, any suitable type of ranging camera, including those that operate using solid triangulation, light sheet triangulation, time-of-flight, interferometry, coded aperture, or any other technique suitable for distance detection, is applicable to the described system.
[0034] Memory 206 stores program instructions that can be loaded and executed on processor 208, as well as data generated during the execution of these programs. Depending on the configuration and type of user equipment 202, memory 206 may be volatile (e.g., random access memory (RAM)) and / or non-volatile (e.g., read-only memory (ROM), flash memory, etc.). User equipment 202 also includes additional memory 212, such as removable or non-removable memory, including but not limited to magnetic storage, optical disk and / or magnetic tape storage. Disk drives and their associated computer-readable media can provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for computing devices. In some embodiments, memory 206 may include various different types of memory, such as static random access memory (SRAM), dynamic random access memory (DRAM), or ROM. The contents of memory 206 are described in detail below. Memory 206 includes operating system 214 and one or more applications or services for implementing the features disclosed in this invention, said applications or services including at least mobile application 216. Memory 206 also includes application data 218, which provides information generated and / or used by mobile application 216. In some embodiments, application data 218 may be stored in a database.
[0035] For the purposes of this invention, a mobile application can be installed and executed on user device 202 using any set of computer-executable instructions. The mobile application may be installed on the user device by the user device manufacturer or by another entity. In some embodiments, mobile application 216 may enable the user device to establish a session with mobile application server 204, which provides backend support for mobile application 216. Mobile application server 204 may maintain account information associated with a specific user device and / or user. In some embodiments, the user may be required to log in to the mobile application to access the functionality provided by mobile application 216.
[0036] According to at least some embodiments, mobile application 216 is configured to provide user information to mobile application server 204 and present information received from mobile application server 204 to the user. More specifically, mobile application 216 is configured to acquire the user's measurement data and submit the measurement data to mobile application server 204 in connection with a request for streaming data enhanced with virtual try-on data. In some embodiments, mobile application 216 may also receive an instruction for a data stream enhanced with virtual try-on data.
[0037] According to at least some embodiments, mobile application 216 can receive output from input sensor 210 and generate a 3D model based on that output. For example, mobile application 216 can receive depth information (e.g., depth image) from a depth sensor (e.g., a ranging camera), where the depth sensor can be, for example, the depth sensor previously described with respect to input sensor 210, and mobile application 216 can also receive image information from a camera input sensor. Based on this information, mobile application 216 can determine the edges of an object to be identified (e.g., a user). For example, a sudden change in depth within the depth information can indicate the edges or contours of an object. In another example, mobile application 216 can use one or more machine vision techniques and / or machine learning to identify the edges of an object. In this example, mobile application 216 can receive image information from camera input sensor 210 and can identify potential objects within the image information based on differences in color or texture data detected within the image or based on learned patterns. In some embodiments, mobile application 216 may send the output obtained from input sensor 210 by user device 202 to mobile application server 204, which may then perform one or more object recognition techniques on the output to generate a 3D model of an object.
[0038] User equipment 202 also includes a communication interface 220 that enables user equipment 202 to communicate with any other suitable electronic device. In some embodiments, the communication interface 220 enables user equipment 202 to communicate with other electronic devices on a network (e.g., a private network). For example, user equipment 202 may include Bluetooth, which allows it to communicate with another electronic device. TM (BLUETOOTH TM Wireless communication module. User equipment 202 also includes input / output (I / O) devices and / or ports 222, such as for enabling connections to keyboards, mice, pens, voice input devices, touch input devices, displays, speakers, printers, etc.
[0039] In some embodiments, user equipment 202 may communicate with mobile application server 204 via a communication network. The communication network may include any one or a combination of several different types of networks, such as cable networks, the Internet, wireless networks, cellular networks, and other private and / or public networks. Furthermore, the communication network includes a variety of different networks. For example, user equipment 202 may communicate with a wireless router via a wireless local area network (WLAN), which may then route the communication to mobile application server 204 via a public network (e.g., the Internet).
[0040] Mobile application server 204 can be any computing device or multiple computing devices used to perform one or more computations for mobile application 216 on user device 202. In some embodiments, mobile application 216 can communicate periodically with mobile application server 204. For example, mobile application 216 can receive updates, push notifications, or other instructions from mobile application server 204. In some embodiments, mobile application 216 and mobile application server 204 may use proprietary encryption and / or decryption schemes to protect their communication. In some embodiments, mobile application server 204 may be executed by one or more virtual machines implemented in a managed computing environment. The managed computing environment includes one or more computing resources that can be rapidly provisioned and released, including computing, networking, and / or storage devices. A managed computing environment may also be referred to as a cloud computing environment.
[0041] In one illustrative configuration, the mobile application server 204 may include at least one memory 224 and one or more processing units (or processors) 226. The processor 226 may be suitably implemented as hardware, computer-executable instructions, firmware, or a combination thereof. The computer-executable instructions or firmware implementation of the processor 226 may include computer-executable instructions or machine-executable instructions written in any suitable programming language for performing the various functions described above.
[0042] Memory 224 may store program instructions that are loadable and executable on processor 226, as well as data generated during the execution of these programs. Depending on the configuration and type of mobile application server 204, memory 224 may be volatile (e.g., RAM) and / or non-volatile (e.g., ROM, flash memory, etc.). Mobile application server 204 also includes additional memory 228, such as removable or non-removable memory, including but not limited to magnetic storage, optical disc and / or magnetic tape storage. Disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules and other data for computing devices. In some embodiments, memory 224 may include various different types of memory, such as SRAM, DRAM or ROM. The contents of memory 224 are described in detail below. Memory 224 includes operating system 230 and one or more applications or services for implementing the features disclosed in this invention, said applications or services including at least a module for trying on a product 3D model onto a user 3D model (trying-on module 232) and / or a module for determining a pose and applying the pose to the product 3D model and the user 3D model (pose module 234). The storage 224 also includes account data 236, which provides information associated with user accounts maintained by the system; user model data 238, which maintains a 3D model associated with each user of the account; and / or object model data 240, which maintains 3D models associated with multiple objects (products). In some embodiments, one or more of the account data 236, user model data 238, or object model data 240 may be stored in a database. In some embodiments, object model data 240 may be an electronic catalog that includes data related to objects that can be sold from resource providers (e.g., retailers or other suitable merchants).
[0043] Memory 224 and additional memory 228 are examples of computer-readable storage media, which may be removable or non-removable. For example, a computer-readable storage medium may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. As used herein, the term "module" refers to a programming module executed by a computing system (e.g., a processor) mounted on and / or executing from the mobile application server 204. The mobile application server 204 also includes a communication connection 242 that allows the mobile application server 204 to communicate with a stored database, another computing device or server, a user terminal, and / or other components of the system. The mobile application server 204 also includes input / output (I / O) devices and / or ports 244, for example, for enabling connections to a keyboard, mouse, pen, voice input device, touch input device, display, speaker, printer, etc.
[0044] The contents of memory 224 are described in detail below. Memory 224 includes a try-on module 232, a posture module 234, a database containing account data 236, a database containing user model data 238, and / or a database containing object model data 240.
[0045] In some embodiments, the try-on module 232 may be configured to work with the processor 226 to deform a product 3D model in order to try it on onto a user 3D model. The try-on module 232 may access one or more rules describing how to deform (e.g., stretch and / or bend) a particular product type (e.g., shirt, trousers, etc.) to try it on the user model. To try the product 3D model on the user 3D model, the try-on module 232 may align certain portions of the product 3D model with specific portions of the user 3D model. For example, a shirt 3D model may be positioned such that the sleeves of the shirt 3D model surround the arms of the user 3D model. Furthermore, the shirt 3D model may be positioned such that the collar of the shirt 3D model surrounds the neck of the user 3D model. The remaining portions of the shirt 3D model can then be deformed by stretching and bending portions of the shirt 3D model, such that the interior of the shirt 3D model is outside or along the outer edge of the user 3D model.
[0046] In some embodiments, the pose module 234 may be configured to work with the processor 226 to identify the pose of a presenter (i.e., a human body) within the streaming data and apply that pose to a combination of a product 3D model and a user 3D model generated by the try-on module 232. This includes using one or more pose estimation techniques to determine the presenter's current pose within the data stream. For example, the pose module 234 may use machine learning to determine the presenter's pose within the data stream. Those skilled in the art will recognize that a variety of suitable pose estimation techniques can be employed. In some embodiments, the pose module 234 may apply the determined pose to a user model (on which the product model has been tried on). This includes repositioning one or more attachments or body parts of the user model until the determined pose is achieved. In some embodiments, the pose module 234 may monitor the presenter's pose within the streaming data and may adjust the user model's pose to match the presenter's pose when a change in the presenter's pose is detected.
[0047] After adjusting the pose of the combined user model and product model, the pose module 234 can render the combined user model and product model. In some embodiments, the combined user model and product model can be rendered in a small window, which is then placed in an inconspicuous position within the streaming data. For example, in the case that the streaming data is video, the combined user model and product model can be rendered in a window in the lower corner of the video. This rendering allows the user to imagine the product being worn by him / her. It should be noted that although the pose module 234 and the try-on module 232 are described with reference to the mobile application server 204, the functions described as being performed by one or more modules can be performed by the mobile application on the user device 202.
[0048] In some embodiments, each object entry within the object model database 240 may be associated with a 3D model of that object. In these embodiments, the 3D model may be combined with a second 3D model of the user and provided to the mobile application 216, causing the user device 202 to display the combination of 3D models on the user device's display as an enhancement to the streaming data. As the presenter's pose within the streaming data is updated, the mobile application 216 may dynamically update the pose of the combination of 3D models on the user device's display.
[0049] Figure 3A simplified flowchart of a method for presenting a data stream augmented using virtual fitting data, as provided in an embodiment of the present invention, is shown. This flow is described in conjunction with a computer system as an example of the computer system described in this invention. Some or all of the operations of this flow can be implemented by specific hardware on the computer system and / or as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system. As stored, the computer-readable instructions represent programmable modules comprising code executable by a processor of the computer system. Execution of such instructions configures the computer system to perform the corresponding operations. Each programmable module, together with the processor, represents a means for performing the corresponding operation. Although these operations are shown in a specific order, it should be understood that the specific order is not mandatory and one or more operations may be omitted, skipped, and / or reordered.
[0050] At the beginning of process 300, one or more products 302 are scanned to generate product model data 304. The generated product model 304 serves as a 3D virtual representation of product 302 and can be generated by scanning product 302 from multiple perspectives using a camera and / or depth sensor. In step 1 of process 300, multiple generated product models 304 are provided to mobile application server 204 for storage in object model data 240. Product models 304 can be generated from multiple different entities. For example, a product model for a specific product can be generated by the manufacturer of that product.
[0051] Additionally, in step 2 of process 300, the user is scanned using a camera and / or depth sensor mounted on user equipment 202 to generate user model data 308. (See below for reference.) Figure 4 Some example techniques for generating object (e.g., user) models are explained in more detail. In step 3 of process 300, user model data 308 is sent to mobile application server 204 for storage in user model data 238. In some cases, user model data 308 may be stored in relation to accounts maintained for the scanned users.
[0052] Mobile application server 204 receives a request from a user to use streaming data 306. In step 4 of process 300, upon receiving the request to use streaming data 306, mobile application server 204 retrieves streaming data 306 from its location. In some embodiments, streaming data 306 may be maintained by mobile application server 204. In some embodiments, streaming data 306 may be maintained by an entity separate from mobile application server 204. For example, a user may request to watch content from YouTube.com through mobile application server 204. TMThe mobile application server 204 hosts video files and supports mobile applications installed on users' mobile devices. In this example, the user can provide a Uniform Resource Locator (URL) or other identifier for the video. The mobile application server 204 can then retrieve the video file by accessing the URL. Once the streaming data 306 is retrieved, the mobile application server 204 identifies one or more relevant products and the presenter's pose. As described elsewhere, this can be done using the techniques described above. Figure 2 The fitting module 232 and / or posture module 234 described herein are used to accomplish this.
[0053] The steps performed by pose module 234 are described below. In step 5 of process 300, process 300 includes determining the pose of the presenter within the streaming data 306. This includes first identifying the presenter within the streaming data (e.g., using one or more machine vision techniques), and then estimating the presenter's pose using any suitable pose estimation technique. Those skilled in the art will recognize that a variety of suitable techniques can be employed. Generally, an object's pose indicates the object's position and orientation. For a speaker, the estimated pose includes a record of the position and orientation of the presenter's various body parts or joints.
[0054] After estimating the presenter's posture, the posture module 234 applies the posture to the user model. To do this, in step 6 of process 300, the posture module 234 retrieves the user model. The user model is a three-dimensional model representing a person, stored in user model data 238 in association with that person or an account linked to that person. When a request for usage stream data 306 for a specific user is received, the posture module 234 can retrieve the user model associated with that person from the user model data 238. Once retrieved, the posture module 234 applies the presenter's estimated posture to the retrieved user model. To do this, the posture module 234 repositions the individual body parts of the user model to match the recorded positions and orientations of the presenter's body parts. Then, in step 7 of process 300, the posed user model is provided to the try-on module 232.
[0055] The steps performed by the try-on module 232 are described below. In step 8 of process 300, process 300 includes retrieving one or more product models. First, the try-on module 232 identifies one or more products associated with streaming data 306. In some embodiments, streaming data 306 may include indications of one or more products. For example, streaming data 306 may have additional metadata indicating an inventory unit (SKU) or other product identifier associated with streaming data 306. In some embodiments, machine vision techniques (e.g., object recognition) may be used to identify one or more products from streaming data 306. For example, the try-on module 232 may identify a specific product (e.g., a shirt or trousers) worn by a presenter within streaming data 306 by comparing the product's visual attributes with attributes associated with multiple products stored in an electronic catalog maintained by mobile application server 204. In this example, the try-on module 232 may identify the product in the electronic catalog that best matches the product worn by the presenter. Once the try-on module 232 has identified one or more products associated with streaming data 306, product models of these products are retrieved from object model data.
[0056] In step 9 of process 300, one or more product models obtained in step 8 are tried on onto the posed user model provided in step 7. The product models are tried on onto the posed user model by adjusting a set of parameters that control the deformation of one or more regions of interest of the product model until the product model matches the user model. This set of parameters can be defined as a set of measurements, such as the displacement of each vertex of the product model. This process can be described as an optimization process, where several different optimization algorithms can be used to find the optimal set of parameters that minimizes one or more cost functions. The cost function can be defined as the number of penetrations between the meshes of two 3D models, the average distance between the vertices of the body mesh and the vertices of the clothing mesh, etc. U.S. Patent Application No. 62 / 987,196, entitled "System and Method for Virtual Fitting," describes further examples of techniques for rendering product models tried on onto user models in more detail, the entire contents of which are incorporated herein by reference for all purposes.
[0057] Once the product model is tried on the user model, these models are rendered in step 10. Rendering is the process of giving a 3D model entity its appearance using shadows and colors. Those skilled in the art will recognize that there are various suitable techniques for rendering product models tried on by posed user models. In some embodiments, the rendered model is used to enhance streaming data 306. For example, the rendered model may be placed as enhanced visual data within a small window inside streaming data 306, so that a viewer of streaming data 306 (e.g., a user) can view the rendered model while viewing streaming data 306.
[0058] Then, in step 11 of process 300, the rendered model (e.g., enhanced streaming data) is provided to user device 202. Upon receiving the rendered model, user device 202 can present the rendered model to the user. For example, user device 202 can play the enhanced streaming data via a media player application.
[0059] In some embodiments, additional processing can be performed when using the enhanced streaming data 306. For example, the user device 202, which is presenting the enhanced streaming data 306, can capture image information of the user viewing the enhanced streaming data 306 through a front-facing camera mounted on the user device 202. In this example, the user's facial data can be extracted from the image information and overlaid onto the rendered user model, thereby providing the user's facial and facial expression data to the user model.
[0060] It should be understood that Figure 3 The specific steps illustrated provide a particular method for presenting a data stream augmented using virtual try-on data, according to embodiments of the present invention. As described above, according to alternative embodiments, the steps may also be performed in a different order. For example, alternative embodiments of the present invention may perform the above steps in a different order. Furthermore, Figure 3 Each step shown may include multiple sub-steps, which may be performed in various orders suitable for each step. Furthermore, steps may be added or removed depending on the specific application. Many variations, modifications, and alternatives will be apparent to those skilled in the art.
[0061] Figure 4 Illustrative examples of techniques for obtaining 3D models using sensor data according to at least some embodiments are shown. According to at least some embodiments, sensor data 402 can be acquired from one or more input sensors mounted on a user device. The captured sensor data 402 includes image information 404 captured by a camera device and depth map information 406 captured by a depth sensor.
[0062] As described above, sensor data 402 includes image information 404. One or more image processing techniques can be used on image information 404 to identify one or more objects within image information 404. For example, edge detection can be used to identify regions 408 within image information 404 that include objects. To do this, discontinuities in brightness, color, and / or texture can be identified on the image to detect the edges of various objects within the image. Region 408 illustrates an example image of a chair highlighting such discontinuities.
[0063] As described above, sensor data 402 includes depth information 406. In the depth information 406, a value can be assigned to each pixel, representing the distance between the user device and a specific point corresponding to that pixel's location. The depth information 406 can be analyzed to detect sudden changes in depth within it. For example, a sudden change in distance can indicate the edge or boundary of an object within the depth information 406.
[0064] In some embodiments, sensor data 402 includes both image information 404 and depth information 406. In at least some of these embodiments, an object may be first identified in image information 404 or depth information 406, and various attributes of the object may be determined from other information. For example, edge detection techniques may be used to identify a region 408 within image information 404 that includes the object. Region 408 may then be mapped to a corresponding region 410 in the depth information to determine the depth information (e.g., point cloud) of the identified object. In another example, a region 410 including the object may be first identified within depth information 406. In this example, region 410 may then be mapped to a corresponding region 408 in the image information to determine the appearance attributes (e.g., color or texture values) of the identified object.
[0065] In some embodiments, various attributes of the object identified in the sensor data 402 (e.g., color, texture, point cloud data, object edges) can be used as input to a machine learning module to identify or generate a 3D model 412 that matches the identified object. In some embodiments, a point cloud of the object can be generated from depth information and / or image information and compared with point cloud data stored in a database to identify the best-matching 3D model. Alternatively, a 3D model of an object (e.g., a user or product) can be generated using the sensor data 402. For this purpose, a mesh can be created using point cloud data obtained from region 410 of the depth information 406. The system can then map appearance data from the region corresponding to region 410 in the image information 404 onto the mesh to generate a basic 3D model. Although specific techniques have been described, it should be noted that many techniques exist for identifying specific objects from sensor output.
[0066] As described elsewhere, by user equipment (e.g., Figure 1 Sensor data captured by user device 102 can be used to generate a user 3D model using the techniques described above. This user 3D model can then be provided as user data to the mobile application server. In some embodiments, sensor data can be used to generate a 3D model of a product, which can be stored in user model data 238. For example, a user wishing to sell a product can capture sensor data related to the product from their user device. The user's device can then generate a 3D model as described above and provide it to the mobile application server.
[0067] Figure 5 An example of a graphical user interface (GUI) according to embodiments of the present invention is shown for demonstrating some example features that can be implemented. Figure 5 In the example, user equipment 502 is shown as having a display screen on which visual data can be presented. User equipment 502 is the one mentioned above. Figure 2 An example of user equipment 202 described.
[0068] like Figure 5 As shown, the GUI of a software application (e.g., a media viewer application) installed on user device 502 can be used to present streaming data 504. Streaming data 504 includes at least one presenter 506 and product 508, wherein the presenter is the person shown in streaming data 506. Product 508 may be worn by the presenter 506 within streaming data 504 or otherwise presented.
[0069] As described elsewhere, the posed and tried-on model 510 can be presented alongside the streaming data 504. For example, the model 510 can be presented within a separate window 512, positioned to minimize any obstruction to viewing the streaming data 504, sometimes referred to as a picture-in-picture. The model 510 includes a user model representing the current viewer of the streaming data 504, which has been posed in a manner similar to presenter 508 and has tried on a product model representing product 508.
[0070] Figure 6 A flowchart is shown, illustrating a process for presenting virtual fitting data to a user, according to at least some embodiments. Figure 6 The process 600 shown can be performed by a user device (e.g., Figure 2 Mobile application servers (e.g., user equipment 202) that communicate with user equipment 202) Figure 2 The mobile application server (204) executes.
[0071] At 602, process 600 includes receiving an indication that the user is using media content. For example, it could be an indication that the user is watching streaming video, where streaming video is a type of media content. The indicated media content could include a presentation of a presenter, where the presenter is someone other than the user. The indicated media content could also include a description of a product presented by the presenter. For example, the product might be clothing worn by the presenter in the media content.
[0072] At 604, process 600 includes identifying a product associated with media content. In some embodiments, the product associated with the media content is identified by an identifier associated with the product contained in the metadata of the media content. In one example, the identifier associated with the product is a SKU number. In some embodiments, the product associated with the media content is identified by object identification.
[0073] At 606, process 600 includes acquiring a first 3D model representing the product. For this purpose, object model data (e.g., ...) is stored therein. Figure 2 The database of object model data 240 retrieves a 3D model associated with the product identified in 604. In some embodiments, the appropriate size and / or style of the product may be selected based on stored information about the user who is watching the media content.
[0074] At 608, process 600 includes acquiring a second 3D model representing the user. In some embodiments, the user model may be stored in association with one or more accounts. In these embodiments, the second 3D model can be identified and retrieved by being stored in association with an account used to watch media content. In some embodiments, the second 3D model representing the user may be received from a user device that is being used to watch media content.
[0075] At 610, process 600 includes determining a presentation pose based on the media content. The presentation pose is determined as the presenter's current pose within the media content. This can be done using any suitable pose estimation technique. The determined presentation pose includes indications of various parts of the user model (e.g., body parts) and their respective positions and orientations.
[0076] At 612, process 600 includes applying the demonstration pose to the second 3D model. To do this, the positions and orientations of various parts (e.g., body parts) of the second 3D model can be adjusted to match the corresponding positions and orientations in the demonstration pose data.
[0077] At 614, process 600 includes generating a third 3D model by having the second 3D model try on the first 3D model. Here, the first 3D model is deformed to minimize the distance between the first 3D model and the second 3D model.
[0078] At 616, process 600 includes presenting the third 3D model to the user. This involves rendering the third 3D model and providing it to the user device that is presenting media content. The third 3D model is presented along with the media content. For example, the media content may be enhanced to include the third 3D model (e.g., in a separate window within the media content).
[0079] It should be understood that, according to embodiments of the present invention, Figure 6 The specific steps illustrated provide a particular method for presenting virtual try-on data to a user. As described above, according to alternative embodiments, other sequences of steps may also be performed. For example, alternative embodiments of the invention may perform the above steps in a different order. Furthermore, Figure 6 Each step shown may include multiple sub-steps, which may be performed in various orders suitable for each step. Furthermore, steps may be added or removed depending on the specific application. Many variations, modifications, and alternatives will be apparent to those skilled in the art.
[0080] Figure 7 Examples of computer system components provided in some embodiments are shown. Computer system 700 is an example of a computer system described in this disclosure. Although these components are shown as belonging to the same computer system 700, the computing system 700 may also be distributed.
[0081] The computing system 700 includes at least a processor 702, a memory 704, a storage device 706, input / output (I / O) peripherals 708, communication peripherals 710, and an interface bus 712. The interface bus 712 can be used to communicate, send, and transmit data, control, and commands between various components of the computing system 700. The memory 704 and storage device 706 may include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM), hard disk drives, CD-ROMs, optical storage devices, magnetic storage devices, electronically non-volatile computer storage (e.g., memory), and other tangible storage media. Any such computer-readable storage medium can be used to store instructions or program code implementing aspects of this disclosure. The memory 704 and storage device 706 may also include computer-readable signal media. Computer-readable signal media include propagated data signals containing computer-readable program code. Such propagated signals take any of a variety of forms, including but not limited to electromagnetic, optical, or any combination thereof. Computer-readable signal media includes any computer-readable medium that is not a computer-readable storage medium but can communicate, propagate or transmit programs for use in conjunction with computing system 700.
[0082] Furthermore, the memory 704 package may include an operating system, programs, and applications. The processor 702 can be used to execute stored instructions and includes, for example, a logic processing unit, a microprocessor, a digital signal processor, and other processors. The memory 704 and / or processor 702 can be virtualized and can be hosted in another computing system, such as a cloud network or data center. Input / output peripherals 708 may include user interfaces such as a keyboard, a screen (e.g., a touchscreen), a microphone, a speaker, other input / output devices, and computing components such as a graphics processing unit, serial ports, parallel ports, a universal serial bus, and other input / output peripherals. Input / output peripherals 708 are connected to the processor 702 via any port coupled to the interface bus 712. Communication peripherals 710 can be used to facilitate communication between the computing system 700 and other computing devices via a communication network and include, for example, a network interface controller, a modem, wireless and wired interface cards, antennas, and other communication peripherals.
[0083] Figure 8 A block diagram of an apparatus for presenting virtual fitting data to a user is shown, according to at least some embodiments. Figure 8 The device 800 shown can be implemented with a user equipment (e.g., Figure 2Mobile application servers (e.g., user equipment 202) that communicate with user equipment 202) Figure 2 Mobile application server 204).
[0084] The device 800 includes a receiving module 802 configured to receive an indication that a user is watching media content. For example, it may receive an indication that the user is watching streaming video, where streaming video is a type of media content. The indicated media content may include a description of a presenter, where the presenter is a person other than the user. The indicated media content may also include a description of a product presented by the presenter. For example, the product might be clothing worn by the presenter in the media content.
[0085] The apparatus 800 also includes an identification module 804 configured to identify products associated with media content. In some embodiments, the product associated with the media content is identified by an identifier associated with the product contained in the metadata of the media content. In one example, the identifier associated with the product is an inventory unit (SKU) number. In some embodiments, the product associated with the media content is identified by object identification.
[0086] The device 800 also includes an acquisition module 806 configured to acquire a first three-dimensional (3D) model representing the product. For this purpose, object model data (e.g., ...) stored therein is used. Figure 2 The database of object model data 240 retrieves a 3D model associated with the product identified in 604. In some embodiments, the appropriate size and / or style of the product may be selected based on stored information about the user who is watching the media content.
[0087] The acquisition module 806 is also configured to acquire a second 3D model representing the user. In some embodiments, the user model may be stored in association with one or more accounts. In these embodiments, the second 3D model can be identified and retrieved by being stored in association with an account used to watch media content. In some embodiments, the second 3D model representing the user may be received from a user device that is being used to watch media content.
[0088] The device 800 also includes a determination module 808 configured to determine a presentation pose based on the media content. The presentation pose is determined as the presenter's current pose within the media content. Any suitable pose estimation technique can be used to accomplish this. The determined presentation pose may include indications of various parts of the user model (e.g., body parts) and their respective positions and orientations.
[0089] The device 800 also includes an application module 810 configured to apply the demonstration pose to the second 3D model; for this purpose, the position and orientation of various parts (e.g., body parts) of the second 3D model can be adjusted to match the corresponding position and orientation in the demonstration pose data.
[0090] The apparatus 800 also includes a generation module 812 configured to generate a third 3D model by having the second 3D model try on the first 3D model. Here, the first 3D model is deformed to minimize the distance between the first 3D model and the second 3D model.
[0091] The device 800 also includes a presentation module 814 configured to present the third 3D model to the user. This involves rendering the third 3D model and providing it to the user device that is presenting media content. The third 3D model is presented along with the media content. For example, the media content may be enhanced to include the third 3D model (e.g., in a separate window within the media content).
[0092] Although this subject matter has been described in detail with reference to specific embodiments thereof, it should be understood that those skilled in the art, upon gaining an understanding of the foregoing, can readily generate changes, variations, and equivalents to these embodiments. Therefore, it should be understood that this disclosure is presented for illustrative purposes rather than for limitation, and does not exclude the inclusion of such modifications, variations, and / or additions to the subject matter that are obvious to those of ordinary skill. In fact, the methods and systems described in this disclosure can be implemented in a variety of other forms; furthermore, various omissions, substitutions, and changes can be made to the form of the methods and systems described in this disclosure without departing from the spirit of this disclosure. The appended claims and their equivalents are intended to cover such forms or modifications that fall within the scope and spirit of this disclosure.
[0093] Unless otherwise expressly stated, it should be understood that throughout the discussion in this specification, terms such as “processing,” “computing,” “calculating,” “determining,” and “identifying” refer to the actions or processes of a computing device (such as one or more computers or similar electronic computing devices) that manipulate or convert data represented as physical electronic or magnetic quantities in the memory, registers, or other information storage, transmission, or display devices of a computing platform.
[0094] The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device may include any suitable arrangement of components that provides a result conditioned on one or more inputs. Suitable computing devices include microprocessor-based multipurpose computer systems that access stored software that programs or configures the computing system from a general-purpose computing device to a dedicated computing device that implements one or more embodiments of this subject matter. Any suitable programming, scripting, or other type of language or combination of languages may be used to implement the teachings contained herein in software used for programming or configuring the computing device.
[0095] Embodiments of the methods disclosed herein can be executed in the operation of such a computing device. The order of the boxes presented in the above examples can be changed; for example, the boxes can be reordered, combined, and / or decomposed into sub-boxes. Some boxes or processes can be executed in parallel.
[0096] The conditional language used in this document, such as “can,” “may,” “may,” “can,” “for example,” etc., unless otherwise explicitly stated or otherwise understood in the context in which they are used, is generally intended to convey that some examples include certain features, elements, and / or steps while other examples do not. Therefore, such conditional language generally does not imply that one or more examples require features, elements, and / or steps in any way, or that one or more examples must include logic for determining whether to include or perform such features, elements, and / or steps in any particular example, with or without author input or prompting.
[0097] The terms “including,” “comprising,” “having,” etc., are synonyms and are used inclusively in an open-ended manner, not excluding other elements, features, actions, operations, etc. Furthermore, the term “or” is used inclusively (not exclusively), so that when used, for example, to connect lists of elements, the term “or” indicates one, some, or all of the elements in the list. The use of “applies to” or “is used for” herein is an open and inclusive language, not excluding devices applicable to or used to perform additional tasks or steps. Furthermore, the use of “based on” implies openness and inclusion, because a process, step, calculation, or other action “based on” one or more enumerated conditions or values may actually be based on additional conditions or values beyond the enumeration. Similarly, the use of “at least partially based on” implies openness and inclusion, because a process, step, calculation, or other action “at least partially based on” one or more enumerated conditions or values may actually be based on additional conditions or values beyond the enumeration. The headings, lists, and numbering included herein are for illustrative purposes only and are not intended to be limiting.
[0098] The various features and processes described above can be used independently of each other or in combination in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. Furthermore, certain method or process blocks may be omitted in some embodiments. The methods and processes described herein are not limited to any particular order, and the blocks or states associated with them may be performed in other suitable orders. For example, the described blocks or states may be performed in an order different from that specifically disclosed, or multiple blocks or states may be combined in a single block or state. Example blocks or states may be performed serially, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently from those described. For example, elements may be added, removed, or rearranged compared to the disclosed examples.
Claims
1. A method for virtual try-on, comprising: Acquire products associated with the media content the user is watching; Obtain a first three-dimensional 3D model representing the product; Obtain a second 3D model representing the user; A third 3D model is generated by having the second 3D model try on the first 3D model; The method further includes: Determine the demonstration posture based on the media content; Apply the demonstration pose to the second 3D model; The third 3D model is presented to the user in the aforementioned demonstration posture; The step of applying the demonstration pose to the second 3D model includes: The second 3D model is retrieved from the user model data, and the various body parts of the second 3D model are repositioned to match the position and orientation of the presenter's body parts, and the presentation pose is applied to the second 3D model.
2. The method according to claim 1, wherein, The products associated with the media content being viewed by the user include: Receive instructions on the media content the user is currently viewing; Identify the products associated with the media content.
3. The method according to claim 1, wherein, Products associated with the media content are identified by identifiers that are associated with the products contained in the metadata of the media content.
4. The method according to claim 1, wherein, The product is clothing worn by the presenter within the media content, and the presentation poses include the presenter's posture.
5. The method according to claim 4, wherein, The presenter includes a second user who is different from the user mentioned above.
6. The method according to claim 1, wherein, The step of generating a third 3D model by having the second 3D model try on the first 3D model includes: The first 3D model is put on the second 3D model by adjusting parameters, wherein the parameters control the deformation of one or more regions of interest of the first 3D model until the first 3D model matches the second 3D model, so as to generate the third 3D model.
7. The method according to claim 1, wherein, The method further includes: Collect the user's image information; Extract the user's facial data from the user's image information; The user's facial data is overlaid on the user's second 3D model.
8. A system for virtual try-on, comprising: processor; as well as Includes a memory containing instructions, which, when executed by the processor, at least cause the system to: Acquire products associated with the media content the user is watching; Obtain a first three-dimensional 3D model representing the product; Obtain a second 3D model representing the user; A third 3D model is generated by having the second 3D model try on the first 3D model; The system also makes: The presentation pose is determined based on the media content; the second 3D model is retrieved from the user model data, and the various body parts of the second 3D model are repositioned to match the position and orientation of the presenter's body parts; the presentation pose is applied to the second 3D model; and the third 3D model is presented to the user in the presentation pose.
9. A non-transitory computer-readable medium storing specific computer-executable instructions that, when executed by a processor, cause at least the following in a computer system: Acquire products associated with the media content the user is watching; Obtain a first three-dimensional 3D model representing the product; Obtain a second 3D model representing the user; A third 3D model is generated by having the second 3D model try on the first 3D model; The computer system also includes: The presentation pose is determined based on the media content; the second 3D model is retrieved from the user model data, and the various body parts of the second 3D model are repositioned to match the position and orientation of the presenter's body parts; the presentation pose is applied to the second 3D model; and the third 3D model is presented to the user in the presentation pose.
10. An apparatus for virtual try-on, comprising: The acquisition module is configured to acquire a product associated with the media content being viewed by the user, a first 3D model representing the product, and a second 3D model representing the user; The generation module is configured to generate a third 3D model by having a second 3D model try on a first 3D model; The determination module is configured to determine the presentation posture based on the media content; The application module is configured to retrieve the second 3D model from the user model data, reposition the various body parts of the second 3D model to match the position and orientation of the various body parts of the presenter, and apply the presentation pose to the second 3D model. The presentation module is configured to present a third 3D model to the user in the demonstration pose.