Interactive data processing methods, apparatus, devices, systems and storage media

By using a head-mounted display device to collect user facial data to drive the terminal device, a digital human has been created, solving the problem of fixed image of remote surrogate robot digital humans and achieving a more vivid remote interactive experience.

CN122308591APending Publication Date: 2026-06-30GRAVITYXR ELECTRONICS & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GRAVITYXR ELECTRONICS & TECH CO LTD
Filing Date
2024-12-28
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing remote avatar robots display fixed digital human images with stereotypical expressions, resulting in a poor interactive experience.

Method used

By collecting users' facial data through a head-mounted display device, the digital human on the terminal device is driven, enabling the digital human's expressions and movements to be synchronized with the user's facial data. Combined with multimedia data, stereoscopic images and simulated tactile feedback are displayed on the screen.

Benefits of technology

It improves the flexibility and vividness of the digital human displayed on the terminal device, and enhances the experience of remote interaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308591A_ABST
    Figure CN122308591A_ABST
Patent Text Reader

Abstract

This application provides an interactive data processing method, apparatus, device, system, and storage medium. The interactive data processing method is applied to a head-mounted display device that interacts with at least one terminal device. The method includes: sending collected facial data of a user, causing the at least one terminal device to adjust the displayed digital persona corresponding to the user based on the facial data; receiving multimedia data, wherein the multimedia data includes image data collected by the image acquisition device of the at least one terminal device; and displaying a corresponding image on the display screen of the head-mounted display device according to the image data. By utilizing the head-mounted display device to view images collected by the terminal device, this method achieves dynamic driving of the digital persona displayed on the terminal device based on the facial data of the head-mounted display device, improving the dynamic effect of the digital persona displayed on the terminal device.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of multi-device interaction technology, and in particular to an interactive data processing method, apparatus, device, system and storage medium. Background Technology

[0002] Remote devices, such as remote surrogate robots, can operate in spaces that are inaccessible or inconvenient for humans, such as high-risk environments and confined spaces, overcoming geographical limitations.

[0003] For some remote surrogate robots that need to interact with users in their environment, digital humans can be used to enhance the interactive experience and intelligence. However, existing remote surrogate robots display fixed digital human images with stereotypical expressions, resulting in a poor interactive experience. Summary of the Invention

[0004] This application provides an interactive data processing method, apparatus, device, system, and storage medium, which realizes the use of user data collected by a head-mounted display device to drive a digital human on a terminal device, such as a remote device, thereby improving the flexibility and richness of the digital human displayed on the terminal device and enhancing the interactive experience.

[0005] In a first aspect, embodiments of this application provide an interactive data processing method applied to a head-mounted display device, the head-mounted display device being used to interact with at least one terminal device, the method comprising: sending collected facial data of a user, so that the at least one terminal device adjusts the displayed digital person corresponding to the user based on the facial data; receiving multimedia data; wherein the multimedia data includes image data collected by the image acquisition device of the at least one terminal device; and displaying a corresponding image on the display screen of the head-mounted display device according to the image data.

[0006] In one possible implementation, the display screen of the head-mounted display device includes a first display screen corresponding to the user's left eye and a second display screen corresponding to the user's right eye. Displaying the corresponding image on the display screen of the head-mounted display device according to the image data includes: generating a left-eye image and a right-eye image based on the image data; and synchronously displaying the left-eye image and the right-eye image through the first display screen and the second display screen, respectively.

[0007] In one possible implementation, generating a left-eye image and a right-eye image based on image data includes: correcting the viewing angle of the image data based on a viewing angle correction matrix to obtain a processed image under the viewing angle corresponding to the head-mounted display device; and generating the left-eye image and the right-eye image based on the processed image.

[0008] In one possible implementation, the method further includes: acquiring height information and pitch angle of the image acquisition device of the terminal device, the height information being used to describe the height of the image acquisition device of the terminal device relative to the eye of the displayed digital human; determining a translation matrix based on the height information; determining a perspective transformation matrix based on the pitch angle; and determining a viewpoint correction matrix based on the translation matrix and the perspective transformation matrix.

[0009] In one possible implementation, the method further includes: after placing the calibration sample at a preset distance directly in front of the image acquisition device of the terminal device, acquiring first test image data of the calibration sample based on the image acquisition device of the terminal device; the calibration sample includes multiple corner points; after placing the calibration sample at the preset distance directly in front of the display screen of the head-mounted display device, acquiring a second test image of the calibration sample based on the image acquisition device of the head-mounted display device; and determining the viewing angle correction matrix based on the positions of the corner points in the first test image and the second test image.

[0010] In one possible implementation, generating a left-eye image and a right-eye image based on the processed image includes: obtaining the user's interpupillary distance and the focal length of the head-mounted display device; determining the offset distance of each pixel in the processed image based on the depth information of the pixel, the user's interpupillary distance, and the focal length of the head-mounted display device; and shifting each pixel in the processed image to the left and right based on the offset distance of each pixel in the processed image to obtain the left-eye image and the right-eye image.

[0011] In one possible implementation, the multimedia data further includes spatial audio data, which is obtained by the server processing audio data collected by the terminal device based on a spatial model of the space where the terminal device is located; the method further includes: synchronously playing the spatial audio data while displaying the corresponding image.

[0012] In one possible implementation, the method further includes at least one of the following steps: sending collected user limb movement data to enable the at least one terminal device to control the movement of a robotic arm based on the limb movement data; the multimedia data further includes odor data collected by the device, and controlling an odor synthesizer to synthesize an odor based on the odor data; the multimedia data further includes tactile data collected by the terminal device, and controlling at least one of vibration parameters, pressure, and temperature of a tactile glove attached to the head-mounted display device based on the tactile data.

[0013] Secondly, embodiments of this application provide another interactive data processing method applied to a terminal device, the terminal device being used to interact with at least one head-mounted display device, the method comprising: sending collected multimedia data to cause the at least one head-mounted display device to display a corresponding image based on the image data in the multimedia data; receiving user data of a target user collected by the at least one head-mounted display device, the user data including facial data; and adjusting the digital person corresponding to the target user displayed on the display of the terminal device based on the facial data.

[0014] In one possible implementation, the display of the terminal device is a three-dimensional display. Adjusting the digital person corresponding to the target user displayed on the display of the terminal device based on the facial data includes: generating facial image data of the digital person corresponding to the target user based on the facial data; determining a multi-view rendering table corresponding to the digital person corresponding to the target user based on the association features of the facial data; and mapping the facial image data to the three-dimensional display based on the multi-view rendering table to adjust the displayed digital person corresponding to the target user.

[0015] In one possible implementation, mapping the facial image data to the 3D display based on the multi-view rendering table includes: determining the pixels in the facial image data mapped to each point in the 3D display based on the multi-view rendering table; determining the pixel value of each point in the 3D display based on the pixel values ​​of multiple pixels within a preset range of the mapped pixels; and driving the 3D display based on the pixel values ​​of each point in the 3D display to display the adjusted digital human.

[0016] In one possible implementation, generating facial image data of a digital human corresponding to the target user based on the facial data includes: determining a gaze direction based on eye data in the facial data; determining a target eye image from multiple stored eye images of the digital human corresponding to the target user based on the gaze direction; generating a lip shape image of the digital human corresponding to the target user based on mouth data in the facial data and a pre-trained lip shape driving model; and obtaining facial image data of the digital human corresponding to the target user based on the target eye image and the lip shape image.

[0017] In one possible implementation, generating facial image data of a digital human corresponding to the target user based on the facial data includes: determining a target lip shape image and a target facial image from multiple lip shape images and multiple facial images of the digital human corresponding to the target user, respectively, based on the facial data; wherein the lip shape corresponding part in the facial image is a default lip shape or default color; and fusing the target lip shape image and the target facial image to obtain facial image data of the digital human corresponding to the target user.

[0018] In one possible implementation, the terminal device further includes a robotic arm, and the method further includes at least one of the following steps: the user data further includes limb motion data, and the robotic arm is controlled to move based on the limb motion data; the collected odor data and / or tactile data are sent to enable the at least one head-mounted display device to control an odor synthesizer to synthesize an odor based on the odor data, and at least one of vibration parameters, pressure, and temperature of a tactile glove attached to the head-mounted display device is controlled based on the tactile data.

[0019] Thirdly, embodiments of this application provide an interactive data processing device applied to a head-mounted display device, the head-mounted display device being used to interact with at least one terminal device. The device includes: a facial data transmitting module, used to transmit collected facial data of a user, so that the at least one terminal device adjusts the displayed digital person corresponding to the user based on the facial data; a multimedia data receiving module, used to receive multimedia data; wherein the multimedia data includes image data collected by the image acquisition device of the at least one terminal device; and a processing module, used to display a corresponding image on the display screen of the head-mounted display device according to the image data.

[0020] Fourthly, embodiments of this application provide an interactive data processing device applied to a terminal device, the terminal device being used to interact with at least one head-mounted display device. The device includes: a multimedia data transmitting module for transmitting collected multimedia data so that the at least one head-mounted display device displays a corresponding image based on the image data in the multimedia data; a user data receiving module for receiving user data of a target user collected by the at least one head-mounted display device; and a digital human processing module for adjusting the digital human corresponding to the target user displayed on the terminal device's screen based on the facial data.

[0021] Fifthly, embodiments of this application provide an interactive data processing device, comprising: a memory, a processor; the memory storing computer execution instructions; the processor executing the computer execution instructions stored in the memory, causing the processor to perform the methods provided in the first and / or second aspects above.

[0022] In a sixth aspect, embodiments of this application provide a head-mounted display device, comprising: a fixing component and a main body, the main body including a first display screen, a first data acquisition unit and a first processor; the fixing component is used to fix the main body to the user's head when the user wears the head-mounted display device; the second data acquisition unit is used to acquire the user's facial data; the first processor is used to execute the method provided in the first aspect above.

[0023] In a seventh aspect, embodiments of this application provide a terminal device, including: a second display screen, a second data acquisition unit, and a second processor; the second data acquisition unit is used to acquire multimedia data from the external environment; the second processor is used to execute the method provided in the second aspect above.

[0024] Eighthly, embodiments of this application provide an interactive system including at least one head-mounted display device provided in the sixth aspect and at least one terminal device provided in the seventh aspect.

[0025] Ninthly, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the methods provided in the first and / or second aspects above.

[0026] In a tenth aspect, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the methods provided in the first and / or second aspects above.

[0027] The interactive data processing method, apparatus, device, system, and storage medium provided in this application, targeting interactive head-mounted display devices and terminal devices, realize the display of corresponding images on the screen of the head-mounted display device based on image data in multimedia data collected by the interactive terminal device, and realize the real-time transmission of images collected by the terminal device back to the head-mounted display device. In remote interaction scenarios, it enables viewing images of the external environment collected by the remote terminal through the head-mounted display device. At the same time, on the terminal device side, it also supports the display of a digital human matching the user image of the head-mounted display device to improve the interactive experience of the object interacting with the terminal device. It can also send the collected facial data of the user, such as the wearer, to the terminal device, so that the terminal device can dynamically adjust the displayed digital human based on the facial data, improving the vividness of the displayed digital human and further enhancing the interactive experience of the object interacting with the terminal device. Attached Figure Description

[0028] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0029] Figure 1 A schematic diagram illustrating an application scenario provided in an embodiment of this application;

[0030] Figure 2 A flowchart illustrating an interactive data processing method provided in an embodiment of this application;

[0031] Figure 3 This is a schematic diagram of the structure of a head-mounted display device provided in an embodiment of this application;

[0032] Figure 4 This is a schematic diagram of another head-mounted display device provided in an embodiment of this application;

[0033] Figure 5 This is a schematic diagram of the robot provided in an embodiment of this application;

[0034] Figure 6 For this application Figure 2 The illustrated embodiment provides a flowchart of step S203;

[0035] Figure 7 This is a hardware block diagram of a head-mounted display device provided in an embodiment of this application;

[0036] Figure 8 A software framework diagram of a head-mounted display device provided in an embodiment of this application;

[0037] Figure 9 A flowchart illustrating another interactive data processing method provided in an embodiment of this application;

[0038] Figure 10 A schematic diagram of the digital human before and after adjustment displayed on a terminal device provided in an embodiment of this application;

[0039] Figure 11 For this application Figure 9 The flowchart of step S903 is shown below;

[0040] Figure 12 A hardware block diagram of a terminal device provided in an embodiment of this application;

[0041] Figure 13 This is a schematic diagram of the structure of an interactive data processing device provided in an embodiment of this application;

[0042] Figure 14 This is a software block diagram of an interactive system provided in an embodiment of this application.

[0043] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0044] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0045] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in one or more embodiments of this specification are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of related data must comply with relevant laws, regulations and standards, and corresponding operation entry points are provided for users to choose to authorize or refuse.

[0046] Head-mounted displays (HMDs) are widely used due to their convenience, flexibility, and immersive experience. In some scenarios, HMDs can interact with other terminal devices, such as mobile phones, computers, and remote robots.

[0047] The terminal device and the head-mounted display device can be connected via the Internet. The terminal device can act as a remote device, transmitting remotely collected data back to the head-mounted display device for display. Alternatively, the head-mounted display device can control the terminal device, such as controlling its movement, collecting data, or grasping objects.

[0048] For example, Figure 1 This is a schematic diagram illustrating an application scenario provided in an embodiment of this application, such as... Figure 1 As shown, in order to overcome geographical limitations, users in Area A can view on-site data collected by remote terminal devices located in Area B that are connected to the head-mounted display device after wearing the head-mounted display device. They can also operate the remote terminal devices through the head-mounted display device to achieve various operational purposes, such as telemedicine, child care, online education, multi-device collaborative work, and video cloud conferencing.

[0049] Taking telemedicine as an example, the remote terminal device can be a medical robot, which can act as a family doctor for the user. The remote doctor can view the user's data collected by the medical robot through a head-mounted display device, such as images of the user's uncomfortable parts, the user's verbal symptoms, and the user's body temperature. The remote doctor can also operate the medical robot's robotic arm through haptic gloves attached to the head-mounted display device to conduct medical examinations and even implement some treatment plans.

[0050] Taking childcare scenarios as an example, the remote terminal device can be a companion robot. Parents can remotely check on their child's current situation through wearable devices and can also talk to the child through the companion robot, such as telling stories. When parents cannot find their child in the images captured by the companion robot, they can adjust the robot's position through a head-mounted display device.

[0051] In addition to the aforementioned remote interaction scenarios, the solutions provided in this application are also applicable to some short-range interaction scenarios, such as when a head-mounted display device is connected to other terminal devices via Bluetooth, WiFi, wired connections, etc. This application does not limit the connection method between the head-mounted display device and other terminal devices.

[0052] For terminal devices that interact with head-mounted displays, digital humans can be displayed to improve the user experience. However, for robots, the displayed digital human image is often static and lacks coherent expression; for example, facial expressions might be simple cartoon drawings, or only the mouth might open and close during speech, with the actual lip movements inconsistent with the actual speech.

[0053] Based on this, this application provides an interactive data processing method for scenarios involving interaction between a head-mounted display device and a terminal device. In addition to displaying image data collected by the terminal device through the head-mounted display device, it also enables the use of facial data collected by the head-mounted display device, such as dynamic facial data, to drive the display of a digital human on the terminal device. This allows the digital human's form, including expressions and movements, to change according to the user's facial data, improving the flexibility and dynamic effects of the digital human displayed on the terminal device and enhancing the interactive experience.

[0054] Figure 2 This is a flowchart illustrating an interactive data processing method provided in an embodiment of this application. The interactive data processing method provided in this embodiment utilizes a head-mounted display device, which interacts with at least one terminal device, such as... Figure 2 As shown, the interactive data processing method includes the following steps:

[0055] Step S201: Send the collected facial data of the user so that at least one terminal device adjusts the displayed digital person corresponding to the user based on the facial data.

[0056] Facial data is used to describe a user's facial information, which may include facial expressions, eye gaze direction, mouth shape, etc. Specifically, the user can be someone wearing a head-mounted device, also known as a wearer.

[0057] The head-mounted display device can periodically collect facial data from the user and send the collected facial data for each period. The collection period can be 1 second, 3 seconds, or other values.

[0058] Head-mounted display devices can also output updated facial data when they detect changes in the user's facial data.

[0059] Head-mounted display devices can directly send facial data to one or more connected terminal devices. Alternatively, they can upload facial data to a server, which then forwards it to the terminal devices. When sending facial data, the head-mounted display device can also add an identifier for the terminal device, instructing the server to forward the data to the appropriate terminal device.

[0060] After receiving facial data, if the facial data is the first facial data received by the terminal device for the user, an image of a digital person corresponding to the user is generated based on the facial data and displayed on the terminal device's screen. If the facial data is not the first facial data received by the terminal device for the user, the digital person corresponding to the user displayed by the terminal device is driven based on the facial data, so that the digital person presents changes corresponding to the facial data, such as the digital person's expression matching the facial expression described by the facial data, the digital person's mouth shape matching the mouth shape described by the facial data, and the digital person's eye gaze direction matching the eye gaze direction described by the facial data.

[0061] When the computing power of the terminal device is low, the server can generate an image of the digital human based on facial data and send the image of the digital human to the terminal device for display.

[0062] When the computing power of the terminal device is low, the server can generate an image of the digital human based on the facial data, and determine the pixel value of each point on the terminal device screen based on the rendering table corresponding to the digital human. The pixel value of each point on the screen is then sent to the terminal device to drive the screen of the terminal device and realize the display of the digital human whose shape matches the facial data.

[0063] Head-mounted display devices can be equipped with multiple acquisition devices. Different acquisition devices can acquire data from different areas of the user's face, or the acquisition range of some acquisition devices can be adjusted, such as the entire facial area or a partial area of ​​the facial area.

[0064] Optionally, the user's facial area corresponding to the head-mounted display device is designated as a first area. The acquisition device of the head-mounted display device includes a first type of acquisition device and a second type of acquisition device. The facial data includes data acquired by the first type of acquisition device and data acquired by the second type of acquisition device. The acquisition area of ​​the first type of acquisition device intersects with the first area, and the sum of the acquisition areas of all first type of acquisition devices covers the first area. Similarly, the acquisition area of ​​the second type of acquisition device intersects with the second area, and the sum of the acquisition areas of all second type of acquisition devices covers the second area. The second area is the portion remaining after subtracting the first area from the facial area.

[0065] For example, the first region is the eye region, and the second region is the mouth region.

[0066] The data acquisition device can be a camera, a webcam, an integrated circuit for data acquisition, etc.

[0067] The first type of acquisition device is used to collect data about the user's eyes, such as the location of eye feature points and eye images. The second type of acquisition device is used to collect data about the user's mouth, such as the location of mouth feature points and mouth images.

[0068] The first type of acquisition device can be deployed on the side of the head-mounted display device facing the user. Multiple first type acquisition devices can be distributed along the edge of this side to collect the user's eye data from different angles, ensuring the comprehensiveness of eye data collection.

[0069] The second type of acquisition device can be deployed on the side of the head-mounted display device away from the user, or on the lower edge of the head-mounted display device, close to the user's mouth.

[0070] For example, Figure 3 This is a schematic diagram of the structure of a head-mounted display device provided in an embodiment of this application, as shown below. Figure 3 As shown, the head-mounted display device covers the upper half of the user's face, primarily the eye area. Cameras 31 to 35 are deployed on the front of the head-mounted display device, the side furthest from the user. Cameras 31 and 32 are positioned at the lower edge of the front and are second-type acquisition devices used to capture images of the lower half of the user's face. Camera 33 is positioned at the center of the upper side of the front and is used to capture a frontal image of the user, which can be used in generating digital humans, multi-view rendering tables, and corresponding virtual avatars. Cameras 34 and 35 are distributed on the left and right sides of the lower front of the head-mounted display device and are used to capture the user's body movements, including hand movements.

[0071] Four Type I acquisition devices, namely cameras 36 to 39, are evenly arranged on the underside of the back of the head-mounted display device (i.e., the side facing the user), for acquiring images of the user's eyes.

[0072] Head-mounted display devices may also include infrared sensors and depth sensors to assist in image acquisition.

[0073] Figure 3 The number and layout of the cameras shown, as well as the structure of the head-up display device, are for illustrative purposes only. In practical applications, adjustments can be made according to specific needs. Figure 3 The content shown should not be construed as limiting this application.

[0074] By using comprehensive facial data, including the eyes and mouth, terminal devices can generate more vivid and lifelike digital humans. For example, the shape of the digital human's mouth and eyes can be adjusted to match the eyes and mouth of the user of the head-mounted display device.

[0075] Eye data can be input into a pre-trained expression-driven model, such as a large model, to obtain the digital human's expression image. Mouth data can be input into a pre-trained lip-shape-driven model, such as a large model, to obtain the digital human's lip-shape image. By stitching the expression image and the lip-shape image together, a complete digital human facial image can be obtained. This facial image can be mapped onto the screen of a terminal device, thereby enabling the digital human displayed on the terminal device to be updated, so that the digital human presents an expression consistent with the user's face.

[0076] Step S202: Receive multimedia data; the multimedia data includes image data.

[0077] Multimedia data includes data about the external environment collected by at least one terminal device, including image data.

[0078] The timing of steps S201 and S202 can be arbitrary, and can be executed serially or in parallel. For example, step S202 can be executed first and then step S201 can be executed, or steps S202 and S201 can be executed in parallel. This application does not limit this.

[0079] Step S203: Display the corresponding image on the display screen of the head-mounted display device according to the image data.

[0080] Image acquisition devices, also known as image sensors, can be cameras, webcams, etc.

[0081] For example, the image acquisition device can be a 2D camera or a 3D camera.

[0082] The image data acquired by the acquisition device of the terminal device can be image or video data of the external environment of the terminal device, and can be one or more matrices, including the coordinates and pixel values ​​of each pixel in the image or video frame.

[0083] After receiving multimedia data, the head-mounted display device can display corresponding images on its screen based on the image data within it, allowing users to view the content that the terminal device "sees" through the head-mounted display device.

[0084] In addition to visual data such as image data, multimedia data can also include other data to form a multimodal dataset, such as audio data, odor data, tactile data, and other sensory data.

[0085] When multimedia data includes multimodal data, while displaying images based on facial features, other modal data can also be presented through wearable display devices or devices paired with wearable display devices. Taking image and audio data as examples, while displaying images based on facial features, the wearable display device can simultaneously play audio data through its speakers. Similarly, taking image and tactile data as examples, while displaying images based on facial features, the wearable display device can simultaneously simulate tactile data through a tactile glove paired with the device. For example, tactile data simulation can be achieved by controlling parameters such as vibration mode, temperature, and pressure of the tactile glove.

[0086] The display screen of a head-mounted display device can be a three-dimensional display screen. It can synthesize three-dimensional image data based on image data for display on the three-dimensional display screen, and display the stereoscopic image corresponding to the image data, such as a naked-eye stereoscopic image, through the three-dimensional display screen.

[0087] The interactive data processing method provided in this embodiment targets interactive head-mounted display devices and terminal devices. It enables the display of image data from multimedia data collected by the interactive terminal device on the screen of the head-mounted display device, and realizes the real-time transmission of images collected by the terminal device back to the head-mounted display device. In remote interaction scenarios, it enables viewing images of the external environment collected by the remote terminal through the head-mounted display device. At the same time, on the terminal device side, it also supports the display of a digital human matching the user image of the head-mounted display device to improve the interactive experience of the object interacting with the terminal device. It can also send the collected facial data of the user, such as the wearer, to the terminal device, so that the terminal device can dynamically adjust the displayed digital human based on the facial data, improving the vividness of the displayed digital human and further enhancing the interactive experience of the object interacting with the terminal device.

[0088] Figure 4This is a schematic diagram of the structure of another head-mounted display device provided in an embodiment of this application, as shown below. Figure 4 As shown, the display screen of the head-mounted display device includes a first display screen corresponding to the user's left eye and a second display screen corresponding to the user's right eye. When displaying a corresponding image on the display screen of the head-mounted display device according to image data, the corresponding image can be displayed on the first display screen and the second display screen of the head-mounted display device respectively according to the image data. The images displayed on the first display screen and the second display screen can be the same.

[0089] To enhance the stereoscopic effect of the displayed image, the first and second displays can show parallax images during image display. Optionally, displaying the corresponding image on the display screen of the head-mounted display device based on the image data includes: generating a left-eye image and a right-eye image based on the image data; and synchronously displaying the left-eye image and the right-eye image on the first and second displays, respectively.

[0090] When the computing power of the head-mounted display device is low, the aforementioned steps of determining the left-eye and right-eye images can also be performed by the cloud, such as by a cloud server, and the generated left-eye and right-eye images can be sent to the head-mounted display device for display.

[0091] The left-eye and right-eye images can be obtained by shifting pixels in the image data. The left-eye image is obtained by shifting pixels in the image data to the left, and the right-eye image is obtained by shifting pixels in the image data to the right.

[0092] By displaying different left-eye and right-eye images on the head-mounted display device and the corresponding displays for the user's left and right eyes respectively, namely the first display screen and the second display screen, the user can see parallax images, thereby forming naked-eye stereoscopic images in the user's brain and improving the stereoscopic effect of the images seen by the user.

[0093] Since the image data is acquired from the perspective of the image acquisition device of the terminal device, which is different from the perspective from which the user views the image in the head-mounted display device, the image data needs to be corrected for perspective before generating the left-eye and right-eye images to conform to the user's viewing habits.

[0094] Optionally, based on the image data, generating left-eye and right-eye images respectively includes: correcting the viewing angle of the image data based on the viewing angle correction matrix to obtain a processed image under the viewing angle corresponding to the head-mounted display device; and generating the left-eye and right-eye images respectively based on the processed image.

[0095] The viewing angle correction matrix can be obtained offline and stored in the head-mounted display device. Upon receiving image data from the terminal device, this matrix is ​​used to correct the viewing angle of the image data, resulting in a processed image from the user's perspective on the head-mounted display device. When generating the left and right eye images, this processed image is used as a basis for left and right offsets, respectively, to achieve the display of a naked-eye stereoscopic image.

[0096] Image data can be viewed as a matrix. The viewpoint correction of the image data can be achieved by left-multiplying the image data by the viewpoint correction matrix, thus obtaining the processed image (or processed image data).

[0097] Specifically, the view correction matrix can be determined based on the parameters of the image acquisition device of the terminal equipment, including position, top angle, etc.

[0098] By using a viewing angle correction matrix, image data is displayed from the perspective corresponding to the head-mounted display device, avoiding the discomfort caused by users looking up or down, and improving the user experience.

[0099] Optionally, the method further includes: acquiring the height information and pitch angle of the image acquisition device of the terminal device, wherein the height information is used to describe the height of the image acquisition device of the terminal device relative to the eyes of the displayed digital human; determining a translation matrix based on the height information; determining a perspective transformation matrix based on the pitch angle; and determining a viewpoint correction matrix based on the translation matrix and the perspective transformation matrix.

[0100] The position of the digital human displayed on the terminal device is usually fixed, so the height of the digital human's eyes is also fixed. The height information of the image acquisition device is used to describe the height difference between the image acquisition device and the digital human's eyes. The pitch angle is the angle between the lens of the image acquisition device and the horizontal plane.

[0101] Taking a robot as the terminal device and a camera or webcam as the image acquisition device as an example, Figure 5 This is a schematic diagram of the robot structure provided in the embodiments of this application, such as... Figure 5 As shown, the robot's camera or webcam is usually positioned directly above the display screen, causing the camera to capture images from a certain downward angle. In contrast, head-mounted displays typically display images from a frontal angle. Without perspective correction, users of head-mounted displays need to view the images from above, resulting in a poor user experience.

[0102] The camera of the terminal device may also be deployed below or to one side of the display screen, and this application does not limit this.

[0103] A camera coordinate system for the image acquisition device can be established. Using a translation matrix determined by height information, the origin of this coordinate system is translated to the height of the digital human's eye. A rotation matrix is ​​then determined by the pitch angle, ensuring that the translated camera coordinate system aligns the lens direction (positive Z-axis) with the horizontal plane. Combining this with the transformation matrix between the camera coordinate system and the image coordinate system, perspective correction of the acquired image (or image data) can be achieved. The perspective transformation matrix is ​​simply the product of the aforementioned rotation and transformation matrices.

[0104] The perspective correction matrix is ​​specifically the product of the translation matrix and the perspective transformation matrix.

[0105] The viewpoint correction matrix is ​​determined by calculation, which is efficient and requires no manual operation.

[0106] In some embodiments, the viewing angle correction matrix can be determined by calibration. The steps for determining the viewing angle correction matrix provided in this application can be performed by a head-mounted display device or by other devices such as a cloud server. The allocation of different execution entities for these steps can be based on the computing power of the head-mounted display device.

[0107] Optionally, the method further includes: after placing the calibration sample at a preset distance directly in front of the image acquisition device of the terminal device, acquiring first test image data of the calibration sample based on the image acquisition device of the terminal device; the calibration sample includes multiple corner points; after placing the calibration sample at the preset distance directly in front of the display screen of the head-mounted display device, acquiring a second test image of the calibration sample based on the camera of the head-mounted display device; and determining the viewing angle correction matrix based on the positions of the corner points in the first test image and the second test image.

[0108] The correction sample can include several mutually perpendicular lines, thus forming multiple corner points. A corner point is the vertex of the angle formed by two mutually perpendicular lines.

[0109] For example, the pattern in the calibration sample may include multiple cross symbols arranged in a dispersed manner.

[0110] For example, the pattern in the calibration sample can be a checkerboard pattern.

[0111] During the calibration process, the calibration sample is placed directly in front of the image acquisition device of the terminal device and the display screen of the head-mounted display device, respectively. The patterned side of the calibration sample faces the terminal device and the head-mounted display device. The patterned side of the calibration sample must also be perpendicular to the lens of the image acquisition device and the screen normal of the display screen of the head-mounted display device, so as to ensure that the calibration sample is facing the lens and the screen of the head-mounted display device.

[0112] After the calibration sample is placed, images of the calibration sample are acquired by the image acquisition devices of the terminal device (such as a camera) and the head-mounted display device (such as a webcam), respectively, to obtain the first test image and the second test image.

[0113] The first test image and the second test image can both be one or more images.

[0114] Identify corner points in the first and second test images, such as the corner point at the center (denoted as the center corner point). Determine the translation pixel amount d using the image coordinates of the center corner point in the two test images. Select at least four relatively dispersed corner points from the first and second test images. Calculate the perspective transformation matrix M based on the image coordinates of these four corner points in the first and second test images. The perspective transformation matrix M can be obtained by solving the following equation:

[0115] (xˊ,yˊ,1)=M*(x,y,1)

[0116] Where (xˊ,yˊ) are the image coordinates of the selected corner point in the second test image, and (x,y) are the image coordinates of the selected corner point in the first test image.

[0117] The perspective transformation matrix M can be calculated using the least squares method.

[0118] Based on the translation pixel amount d and perspective transformation matrix M obtained from calibration, the viewpoint correction matrix can be calculated. For example, the translation pixel amount d can be converted into a translation matrix D, and the viewpoint correction matrix can be determined as the product of the translation matrix D and the perspective transformation matrix M.

[0119] By determining the viewing angle correction matrix through calibration, the accuracy of the viewing angle correction matrix determination is improved, thereby improving the image quality displayed by the head-mounted display device.

[0120] Optional, Figure 6 For this application Figure 2 The flowchart of step S203 provided in the illustrated embodiment is as follows: Figure 6 As shown, step S203 may specifically include the following steps:

[0121] Step S601: Obtain the user's interpupillary distance and the focal length of the head-mounted display device.

[0122] The user's interpupillary distance (in millimeters) can be obtained through image recognition or input by the user in advance.

[0123] The focal length f (in pixels) of a head-mounted display device can be determined by the width w of the head-mounted display screen and the field of view θ of the head-mounted display device. The calculation formula is as follows:

[0124] f = w / (2*tan(θ / 2))

[0125] Step S602: Based on the viewing angle correction matrix, correct the viewing angle of the image data to obtain the processed image under the viewing angle corresponding to the head-mounted display device.

[0126] Step S603: For each pixel in the processed image, based on the depth information of the pixel, the interpupillary distance of the user, and the focal length of the head-mounted display device, determine the offset distance of the pixel.

[0127] Depth information can be extracted from image data. If the acquired image data is two-dimensional, meaning it does not contain depth information, then a deep learning model can be used to extract the depth information of each pixel in the image data. Image data can also be three-dimensional, where one dimension represents depth; in this case, depth information can be obtained by extracting data from that dimension.

[0128] After obtaining the processed image, the depth information of the pixels in the processed image can be smoothed to reduce the discontinuity caused by the offset.

[0129] The offset distance l is calculated as follows: l = e * f / z, where z is the depth information of the pixel or the smoothed depth information (in millimeters), f is the focal length of the head-mounted display device (in pixels), and e is the user's interpupillary distance (in millimeters).

[0130] After obtaining the offset distances of each pixel in the processed image, these offset distances can be smoothed. For example, the offset distance of a pixel can be weighted using the pixel values ​​of one or more pixels to its left and right, resulting in a sub-pixel level offset distance. Taking the weighted offset distance based on the pixel values ​​of the two pixels to the left and right as an example, the pixel values ​​of the left and right pixels can be used as one weight, and the pixel values ​​of the right and right pixels as another weight. The sum of these two weights multiplied by the pixel's offset distance yields the weighted offset distance, which is the sub-pixel level offset distance.

[0131] By weighting, the tortuosity caused by displacement can be further reduced, and the quality of the parallax image displayed through the left-eye and right-eye images can be improved, thereby enhancing the stereoscopic effect of the image.

[0132] Step S604: Based on the offset distance of each pixel in the processed image, each pixel in the processed image is offset to the left and to the right respectively to obtain the left eye image and the right eye image.

[0133] After determining the offset distance l or sub-pixel level offset distance of each pixel, the processed image is shifted pixel by pixel to the left and right (parallel to the direction of the user's eyes) according to the offset distance l or sub-pixel level offset distance of each pixel. The image obtained by shifting to the left is the left eye image, and the image obtained by shifting to the right is the right eye image.

[0134] Step S605: The left eye image and the right eye image are simultaneously displayed on the first display screen and the second display screen, respectively.

[0135] In this embodiment, the offset distance is determined pixel by pixel based on depth, interpupillary distance, and focal length of the head-mounted display device, which improves the accuracy of offset distance determination, thereby improving the quality of the parallax image seen by the user's eyes and enhancing the stereoscopic effect.

[0136] In some embodiments, the multimedia data collected by the terminal device includes image data and its corresponding audio data. The terminal device can upload the image data and its corresponding audio data to the server, which then forwards them to the head-mounted display device. Thus, the head-mounted display device can simultaneously play the corresponding audio data while displaying the image data.

[0137] Because the sound field environment of the terminal device is different from that of the user, in order to improve the user's sense of presence, the audio data can be processed so that the sound heard by the user is close to the sound heard by the user at the terminal device.

[0138] The audio data collected by the terminal device can be processed by the head-mounted display device to obtain processed audio data simulating the sound field space on the terminal device side. Alternatively, the audio data can be processed by a server and sent to the head-mounted display device to reduce the data processing load on the head-mounted display device.

[0139] Optionally, the multimedia data further includes spatial audio data, which is obtained by the server processing audio data collected by the terminal device based on a spatial model of the space where the terminal device is located; the method further includes: synchronously playing the spatial audio data while displaying the corresponding image.

[0140] In some embodiments, when the computing power of the terminal device or head-mounted display device is high, the terminal device or head-mounted display device can also process the audio data into spatial audio data. This application does not limit the subject performing this step.

[0141] A spatial model is a simplified model of the space or environment in which the terminal device is located, which may include room size, wall materials, etc. Based on this spatial model, the propagation path and reverberation properties of audio data are adjusted in real time to improve the dynamic effect and spatial sense of the sound.

[0142] The server can adjust the propagation path and reverberation properties of audio data based on the location and orientation of the terminal device in space, as well as the spatial model of that space, to obtain processed audio data, namely spatial audio data, and then send the spatial audio data to the head-mounted display device for playback.

[0143] Multimedia data can also include odor data, tactile data, or other modal data.

[0144] Head-mounted display devices can synthesize odors based on odor data collected from terminal devices, allowing users to smell the corresponding odors.

[0145] Head-mounted display devices may also include haptic gloves, or be equipped with haptic gloves. The controllable parameters of the haptic gloves are adjusted based on haptic data collected by the terminal device to simulate corresponding tactile sensations. The controllable parameters of the haptic gloves may include temperature, vibration parameters such as vibration mode, and pressure.

[0146] Optionally, the terminal device may also include a robotic arm. The head-mounted display device may also send the user's limb movement data, such as hand images, and send the limb movement data to the terminal device containing the robotic arm, or forward the limb movement data to the terminal device containing the robotic arm through a server, so that the terminal device containing the robotic arm controls the robotic arm's movements based on the limb movement data, thereby simulating the user's limb movements, such as grasping an object.

[0147] Optionally, the interactive data processing method further includes at least one of the following steps: sending collected user limb movement data to enable the at least one terminal device to control the movement of a robotic arm based on the limb movement data; the multimedia data further includes odor data collected by the device, and controlling an odor synthesizer to synthesize odors based on the odor data; the multimedia data further includes tactile data collected by the terminal device, and controlling at least one of vibration parameters, pressure, and temperature of a tactile glove bound to the head-mounted display device based on the tactile data.

[0148] Specifically, the odor synthesizer can generate a specific odor by mixing various basic odor chemical components, and release the synthesized odor through the odor release module, thereby simulating the odor of the environment where the terminal device is located.

[0149] The robotic arm of the terminal device can be equipped with multiple tactile sensors to collect information such as the texture, temperature, and pressure of the touched object, thus obtaining tactile data. On the head-mounted display device side, tactile gloves simulate the tactile sensation of the terminal device through controllable vibration parameters, pressure, and temperature.

[0150] Through the interaction of multi-dimensional data, the terminal device can better simulate the form of the user of the head-mounted display device, and can better replace the user in interacting with other users on the terminal device side, such as children who need companionship, patients, engineers, etc., to improve the interactive experience. It can also operate the robotic arm of the terminal device through the user's body movements, which is convenient and not limited by region, space or operating object. At the same time, through the interaction of multi-sensory data such as vision, hearing, touch and taste, the user's sense of presence is improved.

[0151] Figure 7 This is a hardware block diagram of the head-mounted display device provided in the embodiments of this application, such as... Figure 7 As shown, the head-mounted display device includes a processor, a display chip, a display screen, a camera / camera, a recording device, a speaker, a depth sensor, an infrared sensor, an ambient light sensor (ALS), a tactile glove, and an odor synthesizer.

[0152] Among them, the ambient light sensor is used to collect the color temperature and brightness of the ambient light, which can be output to the terminal device to adjust the skin color and brightness of the digital human displayed on the terminal device.

[0153] Recording equipment is used to record users' voice data and output it to terminal devices for playback.

[0154] The display chip is a chip that provides display functions. After the image data is processed by the processor, the corresponding image is displayed on the display screen through the display chip.

[0155] The processor is used to implement the interactive data processing method provided in the foregoing embodiments, as well as other processing methods.

[0156] Figure 8 The software framework diagram of the head-mounted display device provided in the embodiments of this application is as follows: Figure 8 As shown, in order to implement the interactive data processing method provided in the foregoing embodiments and thereby realize the display of naked-eye stereoscopic images, the head-mounted display device may include a visual conversion module, a parallax module, a tactile conversion module, and an odor conversion module.

[0157] The visual conversion module is used to correct the viewing angle of the image data collected by the terminal device to obtain the processed image; the parallax module is used to offset the processed image into left-eye and right-eye images; the tactile conversion module is used to convert the tactile data collected by the terminal device to generate control signals for the tactile glove, thereby controlling the tactile glove to adjust its controllable parameters; and the odor conversion module is used to convert the odor data collected by the terminal device to generate input signals for the odor synthesizer, thereby enabling the odor synthesizer to synthesize and release the corresponding odor.

[0158] Figure 9 This is a flowchart illustrating another interactive data processing method provided in an embodiment of this application. This interactive data processing method is applied to a terminal device that interacts with at least one of the aforementioned head-mounted display devices, such as... Figure 9 As shown, the interactive data processing method includes the following steps:

[0159] Step S901: Send the collected multimedia data so that the at least one head-mounted display device can display the corresponding image based on the image data in the multimedia data.

[0160] Step S902: Receive user data of the target user collected by at least one head-mounted display device, including facial data.

[0161] Step S903: Based on the facial data, adjust the digital person corresponding to the target user displayed on the display of the terminal device.

[0162] Specifically, facial data can be used as driving data for digital humans to change the form of the digital human displayed on the terminal device, such as changing the digital human's mouth shape, gaze direction, emotions, head posture, etc.

[0163] Based on eye data in facial data, a matching target eye image can be determined from multiple pre-stored eye images of the digital human, and based on mouth data in facial data, a matching target lip image can be determined from multiple pre-stored lip images of the digital human. By stitching the target eye image and the target lip image together, a complete facial image of the digital human can be obtained and displayed, thereby enabling the updating of the digital human.

[0164] In some embodiments, the target lip shape image can also be obtained through online inference. Specifically, mouth data from facial data can be input into a pre-trained lip shape-driven model, and the target lip shape image can be obtained through model inference.

[0165] For example, Figure 10 The diagram shows the digital human before and after adjustment displayed on the terminal device provided in the embodiments of this application, such as... Figure 10As shown, when no facial data is received, the digital human is in a silent state, and its face displays the default expression. Figure 10 As shown in the left image, the adjusted digital human is displayed after receiving facial data. Figure 10 As shown in the right image, compared to the left image, the digital human's mouth is slightly open, the corners of the mouth are upturned, and the eyes are curved, indicating a happy expression.

[0166] In some embodiments, the display of the terminal device can be a three-dimensional stereoscopic display, which can then display a three-dimensional digital human based on the display parameters of the three-dimensional stereoscopic display.

[0167] To improve the 3D effect of the digital human and reduce local deformation, a multi-view rendering table corresponding to the digital human can be used to display the 3D digital human. The multi-view rendering table is a pre-generated table describing the mapping relationship between each pixel in the facial image used to drive the digital human and each point on the screen of the 3D display. Therefore, when displaying or adjusting the digital human, the facial image can be mapped to the screen through the multi-view rendering table corresponding to the digital human. Points on the screen are also called sub-pixels; to avoid confusion with pixels in an image, this application refers to sub-pixels on the screen simply as points on the screen.

[0168] Digital humans can use different multi-view rendering tables for different forms or viewing angles. That is, multiple multi-view rendering tables for the digital human are pre-generated, each for a different rendering scenario. Therefore, when rendering or displaying the digital human, it is also necessary to combine the specific rendering scenario to determine the multi-view rendering table to use.

[0169] Optionally, adjusting the digital persona corresponding to the target user displayed on the terminal device's screen based on the facial data includes: generating facial image data of the digital persona corresponding to the target user based on the facial data; determining a multi-view rendering table corresponding to the digital persona corresponding to the target user based on the association features of the facial data; and mapping the facial image data to the 3D display based on the multi-view rendering table to adjust the displayed digital persona corresponding to the target user.

[0170] The target users are those whose facial data is collected by the head-mounted display device.

[0171] The associated features of facial data include internal features extracted from the facial data, and may also include external features, such as the features of the user (also known as the viewer) viewing the digital person when adjusting the display of the digital person through facial data, such as the viewer's position or height, and the display features of the digital person, such as the display position or display area.

[0172] The associated features of facial data can include the pose features of the digital human corresponding to the facial data, specifically head pose features such as head orientation and head rotation angle. The associated features of facial data can also include the relative positional relationship between the viewer and the digital human when the digital human is displayed; this relative positional relationship can be determined by the display center of the digital human and the viewer's position.

[0173] Specifically, based on the associated features of facial data, a multi-view rendering table that matches the associated features can be determined from multiple stored multi-view rendering tables. This multi-view rendering table is then used as the multi-view rendering table corresponding to the digital human. In this way, facial data can be mapped onto the screen of a 3D display through the multi-view rendering table corresponding to the digital human, thereby realizing the display of the adjusted digital human, and the displayed digital human is 3D.

[0174] Facial data can be the data corresponding to a complete image of the target user's face. Based on this facial data, a facial image that matches the facial data is selected from multiple stored digital human facial images corresponding to the target user. The data of this facial image is the facial image data.

[0175] Since the face includes multiple parts, such as eyes, eyebrows, and mouth, and different parts correspond to different shapes, there are many combinations of multiple parts and shapes, resulting in a large number of digital human facial images that need to be stored, which will occupy too much memory.

[0176] To reduce the memory occupied by digital human facial images, the face can be split, such as splitting it vertically or according to different parts, and then the images of each split part in different shapes can be stored. The split parts are matched with facial data to obtain the corresponding part images. By stitching the images of each part together, a complete facial image is obtained.

[0177] For example, the face can be divided into an upper half and a lower half, with the upper half including the eyes and the lower half including the mouth.

[0178] Optionally, based on the facial data, generating facial image data of the digital human corresponding to the target user includes: based on the facial data, determining a target lip shape image and a target facial image from multiple lip shape images and multiple facial images of the digital human corresponding to the target user, respectively; wherein the lip shape corresponding part in the facial image is a default lip shape or default color; fusing the target lip shape image and the target facial image to obtain facial image data of the digital human corresponding to the target user.

[0179] Specifically, the target facial image can be determined from multiple facial images of the digital human corresponding to the target user based on eye data in the facial data, and the target lip image can be determined from multiple lip images of the digital human corresponding to the target user based on mouth data in the facial data.

[0180] The target lip shape image and the target facial image are fused. Specifically, the target lip shape image and the target facial image are stitched together, and the target lip shape image is stitched onto the corresponding part of the lip shape in the facial image.

[0181] The system can generate facial images of the target user under various facial expressions using a large expression-driven model. The system can then adjust the default color of the mouth part (which may include only the mouth or both the mouth and nose) in the generated facial images, or replace it with an image corresponding to the default mouth shape, and store the processed facial images on the terminal device.

[0182] The number of target users can be one or more. For example, multiple target users can interact with the terminal device through their respective wearable devices, and the terminal device can display the digital human of different target users through different areas of the screen.

[0183] Figure 11 For this application Figure 9 The flowchart of step S903 shown is as follows: Figure 11 As shown, step S903 may specifically include the following steps:

[0184] Step S1101: Determine the direction of gaze based on the eye data in the facial data.

[0185] The gaze direction describes the direction in which the target user's eyes are looking. The gaze direction can be determined by recognizing the pose of the target user's pupils relative to their eye sockets using facial data.

[0186] Step S1102: Based on the gaze direction, determine the target eye image from multiple eye images of the digital human corresponding to the target user stored in the database.

[0187] The stored digital human contains multiple eye images, including images of the eyes looking in different directions. Each eye image can be labeled with a gaze direction, thereby identifying the target eye image based on the matching results of the gaze direction determined from facial data and the labels of each eye image.

[0188] Facial data can also include the target user's emotions, so when determining the target eye image, the emotion and the direction of gaze can be combined to find the eye image corresponding to that emotion and the direction of gaze, which can then be used as the target eye image.

[0189] Step S1103: Based on the mouth data in the facial data and the pre-trained lip-shape driving model, generate the lip-shape image of the digital human corresponding to the target user.

[0190] The input to the lip-sync model is the mouth data of the user, such as the target user, which can be a mouth image, mouth curvature curve, etc., and the corresponding lip-sync image is output through inference.

[0191] Facial data can also include the target user's emotions, so when generating lip-sync images, emotions and mouth data can be combined.

[0192] Step S1104: Based on the target eye image and the mouth shape image, obtain the facial image data of the digital human corresponding to the target user.

[0193] The target's eye image and mouth shape image can be stitched together according to their location to obtain a complete facial image of the digital human, thereby obtaining the digital human's facial image data, such as the matrix corresponding to the facial image.

[0194] Step S1105: Based on the association features of the facial data, determine the multi-view rendering table corresponding to the digital human of the target user.

[0195] Step S1106: Based on the multi-view rendering table, determine the pixel points in the facial image data mapped to each point in the 3D display.

[0196] Step S1107: Determine the pixel value of each point in the three-dimensional display based on the pixel values ​​of multiple pixels within a preset range of mapped pixels.

[0197] The preset range can correspond to a filtering window, such as a 2*2, 3*3 or other size window. Multiple pixels are sampled from the filtering window, and the average or weighted average of these multiple pixels is determined as the pixel value of the corresponding screen center point.

[0198] For example, the average of the pixel values ​​of the four pixels adjacent to the pixel in the four directions of up, down, left, and right can be used as the pixel value of the point in the three-dimensional display screen that the pixel is mapped to.

[0199] Step S1108: Based on the pixel values ​​of each point in the three-dimensional display, drive the three-dimensional display to display the adjusted digital human.

[0200] The 3D display is driven according to the pixel values ​​of each point in the 3D display, so that each sub-pixel of the 3D display presents the corresponding color value, thereby realizing the display of the adjusted digital human.

[0201] In this embodiment, only the eye image is stored, while the lip shape image is obtained through inference, reducing the memory required for image storage. By using a multi-view rendering table that matches the associated features of the facial image, a mapping relationship between the screen midpoint and the pixels in the image is established, realizing the display of the 3D digital human. At the same time, by using the pixel values ​​of multiple pixels around the mapped pixel, the pixel value of the mapped screen midpoint is determined, improving the display of the 3D digital human, increasing the signal-to-noise ratio, and reducing abrupt changes in detail.

[0202] Optionally, the terminal device further includes a robotic arm, and the method further includes at least one of the following steps: user data further includes limb motion data, controlling the robotic arm movement based on the limb motion data; sending collected odor data and / or tactile data to cause at least one head-mounted display device to control an odor synthesizer to synthesize an odor based on the odor data, and controlling at least one of vibration parameters, pressure, and temperature of a tactile glove attached to the head-mounted display device based on the tactile data.

[0203] For example, Figure 12 A hardware block diagram of a terminal device provided in an embodiment of this application, such as... Figure 12 As shown, the terminal device includes a processor, memory, display screen, display chip, camera / camera, robotic arm, speaker, tactile sensor, odor sensor, and recording device.

[0204] The processor, combined with the display chip, enables interactive data processing methods applicable to terminal devices. Recording devices are used to record ambient sounds to obtain audio data; tactile sensors and odor sensors are used to collect tactile and odor data, respectively; and cameras / video cameras are used to collect images / videos. All of the aforementioned data are collectively referred to as multimedia data.

[0205] The terminal device's memory can store one or more multi-view rendering tables corresponding to each digital human. Each digital human can have one or more multi-view rendering tables. To reduce the storage space required for the multi-view rendering tables, they can be compressed.

[0206] The terminal device's memory also stores one or more digital human eye images or facial images. Each digital human has multiple eye images, with different eye images looking in different directions. There are also multiple facial images, with different facial expressions in different facial images.

[0207] Corresponding to the interactive data processing method for head-mounted devices provided in the foregoing embodiments, this application also provides an interactive data processing apparatus for head-mounted devices. The head-mounted display device is used to interact with at least one terminal device. The interactive data processing apparatus includes: a facial data sending module for sending collected facial data of a user, so that at least one terminal device adjusts the displayed digital person corresponding to the user based on the facial data; a multimedia data receiving module for receiving multimedia data, wherein the multimedia data includes image data collected by the image acquisition device of at least one terminal device; and a processing module for displaying a corresponding image on the display screen of the head-mounted display device according to the image data.

[0208] Optionally, the display screen of the head-mounted display device includes a first display screen corresponding to the user's left eye and a second display screen corresponding to the user's right eye. The processing module includes: an offset unit for generating left-eye and right-eye images respectively based on image data; and a synchronous display unit for synchronously displaying the left-eye and right-eye images through the first and second display screens respectively.

[0209] Optionally, the offset unit includes: a correction subunit, used to correct the viewing angle of the image data based on the viewing angle correction matrix to obtain the processed image under the viewing angle corresponding to the head-mounted display device; and an offset subunit, used to generate left-eye and right-eye images respectively based on the processed image.

[0210] Optionally, the device further includes a viewpoint correction matrix determination module, used to: acquire the height information and pitch angle of the image acquisition device of the terminal device, wherein the height information describes the height of the image acquisition device of the terminal device relative to the eye of the displayed digital human; determine the translation matrix based on the height information; determine the perspective transformation matrix based on the pitch angle; and determine the viewpoint correction matrix based on the translation matrix and the perspective transformation matrix.

[0211] Optionally, the device further includes a viewing angle correction matrix calibration module, used for: after placing the correction sample at a preset distance in front of the image acquisition device of the terminal device, acquiring first test image data of the correction sample based on the image acquisition device of the terminal device; the correction sample includes multiple corner points; after placing the correction sample at a preset distance in front of the display screen of the head-mounted display device, acquiring a second test image of the correction sample based on the image acquisition device of the head-mounted display device; and determining the viewing angle correction matrix based on the positions of the corner points in the first test image and the second test image.

[0212] Optionally, the offset subunit is specifically used for: obtaining the user's interpupillary distance and the focal length of the head-mounted display device; determining the offset distance of each pixel in the processed image based on the pixel's depth information, as well as the user's interpupillary distance and the focal length of the head-mounted display device; and offsetting each pixel in the processed image to the left and right based on the offset distance of each pixel in the processed image to obtain the left-eye image and the right-eye image.

[0213] Optionally, the multimedia data also includes spatial audio data, which is obtained by the server processing the audio data collected by the terminal device based on the spatial model of the space where the terminal device is located; the device also includes an audio synchronization playback module, used to: synchronously play the spatial audio data when displaying the corresponding image.

[0214] Optionally, the device further includes at least one of the following modules: a limb data transmission module for transmitting collected limb movement data of the user, so that at least one terminal device controls the movement of the robotic arm based on the limb movement data; the multimedia data also includes odor data collected by the device; an odor synthesis module for controlling an odor synthesizer to synthesize odors based on the odor data; the multimedia data also includes tactile data collected by the terminal device; and a tactile simulation module for controlling at least one of the vibration parameters, pressure, and temperature of a tactile glove attached to a head-mounted display device based on the tactile data.

[0215] The interactive data processing device provided in this embodiment can execute the interactive data processing method for head-mounted display devices provided in the above embodiments. Its implementation principle and technical effect are similar, and will not be described in detail here.

[0216] Corresponding to the interactive data processing method for terminal devices provided in the foregoing embodiments, this application also provides an interactive data processing apparatus for use with a terminal device. The terminal device interacts with at least one head-mounted display device. The interactive data processing apparatus includes: a multimedia data sending module for sending collected multimedia data to enable at least one head-mounted display device to display a corresponding image based on the image data in the multimedia data; a user data receiving module for receiving user data of a target user collected by at least one head-mounted display device, the user data including facial data; and a digital human processing module for adjusting the digital human corresponding to the target user displayed on the terminal device's screen based on the facial data.

[0217] Optionally, the terminal device's display is a 3D display, and the digital human processing module includes: a facial image generation unit, used to generate facial image data of a digital human corresponding to the target user based on facial data; a rendering table determination unit, used to determine the multi-view rendering table corresponding to the digital human corresponding to the target user based on the associated features of the facial data; and a mapping unit, used to map the facial image data to the 3D display based on the multi-view rendering table to adjust the displayed digital human corresponding to the target user.

[0218] Optionally, the mapping unit is specifically used to: determine the pixels in the facial image data mapped to each point in the 3D display based on the multi-view rendering table; determine the pixel value of each point in the 3D display based on the pixel values ​​of multiple pixels within a preset range of the mapped pixels; and drive the 3D display to display the adjusted digital human based on the pixel values ​​of each point in the 3D display.

[0219] Optionally, the facial image generation unit is specifically used for: determining the gaze direction based on eye data in the facial data; determining the target eye image from multiple eye images of the digital human corresponding to the target user based on the gaze direction; generating a lip shape image of the digital human corresponding to the target user based on mouth data in the facial data and a pre-trained lip shape driving model; and obtaining facial image data of the digital human corresponding to the target user based on the target eye image and the lip shape image.

[0220] Optionally, the facial image generation unit is specifically used to: determine the target lip shape image and the target facial image respectively from multiple lip shape images and multiple facial images of the digital human corresponding to the target user, based on facial data; wherein the lip shape corresponding part in the facial image is a default lip shape or default color; and fuse the target lip shape image and the target facial image to obtain the facial image data of the digital human corresponding to the target user.

[0221] Optionally, the terminal device also includes a robotic arm, and the device further includes at least one of the following modules: user data including limb motion data; a robotic arm control module for controlling the robotic arm's movements based on the limb motion data; and an odor / tactile data transmission module for transmitting the collected odor data and / or tactile data to enable at least one head-mounted display device to control an odor synthesizer to synthesize an odor based on the odor data, and to control at least one of the vibration parameters, pressure, and temperature of a tactile glove attached to the head-mounted display device based on the tactile data.

[0222] The interactive data processing device provided in this embodiment can execute the interactive data processing method for terminal devices provided in the above embodiment. Its implementation principle and technical effect are similar, and will not be described in detail here.

[0223] Figure 13This is a schematic diagram of the structure of a data interaction processing device provided in an embodiment of this application. Figure 13 As shown, the interactive data processing device provided in this embodiment includes a memory 1302 and at least one processor 1301. The memory 1302 stores computer execution instructions.

[0224] In a specific implementation, at least one processor 1301 executes computer execution instructions stored in memory 1302, causing at least one processor 1301 to perform the above-described method.

[0225] Optionally, the interactive data processing device further includes a communication component 1303. The processor 1301, memory 1302, and communication component 1303 are connected via a bus 1304.

[0226] The specific implementation process of processor 1301 can be found in the above method embodiments, and its implementation principle and technical effect are similar. It will not be repeated here.

[0227] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor.

[0228] The memory may include random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.

[0229] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.

[0230] This application also provides a head-mounted display device, including a fixing component and a main body. The main body includes a first display screen, a first data acquisition unit, and a first processor. The fixing component is used to fix the main body to the user's head when the user wears the head-mounted display device. The second data acquisition unit is used to acquire the user's facial data. The first processor is used to execute an interactive data processing method applied to the head-mounted display device.

[0231] This application also provides a terminal device, including: a second display screen, a second data acquisition unit, and a second processor; the second data acquisition unit is used to acquire multimedia data from the external environment; the second processor is used to execute an interactive data processing method applied to the terminal device.

[0232] This application also provides an interactive system, including at least one of the aforementioned head-mounted display devices and at least one of the aforementioned terminal devices.

[0233] Furthermore, the interactive system also includes a server for forwarding some of the data used for interaction between the head-mounted display device and the terminal device.

[0234] Figure 14 A software block diagram of an interactive system provided in an embodiment of this application, such as Figure 14 As shown, the interactive system includes a cloud server, a remote robot, and a head-mounted display device.

[0235] The cloud server is used to send multimedia data collected by the remote robot to the head-mounted display device, and to send user data collected by the head-mounted display device, including facial data and body movement data, to the remote robot.

[0236] The cloud server is also used to input user frontal photos captured by head-mounted display devices into the digital human driving model, obtaining multiple facial images, or multiple eye images and multiple mouth images, or other split images describing different forms of the digital human, and storing these images. The cloud server is also used to perform spatial audio processing on audio data collected by remote robots, obtaining spatial audio data.

[0237] For facial data collected from user data by head-mounted display devices, a remote robot uses facial capture algorithms to infer the expression of the lower half of the user's face, which can be lip movements, to obtain a lip movement image of the digital human. Based on eye-tracking algorithms, it identifies the user's gaze direction and, based on the identified gaze direction, determines the target eye image from eye images stored on a cloud server. By stitching the two together, a complete facial image of the digital human is obtained. Through 3D rendering, using the multi-view rendering table corresponding to the digital human, the complete facial image of the digital human is mapped to an image, realizing the display and adjustment of the three-dimensional digital human.

[0238] The remote robot is also used to extract the corresponding movements from the user's limb movement data through motion capture algorithms, and control the movement of the remote robot's robotic arm based on the extracted movements, so that the robotic arm can perform the corresponding movements.

[0239] In an interactive system, there can be one or more head-mounted display devices, and there can also be one or more remote robots. A head-mounted display device can interact with one or more remote robots, and a remote robot can also interact with one or more head-mounted display devices.

[0240] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.

[0241] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the above-described method.

[0242] The aforementioned readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The readable storage medium can be any available medium accessible to a general-purpose or special-purpose computer.

[0243] An exemplary readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components in the device.

[0244] The division of units is merely a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0245] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0246] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0247] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0248] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.

[0249] Finally, it should be noted that other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein, and is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.

Claims

1. An interactive data processing method, characterized by, The method, applied to a head-mounted display device for interacting with at least one terminal device, includes: Send the collected facial data of the user so that the at least one terminal device adjusts the displayed digital persona corresponding to the user based on the facial data; Receive multimedia data; wherein the multimedia data includes image data acquired by the image acquisition device of the at least one terminal device; Based on the image data, the corresponding image is displayed on the screen of the head-mounted display device.

2. The method of claim 1, wherein, The display screen of the head-mounted display device includes a first display screen corresponding to the user's left eye and a second display screen corresponding to the user's right eye. Based on the image data, the corresponding image is displayed on the display screen of the head-mounted display device, including: Based on the image data, generate left-eye and right-eye images respectively; The left eye image and the right eye image are simultaneously displayed on the first display screen and the second display screen, respectively.

3. The method of claim 2, wherein, Based on the image data, left-eye and right-eye images are generated separately, including: Based on the viewing angle correction matrix, the viewing angle of the image data is corrected to obtain the processed image under the viewing angle corresponding to the head-mounted display device. Based on the processed image, the left eye image and the right eye image are generated respectively.

4. The method of claim 3, wherein, The method further includes: The height information and pitch angle of the image acquisition device of the terminal device are obtained, wherein the height information is used to describe the height of the image acquisition device of the terminal device relative to the eyes of the displayed digital human. Based on the height information, determine the translation matrix; Based on the pitch angle, determine the perspective transformation matrix; The viewpoint correction matrix is ​​determined based on the translation matrix and the perspective transformation matrix.

5. The method of claim 3, wherein, The method further includes: After placing the calibration sample at a preset distance directly in front of the image acquisition device of the terminal device, the first test image data of the calibration sample is acquired based on the image acquisition device of the terminal device; the calibration sample includes multiple corner points; After placing the calibration sample at the preset distance directly in front of the display screen of the head-mounted display device, a second test image of the calibration sample is acquired based on the image acquisition device of the head-mounted display device; The viewpoint correction matrix is ​​determined based on the positions of the corner points in the first test image and the second test image.

6. The method of claim 3, wherein, Based on the processed image, left-eye and right-eye images are generated respectively, including: Obtain the user's interpupillary distance and the focal length of the head-mounted display device; For each pixel in the processed image, the offset distance of the pixel is determined based on the depth information of the pixel, the interpupillary distance of the user, and the focal length of the head-mounted display device. Based on the offset distance of each pixel in the processed image, each pixel in the processed image is offset to the left and to the right to obtain the left eye image and the right eye image.

7. The method according to any one of claims 1 to 6, characterized in that, The multimedia data also includes spatial audio data, which is obtained by the server processing the audio data collected by the terminal device based on the spatial model of the space where the terminal device is located. The method further includes: The spatial audio data is played synchronously while the corresponding image is displayed.

8. The method according to any one of claims 1-6, characterized in that, The method further includes at least one of the following steps: Send the collected user's body movement data so that the at least one terminal device can control the movement of the robotic arm based on the body movement data; The multimedia data also includes odor data collected by the device, and the odor synthesizer is controlled to synthesize odors based on the odor data; The multimedia data also includes tactile data collected by the terminal device, and at least one of the vibration parameters, pressure, and temperature of the tactile glove bound to the head-mounted display device is controlled based on the tactile data.

9. An interactive data processing method, characterized in that, Applied to a terminal device for interacting with at least one head-mounted display device, the method includes: The collected multimedia data is transmitted so that the at least one head-mounted display device displays a corresponding image based on the image data in the multimedia data; Receive user data of a target user collected by the at least one head-mounted display device, the user data including facial data; Based on the facial data, the digital person corresponding to the target user is adjusted to be displayed on the screen of the terminal device.

10. The method according to claim 9, characterized in that, The terminal device's display is a three-dimensional display. Based on the facial data, the digital person corresponding to the target user displayed on the terminal device's display is adjusted, including: Based on the facial data, generate facial image data of the digital human corresponding to the target user; Based on the association features of the facial data, determine the multi-view rendering table corresponding to the digital human of the target user; Based on the multi-view rendering table, the facial image data is mapped to the 3D display to adjust the digital person corresponding to the target user.

11. The method according to claim 10, characterized in that, Based on the multi-view rendering table, mapping the facial image data to the 3D display includes: Based on the multi-view rendering table, determine the pixels in the facial image data mapped to each point in the 3D display; The pixel value of each point in the three-dimensional display is determined based on the pixel values ​​of multiple pixels within a preset range of mapped pixels. Based on the pixel values ​​of each point in the 3D display, the 3D display is driven to display the adjusted digital human.

12. The method according to claim 10, characterized in that, Based on the facial data, facial image data of the digital human corresponding to the target user is generated, including: Based on the eye data in the facial data, the direction of gaze is determined; Based on the gaze direction, the target eye image is determined from multiple eye images of the digital human corresponding to the target user stored in the database; Based on the mouth data in the facial data and the pre-trained lip-shape driving model, a lip-shape image of the digital human corresponding to the target user is generated; Based on the target eye image and the lip shape image, facial image data of the digital human corresponding to the target user is obtained.

13. The method according to claim 10, characterized in that, Based on the facial data, facial image data of the digital human corresponding to the target user is generated, including: Based on the facial data, a target lip shape image and a target facial image are determined from multiple lip shape images and multiple facial images of the digital human corresponding to the target user, respectively; wherein, the lip shape corresponding part in the facial image is a default lip shape or default color; By fusing the target lip shape image and the target facial image, facial image data of the digital human corresponding to the target user is obtained.

14. The method according to any one of claims 9-13, characterized in that, The terminal device further includes a robotic arm, and the method further includes at least one of the following steps: The user data also includes limb movement data, and the robotic arm's movements are controlled based on the limb movement data. The device transmits collected odor data and / or tactile data to enable the at least one head-mounted display device to control an odor synthesizer to synthesize odors based on the odor data, and to control at least one of vibration parameters, pressure, and temperature of a tactile glove attached to the head-mounted display device based on the tactile data.

15. An interactive data processing device, characterized in that, An apparatus for use in a head-mounted display device for interacting with at least one terminal device, the apparatus comprising: A facial data transmission module is used to transmit collected facial data of a user, so that the at least one terminal device adjusts the displayed digital person corresponding to the user based on the facial data; A multimedia data receiving module is used to receive multimedia data; wherein the multimedia data includes image data acquired by the image acquisition device of the at least one terminal device. The processing module is used to display the corresponding image on the display screen of the head-mounted display device according to the image data.

16. An interactive data processing device, characterized in that, Applied to a terminal device for interacting with at least one head-mounted display device, the device includes: A multimedia data transmission module is used to transmit collected multimedia data so that the at least one head-mounted display device can display a corresponding image based on the image data in the multimedia data; The user data receiving module is used to receive user data of the target user collected by the at least one head-mounted display device, the user data including facial data; A digital human processing module is used to adjust the digital human corresponding to the target user displayed on the display of the terminal device based on the facial data.

17. An interactive data processing device, characterized in that, include: Memory, processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory, causing the processor to perform the method as described in any one of claims 1-14.

18. A head-mounted display device, characterized in that, Includes: a fixed component and a main body, wherein the main body includes a first display screen, a first data acquisition unit and a first processor; The fixing component is used to fix the main body to the user's head when the user wears the head-mounted display device; The second data acquisition unit is used to acquire the user's facial data; The first processor is configured to perform the method according to any one of claims 1-8.

19. A terminal device, characterized in that, include: Second display screen, second data acquisition unit, and second processor; The second data acquisition unit is used to acquire multimedia data from the external environment; The second processor is used to perform the method according to any one of claims 9-14.

20. An interactive system, characterized in that, It includes at least one head-mounted display device as described in claim 18, and at least one terminal device as described in claim 19.

21. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1-14.

22. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the method described in any one of claims 1-14.