Screen recording method for rendered picture, and head-mounted display device, readable medium and product
By acquiring and processing the pose information and shared texture data of the head-mounted display device, distortion correction and temporal smoothing are performed, solving the problems of screen tearing and jitter in screen recording of rendering images from the head-mounted display device, and improving the quality of the recorded video.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HANGZHOU LINGBAN TECH CO LTD
- Filing Date
- 2026-02-13
- Publication Date
- 2026-07-02
AI Technical Summary
In existing technologies, screen recording of rendering images from head-mounted display devices suffers from issues such as screen tearing, snow-like spots, and frequent jitter, resulting in a decrease in the quality of the recorded images.
By acquiring the pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process, distortion correction and temporal smoothing are performed. Combined with low-latency rendering technology, high-quality rendered frames are generated and screen recording is performed.
It effectively avoids screen tearing and snow in recorded videos, reduces recording latency, minimizes video jitter, and improves the quality of recorded footage.
Smart Images

Figure CN2026079126_02072026_PF_FP_ABST
Abstract
Description
Screen recording methods for rendering images, head-mounted display devices, readable media and products
[0001] Cross-references to related applications
[0002] This application claims priority to Chinese Patent Application No. 202411947901.1, filed with the Chinese Patent Office on December 26, 2024, the entire contents of which are incorporated herein by reference. Technical Field
[0003] Embodiments of this disclosure relate to the field of computer technology, and more specifically to methods for rendering and recording screen images, head-mounted display devices, readable media, and products. Background Technology
[0004] Screen recording technology for head-mounted displays involves projecting a superimposed image of virtual and real images onto an external screen and recording it as video. The typical method for recording the rendered image is as follows: the application process independently renders the image onto the screen using a single-pass stereoscopic rendering algorithm and an auto-refresh rendering algorithm, obtaining a series of raw rendered frames. Then, the server uses a virtual screen display method (SurfaceFlinger) to composite these raw rendered frames, generating a virtual screen image. Next, this virtual screen image is rendered onto a canvas (Surface). Finally, the virtual screen image on the canvas is encoded and recorded.
[0005] However, in practice, it has been found that when using the above method to screen record the rendered image, the following technical problems often occur: First, due to the use of a single-channel stereo rendering algorithm and an automatic refresh rendering algorithm, the server cannot accurately obtain the rendering status of the image, resulting in screen tearing or snow-like spots in the recorded image, which reduces the quality of the recorded image; Second, the image rendering is based on three-degree-of-freedom pose information, and head-mounted display devices are highly sensitive. Even the slightest movements of the human body will be reflected in the rendered image, causing frequent shaking in the screen recording, which reduces the quality of the recorded image. Summary of the Invention
[0006] The summary portion of this disclosure is intended to provide a brief overview of the concepts, which will be described in detail in the detailed description portion. This summary portion is not intended to identify key or essential features of the claimed technical solutions, nor is it intended to limit the scope of the claimed technical solutions.
[0007] Some embodiments of this disclosure provide a method for rendering and recording screen images, a head-mounted display device, a readable medium, and a product to address one or more of the technical problems mentioned in the background section above.
[0008] In a first aspect, some embodiments of this disclosure provide a screen recording method for rendering a screen, including: acquiring pose information of a head-mounted display device, a target screen frame to be rendered, and shared texture data of a target application process; determining distortion correction information of the target screen frame to be rendered based on the pose information; in response to detecting jitter in a historical augmented reality screen recording video, performing time-dimensional smoothing processing on the pose information to obtain smoothed pose information, wherein the historical augmented reality screen recording video is a screen recording video corresponding to each screen frame to be rendered before the target screen frame to be rendered; and rendering the target screen frame to be rendered as a whole based on the distortion correction information, the smoothed pose information, and the shared texture data to obtain a rendered screen frame, which is then used by a server to perform screen recording processing on the historical augmented reality screen recording video and the rendered screen frame to obtain an augmented reality screen recording video.
[0009] In one embodiment, the method further includes: in response to detecting that the target application process has finished rendering and that the current application process meets the conditions for starting the rendering operation, acquiring the shared texture data corresponding to the current application process, the pose information of the head-mounted display device, and the target frame to be rendered, so as to perform overall rendering of the screen.
[0010] In one possible implementation, determining the distortion correction information of the target image frame to be rendered based on the pose information includes: acquiring device calibration information and image display position information of the head-mounted display device, wherein the image display position information is position information extracted from the pose information at the previous time point; determining a first position difference between the image display position information and the pose information; determining the target mapping display position information of the target image frame to be rendered based on the first position difference; performing anti-distortion fusion processing on the device calibration information and the first position difference to obtain a second position difference; and determining the target mapping display position information and the second position difference as distortion correction information.
[0011] In one possible implementation, the above-mentioned temporal smoothing processing of the pose information to obtain smoothed pose information includes: determining the average pose information of the target frame to be rendered and the previous frame to be rendered; determining the frame rotation angle based on the average pose information; determining the product of the derivative of the frame rotation angle with respect to time and the angular velocity in the pose information as a frame offset; determining the frame smoothness of the target frame to be rendered based on the frame offset; and generating smoothed pose information based on the frame smoothness and the average pose information.
[0012] In one possible implementation, the method further includes: controlling the low-latency rendering thread included in the target application process, performing low-latency parallel rendering on the target screen frame to be rendered according to the shared texture data, to obtain low-latency rendered screen frames; and determining each obtained low-latency rendered screen frame as a projection video, wherein each of the low-latency rendered screen frames is a screen frame obtained through low-latency rendering.
[0013] In one possible implementation, the acquisition of pose information of the head-mounted display device, the target frame to be rendered, and shared texture data of the target application process includes: controlling the low-latency rendering thread and the screen recording rendering thread included in the target application process to share an open image library context, wherein the open image library context is the rendering state and rendering resource information created by the low-latency rendering thread or the screen recording rendering thread; acquiring shared texture data through the open image library context; acquiring pose information through the head-mounted display device; and acquiring the target frame to be rendered corresponding to the target application process.
[0014] In one possible implementation, the augmented reality screen recording video is obtained through the following steps: in response to determining that a communication connection is established with the target application process through an image canvas communication mechanism, an initial rendered frame corresponding to the target application process is obtained, and a recording operation is started, wherein the initial rendered frame is the first rendered frame in the historical augmented reality screen recording video; in response to detecting that the recording operation has been successfully started, video encoding is performed on the historical augmented reality screen recording video and the rendered frame to obtain an augmented reality screen recording video bitstream, which serves as the augmented reality screen recording video.
[0015] In a second aspect, some embodiments of this disclosure provide a head-mounted display device, including: one or more processors; a storage device for storing one or more programs; at least one display screen and optical elements for imaging in front of a user; and when one or more programs are executed by one or more processors, causing the one or more processors to implement the method described in any implementation of the first aspect above.
[0016] Thirdly, some embodiments of this disclosure provide a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method as described in any of the implementations of the first aspect.
[0017] Fourthly, some embodiments of this disclosure provide a computer program product, including a computer program that, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
[0018] The above embodiments of this disclosure have the following beneficial effects: the screen recording method for rendering images in some embodiments of this disclosure can avoid the problems of screen tearing and snow in the recorded video, further reduce the latency of the recorded video, and avoid the problem of video screen shaking during recording. Specifically, the problems of screen tearing or snow in the recorded image, reducing the quality of the recorded image, and the frequent shaking of the screen recording image are caused by the following reasons: due to the use of a single-channel stereo rendering algorithm and an automatic refresh rendering algorithm, the server cannot accurately obtain the rendering status of the image, resulting in screen tearing or snow in the recorded image, reducing the quality of the recorded image. In addition, the image rendering is based on three-degree-of-freedom pose information. The head-mounted display device is highly sensitive, and even small movements of the human body will be reflected in the rendered image, resulting in frequent shaking of the screen recording image, reducing the quality of the recorded image. Based on this, the screen recording method for rendering images in some embodiments of this disclosure can first obtain the pose information of the head-mounted display device, the target image frame to be rendered, and the shared texture data of the target application process. Therefore, the obtained pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process can be used for subsequent rendering of the target frame to be rendered. Secondly, based on the pose information, distortion correction information for the target frame to be rendered is determined. This distortion correction information removes distortion from the virtual image in the head-mounted display device and modifies the pose information, reducing rendering latency and image distortion in subsequent server-recorded videos, thus improving image quality. Then, in response to the detection of jitter in historical augmented reality screen recordings, temporal smoothing is performed on the pose information to obtain smoothed pose information. The historical augmented reality screen recordings are the recordings corresponding to the frames to be rendered preceding the target frame to be rendered. Therefore, due to the sensitivity of head-mounted displays, even slight human body movements will be reflected in the virtual reality image, causing image jitter. Temporal smoothing can counteract the image jitter caused by slight human body movements, avoiding the problem of image stabilization in recorded videos. Finally, based on the aforementioned distortion correction information, smoothed pose information, and shared texture data, the target image frame to be rendered is rendered as a whole to obtain a rendered image frame. This rendered image frame is then used by the server to record the historical augmented reality screen recording video and the rendered image frame to obtain an augmented reality screen recording video. Therefore, by rendering the target image frame as a whole, screen tearing and snow-like artifacts can be avoided, improving the quality of the recorded video and further reducing rendering latency. Attached Figure Description
[0019] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and elements are not necessarily drawn to scale.
[0020] Figure 1 is a schematic diagram of an application scenario of the screen recording method for rendering screens according to some embodiments of the present disclosure;
[0021] Figure 2 is a flowchart of some embodiments of the screen recording method for rendering screens according to the present disclosure;
[0022] Figure 3 is a flowchart of some other embodiments of the screen recording method for rendering screens according to the present disclosure;
[0023] Figure 4 is a schematic diagram of the structure of an electronic device suitable for implementing some embodiments of the present disclosure. Detailed Implementation
[0024] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0025] It should also be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings. Unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other.
[0026] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.
[0027] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0028] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
[0029] This disclosure will now be described in detail with reference to the accompanying drawings and embodiments.
[0030] Figure 1 is a schematic diagram of an application scenario of the screen recording method for rendering screens according to some embodiments of the present disclosure.
[0031] As shown in Figure 1, the pose information of the head-mounted display device 101, the target frame to be rendered 102, and the shared texture data of the target application process are acquired. Based on the pose information, distortion correction information 103 for the target frame to be rendered is determined. In response to the detection of jitter in the historical augmented reality screen recording video, the pose information is smoothed over time to obtain smoothed pose information 104. The historical augmented reality screen recording videos are the screen recording videos corresponding to each frame to be rendered preceding the target frame to be rendered 102. Based on the distortion correction information 103, the smoothed pose information 104, and the shared texture data, the target frame to be rendered 102 is rendered to obtain a rendered frame 105. This rendered frame 105 is then used by the server 107 to perform screen recording processing on the historical augmented reality screen recording video and the rendered frame 105 to obtain an augmented reality screen recording video 108. In this application scenario, the connection to the server can be established via network 106.
[0032] It is understandable that the execution entity of the screen recording method can be the processor of a head-mounted display device or a related terminal device. The head-mounted display device can be an AR (Augmented Reality) head-mounted display device, a VR (Virtual Reality) head-mounted display device, or a MR (Mixed Reality) head-mounted display device. When the execution entity of the screen recording method is software, it can be installed on the head-mounted display devices listed above. It can be implemented as multiple software programs or software modules, for example, to provide distributed services, or it can be implemented as a single software program or software module. No specific limitations are made here.
[0033] Referring again to Figure 2, a flow 200 of some embodiments of the screen recording method for rendering a screen according to this disclosure is shown. This screen recording method for rendering a screen includes the following steps:
[0034] Step 201: Obtain the pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process.
[0035] In some embodiments, the execution entity of the above-described screen recording method (e.g., a head-mounted display device) can acquire the pose information of the head-mounted display device, the target image frame to be rendered, and the shared texture data of the target application process via a wired or wireless connection. The pose information can be the angle and position information of the target user wearing the head-mounted display device rotating around the horizontal, vertical, and longitudinal axes. The target image frame to be rendered can be a frame of image currently awaiting rendering. The target application process can be an application process installed on the head-mounted display device, awaiting image rendering. The shared texture data can be image data used to color the target image frame to be rendered and provide image details. It should be noted that the wireless connection method can include, but is not limited to, 3G / 4G connections, WiFi connections, Bluetooth connections, WiMAX connections, Zigbee connections, UWB (ultra-wideband) connections, and other currently known or future known wireless connection methods.
[0036] In some optional implementations of certain embodiments, the aforementioned execution entity may obtain the pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process through the following steps:
[0037] The first step involves controlling the target application process, including its low-latency rendering thread and screen recording rendering thread, to share an OpenGL (Open Graphics Library) context. This context can be the rendering state and resource information required for rendering created by either the low-latency rendering thread or the screen recording rendering thread. The low-latency rendering thread can be a thread that renders the image to the screen by reducing latency. The screen recording rendering thread can be a rendering thread that records the entire rendered image on the server. The rendering state can be information recording the state of rendering-related operations in the OpenGL environment. Rendering-related operations can include, but are not limited to, at least one of the following: texture unit and texture object binding operations, texture filtering operations, and transparent object rendering operations. Rendering-required resource information can include, but is not limited to, at least one of the following: texture data, shaders, shader programs, and buffer objects. Shaders can be software programs used to handle vertex or fragment shading operations. Buffer objects can be GPU (Graphics Processing Unit) memory areas used to store vertex data, index data, and rendering results. It should be noted that the low-latency rendering thread and the screen recording rendering thread can simplify rendering operations by using shared texture data to render the screen, reduce the storage space required for rendering, and thus further reduce the rendering load.
[0038] The second step is to obtain shared texture data through the Open Image Library context. In practice, the executing entity can obtain shared texture data from the Unity development platform software through the Open Image Library context.
[0039] The third step is to acquire pose information through a head-mounted display device. In practice, the executing entity can acquire pose information through the gyroscope sensor in the head-mounted display device.
[0040] The fourth step is to obtain the target frame to be rendered corresponding to the target application process.
[0041] Step 202: Determine the distortion correction information of the target frame to be rendered based on the pose information.
[0042] In some embodiments, the executing entity can determine the distortion correction information of the target frame to be rendered based on the pose information. The distortion correction information can be pose information obtained after correcting the pose information.
[0043] As an example, the executing entity can use ray propagation path tracking algorithms and wavelet analysis algorithms to determine the distortion correction information of the target frame to be rendered based on the pose information.
[0044] In some optional implementations of certain embodiments, the executing entity can determine the distortion correction information of the target frame to be rendered based on the pose information through the following steps:
[0045] The first step is to obtain the device calibration information and image display position information of the head-mounted display device. The image display position information can be extracted from the pose information of the previous time point. The device calibration information can represent the skew and rotation angles between the real screen and the augmented reality image. The image display position information can represent the two-dimensional coordinate position of the image rendered onto the canvas (surface).
[0046] The second step is to determine the first position difference between the displayed position information and the pose information. This first position difference represents the difference between the displayed position information and the pose information.
[0047] As an example, the executing entity can determine the product of the inverse matrix of the screen display position information and the position matrix corresponding to the pose information as the first position difference.
[0048] The third step is to determine the target mapping display position information of the target image frame to be rendered based on the first position difference. The target display position information can represent the two-dimensional mapped coordinates of each coordinate point in the target image frame to be rendered.
[0049] As an example, the executing entity can first obtain the previous frame's mapped display position information corresponding to the pose information at the previous time point and the projection matrix in the calibration of the head-mounted display device. The projection matrix represents the transformation relationship from the camera coordinate system to the screen coordinate system during image rendering. Then, the matrix product of the inverse of the projection matrix, the first position difference, and the previous frame's mapped display position information is determined and used as the target mapped display position information.
[0050] The fourth step involves performing anti-distortion fusion processing on the equipment calibration information and the first position difference to obtain the second position difference. The second position difference represents the position difference between the equipment calibration information and the first position difference.
[0051] As an example, the aforementioned execution entity can determine the matrix product of the inverse matrix of the device calibration information and the matrix of the first position difference to obtain the second position difference.
[0052] The fifth step is to determine the target mapping display position information and the difference between the second position as distortion correction information.
[0053] Step 203: In response to the detection of jitter in the historical augmented reality screen recording video, the pose information is smoothed in the time dimension to obtain smoothed pose information.
[0054] In some embodiments, the executing entity may, in response to detecting jitter in a historical augmented reality screen recording video, perform temporal smoothing on the pose information to obtain smoothed pose information. The historical augmented reality screen recording video can be a screen recording video corresponding to each frame to be rendered preceding the target frame. The recorded video can be a video obtained by recording and encoding each frame to be rendered. The smoothed pose information represents the temporally smoothed pose information to avoid jitter in the recorded image caused by minor human body movements.
[0055] As an example, the executing entity can respond to the detection of jitter in a historical augmented reality screen recording video by inputting pose information into a pose-time smoothing function to obtain smoothed pose information. The pose-time smoothing function can be... X k Y k Z k It can represent smoothed pose information, i.e., angles in the horizontal, vertical, and axial directions. X k-1 Y k-1 Z k-1 It can represent pose information, that is, the angles in the horizontal, vertical, and axial directions before time-dimension smoothing. XD M YD M ZD MThis can represent the angles detected by the time-attitude sensor in a head-mounted display device along the horizontal, vertical, and axial axes. T M It can represent the time it takes for the attitude sensor to upload the detected angle. T k It can represent the rendering time of a frame in the target image to be rendered. T k-1 It can represent the rendering time of the previous frame to be rendered. c can represent a constant value in the range [0, 1].
[0056] In some optional implementations of certain embodiments, the executing entity may perform time-dimension smoothing on the pose information to obtain smoothed pose information by performing the following steps:
[0057] The first step is to determine the average pose information of the target frame to be rendered and the previous frame to be rendered. The previous frame to be rendered can be a frame that precedes the target frame to be rendered but has not yet been rendered. The average pose information represents the average pose information between the target frame to be rendered and the previous frame to be rendered over time.
[0058] As an example, the executing entity can input the screen coordinates corresponding to the target frame to be rendered and the screen coordinates corresponding to the previous frame to be rendered into the screen coordinate mean constraint function to obtain the average pose information of the target frame to be rendered and the previous frame to be rendered. The screen coordinate mean constraint function can be... Here, Average(p1, p2, t) represents the average pose information. p1 represents the screen coordinates of the target frame to be rendered. p2 represents the screen coordinates of the previous frame to be rendered. t represents the time dimension. θ represents the angle of rotation of the head-mounted display device along the horizontal, vertical, and axial axes, i.e., Euler angles.
[0059] The second step is to determine the screen rotation angle based on the average pose information. The screen rotation angle can represent the rotation angle between the target frame to be rendered and the previous frame to be rendered, corresponding to the user's viewpoint.
[0060] As an example, the executing entity can first determine the sum of the main diagonals of the pose matrix of the average pose information, as the pose matrix trace number. Then, it can determine the difference between the pose matrix trace number and 1, and the ratio of 1 to 2, as the matrix trace number ratio. Finally, it can determine the inverse cosine function of the matrix trace number ratio as the screen rotation angle.
[0061] The third step is to determine the product of the derivative of the screen rotation angle with respect to time and the angular velocity in the pose information, as the screen offset. The screen offset represents the change in screen coordinates between the target frame to be rendered and the corresponding screen coordinates in the previous frame. Angular velocity represents the angular velocity of the head-mounted display device rotating between the target frame and the previous frame.
[0062] The fourth step is to determine the smoothness of the target frame to be rendered based on the frame offset. Frame smoothness characterizes the fluidity and continuity of the changes in the frame coordinates of the target frame and the corresponding frame coordinates in the previous frame.
[0063] As an example, the executing entity can first, in response to determining that the screen offset is less than a preset offset threshold, increase the historical screen smoothness by 1. The preset offset threshold can be a pre-defined offset threshold, for example, 0.05. The historical screen smoothness can be the screen offset corresponding to a historical augmented reality screen recording video. Then, in response to determining that the screen translation is greater than the preset offset threshold, decrease the historical screen smoothness by 1. Finally, in response to determining that the screen translation equals the preset offset threshold, determine the historical screen smoothness as the final screen smoothness.
[0064] The fifth step is to generate smoothed pose information based on the smoothness of the image and the average pose information.
[0065] As an example, the executing entity can normalize the sum of the preset screen smoothness and the average pose information to obtain the smoothed pose information.
[0066] Step 204: Based on the distortion correction information, smoothed pose information and shared texture data, perform overall rendering of the target image frame to be rendered to obtain the rendered image frame, which is then used by the server to record the historical augmented reality screen recording video and the rendered image frame to obtain the augmented reality screen recording video.
[0067] In some embodiments, the executing entity can perform overall rendering of the target frame to be rendered based on distortion correction information, smoothed pose information, and shared texture data to obtain a rendered frame. This rendered frame is then used by the server to record historical augmented reality screen recordings and the rendered frame to obtain an augmented reality screen recording video. The rendered frame can be a frame that has been rendered and is displayed on the head-mounted display, representing a fusion of virtual and real images. The augmented reality screen recording video can be a video of each rendered frame recorded by the server through a communication connection with the target application process.
[0068] As an example, the executing entity can map shared texture data to the corresponding image positions based on distortion correction information and smoothed pose information to obtain rendered image frames. This allows the server to use the SurfaceFlinger component to process historical augmented reality screen recordings and rendered image frames, resulting in augmented reality screen recording videos. The SurfaceFlinger component can be a component that combines multiple layers into a single display buffer and sends it to the canvas for display.
[0069] In some alternative implementations of certain embodiments, augmented reality screen recording video can be obtained through the following steps:
[0070] The first step, in response to determining that a communication connection has been established with the target application process via the surface inter-process communication (SIC) mechanism, the initial rendered frame corresponding to the target application process has been obtained, and recording has commenced. The initial rendered frame can be the first rendered frame from a historical augmented reality screen recording video. The surface inter-process communication (SIC) mechanism can be a surface communication mechanism between the server and the application process running in the execution entity. The recording operation can be an operation performed on the server to record the rendered screen.
[0071] The second step involves detecting that the recording operation has been successfully started, performing video encoding on the historical augmented reality screen recording video and the aforementioned rendered screen frames to obtain the augmented reality screen recording video bitstream, which serves as the augmented reality screen recording video.
[0072] In addressing the first technical problem mentioned above, a second technical problem often arises: Since historical augmented reality screen recordings are 3D videos, including multiple perspectives of 2D texture video maps and depth maps, and because depth maps differ from 2D video frames, containing numerous flat and sharp edge regions, how can depth images be encoded to reduce the video encoding bitstream and minimize the waste of video storage resources? A conventional solution for this second technical problem is to use a quadtree-based coding unit partitioning method to divide the depth map into coding units, obtaining a depth coding unit set. Then, based on the depth coding unit set, intra-frame encoding is performed on the depth map to obtain the depth encoded bitstream. However, this conventional solution still suffers from the following problems: recursive partitioning is used, and the sum of rate-distortion costs for each coding unit is calculated to determine whether further partitioning is necessary. Depth maps with sharp edge regions require more coding units, leading to low partitioning efficiency, prolonged partitioning time, and a large amount of computational resources required, increasing the load on the server-side video encoding. Considering the shortcomings of the conventional solution and leveraging the advantages and current technology of 3D video encoding within the inventor's company, the inventor adopts the following solution:
[0073] In some optional implementations of certain embodiments, the executing entity may perform video encoding on historical augmented reality screen recording videos and rendered frame images through the following steps to obtain augmented reality screen recording videos:
[0074] The first step, in response to the determination that the rendered frame and the historical termination augmented reality screen recording video frame are not located in the same rendered frame group, is to perform intra-frame encoding on the 2D texture video frame to obtain a 2D video bitstream. The rendered frame group can be a collection of rendered frames corresponding to different shooting positions at the same time, and the historical termination augmented reality screen recording video frame can be a video frame located at the termination position in the historical augmented reality screen recording video. The 2D texture video frame can be a video frame that displays the appearance information of the objects included in the rendered frame in a 2D format. The rendered frame also includes a depth map. The depth map can be an image representing the distance between the camera and the object using grayscale pixels. Intra-frame encoding can be performed using intra-frame encoding in the HEVC (High Efficiency Video Coding) algorithm.
[0075] The second step is to divide the depth map included in the rendered frame into encoding units to obtain a depth encoding unit set. This set can include depth encoding units of different sizes. For example, the depth encoding unit set can include: 64*64 depth encoding units, 32*32 depth encoding units, 16*16 depth encoding units, and 8*8 depth encoding units.
[0076] Third, for each depth coding unit in the depth coding unit set, perform the following determination steps:
[0077] Sub-step 1 involves inputting the deep coding unit into the first convolutional pooling layer of the coding unit partitioning model to obtain a first deep coding feature map. The first convolutional pooling layer may include a first convolutional layer and a first max-pooling layer. The coding unit partitioning model may also include a second convolutional pooling layer, a non-local self-attention mechanism layer, a third convolutional pooling layer, a fourth convolutional pooling layer, a spatial pyramid pooling network, and multiple fully connected layers. The first convolutional layer can be a convolutional neural network layer with 64 convolutional kernels (3x3 kernels) and a stride of 1. The first max-pooling layer can be a max-pooling layer with a 2x2 kernel and a stride of 2. The coding unit partitioning model can be a model used to determine whether the input deep coding unit needs further partitioning. The non-local self-attention mechanism layer can be an attention mechanism layer that identifies sharp edge regions in the depth map to determine the weights of each pixel in the sharp edge regions. The spatial pyramid pooling network can be a convolutional neural network that generates a fixed-size output from feature maps of different sizes. The second, third, and fourth convolutional pooling layers can be network layers composed of the same convolutional and pooling layers, with different input data than the first convolutional pooling layer.
[0078] Sub-step 2 involves inputting the first depth-coded feature map into the second convolutional pooling layer to obtain the second depth-coded feature map.
[0079] Sub-step 3 involves inputting the second deep-encoded feature map into a non-local self-attention mechanism layer to obtain a third deep-encoded feature map. This third deep-encoded feature map represents the feature information of sharp edge regions in the depth map. The non-local self-attention mechanism layer determines the different weights of each feature in the second deep-encoded feature map through the following steps:
[0080] First, the second depth feature map is input into three 1x1 convolutional layers to obtain the first, second, and third convolutional feature maps. The size of the second depth coding feature map can be c*w*h, and the sizes of the first, second, and third convolutional feature maps can all be c / 2*w*h. Here, c represents the number of channels, w represents the width of the feature map, and h represents the length of the feature map. The first and second convolutional feature maps represent the similarity between pixels at two locations. The third convolutional feature map represents the texture features of any pixel in the depth map, facilitating a weighted summation operation between the first and second convolutional feature maps.
[0081] Next, the first and third convolutional feature maps are scaled to obtain the fourth and fifth convolutional feature maps. The dimensions of the fourth and fifth convolutional feature maps can be c / 2*w / 2*h / 2.
[0082] Next, feature stretching is performed on the fourth, second, and fifth convolutional feature maps to obtain the first, second, and third convolutional stretched feature maps. The second convolutional stretched feature map can be a one-dimensional feature map of c / 2*wh. The first and third convolutional stretched feature maps can be one-dimensional feature maps of c / 2*wh / 4.
[0083] Subsequently, the second and third convolutional stretched feature maps are transposed to obtain the first and second transposed feature maps. The first transposed feature map can be a feature map of wh*c / 2, and the second transposed feature map can be a feature map of wh / 4*c / 2.
[0084] Next, the matrix multiplication of the first convolutional stretched feature map and the first transposed feature map is determined to obtain the first similarity feature map. The first similarity feature map can represent the similarity between any two pixels in the depth map. The first similarity feature map can be a wh*wh / 4*c / 2 feature map.
[0085] Next, the normalized first similarity feature map is multiplied by the matrix of the transposed second similarity feature map to obtain the second similarity feature map. The second similarity feature map can be a feature map of wh*c / 2.
[0086] Then, the second similarity feature map is stretched to obtain a stretched feature map. This stretched feature map can be a feature map of size w*h*c / 2.
[0087] Finally, the dimension-stretched feature map is expanded and concatenated with the second depth-encoded feature map to obtain the third depth-encoded feature map.
[0088] Sub-step 4: Input the third deep coding feature map into the third convolutional pooling layer to obtain the fourth deep coding feature map.
[0089] Sub-step 5: Input the fourth deep coding feature map into the fourth convolutional pooling layer to obtain the fifth deep coding feature map.
[0090] Sub-step 6 involves inputting the third deep coding feature map and its corresponding preset quantization parameter set, as well as the fifth deep coding feature map and its corresponding preset quantization parameter set, into a spatial pyramid pooling network to obtain the sixth deep coding feature map set. The preset quantization parameters in the preset quantization parameter set can be pre-defined values used to quantize the coefficients after discrete cosine transform, thereby reducing video precision. All sixth deep coding feature maps included in the sixth deep coding feature map set are of the same size.
[0091] Sub-step 7 involves concatenating the sixth deep coding feature map set and inputting it into a multi-layer fully connected layer to obtain the coding unit partitioning result. This multi-layer fully connected layer can be a three-layer fully connected layer. The coding unit partitioning result indicates whether further partitioning of the deep coding units is warranted.
[0092] Sub-step 8: In response to the determination of the coding unit partitioning result indicating the end of partitioning, intra-frame coding is performed on the depth coding unit to obtain the depth coded bitstream.
[0093] The fourth step is to continue dividing the deep coding units in response to the determination of the coding unit partitioning result. The deep coding units are then partitioned again to obtain a deep partitioned coding unit set, and the deep partitioned coding unit set is determined as the deep coding unit set, so as to continue to perform the above determination steps.
[0094] The fifth step is to determine the two-dimensional video bitstream and the obtained depth-coded bitstreams as the rendering bitstream.
[0095] Step 6: In response to determining that the rendered screen frame and the historical terminated augmented reality screen recording video frame are located in the same rendered screen frame group, inter-frame encoding is performed on the two-dimensional texture video frame and depth map included in the rendered screen frame according to the historical augmented reality screen recording video to obtain the rendered screen bitstream.
[0096] In practice, the executing entity can utilize the disparity compensation prediction, inter-viewpoint motion prediction, and inter-viewpoint redundancy prediction in the 3D-HEVC (3D High Efficiency Video Coding) algorithm to perform inter-frame coding on the remaining video frame groups based on the initial rendered frame and the aforementioned initial depth map, thereby obtaining the remaining two-dimensional video bitstream group and the remaining depth-coded bitstream group.
[0097] The seventh step is to identify the bitstream of the rendered screen and the bitstream corresponding to the historical augmented reality screen recording video as the augmented reality screen recording video bitstream.
[0098] The first to seventh steps and related content described above constitute an inventive point of this disclosure, solving the second technical problem mentioned in the background: Using a recursive approach for layer-by-layer partitioning and calculating the total rate-distortion cost of each coding unit to determine whether further partitioning is needed, the depth map has sharp edge regions requiring more coding units, leading to low partitioning efficiency, prolonged partitioning time, and a large amount of computational resources required, increasing the load on the server-side video encoding. The factors leading to low partitioning efficiency, prolonged partitioning time, and a large amount of computational resources required, increasing the load on the server-side video encoding, are often as follows: using a recursive approach for layer-by-layer partitioning and calculating the total rate-distortion cost of each coding unit to determine whether further partitioning is needed; the depth map has sharp edge regions requiring more coding units. Solving these factors can improve partitioning efficiency, reduce partitioning time, reduce waste of computational resources, and reduce the load on the server-side video encoding. To achieve this effect, this disclosure first determines whether the rendered frame and the historical terminated augmented reality screen recording video frame are located in the same rendered frame group, in order to determine the encoding method of the rendered frame under different conditions. Secondly, when the two-dimensional texture video frames are located in different rendering frame groups, the HEVC encoding algorithm is used to encode the two-dimensional texture video frames included in the rendering frame. This can reduce the encoding bitstream and storage resource consumption while ensuring the resolution of the two-dimensional texture video frames. Subsequently, the depth map is encoded and divided into depth coding unit sets, which facilitates subsequent determination of whether each depth coding unit should be further divided. Then, for each depth coding unit in the depth coding unit set, the following determination steps are performed: First, secondary feature extraction is performed on the depth coding unit through convolutional and pooling layers, which can fully learn the feature information of the depth coding unit. Second, the depth coding feature map extracted by the convolutional and pooling layers is input into the non-local self-attention mechanism layer. Since the depth map includes many flat regions and sharp edge regions, sharp edge regions can be assigned larger weight values, and flat regions can be assigned smaller weight values. This enhances the attention to important information, removes a large amount of redundant information, improves the recognition of the depth map, and thus improves the accuracy of the depth map coding unit division. Third, the third depth coding feature map is subjected to secondary convolutional pooling processing again, which can fully extract the detailed information of the depth map. The fourth step involves inputting the third-depth encoded feature map, the fifth-depth encoded feature map, and their respective preset quantization parameter sets into the spatial pyramid pooling network. By pooling inputs of different sizes, the same-size output can be obtained, which can solve the defects of inputs of different sizes and avoid the loss of information contained in the feature map.The fifth step involves inputting the concatenated sixth depth coding feature map into a multi-layer fully connected layer and performing intra-frame coding on the depth coding units that represent the end of the partitioning process. This improves the accuracy of depth coding unit partitioning, allowing for the precise early termination of the partitioning of useless coding units, thus increasing the partitioning efficiency of depth coding units and the coding efficiency of the depth map. Finally, when the frames are in the same rendering frame group, inter-frame coding is performed on the rendering frames, which reduces the bitstream generated after coding and reduces the occupation of storage resources.
[0099] Optionally, after rendering the target image frame based on distortion correction information, smoothed pose information, and shared texture data to obtain a rendered image frame, which is then used by the server to record historical augmented reality screen recording videos and rendered image frames to obtain augmented reality screen recording videos, the above method may further include:
[0100] The first step involves controlling the low-latency rendering thread within the target application process. Based on shared texture data, this thread performs low-latency parallel rendering of the target image frame to be rendered, resulting in a low-latency rendered image frame. This low-latency rendered image frame can be an image obtained by reducing the time interval between the input of the target image frame and the completion of rendering, thus rendering the color and texture details of the target image. It should be noted that the low-latency rendering thread and the recording rendering thread are independent, parallel rendering threads.
[0101] As an example, the aforementioned execution entity can control the low-latency rendering thread included in the target application process, and use a low-latency rendering algorithm to perform low-latency parallel rendering of the target image frame to be rendered based on shared texture data, thereby obtaining a low-latency rendered image frame. The low-latency rendering algorithm can be a Single Buffer rendering algorithm or a Positional Time Warp (PTW) algorithm.
[0102] The second step is to identify the obtained low-latency rendered frames as the projected video. Each low-latency rendered frame can be a frame obtained through low-latency rendering. The projected video can be a video obtained by projecting each low-latency rendered frame onto a real screen.
[0103] The above embodiments of this disclosure have the following beneficial effects: the screen recording method for rendering images in some embodiments of this disclosure can avoid the problems of screen tearing and snow in the recorded video, further reduce the latency of the recorded video, and avoid the problem of video screen shaking during recording. Specifically, the problems of screen tearing or snow in the recorded image, reducing the quality of the recorded image, and the frequent shaking of the screen recording image are caused by the following reasons: due to the use of a single-channel stereo rendering algorithm and an automatic refresh rendering algorithm, the server cannot accurately obtain the rendering status of the image, resulting in screen tearing or snow in the recorded image, reducing the quality of the recorded image. In addition, the image rendering is based on three-degree-of-freedom pose information. The head-mounted display device is highly sensitive, and even small movements of the human body will be reflected in the rendered image, resulting in frequent shaking of the screen recording image, reducing the quality of the recorded image. Based on this, the screen recording method for rendering images in some embodiments of this disclosure can first obtain the pose information of the head-mounted display device, the target image frame to be rendered, and the shared texture data of the target application process. Therefore, the obtained pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process can be used for subsequent rendering of the target frame to be rendered. Secondly, based on the pose information, distortion correction information for the target frame to be rendered is determined. This distortion correction information removes distortion from the virtual image in the head-mounted display device and modifies the pose information, reducing rendering latency and image distortion in subsequent server-recorded videos, thus improving image quality. Then, in response to the detection of jitter in historical augmented reality screen recordings, temporal smoothing is performed on the pose information to obtain smoothed pose information. The historical augmented reality screen recordings are the recordings corresponding to the frames to be rendered preceding the target frame to be rendered. Therefore, due to the sensitivity of head-mounted displays, even slight human body movements will be reflected in the virtual reality image, causing image jitter. Temporal smoothing can counteract the image jitter caused by slight human body movements, avoiding the problem of image stabilization in recorded videos. Finally, based on the aforementioned distortion correction information, smoothed pose information, and shared texture data, the target image frame to be rendered is rendered as a whole to obtain a rendered image frame. This rendered image frame is then used by the server to record the historical augmented reality screen recording video and the rendered image frame to obtain an augmented reality screen recording video. Therefore, by rendering the target image frame as a whole, screen tearing and snow-like artifacts can be avoided, improving the quality of the recorded video and further reducing rendering latency.
[0104] Referring further to Figure 3, a flow 300 of another embodiment of the screen recording method for rendering a screen according to this disclosure is shown. This screen recording method for rendering a screen includes the following steps:
[0105] Step 301: Obtain the pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process.
[0106] Step 302: Determine the distortion correction information of the target frame to be rendered based on the pose information.
[0107] Step 303: In response to the detection of jitter in the historical augmented reality screen recording video, the pose information is smoothed in the time dimension to obtain smoothed pose information.
[0108] Step 304: Based on the distortion correction information, smoothed pose information and shared texture data, perform overall rendering of the target image frame to be rendered to obtain the rendered image frame, which is then used by the server to record the historical augmented reality screen recording video and the rendered image frame to obtain the augmented reality screen recording video.
[0109] In some embodiments, the specific implementation of steps 301-304 and the resulting technical effects can be referred to steps 201-204 in the embodiment corresponding to Figure 2, and will not be repeated here.
[0110] Step 305: In response to detecting that the target application process has finished rendering and that the current application process meets the conditions for starting the rendering operation, the shared texture data corresponding to the current application process, the pose information of the head-mounted display device, and the target frame to be rendered are obtained to perform overall rendering of the screen.
[0111] In some embodiments, the execution entity may, in response to detecting that the target application process has finished rendering and that the current application process meets the conditions for starting the rendering operation, acquire the shared texture data corresponding to the current application process, the pose information of the head-mounted display device, and the target frame to be rendered, so as to perform overall image rendering. The current application process may be an application process that is different from the target application process and requires image rendering.
[0112] As can be seen from Figure 3, compared with the description of some embodiments corresponding to Figure 2, the flow 300 of the screen recording method for rendering screens in some embodiments corresponding to Figure 3 embodies the steps of continuous recording of rendering screens between the target application process and the current application process. Therefore, the solutions described in these embodiments can continuously record rendering screens while different application processes are constantly exiting and starting rendering operations, thereby achieving seamless recording between multiple application processes and improving the smoothness of recording and the quality of the screen-recorded rendering screens.
[0113] Referring now to FIG4, a schematic diagram of the hardware structure of a head-mounted display device suitable for implementing some embodiments of the present disclosure is shown. The head-mounted display device shown in FIG4 is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.
[0114] As shown in Figure 4, the AR device 400 includes a processing device (e.g., a central processing unit, an image processor, etc.) 401, a memory (ROM) 402, an input unit 403, and an output unit 404, wherein the processing device 401, the memory 402, the input unit 403, and the output unit 404 are interconnected via a bus 405. Here, the methods according to some embodiments of this disclosure can be implemented as a computer program and stored in the memory 402. The processing device 401 in the AR device 400 implements the 3D spatial interface interaction function defined in some embodiments of this disclosure by calling the aforementioned computer program stored in the memory 402. In some implementations, the input unit 403 may include devices such as a camera, microphone, gyroscope, accelerometer, and magnetometer, and the output unit 404 may be a display module or other device for displaying content. The display module may include an optomechanical system and optical elements. The optical elements may include prisms, freeform surfaces, birdbaths, optical waveguides, and other optical components. Therefore, when the processing device 401 calls the aforementioned computer program to execute the interface interaction function in 3D space, it can control the input unit 403 to obtain the user's gestures, voice and other operation commands, and control the output unit 404 to display the display content.
[0115] It should be noted that, in some embodiments of this disclosure, the computer-readable medium described above may be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In some embodiments of this disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In some embodiments of this disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.
[0116] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol), and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or future-developed networks.
[0117] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device. The aforementioned computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: acquire pose information of the head-mounted display device, the target frame to be rendered, and shared texture data of the target application process; determine distortion correction information of the target frame to be rendered based on the pose information; in response to detecting jitter in a historical augmented reality screen recording video, perform temporal smoothing processing on the pose information to obtain smoothed pose information, wherein the historical augmented reality screen recording video refers to the screen recording videos corresponding to each frame to be rendered preceding the target frame to be rendered; and perform overall rendering of the target frame to be rendered based on the distortion correction information, the smoothed pose information, and the shared texture data to obtain a rendered frame, which is then used by the server to perform screen recording processing on the historical augmented reality screen recording video and the rendered frame to obtain an augmented reality screen recording video.
[0118] Computer program code for performing operations of some embodiments of this disclosure can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, and conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0119] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0120] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.
[0121] Some embodiments of this disclosure also provide a computer program product, including a computer program that, when executed by a processor, implements any of the above-described screen recording methods for rendering images.
[0122] The above description is merely a selection of preferred embodiments of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described inventive concept. For example, technical solutions formed by substituting the above-described features with (but not limited to) technical features with similar functions disclosed in the embodiments of this disclosure.
Claims
1. A method for screen recording with rendered footage, comprising: Acquire the pose information of the head-mounted display device, the target frame to be rendered, and the shared texture data of the target application process; Based on the pose information, the distortion correction information of the target frame to be rendered is determined; In response to the detection of jitter in the historical augmented reality screen recording video, the pose information is smoothed in the time dimension to obtain smoothed pose information, wherein the historical augmented reality screen recording video is the screen recording video corresponding to each frame to be rendered that is located before the target frame to be rendered. Based on the distortion correction information, the smoothed pose information, and the shared texture data, the target frame to be rendered is rendered as a whole to obtain a rendered frame, which is then used by the server to record the historical augmented reality screen recording video and the rendered frame to obtain an augmented reality screen recording video.
2. The method according to claim 1, wherein, The method further includes: In response to the detection that the target application process has finished rendering and the current application process meets the conditions for starting the rendering operation, the shared texture data corresponding to the current application process, the pose information of the head-mounted display device, and the target frame to be rendered are obtained to perform overall rendering of the screen.
3. The method according to claim 1, wherein, The step of determining the distortion correction information of the target image frame to be rendered based on the pose information includes: The device calibration information and screen display position information of the head-mounted display device are obtained, wherein the screen display position information is position information extracted from the pose information of the previous time point; Determine a first position difference between the screen display position information and the pose information; Based on the first position difference, determine the target mapping display position information of the target image frame to be rendered; The device calibration information and the first position difference are subjected to anti-distortion fusion processing to obtain the second position difference; The target mapping display position information and the second position difference are determined as distortion correction information.
4. The method according to claim 1, wherein, The step of smoothing the pose information over time to obtain smoothed pose information includes: Determine the average pose information of the target frame to be rendered and the previous frame to be rendered; The screen rotation angle is determined based on the average pose information. The product of the derivative of the screen rotation angle with respect to time and the angular velocity in the pose information is determined as the screen offset. Based on the image offset, determine the image smoothness of the target image frame to be rendered; Based on the image smoothness and the average pose information, smoothed pose information is generated.
5. The method according to claim 1, wherein, The method further includes: The target application process includes a low-latency rendering thread, which performs low-latency parallel rendering on the target frame to be rendered based on the shared texture data, to obtain a low-latency rendered frame. Each low-latency rendered frame is determined as the projected video, wherein each low-latency rendered frame is a frame obtained through low-latency rendering.
6. The method according to claim 1, wherein, The acquisition of pose information of the head-mounted display device, target frame to be rendered, and shared texture data of the target application process includes: The target application process is controlled by a low-latency rendering thread and a screen recording rendering thread that share an open image library context, wherein the open image library context is the rendering state and rendering resource information created by the low-latency rendering thread or the screen recording rendering thread; Shared texture data is obtained through the context of the open image library; Position and pose information is obtained through the head-mounted display device; Obtain the target frame to be rendered corresponding to the target application process.
7. The method according to claim 1, wherein, The augmented reality screen recording video was obtained through the following steps: In response to determining that a communication connection is established with the target application process through the image canvas communication mechanism, the initial rendered screen frame corresponding to the target application process is obtained, and the recording operation is started, wherein the initial rendered screen frame is the first rendered screen frame in the historical augmented reality screen recording video. In response to the detection that the recording operation has been successfully started, the historical augmented reality screen recording video and the rendered screen frame are video encoded to obtain an augmented reality screen recording video bitstream, which is used as the augmented reality screen recording video.
8. A head-mounted display device, comprising: One or more processors; Storage device for storing one or more programs; At least one display screen and optical elements for imaging in front of the user's eyes; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-7.
9. A computer-readable medium having a computer program stored thereon, wherein, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-7.
10. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-7.