Keyframe insertion method and device, head-mounted display device and storage medium

By calculating the optical flow field values ​​and camera pose of two adjacent frames, it is determined whether to insert a keyframe. This solves the problem of too many or too few keyframes being inserted in SLAM technology, achieving a balance between recognition accuracy and efficiency, and reducing the impact of fast-moving objects.

CN122199656APending Publication Date: 2026-06-12GOERTEK INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GOERTEK INC
Filing Date
2024-12-04
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In SLAM technology, inserting too many keyframes will increase the computational burden and storage overhead, while inserting too few may miss important environmental information. Furthermore, moving objects in dynamic scenes will affect feature point tracking and pose estimation, leading to map construction errors.

Method used

By calculating the average optical flow field value and camera pose of two adjacent frames, it is determined whether the insertion conditions are met. If they are met, a key frame is inserted. The system considers rapidly changing areas in the image and changes in camera motion to avoid redundancy or sparseness in key frame insertion.

🎯Benefits of technology

While ensuring recognition accuracy, the number of keyframes inserted was balanced, reducing the impact of fast-moving objects on map building and improving the accuracy and efficiency of map building.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199656A_ABST
    Figure CN122199656A_ABST
Patent Text Reader

Abstract

The application discloses a key frame insertion method and device, a head-mounted display device and a storage medium, and relates to the technical field of computer vision. The method comprises the following steps: acquiring a target video through a camera, and traversing two adjacent target images in the target video; calculating an average optical flow field value between the two target images; acquiring target poses of the camera when the two target images are shot; calculating a motion change amount of the camera between the two target images according to the target poses; judging whether the average optical flow field value and the motion change amount meet a preset insertion condition, and if the insertion condition is met, inserting a key frame in the two target images. The application reduces the influence of fast-moving objects on map construction, avoids the redundancy or sparseness of key frame insertion, and balances the number of key frame insertion while ensuring the recognition accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision technology, and in particular to a keyframe insertion method, apparatus, head-mounted display device, and storage medium. Background Technology

[0002] In SLAM (Simultaneous Localization and Mapping) technology, keyframe insertion is a core issue. This mechanism accurately captures and saves the camera's pose at a specific moment and its associated important information (such as feature points and descriptors), laying a solid foundation for the entire system to maintain high-quality map construction. It also provides strong support for subsequent key steps such as map optimization and loop closure detection.

[0003] However, inserting too many keyframes will increase the computational burden and storage overhead of the system, while inserting too few keyframes may miss important environmental information. Furthermore, in dynamic scenes, moving objects can affect the tracking and matching of feature points, leading to incorrect pose estimation and thus affecting map construction.

[0004] Therefore, how to balance the number of keyframes inserted while ensuring recognition accuracy has become an urgent problem to be solved in this field.

[0005] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0006] The main objective of this application is to provide a keyframe insertion method, apparatus, head-mounted display device, and storage medium, aiming to solve the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy.

[0007] To achieve the above objectives, this application proposes a keyframe insertion method, which includes:

[0008] The target video is acquired by the camera, and the target images of two adjacent frames in the target video are traversed.

[0009] Calculate the average optical flow field value between the two target images;

[0010] The target pose of the camera when capturing the two target images is obtained respectively;

[0011] Calculate the motion change of the camera between the two target images based on the respective target poses;

[0012] Determine whether the average optical flow field value and the amount of motion change meet the preset insertion conditions. If the insertion conditions are met, insert a keyframe into the two target images.

[0013] In one embodiment, the step of calculating the average optical flow field value between the two target images includes:

[0014] Calculate the displacement vector of each pixel in the two target images between the two target images to obtain the optical flow field of each pixel;

[0015] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0016] The average optical flow field value of each pixel is calculated to obtain the average optical flow field value between the two target images.

[0017] In one embodiment, the step of calculating the average optical flow field value between the two target images includes:

[0018] The two target images are scaled to construct an image pyramid;

[0019] At each scale of the image pyramid, the displacement vector of each pixel in the two target images between the two target images is calculated to obtain the optical flow field of each pixel.

[0020] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0021] The average optical flow field value between the two target images is obtained by weighting and averaging the optical flow field values ​​according to each scale.

[0022] In one embodiment, the target pose includes: a rotation matrix and a translation vector, and the motion changes include: the rate of change of position and the rotational angular velocity;

[0023] The step of calculating the motion change of the camera between the two target images based on the respective target poses includes:

[0024] Calculate the norm of the difference between each translation vector, and use the norm as the rate of change of position;

[0025] The rotational angular velocity is calculated based on each of the rotation matrices.

[0026] In one embodiment, the step of determining whether the average optical flow field value and the amount of motion change satisfy a preset insertion condition includes:

[0027] The comprehensive score between the two target images is calculated based on the preset optical flow field weight, the average optical flow field value, the preset position weight, the position change rate, the preset angular velocity weight, and the rotational angular velocity.

[0028] If the overall score is greater than the preset score threshold, then the average optical flow field value and the motion change amount are determined to meet the preset insertion conditions.

[0029] In one embodiment, the step of determining whether the average optical flow field value and the amount of motion change satisfy a preset insertion condition further includes:

[0030] If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the rotational angular velocity is greater than a preset angular velocity threshold, and / or the position change rate is greater than a preset position threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet preset insertion conditions.

[0031] In one embodiment, after the step of calculating the optical flow field values ​​of each of the optical flow fields using the Euclidean norm, the method further includes:

[0032] Target pixels with optical flow field values ​​greater than a preset second optical flow field threshold are selected from each of the pixels.

[0033] If the two target images contain a target region and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into the two target images of the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

[0034] Furthermore, to achieve the above objectives, this application also proposes a keyframe insertion device, which includes:

[0035] The image acquisition module is used to acquire target video through a camera and traverse two adjacent target images in the target video;

[0036] The optical flow calculation module is used to calculate the average optical flow field value between the two target images;

[0037] A motion monitoring module is used to acquire the target pose of the camera when capturing the two target images; and to calculate the motion change of the camera between the two target images based on each target pose.

[0038] The keyframe insertion module is used to determine whether the average optical flow field value and the motion change amount meet the preset insertion conditions. If the insertion conditions are met, a keyframe is inserted into the two target images.

[0039] In addition, to achieve the above objectives, this application also proposes a head-mounted display device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the keyframe insertion method as described above.

[0040] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and which, when executed by a processor, implements the steps of the keyframe insertion method described above.

[0041] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the keyframe insertion method described above.

[0042] This application provides a keyframe insertion method. First, the target video into which the keyframe needs to be inserted is acquired. Then, two adjacent target images in the target video are traversed. Next, the average optical flow field value between the two adjacent target images is calculated. Then, it is detected whether the adjacent target images include rapidly changing regions. Then, the target pose of the camera when capturing the two adjacent target images is detected. Based on the target pose, the amount of motion change between the two adjacent target images can be obtained, thereby detecting the degree of scene change in the adjacent target images. Finally, based on the average optical flow field value and the amount of motion change, it is determined whether a keyframe needs to be inserted between the adjacent target images. If the insertion condition is met, the keyframe is inserted.

[0043] In summary, this application detects whether adjacent target images contain rapidly changing regions by detecting the average optical flow field value, and detects the degree of scene change in adjacent target images by detecting the amount of camera motion change. Compared with traditional keyframe insertion methods, this application considers the impact of rapidly moving objects in the image on pose estimation, uses the camera pose when capturing adjacent images as a reference indicator, reduces the impact of rapidly moving objects on map construction, and uses whether the images contain rapidly changing regions as a reference indicator to avoid redundancy or sparsity in keyframe insertion, thus balancing the number of keyframe insertions while ensuring recognition accuracy. Attached Figure Description

[0044] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0045] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0046] Figure 1 This is a flowchart illustrating an embodiment of the keyframe insertion method of this application.

[0047] Figure 2 This is a schematic diagram of the scale transformation involved in one embodiment of the keyframe insertion method of this application;

[0048] Figure 3 This is a flowchart illustrating Embodiment 2 of the keyframe insertion method of this application;

[0049] Figure 4 This is a schematic diagram of the module structure of the keyframe insertion device according to an embodiment of this application;

[0050] Figure 5 This is a schematic diagram of the device structure of the hardware operating environment involved in the keyframe insertion method in the embodiments of this application.

[0051] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0052] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0053] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0054] The main solution of this application embodiment is as follows: acquire target video through a camera, and traverse two adjacent target images in the target video; calculate the average optical flow field value between the two target images; acquire the target pose of the camera when capturing the two target images respectively; calculate the motion change of the camera between the two target images based on each target pose; determine whether the average optical flow field value and the motion change value meet the preset insertion conditions; if the insertion conditions are met, insert a keyframe in the two target images.

[0055] In this embodiment, for ease of description, the following description uses a head-mounted display device as the execution subject.

[0056] In SLAM technology, keyframe insertion is a core issue. This mechanism accurately captures and preserves the camera's pose at a specific moment and its associated important information (such as feature points and descriptors), laying a solid foundation for the entire system to maintain high-quality map construction. It also provides strong support for subsequent key steps such as map optimization and loop closure detection.

[0057] However, inserting too many keyframes will increase the computational burden and storage overhead of the system, while inserting too few keyframes may miss important environmental information. Furthermore, in dynamic scenes, moving objects can affect the tracking and matching of feature points, leading to incorrect pose estimation and thus affecting map construction.

[0058] Therefore, how to balance the number of keyframes inserted while ensuring recognition accuracy has become an urgent problem to be solved in this field.

[0059] To address the aforementioned issues, this application provides a keyframe insertion method. This method first acquires the target video into which keyframes need to be inserted, then iterates through two adjacent target images in the target video. Next, it calculates the average optical flow value between the two adjacent target images, detects whether the adjacent target images contain rapidly changing regions, and then detects the target pose of the camera when capturing the two adjacent target images. Based on the target pose, the amount of motion change between the two adjacent target images can be obtained, thereby detecting the degree of scene change in the adjacent target images. Finally, based on the average optical flow value and the amount of motion change, it is determined whether a keyframe needs to be inserted between the adjacent target images. If the insertion conditions are met, the keyframe is inserted.

[0060] In summary, this application detects whether adjacent target images contain rapidly changing regions by detecting the average optical flow field value, and detects the degree of scene change in adjacent target images by detecting the amount of camera motion change. Compared with traditional keyframe insertion methods, this application considers the impact of rapidly moving objects in the image on pose estimation, uses the camera pose when capturing adjacent images as a reference indicator, reduces the impact of rapidly moving objects on map construction, and uses whether the images contain rapidly changing regions as a reference indicator to avoid redundancy or sparsity in keyframe insertion, thus balancing the number of keyframe insertions while ensuring recognition accuracy.

[0061] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or head-mounted display device capable of performing the above functions. The following description uses a head-mounted display device as an example to illustrate this embodiment and the subsequent embodiments.

[0062] Based on this, embodiments of this application provide a keyframe insertion method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the keyframe insertion method of this application.

[0063] In this embodiment, the keyframe insertion method includes steps S10 to S50:

[0064] Step S10: Acquire the target video through the camera and traverse the target video to two adjacent frames of target images;

[0065] In this embodiment, when performing SLAM technology, in order to build and update the environment map and locate the position of the head-mounted display device (hereinafter referred to as the head-mounted display device) in real time, the head-mounted display device acquires the target video through its onboard camera and traverses the two adjacent target images in the target video.

[0066] Step S20: Calculate the average optical flow field value between the two target images;

[0067] It should be noted that, in this embodiment, optical flow field is a method to describe the motion of pixels in an image. The optical flow field can be obtained by calculating the displacement vectors of corresponding pixels in two adjacent frames.

[0068] In this embodiment, the average optical flow field value between each target image is calculated. This value reflects the average speed and direction of the overall pixel motion in the image. The calculation of the average optical flow field value helps to evaluate the speed and direction of movement of the head-mounted display device in the environment, as well as the changes in the environment.

[0069] Furthermore, in a feasible implementation, step S20 described above may include steps A10 to A30:

[0070] Step A10: Calculate the displacement vector of each pixel in the two target images between the two target images to obtain the optical flow field of each pixel.

[0071] In this embodiment, for each pixel in each frame, its displacement vector between the two frames is calculated to obtain the optical flow field of each pixel.

[0072] Step A20: Calculate the optical flow field value of each optical flow field using the Euclidean norm;

[0073] In this embodiment, the Euclidean norm is then used to quantify the optical flow field value of each pixel. The Euclidean norm is a method for measuring the length of a vector, and here it is used to calculate the magnitude of the displacement vector of each pixel, that is, the distance that the point moves between two frames.

[0074] Step A30: Calculate the average value of the optical flow field value of each pixel to obtain the average optical flow field value between the two target images.

[0075] In this embodiment, the average optical flow field value between the two target images is obtained by averaging the optical flow field values ​​of all pixels. This average value can comprehensively reflect the overall motion of the two frames and is one of the important indicators for evaluating the dynamic changes of the video sequence.

[0076] For example, the optical flow field V(x,y)=(u(x,y),v(x,y)) of two adjacent frames is defined as the optical flow field of each pixel (x,y) in image I. t and I t+1 The displacement vectors between the two sides, where u(x,y) and v(x,y) represent the horizontal and vertical displacements, respectively.

[0077] (1) The magnitude of the optical flow field can be expressed by the Euclidean norm of the vector as:

[0078]

[0079] Where ∥V(x,y)∥ represents the motion at the pixel.

[0080] (2) The amount of motion in rapidly changing regions can be represented as the average motion of the optical flow field across the entire image:

[0081]

[0082] Where N is the total number of pixels in the image, and Ω is the image domain. M avg This represents the average motion speed of the image.

[0083] By implementing the detailed steps described above, we can more accurately capture subtle motion changes of target objects in videos, which is of great significance for subsequent target location detection and intelligent keyframe insertion.

[0084] Alternatively, in another feasible implementation, please refer to Figure 2 , Figure 2 This is a schematic diagram illustrating the scale transformation process involved in one embodiment of the keyframe insertion method of this application, as shown below. Figure 2 As shown, step S20 above may further include steps B10 to B40:

[0085] Step B10: Scale transformation is performed on the two target images to construct an image pyramid;

[0086] It should be noted that, in this embodiment, the image pyramid is a type of multi-scale image representation, an effective yet conceptually simple structure for interpreting images at multiple resolutions. An image pyramid is a series of image resolutions that gradually decrease in a pyramid shape (from bottom to top) and originate from the same original image. The higher the level, the smaller the image and the lower the resolution. It is obtained through stepwise downsampling until a certain termination condition is met.

[0087] In this embodiment, when calculating the optical flow field value, the scale of two consecutive selected frames can be transformed to construct an image pyramid. An image pyramid is a multi-resolution representation method that can analyze images at different scales, thereby better capturing features and motion at different scales.

[0088] Step B20: Calculate the displacement vector of each pixel in the two target images between the two target images at each scale of the image pyramid to obtain the optical flow field of each pixel.

[0089] In this embodiment, the displacement vector of each pixel in each frame between two frames is calculated at each scale of the image pyramid to obtain the optical flow field of each pixel. By detecting the motion of pixels at multiple scales, the sensitivity to large-scale and smooth motion can be improved.

[0090] Step B30: Calculate the optical flow field value of each optical flow field using the Euclidean norm;

[0091] In this embodiment, the optical flow field value of each pixel at each scale is then calculated using the Euclidean norm.

[0092] Step B40: Perform a weighted average of the optical flow field values ​​according to each of the scales to obtain the average optical flow field value between the two target images.

[0093] In this embodiment, the optical flow field values ​​at each scale are weighted and averaged according to their importance to obtain the average optical flow field value between the two target images. The weights of each optical flow field value can be set according to the actual situation. When it is necessary to focus on a large area, the weights corresponding to larger scale levels can be increased, and vice versa. The weighted averaging process takes into account the influence of different scales on the final result. Generally, smaller scale images can provide finer motion details, while larger scale images are helpful in capturing a wide range of motion.

[0094] Through the above steps, this method can more comprehensively analyze the motion of target objects in target videos, and performs particularly well when dealing with complex scenes containing motion at multiple scales.

[0095] The above are only two feasible implementation methods of step S20 provided in this embodiment. This embodiment does not specifically limit the specific implementation method of step S20.

[0096] Step S30: Obtain the target pose of the camera when capturing the two target images;

[0097] In this embodiment, the head-mounted display also needs to acquire the position and orientation information of the camera when capturing these two frames of images, i.e., the target pose, through built-in sensors. The target pose reflects the camera's motion state in space.

[0098] Step S40: Calculate the motion change of the camera between the two target images based on the pose of each target;

[0099] In this embodiment, the target pose information is used to calculate the changes in position and orientation of the camera between two adjacent frames. The changes in position and orientation reflect the motion of the camera when capturing these two frames.

[0100] Further, in a feasible implementation, the target pose includes: a rotation matrix and a translation vector, and the motion changes include: a rate of change of position and a rotational angular velocity; step S40 above may include steps S41 to S42:

[0101] Step S41: Calculate the norm of the difference between each translation vector, and use the norm as the rate of change of position;

[0102] In this embodiment, after obtaining the target pose (including rotation matrix and translation vector) of the camera when capturing two adjacent target images, the difference between the two translation vectors is first calculated, and the norm of the difference (i.e., the length or modulus of the vector) is obtained. The norm reflects the rate of position change of the camera during the capture of two adjacent images, i.e., the rate of position change.

[0103] Specifically, the camera pose will be acquired at time t and denoted as T. t =[R t ,t t ], where R t Let t be a rotation matrix. t Let T be the translation vector, and let T be the camera pose acquired at time t-1. t-1 =[R t-1 ,t t-1 The formula for calculating the rate of change of camera pose at two time steps is as follows:

[0104] Rate of change of position Δt=∥t t -t t-1 ∥ 。

[0105] Step S42: Calculate the rotational angular velocity based on each of the rotation matrices.

[0106] In this embodiment, the rotation matrix describes the change in orientation of the camera when capturing two adjacent frames. To calculate the rotation angular velocity, matrix operations or quaternion operations can be used to compare the two rotation matrices, thereby obtaining the rotation angular velocity of the camera during the capture of two adjacent frames. The angular velocity reflects the rate of rotation of the camera around its center of mass.

[0107] By analyzing the target pose (rotation matrix and translation vector), the changes in camera motion (rate of change of position and rotational angular velocity) can be accurately calculated. These quantified metrics provide a solid foundation for subsequent insertion condition determination, enabling the head-mounted display to more accurately identify the moments when keyframes need to be inserted.

[0108] Step S50: Determine whether the average optical flow field value and the motion change amount meet the preset insertion conditions. If the insertion conditions are met, insert a keyframe into the two target images.

[0109] In this embodiment, based on the data obtained in the first two steps, the average optical flow field value and the amount of camera motion change are further analyzed to determine whether keyframes need to be inserted into the video stream. Here, "insertion condition" can refer to when the average optical flow field value shows significant motion change, or when the amount of camera motion change exceeds a certain threshold. In order to improve the accuracy and efficiency of subsequent processing, the head-mounted display device will insert one or more keyframes between the two frames currently being processed.

[0110] This application detects whether adjacent target images contain rapidly changing regions by detecting the average optical flow field value, and detects the degree of scene change in adjacent target images by detecting the amount of camera motion change. Compared with traditional keyframe insertion methods, this application considers the impact of rapidly moving objects in the image on pose estimation, and uses the camera pose when capturing adjacent images as a reference indicator, reducing the impact of rapidly moving objects on map construction. At the same time, using whether the images contain rapidly changing regions as a reference indicator avoids redundancy or sparsity in keyframe insertion, thereby balancing the number of keyframe insertions while ensuring recognition accuracy.

[0111] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to that in the first embodiment described above can be referred to the above description and will not be repeated hereafter. Based on this, the step S50 above, which determines whether the average optical flow field value and the amount of motion change meet the preset insertion conditions, may include steps C10 to C20:

[0112] Step C10: Calculate the comprehensive score between the two target images based on the preset optical flow field weight, the average optical flow field value, the preset position weight, the position change rate, the preset angular velocity weight, and the rotational angular velocity.

[0113] In this embodiment, the importance of the average optical flow value, the rate of change of position, and the rotational angular velocity are first evaluated based on preset weight values ​​(including optical flow field weight, position weight, and angular velocity weight). These weight values ​​reflect the sensitivity and importance that the head-mounted display device attaches to different types of motion changes. Then, these weight values ​​are multiplied by the corresponding motion change amounts, and the results are added together to obtain a comprehensive score. The comprehensive score reflects the overall degree of motion change between two adjacent target image frames.

[0114] Step C20: If the comprehensive score is greater than the preset score threshold, then the average optical flow field value and the motion change amount are determined to meet the preset insertion conditions.

[0115] In this embodiment, the calculated comprehensive score is compared with a preset scoring threshold. If the comprehensive score is greater than the scoring threshold, the motion change between two adjacent target images is considered significant enough, and a keyframe needs to be inserted between these two images to capture this change. Conversely, if the comprehensive score is less than or equal to the scoring threshold, the motion change between these two images is considered insufficient, and no keyframe needs to be inserted.

[0116] By comprehensively considering and weighting the average optical flow field value, rate of change of position, and rotational angular velocity, the overall motion change between two adjacent target images can be more accurately assessed. Based on the comprehensive score and a preset scoring threshold, a decision is made on whether to insert a keyframe. This decision-making method considers both the dynamic changes in image content and the camera's motion state, thereby improving the accuracy and efficiency of video processing. Furthermore, by adjusting the weight values ​​and scoring thresholds, the head-mounted display device can adapt to different application scenarios and needs, providing greater flexibility and possibilities for video processing.

[0117] In another feasible implementation, the step of determining whether the average optical flow field value and the amount of motion change satisfy the preset insertion conditions in step S50 above may further include step C30:

[0118] Step C30: If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the rotational angular velocity is greater than a preset angular velocity threshold, and / or the position change rate is greater than a preset position threshold, then it is determined that the average optical flow field value and each of the detection confidence scores meet the preset insertion conditions.

[0119] In this embodiment, three thresholds are preset: a first optical flow field threshold, an angular velocity threshold, and a position threshold. If the average optical flow field value is greater than the first optical flow field threshold (i.e., there is a rapidly moving region in two adjacent frames, requiring keyframes to be added to ensure that important environmental information is not missed), or if the rotational angular velocity is greater than the preset angular velocity threshold, or if the rate of change of position is greater than the preset position threshold (i.e., the motion amplitude of the head-mounted display device is large in two adjacent frames, requiring keyframes to be added to improve the accuracy of pose estimation), then the preset insertion conditions are directly determined to be met, and keyframes need to be inserted between the current two frames of the target video.

[0120] For example, to help understand the implementation flow of the keyframe insertion method obtained by combining this embodiment with the first embodiment described above, please refer to... Figure 3 , Figure 3 A simplified flowchart of a keyframe insertion method is provided, specifically:

[0121] When inserting keyframes, the image at time T is first input, and the image at time T-1 is retrieved from the video frame queue. Then, in the optical flow calculation module, the magnitude of the optical flow field is first obtained, and then the average motion of the optical flow field on the image is calculated. Simultaneously, in the motion monitoring module, the rate of change of position and rotational angular velocity are calculated through the model. If the average optical flow field value... Greater than the preset first optical flow field threshold Then insert a keyframe, or, if the rate of change of position Δt is greater than the position threshold Δt thr Or the rotational angular velocity Δω is greater than the angular velocity threshold Δω thr If so, then insert a keyframe.

[0122] Then, based on the total score, a threshold is set. When the total score exceeds the threshold, a keyframe is inserted; otherwise, no keyframe is inserted, and the next frame is processed.

[0123] Through an independent threshold judgment mechanism, the system can accurately assess the degree of motion change between two adjacent target images and make a decision on whether to insert a keyframe based on these assessment results.

[0124] Based on the first and / or second embodiments of this application, in the third embodiment of this application, the content that is the same as or similar to the first and / or second embodiments described above can be referred to the above description and will not be repeated hereafter. Furthermore, after step A20 above, the method may further include steps D10 to D20:

[0125] Step D10: Select target pixels whose optical flow field value is greater than a preset second optical flow field threshold from among the pixels.

[0126] In this embodiment, a second optical flow threshold is set, and target pixels with optical flow values ​​greater than a certain threshold are selected from all pixels. These target pixels represent regions in the image where motion is significant.

[0127] Step D20: If the two target images contain a target region and the area of ​​the target region is greater than a preset area threshold, then insert a keyframe into the two target images of the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

[0128] In this embodiment, it is checked whether the selected target pixels form a connected region (i.e., a target region). Specifically, if the distance between target pixels is less than a preset pixel threshold, these pixels are considered to belong to the same target region. The area of ​​the target region, i.e., the number of target pixels contained in the target region, is calculated. If the area of ​​the target region is greater than a preset area threshold, it is determined that a keyframe needs to be inserted between the current two frames of the target video.

[0129] When determining the target region, the target pixels in the image domain can be traversed. It is determined whether there are other target pixels among the adjacent pixels of the target pixel. If so, the target pixel and its adjacent pixels are taken as an initial target region. Then, it is determined whether there are other target pixels among the pixels adjacent to the initial target region. If so, the adjacent pixels are added to the initial target region. This process continues until the initial target region has no adjacent pixels. Then, the initial target region is taken as the target region. At this point, other isolated target pixels are traversed, and the region aggregation operation is performed in the same way until there are only isolated target pixels and target regions in the image domain. This results in multiple target regions and multiple isolated target pixels.

[0130] By selecting target pixels with larger optical flow field values, the system can more accurately identify regions with significant motion in the image, thereby improving detection accuracy.

[0131] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the keyframe insertion method of this application. Any simple transformations based on this technical concept are all within the protection scope of this application.

[0132] This application also provides a keyframe insertion device, please refer to... Figure 4 The keyframe insertion device includes:

[0133] The image acquisition module 10 is used to acquire target video through a camera and traverse two adjacent target images in the target video;

[0134] Optical flow calculation module 20 is used to calculate the average optical flow field value between the two target images;

[0135] The motion monitoring module 30 is used to acquire the target pose of the camera when capturing the two target images respectively; and to calculate the motion change of the camera between the two target images based on each target pose.

[0136] The keyframe insertion module 40 is used to determine whether the average optical flow field value and the motion change amount meet the preset insertion conditions. If the insertion conditions are met, a keyframe is inserted into the two target images.

[0137] Optionally, the optical flow calculation module 20 is also used for:

[0138] Calculate the displacement vector of each pixel in the two target images between the two target images to obtain the optical flow field of each pixel;

[0139] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0140] The average optical flow field value of each pixel is calculated to obtain the average optical flow field value between the two target images.

[0141] Optionally, the optical flow calculation module 20 is also used for:

[0142] The two target images are scaled to construct an image pyramid;

[0143] At each scale of the image pyramid, the displacement vector of each pixel in the two target images between the two target images is calculated to obtain the optical flow field of each pixel.

[0144] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0145] The average optical flow field value between the two target images is obtained by weighting and averaging the optical flow field values ​​according to each scale.

[0146] Optionally, the target pose includes: a rotation matrix and a translation vector, and the motion changes include: the rate of change of position and the rotational angular velocity; the motion monitoring module 30 is further configured to:

[0147] Calculate the norm of the difference between each translation vector, and use the norm as the rate of change of position;

[0148] The rotational angular velocity is calculated based on each of the rotation matrices.

[0149] Optionally, the keyframe insertion module 40 is also used for:

[0150] The comprehensive score between the two target images is calculated based on the preset optical flow field weight, the average optical flow field value, the preset position weight, the position change rate, the preset angular velocity weight, and the rotational angular velocity.

[0151] If the overall score is greater than the preset score threshold, then the average optical flow field value and the motion change amount are determined to meet the preset insertion conditions.

[0152] Optionally, the keyframe insertion module 40 is also used for:

[0153] If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the rotational angular velocity is greater than a preset angular velocity threshold, and / or the position change rate is greater than a preset position threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet preset insertion conditions.

[0154] Optionally, the keyframe insertion device is also used for:

[0155] Target pixels with optical flow field values ​​greater than a preset second optical flow field threshold are selected from each of the pixels.

[0156] If the two target images contain a target region and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into the two target images of the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

[0157] The keyframe insertion device provided in this application, employing the keyframe insertion method described in the above embodiments, can solve the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the keyframe insertion device provided in this application are the same as those of the keyframe insertion method described in the above embodiments, and other technical features in the keyframe insertion device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0158] This application provides a head-mounted display device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the keyframe insertion method in Embodiment 1 above.

[0159] The head-mounted display device in this application embodiment may include, but is not limited to, head-mounted display devices such as Mixed Reality (MR) devices (e.g., MR glasses or MR helmets), Augmented Reality (AR) devices (e.g., AR glasses or AR helmets), Virtual Reality (VR) devices (e.g., VR glasses or VR helmets), Extended Reality (XR) devices, or some combination thereof.

[0160] like Figure 5 As shown, the head-mounted display device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the head-mounted display device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I / O interface 1006: input devices 1007 including, for example, a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. The communication device 1009 allows the head-mounted display device to communicate wirelessly or wiredly with other devices to exchange data. Although head-mounted display devices with various systems are shown in the figures, it should be understood that implementation or possession of all the systems shown is not required. More or fewer systems may be implemented alternatively.

[0161] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0162] The head-mounted display device provided in this application, employing the keyframe insertion method described in the above embodiments, solves the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the head-mounted display device provided in this application are the same as those of the keyframe insertion method described in the above embodiments, and other technical features of this head-mounted display device are the same as those disclosed in the previous embodiment method, and will not be repeated here.

[0163] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0164] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0165] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the keyframe insertion method in the above embodiments.

[0166] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems or devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0167] The aforementioned computer-readable storage medium may be included in the head-mounted display device; or it may exist independently and not assembled into the head-mounted display device.

[0168] The aforementioned computer-readable storage medium carries one or more programs that, when executed by a head-mounted display device, cause the head-mounted display device to: acquire a target video using a camera and traverse two adjacent target images in the target video; calculate the average optical flow field value between the two target images; acquire the target pose of the camera when capturing the two target images; calculate the motion change of the camera between the two target images based on each target pose; determine whether the average optical flow field value and the motion change value satisfy a preset insertion condition; and if the insertion condition is satisfied, insert a keyframe into the two target images.

[0169] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0170] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0171] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0172] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described keyframe insertion method, thereby solving the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the keyframe insertion method provided in the above embodiments, and will not be repeated here.

[0173] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the keyframe insertion method described above.

[0174] The computer program product provided in this application solves the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as those of the keyframe insertion method provided in the above embodiments, and will not be repeated here.

[0175] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A keyframe insertion method, characterized in that, The method includes: The target video is acquired by the camera, and the target images of two adjacent frames in the target video are traversed. Calculate the average optical flow field value between the two target images; The target pose of the camera when capturing the two target images is obtained respectively; Calculate the motion change of the camera between the two target images based on the respective target poses; Determine whether the average optical flow field value and the amount of motion change meet the preset insertion conditions. If the insertion conditions are met, insert a keyframe into the two target images.

2. The keyframe insertion method as described in claim 1, characterized in that, The step of calculating the average optical flow field value between the two target images includes: Calculate the displacement vector of each pixel in the two target images between the two target images to obtain the optical flow field of each pixel; The optical flow field values ​​of each optical flow field are calculated using Euclidean norm. The average optical flow field value of each pixel is calculated to obtain the average optical flow field value between the two target images.

3. The keyframe insertion method as described in claim 1, characterized in that, The step of calculating the average optical flow field value between the two target images includes: The two target images are scaled to construct an image pyramid; At each scale of the image pyramid, the displacement vector of each pixel in the two target images between the two target images is calculated to obtain the optical flow field of each pixel. The optical flow field values ​​of each optical flow field are calculated using Euclidean norm. The average optical flow field value between the two target images is obtained by weighting and averaging the optical flow field values ​​according to each scale.

4. The keyframe insertion method as described in claim 1, characterized in that, The target pose includes: a rotation matrix and a translation vector, and the motion changes include: the rate of change of position and the rotational angular velocity; The step of calculating the motion change of the camera between the two target images based on the respective target poses includes: Calculate the norm of the difference between each translation vector, and use the norm as the rate of change of position; The rotational angular velocity is calculated based on each of the rotation matrices.

5. The keyframe insertion method as described in claim 4, characterized in that, The step of determining whether the average optical flow field value and the change in motion meet the preset insertion conditions includes: The comprehensive score between the two target images is calculated based on the preset optical flow field weight, the average optical flow field value, the preset position weight, the position change rate, the preset angular velocity weight, and the rotational angular velocity. If the overall score is greater than the preset score threshold, then the average optical flow field value and the motion change amount are determined to meet the preset insertion conditions.

6. The keyframe insertion method as described in claim 5, characterized in that, The step of determining whether the average optical flow field value and the change in motion meet the preset insertion conditions further includes: If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the rotational angular velocity is greater than a preset angular velocity threshold, and / or the position change rate is greater than a preset position threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet preset insertion conditions.

7. The keyframe insertion method as described in claim 2, characterized in that, After the step of calculating the optical flow field values ​​of each optical flow field using the Euclidean norm, the method further includes: Target pixels with optical flow field values ​​greater than a preset second optical flow field threshold are selected from each of the pixels. If the two target images contain a target region and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into the two target images of the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

8. A keyframe insertion device, characterized in that, The keyframe insertion device includes: The image acquisition module is used to acquire target video through a camera and traverse two adjacent target images in the target video; The optical flow calculation module is used to calculate the average optical flow field value between the two target images; A motion monitoring module is used to acquire the target pose of the camera when capturing the two target images; and to calculate the motion change of the camera between the two target images based on each target pose. The keyframe insertion module is used to determine whether the average optical flow field value and the motion change amount meet the preset insertion conditions. If the insertion conditions are met, a keyframe is inserted into the two target images.

9. A head-mounted display device, characterized in that, The head-mounted display device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the keyframe insertion method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the keyframe insertion method as described in any one of claims 1 to 7.