Keyframe insertion method and device, head-mounted display device and storage medium

By calculating the average optical flow field value and detection confidence of the target image, it is determined whether to insert a keyframe, which solves the problem of balancing the number of keyframes inserted in SLAM technology and improves recognition accuracy and system performance.

CN122199655APending Publication Date: 2026-06-12GOERTEK INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GOERTEK INC
Filing Date
2024-12-04
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In SLAM technology, there is a balance to be struck between the number of keyframes inserted, which is to ensure recognition accuracy while avoiding computational burden and storage overhead, and at the same time, to prevent moving objects from affecting feature point tracking and matching in dynamic scenes.

Method used

By calculating the average optical flow field value between target images and the confidence level of the detected target object, it is determined whether the insertion conditions are met. If they are met, a key frame is inserted, taking into account the influence of rapidly changing and difficult-to-track regions in the image.

Benefits of technology

While ensuring recognition accuracy, the number of keyframes inserted was balanced, reducing the impact of fast-moving objects on map construction and avoiding redundant or sparse insertions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199655A_ABST
    Figure CN122199655A_ABST
Patent Text Reader

Abstract

The application discloses a key frame insertion method and device, a head-mounted display device and a storage medium, and relates to the technical field of computer vision. The method comprises the following steps: acquiring any two adjacent target images in a target video; calculating average optical flow field values between the target images; detecting target positions of target objects in the target images, and acquiring detection confidence degrees corresponding to the target positions; judging whether the average optical flow field values and the detection confidence degrees meet preset insertion conditions; and if the insertion conditions are met, inserting key frames into the target images of the target video. The application reduces the influence of fast-moving objects on map construction, avoids redundancy or sparseness of key frame insertion, and balances the number of key frame insertion while ensuring recognition accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision technology, and in particular to a keyframe insertion method, apparatus, head-mounted display device, and storage medium. Background Technology

[0002] In SLAM (Simultaneous Localization and Mapping) technology, keyframe insertion is a core issue. This mechanism accurately captures and saves the camera's pose at a specific moment and its associated important information (such as feature points and descriptors), laying a solid foundation for the entire system to maintain high-quality map construction. It also provides strong support for subsequent key steps such as map optimization and loop closure detection.

[0003] However, inserting too many keyframes will increase the computational burden and storage overhead of the system, while inserting too few keyframes may miss important environmental information. Furthermore, in dynamic scenes, moving objects can affect the tracking and matching of feature points, leading to incorrect pose estimation and thus affecting map construction.

[0004] Therefore, how to balance the number of keyframes inserted while ensuring recognition accuracy has become an urgent problem to be solved in this field.

[0005] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention

[0006] The main objective of this application is to provide a keyframe insertion method, apparatus, head-mounted display device, and storage medium, aiming to solve the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy.

[0007] To achieve the above objectives, this application proposes a keyframe insertion method, which includes:

[0008] Obtain any two adjacent target images from the target video;

[0009] Calculate the average optical flow field value among the target images;

[0010] Detect the target location of each target object in each target image, and obtain the detection confidence score corresponding to each target location;

[0011] Determine whether the average optical flow field value and each of the detection confidence scores meet the preset insertion conditions. If the insertion conditions are met, then insert keyframes into each of the target images in the target video.

[0012] In one embodiment, the step of calculating the average optical flow field value among the target images includes:

[0013] Calculate the displacement vector of each pixel in each target image between the target images to obtain the optical flow field of each pixel;

[0014] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0015] The average optical flow field value of each pixel is calculated to obtain the average optical flow field value among the target images.

[0016] In one embodiment, the step of calculating the average optical flow field value among the target images includes:

[0017] Scale transformation is performed on each of the target images to construct an image pyramid;

[0018] At each scale of the image pyramid, the displacement vector of each pixel in each target image is calculated between the target images to obtain the optical flow field of each pixel.

[0019] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0020] The average optical flow field value among the target images is obtained by weighting the optical flow field values ​​according to each scale.

[0021] In one embodiment, the step of detecting the target location of each target object in each target image and obtaining the detection confidence score corresponding to each target location includes:

[0022] Each of the target images is input into a preset target detection model to obtain a set of target objects output by the target detection model. The set of target objects includes the target location of each target object and the detection confidence level corresponding to each target location.

[0023] In one embodiment, the step of determining whether the average optical flow field value and each of the detection confidence levels meet the preset insertion conditions includes:

[0024] Calculate the confidence complement for each of the aforementioned detection confidence levels;

[0025] Calculate the mean of the confidence complements of each of the aforementioned confidence levels to obtain the average detection difficulty of each of the aforementioned target locations;

[0026] The comprehensive score between each target image is calculated based on the preset optical flow field weight, the average optical flow field value, the preset confidence weight, and the average detection difficulty.

[0027] If the overall score is greater than the preset score threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet the preset insertion conditions.

[0028] In one embodiment, after the step of calculating the mean of each confidence complement to obtain the average detection difficulty of each target location, the method further includes:

[0029] If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the average detection difficulty is greater than a preset confidence threshold, then the average optical flow field value and each of the detection confidence levels are determined to meet the preset insertion conditions.

[0030] In one embodiment, after the step of calculating the optical flow field values ​​of each of the optical flow fields using the Euclidean norm, the method further includes:

[0031] Target pixels with optical flow field values ​​greater than a preset second optical flow field threshold are selected from each of the pixels.

[0032] If each of the target images contains a target region, and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into each of the target images in the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

[0033] Furthermore, to achieve the above objectives, this application also proposes a keyframe insertion device, which includes:

[0034] The image acquisition module is used to acquire any two adjacent target images in the target video;

[0035] An optical flow calculation module is used to calculate the average optical flow field value between the target images;

[0036] The target detection module is used to detect the target position of each target object in each target image and obtain the detection confidence score corresponding to each target position.

[0037] A keyframe insertion module is used to determine whether the average optical flow field value and each of the detection confidence scores meet preset insertion conditions. If the insertion conditions are met, a keyframe is inserted into each of the target images in the target video.

[0038] In addition, to achieve the above objectives, this application also proposes a head-mounted display device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the keyframe insertion method as described above.

[0039] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and which, when executed by a processor, implements the steps of the keyframe insertion method described above.

[0040] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the keyframe insertion method described above.

[0041] This application provides a keyframe insertion method. First, it acquires any two adjacent target images from the target video into which the keyframe needs to be inserted. Then, it calculates the average optical flow field value between the two adjacent target images, detects whether the adjacent target images contain rapidly changing regions, detects the target positions of each target object in the adjacent target images, and obtains the detection confidence for the target positions. This allows for the detection of regions that are difficult to track in the adjacent target images. Finally, it determines whether a keyframe needs to be inserted between the adjacent target images based on the average optical flow field value and the detection confidence. If the insertion conditions are met, the keyframe is inserted.

[0042] In summary, this application detects whether adjacent target images contain rapidly changing regions by detecting the average optical flow field value, and detects difficult-to-track regions in adjacent target images by detecting target objects and confidence levels. Compared with traditional keyframe insertion methods, this application considers the impact of fast-moving objects in the image on pose estimation, uses the existence of difficult-to-track regions in the image as a reference indicator, reduces the impact of fast-moving objects on map construction, and uses whether the images contain rapidly changing regions as a reference indicator to avoid redundancy or sparsity in keyframe insertion, thereby balancing the number of keyframe insertions while ensuring recognition accuracy. Attached Figure Description

[0043] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0044] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0045] Figure 1 This is a flowchart illustrating an embodiment of the keyframe insertion method of this application.

[0046] Figure 2 This is a schematic diagram of a scenario involving target object detection in one embodiment of the keyframe insertion method of this application;

[0047] Figure 3 This is a flowchart illustrating Embodiment 2 of the keyframe insertion method of this application;

[0048] Figure 4 This is a schematic diagram of the module structure of the keyframe insertion device according to an embodiment of this application;

[0049] Figure 5 This is a schematic diagram of the device structure of the hardware operating environment involved in the keyframe insertion method in the embodiments of this application.

[0050] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0051] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0052] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0053] The main solution of this application embodiment is: to acquire any two adjacent target images in the target video; to calculate the average optical flow field value between each target image; to detect the target position of each target object in each target image and to acquire the detection confidence level corresponding to each target position; to determine whether the average optical flow field value and each detection confidence level meet the preset insertion conditions; if the insertion conditions are met, then keyframes are inserted into each target image of the target video.

[0054] In this embodiment, for ease of description, the following description uses a head-mounted display device as the execution subject.

[0055] In SLAM technology, keyframe insertion is a core issue. This mechanism accurately captures and preserves the camera's pose at a specific moment and its associated important information (such as feature points and descriptors), laying a solid foundation for the entire system to maintain high-quality map construction. It also provides strong support for subsequent key steps such as map optimization and loop closure detection.

[0056] However, inserting too many keyframes will increase the computational burden and storage overhead of the system, while inserting too few keyframes may miss important environmental information. Furthermore, in dynamic scenes, moving objects can affect the tracking and matching of feature points, leading to incorrect pose estimation and thus affecting map construction.

[0057] Therefore, how to balance the number of keyframes inserted while ensuring recognition accuracy has become an urgent problem to be solved in this field.

[0058] To address the aforementioned issues, this application provides a keyframe insertion method. First, it acquires any two adjacent target images from the target video into which the keyframe needs to be inserted. Then, it calculates the average optical flow field value between the two adjacent target images, detects whether the adjacent target images contain rapidly changing regions, detects the target positions of each target object in the adjacent target images, and simultaneously obtains the detection confidence score for the target position detection. This allows for the detection of difficult-to-track regions in the adjacent target images. Finally, based on the average optical flow field value and the detection confidence score, it determines whether a keyframe needs to be inserted between the adjacent target images. If the insertion conditions are met, the keyframe is inserted.

[0059] In summary, this application detects whether adjacent target images contain rapidly changing regions by detecting the average optical flow field value, and detects difficult-to-track regions in adjacent target images by detecting target objects and confidence levels. Compared with traditional keyframe insertion methods, this application considers the impact of fast-moving objects in the image on pose estimation, uses the existence of difficult-to-track regions in the image as a reference indicator, reduces the impact of fast-moving objects on map construction, and uses whether the images contain rapidly changing regions as a reference indicator to avoid redundancy or sparsity in keyframe insertion, thereby balancing the number of keyframe insertions while ensuring recognition accuracy.

[0060] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or an electronic device or head-mounted display device capable of performing the above functions. The following description uses a head-mounted display device as an example to illustrate this embodiment and the subsequent embodiments.

[0061] Based on this, embodiments of this application provide a keyframe insertion method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the keyframe insertion method of this application.

[0062] In this embodiment, the keyframe insertion method includes steps S10 to S40:

[0063] Step S10: Obtain any two adjacent target images from the target video;

[0064] In this embodiment, when performing SLAM technology, in order to build and update the environment map and locate the position of the head-mounted display device in real time, the system selects any two consecutive frames from the video stream as the basic unit for processing.

[0065] Step S20: Calculate the average optical flow field value between each of the target images;

[0066] It should be noted that, in this embodiment, optical flow field is a method to describe the motion of pixels in an image. The optical flow field can be obtained by calculating the displacement vectors of corresponding pixels in two adjacent frames.

[0067] In this embodiment, the average optical flow field value between each target image is calculated. This value reflects the average speed and direction of the overall pixel movement in the image. The calculation of the average optical flow field value helps to evaluate the speed and direction of movement of the head-mounted display device in the environment, as well as changes in the environment.

[0068] Furthermore, in one feasible implementation, step S20 above may include steps A10 to A30:

[0069] Step A10: Calculate the displacement vector of each pixel in each target image between the target images to obtain the optical flow field of each pixel.

[0070] In this embodiment, for each pixel in each frame, its displacement vector between the two frames is calculated to obtain the optical flow field of each pixel.

[0071] Step A20: Calculate the optical flow field value of each optical flow field using the Euclidean norm;

[0072] In this embodiment, the Euclidean norm is then used to quantify the optical flow field value of each pixel. The Euclidean norm is a method for measuring the length of a vector, and here it is used to calculate the magnitude of the displacement vector of each pixel, that is, the distance that the point moves between two frames.

[0073] Step A30: Calculate the average value of the optical flow field value of each pixel to obtain the average optical flow field value among the target images.

[0074] In this embodiment, the average optical flow field value between the two target images is obtained by averaging the optical flow field values ​​of all pixels. This average value can comprehensively reflect the overall motion of the two frames and is one of the important indicators for evaluating the dynamic changes of the video sequence.

[0075] For example, the optical flow field V(x,y)=(u(x,y), v(x,y)) of two adjacent frames is defined as the optical flow field of each pixel (x,y) in image I. t and I t+1 The displacement vectors between the two sides, where u(x, y) and v(x, y) represent the horizontal and vertical displacements, respectively.

[0076] (1) The magnitude of the optical flow field can be expressed by the Euclidean norm of the vector as:

[0077]

[0078] Where ||V(x,y)|| is the motion at the pixel.

[0079] (2) The amount of motion in rapidly changing regions can be represented as the average motion of the optical flow field across the entire image:

[0080]

[0081] Where N is the total number of pixels in the image, and Ω is the image domain. M avg This represents the average motion speed of the image.

[0082] By implementing the detailed steps described above, we can more accurately capture subtle motion changes of target objects in videos, which is of great significance for subsequent target location detection and intelligent keyframe insertion.

[0083] In another feasible implementation, step S20 may further include steps B10 to B40:

[0084] Step B10: Scale transformation is performed on each of the target images to construct an image pyramid;

[0085] It should be noted that, in this embodiment, the image pyramid is a type of multi-scale image representation, an effective yet conceptually simple structure for interpreting images at multiple resolutions. An image pyramid is a series of image resolutions that gradually decrease in a pyramid shape (from bottom to top) and originate from the same original image. The higher the level, the smaller the image and the lower the resolution. It is obtained through stepwise downsampling until a certain termination condition is met.

[0086] In this embodiment, when calculating the optical flow field value, the scale of two consecutive selected frames can be transformed to construct an image pyramid. An image pyramid is a multi-resolution representation method that can analyze images at different scales, thereby better capturing features and motion at different scales.

[0087] Step B20: Calculate the displacement vector of each pixel in each target image at each scale of the image pyramid to obtain the optical flow field of each pixel.

[0088] In this embodiment, the displacement vector of each pixel in each frame between two frames is calculated at each scale of the image pyramid to obtain the optical flow field of each pixel. By detecting the motion of pixels at multiple scales, the sensitivity to large-scale and smooth motion can be improved.

[0089] Step B30: Calculate the optical flow field value of each optical flow field using the Euclidean norm;

[0090] In this embodiment, the optical flow field value of each pixel at each scale is then calculated using the Euclidean norm.

[0091] Step B40: Perform a weighted average of the optical flow field values ​​according to each of the scales to obtain the average optical flow field value among the target images.

[0092] In this embodiment, the optical flow field values ​​at each scale are weighted and averaged according to their importance to obtain the average optical flow field value between the two target images. The weights of each optical flow field value can be set according to the actual situation. When it is necessary to focus on a large area, the weights corresponding to larger scale levels can be increased, and vice versa. The weighted averaging process takes into account the influence of different scales on the final result. Generally, smaller scale images can provide finer motion details, while larger scale images are helpful in capturing a wide range of motion.

[0093] Through the above steps, this method can more comprehensively analyze the motion of target objects in target videos, and performs particularly well when dealing with complex scenes containing motion at multiple scales.

[0094] The above are only two feasible implementation methods of step S20 provided in this embodiment. This embodiment does not specifically limit the specific implementation method of step S20.

[0095] Step S30: Detect the target position of each target object in each target image and obtain the detection confidence score corresponding to each target position;

[0096] In this embodiment, the device uses image processing algorithms (such as deep learning models) to detect target objects (such as furniture, walls, etc.) in a target image and determine their positions. Simultaneously, it needs to calculate the detection confidence score for each target position, i.e., an assessment of the accuracy of that target position. The confidence score reflects the difficulty the device faces in detecting the target object.

[0097] Furthermore, in one feasible implementation, step S30 above may include step S31:

[0098] Step S31: Input each of the target images into a preset target detection model to obtain a target object set output by the target detection model, wherein the target object set includes the target position of each target object and the detection confidence level corresponding to each target position.

[0099] In this embodiment, two target images are first input into a preset target detection model. The YOLO target detection network (a deep learning target detection algorithm) is used to identify and locate the target objects in the images. The model outputs a series of target object sets, including the bounding box of each target object. Please refer to [link / reference]. Figure 2 , Figure 2 This is a schematic diagram of a scene involving target object detection according to an embodiment of the keyframe insertion method of this application, as shown below. Figure 2 As shown, each bounding box corresponds to a detected target object and its position in the image. The detection confidence of each target position represents the reliability of the model's judgment on the target position, which is usually a value between 0 and 1.

[0100] For example, the device uses the YOLO object detection network to identify objects in an image and calculates whether these objects are easy to track. The output of the object detection network can be defined as a set:

[0101] O = {o1, o2, ..., o} k}

[0102] Each object o i Corresponding to a bounding box B i and confidence level c i ,Right now:

[0103] o i = (B i ,c i )

[0104] Bounding box B i It is a rectangular region representing the location of the detected object, with a confidence level of c. i This indicates the reliability of the detection.

[0105] By using a pre-trained object detection model, the system can accurately identify and locate various objects in the target image, providing reliable data support for subsequent processing.

[0106] Alternatively, feature extractors such as SIFT (Scale Invariant Feature Transform) or HOG (Histogram of Oriented Gradients) can be used in conjunction with machine learning classifiers such as SVM (Support Vector Machine) to detect and classify target objects, and the confidence level can be evaluated by the output of the classifier or the result of feature matching.

[0107] Although the method of identifying target objects through feature extractors may not perform well in the face of complex backgrounds or occlusions, it has low computational cost and can achieve relatively accurate identification if the background of the image is simple.

[0108] Step S40: Determine whether the average optical flow field value and each of the detection confidence scores meet the preset insertion conditions. If the insertion conditions are met, insert keyframes into each of the target images in the target video.

[0109] In this embodiment, based on the data obtained in the first two steps, the average optical flow field value and the detection confidence of each target location are further analyzed to determine whether keyframes need to be inserted into the video stream. Here, "insertion condition" refers to inserting one or more keyframes between the currently processed frames when the average optical flow field value shows significant motion changes or the detection confidence of the target location falls below a certain threshold. This is done to improve the accuracy and efficiency of subsequent processing. The purpose is to ensure more stable and accurate tracking results under rapidly changing or highly uncertain conditions.

[0110] This application detects whether adjacent target images contain rapidly changing regions by detecting the average optical flow field value, and detects difficult-to-track regions in adjacent target images by detecting target objects and confidence levels. Compared with traditional keyframe insertion methods, this application considers the impact of fast-moving objects in the image on pose estimation, and uses the existence of difficult-to-track regions in the image as a reference indicator, reducing the impact of fast-moving objects on map construction. At the same time, using whether the images contain rapidly changing regions as a reference indicator avoids redundancy or sparsity in keyframe insertion, thereby balancing the number of keyframe insertions while ensuring recognition accuracy.

[0111] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to that in the first embodiment described above can be referred to the above description and will not be repeated hereafter. On this basis, the step of determining whether the average optical flow field value and each of the detection confidence levels meet the preset insertion conditions in step S40 above may include steps C10 to C40:

[0112] Step C10: Calculate the confidence complement of each detection confidence level;

[0113] In this embodiment, the detection confidence C for each target location is... i Calculate its confidence complement 1-C i The confidence complement reflects the difficulty the model faces in detecting the target's location. If the detection confidence is close to 1, it means the model is very confident about the target's location; if the detection confidence is close to 0, it means the model is very uncertain about the target's location.

[0114] Step C20: Calculate the mean of each confidence complement to obtain the average detection difficulty of each target location;

[0115] In this embodiment, the confidence complements of all target locations are averaged to obtain the average detection difficulty. The higher the average detection difficulty, the higher the uncertainty of the overall detection, that is, the greater the detection difficulty.

[0116] Step C30: Calculate the comprehensive score between each target image based on the preset optical flow field weight, the average optical flow field value, the preset confidence weight, and the average detection difficulty.

[0117] In this embodiment, a comprehensive score is calculated based on the preset optical flow field weight and confidence weight, combined with the average optical flow field value and the average detection difficulty. The final comprehensive score takes into account whether there are rapidly changing regions in the image and whether there are rapidly moving objects in the image that are difficult to detect.

[0118] Step C40: If the overall score is greater than a preset score threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet the preset insertion conditions.

[0119] In this embodiment, a scoring threshold is set. If the overall score is greater than this threshold, it is determined that the current average optical flow field value and detection confidence meet the preset insertion conditions, and a key frame needs to be inserted between the current two frames of the target video.

[0120] For example, to help understand the implementation flow of the keyframe insertion method obtained by combining this embodiment with the first embodiment described above, please refer to... Figure 3 , Figure 3 A simplified flowchart of a keyframe insertion method is provided, specifically:

[0121] When inserting keyframes, the image at time T is first input, and the image at time T-1 is retrieved from the video frame queue. Then, in the optical flow calculation module, the magnitude of the optical flow field is first obtained, and then the average motion M of the optical flow field on the image is calculated. avg Meanwhile, in the object detection module, the bounding box B of the target object is detected by the model. i and confidence level c i Then, to comprehensively evaluate whether keyframes need to be inserted, a weighted overall score can be introduced, which sums the scores of rapidly changing areas and difficult-to-track areas. Definition:

[0122] W dy Weighting the importance of rapidly changing regions;

[0123] W trWeights are assigned to areas that are difficult to track.

[0124] Overall rating: S total It can be represented as:

[0125]

[0126] in:

[0127] M avg It is the average motion of the image;

[0128] It represents the average difficulty of tracking all difficult-to-track objects.

[0129] Then, based on the total score S t Set a threshold T kf A keyframe is inserted when the total score exceeds this threshold:

[0130] S t >T kf

[0131] Otherwise, do not insert a keyframe and continue processing the next frame.

[0132] By calculating the confidence complement and average detection difficulty through the above steps, the system can more accurately assess the uncertainty of target detection, thereby improving detection accuracy. Furthermore, the comprehensive scoring method considers both optical flow field value and detection confidence, making the system more robust when dealing with complex and variable scenes. In addition, by setting a reasonable scoring threshold, the system can intelligently decide when to insert keyframes, thereby reducing unnecessary computational burden and improving the real-time performance of the system while ensuring tracking accuracy.

[0133] In another feasible implementation, after step C20 described above, the method may further include step C50:

[0134] Step C50: If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the average detection difficulty is greater than a preset confidence threshold, then it is determined that the average optical flow field value and each of the detection confidence values ​​meet the preset insertion conditions.

[0135] In this embodiment, two thresholds are set, namely the first optical flow field threshold T. dy and confidence threshold T trIf the average optical flow field value is greater than the first optical flow field threshold, that is, there is a fast-moving region in two adjacent frames, key frames need to be added to ensure that important environmental information is not missed, or if the average detection difficulty is greater than the confidence threshold, that is, there is a fast-moving object in two adjacent frames, key frames need to be added to reduce the impact of fast-moving objects on feature point matching and tracking, then it is directly determined that the preset insertion conditions are met, and key frames need to be inserted between the current two frames of the target video.

[0136] The steps described above can quickly identify situations where keyframes need to be inserted at an early stage, thereby accelerating decision-making and improving the real-time performance of the system.

[0137] Based on the first and / or second embodiments of this application, in the third embodiment of this application, the content that is the same as or similar to the first and / or second embodiments described above can be referred to the above description and will not be repeated hereafter. Furthermore, after step A20 above, the method may further include steps D10 to D20:

[0138] Step D10: Select target pixels whose optical flow field value is greater than a preset second optical flow field threshold from among the pixels.

[0139] In this embodiment, a second optical flow threshold is set, and target pixels with optical flow values ​​greater than a certain threshold are selected from all pixels. These target pixels represent regions in the image where motion is significant.

[0140] Step D20: If each of the target images contains a target region and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into each of the target images in the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

[0141] In this embodiment, it is checked whether the selected target pixels form a connected region (i.e., a target region). Specifically, if the distance between target pixels is less than a preset pixel threshold, these pixels are considered to belong to the same target region. The area of ​​the target region, i.e., the number of target pixels contained in the target region, is calculated. If the area of ​​the target region is greater than a preset area threshold, it is determined that a keyframe needs to be inserted between the current two frames of the target video.

[0142] When determining the target region, the target pixels in the image domain can be traversed. It is determined whether there are other target pixels among the adjacent pixels of the target pixel. If so, the target pixel and its adjacent pixels are taken as an initial target region. Then, it is determined whether there are other target pixels among the pixels adjacent to the initial target region. If so, the adjacent pixels are added to the initial target region. This process continues until the initial target region has no adjacent pixels. Then, the initial target region is taken as the target region. At this point, other isolated target pixels are traversed, and the region aggregation operation is performed in the same way until there are only isolated target pixels and target regions in the image domain. This results in multiple target regions and multiple isolated target pixels.

[0143] By selecting target pixels with larger optical flow field values, the system can more accurately identify regions with significant motion in the image, thereby improving detection accuracy.

[0144] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the keyframe insertion method of this application. Any simple transformations based on this technical concept are all within the protection scope of this application.

[0145] This application also provides a keyframe insertion device, please refer to... Figure 4 The keyframe insertion device includes:

[0146] Image acquisition module 10 is used to acquire any two adjacent target images in the target video;

[0147] Optical flow calculation module 20 is used to calculate the average optical flow field value between each of the target images;

[0148] The target detection module 30 is used to detect the target position of each target object in each target image and obtain the detection confidence level corresponding to each target position.

[0149] The keyframe insertion module 40 is used to determine whether the average optical flow field value and each of the detection confidence scores meet the preset insertion conditions. If the insertion conditions are met, a keyframe is inserted into each of the target images in the target video.

[0150] Optionally, the optical flow calculation module 20 is also used for:

[0151] Calculate the displacement vector of each pixel in each target image between the target images to obtain the optical flow field of each pixel;

[0152] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0153] The average optical flow field value of each pixel is calculated to obtain the average optical flow field value among the target images.

[0154] Optionally, the optical flow calculation module 20 is also used for:

[0155] Scale transformation is performed on each of the target images to construct an image pyramid;

[0156] At each scale of the image pyramid, the displacement vector of each pixel in each target image is calculated between the target images to obtain the optical flow field of each pixel.

[0157] The optical flow field values ​​of each optical flow field are calculated using Euclidean norm.

[0158] The average optical flow field value among the target images is obtained by weighting the optical flow field values ​​according to each scale.

[0159] Optionally, the target detection module 30 is also used for:

[0160] Each of the target images is input into a preset target detection model to obtain a set of target objects output by the target detection model. The set of target objects includes the target location of each target object and the detection confidence level corresponding to each target location.

[0161] Optionally, the keyframe insertion module 40 is also used for:

[0162] Calculate the confidence complement for each of the aforementioned detection confidence levels;

[0163] Calculate the mean of the confidence complements of each of the aforementioned confidence levels to obtain the average detection difficulty of each of the aforementioned target locations;

[0164] The comprehensive score between each target image is calculated based on the preset optical flow field weight, the average optical flow field value, the preset confidence weight, and the average detection difficulty.

[0165] If the overall score is greater than the preset score threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet the preset insertion conditions.

[0166] Optionally, the keyframe insertion module 40 is also used for:

[0167] If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the average detection difficulty is greater than a preset confidence threshold, then the average optical flow field value and each of the detection confidence levels are determined to meet the preset insertion conditions.

[0168] Optionally, the keyframe insertion device is also used for:

[0169] Target pixels with optical flow field values ​​greater than a preset second optical flow field threshold are selected from each of the pixels.

[0170] If each of the target images contains a target region, and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into each of the target images in the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

[0171] The keyframe insertion device provided in this application, employing the keyframe insertion method described in the above embodiments, can solve the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the keyframe insertion device provided in this application are the same as those of the keyframe insertion method described in the above embodiments, and other technical features in the keyframe insertion device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0172] This application provides a head-mounted display device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the keyframe insertion method in Embodiment 1 above.

[0173] The head-mounted display device in this application embodiment may include, but is not limited to, head-mounted display devices such as Mixed Reality (MR) devices (e.g., MR glasses or MR helmets), Augmented Reality (AR) devices (e.g., AR glasses or AR helmets), Virtual Reality (VR) devices (e.g., VR glasses or VR helmets), Extended Reality (XR) devices, or some combination thereof.

[0174] like Figure 5As shown, the head-mounted display device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the head-mounted display device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I / O interface 1006: input devices 1007 including, for example, a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. The communication device 1009 allows the head-mounted display device to communicate wirelessly or wiredly with other devices to exchange data. Although head-mounted display devices with various systems are shown in the figures, it should be understood that implementation or possession of all the systems shown is not required. More or fewer systems may be implemented alternatively.

[0175] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0176] The head-mounted display device provided in this application, employing the keyframe insertion method described in the above embodiments, solves the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the head-mounted display device provided in this application are the same as those of the keyframe insertion method described in the above embodiments, and other technical features of this head-mounted display device are the same as those disclosed in the previous embodiment method, and will not be repeated here.

[0177] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0178] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0179] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the keyframe insertion method in the above embodiments.

[0180] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems or devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0181] The aforementioned computer-readable storage medium may be included in the head-mounted display device; or it may exist independently and not assembled into the head-mounted display device.

[0182] The aforementioned computer-readable storage medium carries one or more programs that, when executed by a head-mounted display device, cause the head-mounted display device to: acquire any two adjacent target images in a target video; calculate the average optical flow field value between each target image; detect the target position of each target object in each target image and acquire the detection confidence level corresponding to each target position; determine whether the average optical flow field value and each detection confidence level meet a preset insertion condition; if the insertion condition is met, insert a keyframe into each target image of the target video.

[0183] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0184] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0185] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0186] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described keyframe insertion method, thereby solving the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the keyframe insertion method provided in the above embodiments, and will not be repeated here.

[0187] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the keyframe insertion method described above.

[0188] The computer program product provided in this application solves the technical problem of balancing the number of keyframes inserted while ensuring recognition accuracy. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as those of the keyframe insertion method provided in the above embodiments, and will not be repeated here.

[0189] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A keyframe insertion method, characterized in that, The method includes: Obtain any two adjacent target images from the target video; Calculate the average optical flow field value among the target images; Detect the target location of each target object in each target image, and obtain the detection confidence score corresponding to each target location; Determine whether the average optical flow field value and each of the detection confidence scores meet the preset insertion conditions. If the insertion conditions are met, then insert keyframes into each of the target images in the target video.

2. The keyframe insertion method as described in claim 1, characterized in that, The step of calculating the average optical flow field value among the target images includes: Calculate the displacement vector of each pixel in each target image between the target images to obtain the optical flow field of each pixel; The optical flow field values ​​of each optical flow field are calculated using Euclidean norm. The average optical flow field value of each pixel is calculated to obtain the average optical flow field value among the target images.

3. The keyframe insertion method as described in claim 1, characterized in that, The step of calculating the average optical flow field value among the target images includes: Scale transformation is performed on each of the target images to construct an image pyramid; At each scale of the image pyramid, the displacement vector of each pixel in each target image is calculated between the target images to obtain the optical flow field of each pixel. The optical flow field values ​​of each optical flow field are calculated using Euclidean norm. The average optical flow field value among the target images is obtained by weighting the optical flow field values ​​according to each scale.

4. The keyframe insertion method as described in claim 1, characterized in that, The step of detecting the target location of each target object in each target image and obtaining the detection confidence score corresponding to each target location includes: Each of the target images is input into a preset target detection model to obtain a set of target objects output by the target detection model. The set of target objects includes the target location of each target object and the detection confidence level corresponding to each target location.

5. The keyframe insertion method as described in claim 1, characterized in that, The step of determining whether the average optical flow field value and each of the detection confidence levels meet the preset insertion conditions includes: Calculate the confidence complement for each of the aforementioned detection confidence levels; Calculate the mean of the confidence complements of each of the aforementioned confidence levels to obtain the average detection difficulty of each of the aforementioned target locations; The comprehensive score between each target image is calculated based on the preset optical flow field weight, the average optical flow field value, the preset confidence weight, and the average detection difficulty. If the overall score is greater than the preset score threshold, then the average optical flow field value and each of the detection confidence scores are determined to meet the preset insertion conditions.

6. The keyframe insertion method as described in claim 5, characterized in that, After the step of calculating the mean of each confidence complement to obtain the average detection difficulty of each target location, the method further includes: If the average optical flow field value is greater than a preset first optical flow field threshold, and / or the average detection difficulty is greater than a preset confidence threshold, then the average optical flow field value and each of the detection confidence levels are determined to meet the preset insertion conditions.

7. The keyframe insertion method as described in claim 2, characterized in that, After the step of calculating the optical flow field values ​​of each optical flow field using the Euclidean norm, the method further includes: Target pixels with optical flow field values ​​greater than a preset second optical flow field threshold are selected from each of the pixels. If each of the target images contains a target region, and the area of ​​the target region is greater than a preset area threshold, then a keyframe is inserted into each of the target images in the target video, wherein the distance between each target pixel in the target region is less than a preset pixel threshold.

8. A keyframe insertion device, characterized in that, The keyframe insertion device includes: The image acquisition module is used to acquire any two adjacent target images in the target video; An optical flow calculation module is used to calculate the average optical flow field value between the target images; The target detection module is used to detect the target position of each target object in each target image and obtain the detection confidence score corresponding to each target position. A keyframe insertion module is used to determine whether the average optical flow field value and each of the detection confidence scores meet preset insertion conditions. If the insertion conditions are met, a keyframe is inserted into each of the target images in the target video.

9. A head-mounted display device, characterized in that, The head-mounted display device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the keyframe insertion method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the keyframe insertion method as described in any one of claims 1 to 7.