Video key frame extraction method, device and equipment and storage medium

By calculating the model fitness score based on the uniformity and distribution characteristics of feature point matching in a video frame sequence, the selection of key frames is optimized, which solves the problem of redundant or missing 3D geometric information in existing technologies and improves the accuracy and efficiency of 3D reconstruction.

CN115393761BActive Publication Date: 2026-06-26MIGU COMIC CO LTD +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MIGU COMIC CO LTD
Filing Date
2022-08-18
Publication Date
2026-06-26

Smart Images

  • Figure CN115393761B_ABST
    Figure CN115393761B_ABST
Patent Text Reader

Abstract

The application discloses a video key frame extraction method, device and equipment and a storage medium. The method comprises the following steps: in a video frame sequence after a current key frame, a candidate video frame matched with the current key frame is determined; according to the feature point matching uniformity between the current key frame and each candidate video frame and the feature point distribution characteristics of the candidate video frame, a model fitness score between the current key frame and each candidate video frame is determined; and according to the model fitness score, a next key frame corresponding to the current key frame is determined from the candidate video frame. The application effectively solves the problems of high redundancy of key frame extraction and high re-projection error of subsequent three-dimensional reconstruction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a method, apparatus, device, and storage medium for extracting keyframes from video. Background Technology

[0002] Cameras, camcorders, and mobile phones, among other electronic products, are used to acquire image data and have gradually become necessities in people's lives. Image data is an effective carrier for expressing the real world, and it can realistically reproduce historical features, environmental changes, and scene descriptions. To reproduce the real world using image data, it is necessary to fully extract the three-dimensional geometric information contained within the image data. The selection of keyframes in the image data is particularly crucial for extracting this information; improper keyframe selection can lead to redundancy or loss of three-dimensional geometric information.

[0003] In current keyframe extraction schemes, prior judgments are made based on the camera pose of each video frame in the image data. If the translation or rotation angle of the camera pose in the current frame exceeds a threshold compared to the previous keyframe, the current frame is determined as a keyframe. Due to the over-reliance on the prior camera pose, some areas may have too many keyframes while others may have too few keyframes, resulting in redundancy or loss of 3D geometric information. Summary of the Invention

[0004] The main objective of this invention is to provide a method, apparatus, device, and storage medium for extracting keyframes from videos, aiming to solve the problem of how to filter keyframes in videos.

[0005] To achieve the above objectives, the present invention provides a video keyframe extraction method, which includes the following steps:

[0006] In the video frame sequence following the current keyframe, identify candidate video frames that match the current keyframe;

[0007] Based on the feature point matching uniformity between the current keyframe and each of the candidate video frames, and the feature point distribution characteristics of the candidate video frames, the model fitness score between the current keyframe and each of the candidate video frames is determined.

[0008] Based on the model fitness score, the next keyframe corresponding to the current keyframe is determined from the candidate video frames.

[0009] Optionally, the step of determining the model fitness score between the current video frame and each of the candidate video frames based on the feature point matching uniformity between the current keyframe and each of the candidate video frames, and the feature point distribution characteristics of the candidate video frames, includes:

[0010] Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the first motion model, a first fitness score for the preset first motion model is determined.

[0011] Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the second motion model, a second fitness score for the preset second motion model is determined.

[0012] The model fitness score is determined based on the first fitness score and the second fitness score.

[0013] Optionally, the step of determining the model fitness score based on the first fitness score and the second fitness score includes:

[0014] Determine the difference between the first fitness score and the second fitness score;

[0015] The ratio of the difference to the first fitness score is determined as the model fitness score.

[0016] Optionally, after the step of determining the next keyframe corresponding to the current keyframe among the candidate video frames based on the model fitness score, the method further includes:

[0017] Obtain the keyframe sequence and extract the image features of the keyframe sequence;

[0018] The matching degree between the keyframes is determined based on the image features;

[0019] The keyframe with the highest matching degree is identified as the two target keyframes.

[0020] The three-dimensional model corresponding to the video to be processed is determined based on the two target keyframes.

[0021] Optionally, the step of determining the 3D model corresponding to the video to be processed based on the two target keyframes includes:

[0022] Determine the camera's position and attitude information based on the two target keyframes;

[0023] The three-dimensional coordinate points corresponding to the image features are determined based on the pixel position information of the image features of the two target key frames, and the position and pose information of the camera.

[0024] The three-dimensional model of the video to be processed is determined based on the three-dimensional coordinate points corresponding to the image features.

[0025] Optionally, the step of determining candidate video frames that match the current keyframe in the video frame sequence following the current keyframe includes:

[0026] Obtain feature point information of each video frame in the video frame sequence;

[0027] Determine the feature point matching rate between the current keyframe and the video frame;

[0028] Video frames whose feature point matching rate is greater than a preset threshold are identified as candidate video frames.

[0029] Optionally, the step of determining the feature point matching rate between the current keyframe and the video frame includes:

[0030] Determine the total number of feature points in the current keyframe and each video frame, and the number of matching feature points;

[0031] The feature point matching rate is determined based on the total number of feature points and the number of matching feature points.

[0032] To achieve the above objectives, the present invention also provides a video keyframe extraction device, the device comprising:

[0033] The acquisition module is used to determine candidate video frames that match the current keyframe in the video frame sequence following the current keyframe.

[0034] The calculation module is used to determine the model fitness score between the current keyframe and each candidate video frame based on the feature point matching uniformity between the current keyframe and each candidate video frame, and the feature point distribution characteristics of the candidate video frames.

[0035] The determination module is used to determine the next keyframe corresponding to the current keyframe among the candidate video frames based on the model fitness score.

[0036] To achieve the above objectives, the present invention also provides a video keyframe extraction device, which includes a memory, a processor, and a video keyframe extraction program stored in the memory and executable on the processor. When the video keyframe extraction program is executed by the processor, it implements the various steps of the video keyframe extraction method described above.

[0037] To achieve the above objectives, the present invention also provides a computer-readable storage medium storing a video keyframe extraction program, which, when executed by a processor, implements the various steps of the video keyframe extraction method described above.

[0038] This invention provides a method, apparatus, device, and storage medium for extracting keyframes from a video. The method involves identifying candidate video frames that match the current keyframe in a sequence of video frames following the current keyframe; determining a model fitness score between the current keyframe and each candidate video frame based on the uniformity of feature point matching and the feature point distribution characteristics of the candidate video frames; and determining the next keyframe corresponding to the current keyframe based on the model fitness score. By incorporating a scoring mechanism into the keyframe extraction process of the video to be processed, determining the model fitness score between the current keyframe and candidate video frames, and selecting keyframes from the candidate video frames based on the model fitness score, this invention effectively solves the problems of high redundancy in keyframe extraction and high reprojection errors in subsequent 3D reconstruction, thereby improving the accuracy and efficiency of video 3D reconstruction. Attached Figure Description

[0039] Figure 1 This is a schematic diagram of the hardware structure of the video keyframe extraction device according to an embodiment of the present invention;

[0040] Figure 2 This is a flowchart illustrating the first embodiment of the video keyframe extraction method of the present invention;

[0041] Figure 3 This is a schematic diagram of the keyframe selection process in the first embodiment of the video keyframe extraction method of the present invention;

[0042] Figure 4 This is a detailed flowchart of step S20 of the second embodiment of the video keyframe extraction method of the present invention;

[0043] Figure 5 This is a flowchart illustrating the third embodiment of the video keyframe extraction method of the present invention;

[0044] Figure 6 This is a detailed flowchart of step S10 of the fourth embodiment of the video keyframe extraction method of the present invention.

[0045] Figure 7 This is a schematic diagram of feature matching in the fourth embodiment of the video keyframe extraction method of the present invention;

[0046] Figure 8 This is a schematic diagram of the logical structure of the video keyframe extraction device according to an embodiment of the present invention.

[0047] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0048] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0049] The main solution of this invention is as follows: in the video frame sequence following the current keyframe, determine the candidate video frames that match the current keyframe; determine the model fitness score between the current video frame and each candidate video frame based on the feature point matching uniformity between the current keyframe and each candidate video frame, and the feature point distribution characteristics of the candidate video frames; and determine the next keyframe corresponding to the current keyframe based on the model fitness score.

[0050] By incorporating a scoring mechanism during keyframe extraction of the video to be processed, a model fitness score is determined between the current keyframe and candidate video frames. Keyframes are then selected from the candidate video frames based on the model fitness score. This effectively solves the problems of high redundancy in keyframe extraction and high reprojection errors in subsequent 3D reconstruction, thereby improving the accuracy and efficiency of video 3D reconstruction.

[0051] As one implementation method, video keyframe extraction devices can, for example... Figure 1 As shown.

[0052] The embodiments of the present invention relate to a video keyframe extraction device, which includes: a processor 101, such as a CPU, a memory 102, and a communication bus 103. The communication bus 103 is used to enable communication between these components.

[0053] Memory 102 can be high-speed RAM or stable memory (non-volatile memory), such as disk storage. Figure 1 As shown, the memory 102, which is a computer-readable storage medium, may include a video keyframe extraction program; and the processor 101 can be used to call the video keyframe extraction program stored in the memory 102 and perform the following operations:

[0054] In the video frame sequence following the current keyframe, identify candidate video frames that match the current keyframe;

[0055] Based on the feature point matching uniformity between the current keyframe and each of the candidate video frames, and the feature point distribution characteristics of the candidate video frames, the model fitness score between the current keyframe and each of the candidate video frames is determined.

[0056] Based on the model fitness score, the next keyframe corresponding to the current keyframe is determined from the candidate video frames.

[0057] Optionally, the processor 101 can be used to invoke the digital asset management program stored in the memory 102 and perform the following operations:

[0058] Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the first motion model, a first fitness score for the preset first motion model is determined.

[0059] Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the second motion model, a second fitness score for the preset second motion model is determined.

[0060] The model fitness score is determined based on the first fitness score and the second fitness score.

[0061] Optionally, the processor 101 can be used to invoke the digital asset management program stored in the memory 102 and perform the following operations:

[0062] Determine the difference between the first fitness score and the second fitness score;

[0063] The ratio of the difference to the first fitness score is determined as the model fitness score.

[0064] Optionally, the processor 101 can be used to invoke the digital asset management program stored in the memory 102 and perform the following operations:

[0065] Obtain the keyframe sequence and extract the image features of the keyframe sequence;

[0066] The matching degree between the keyframes is determined based on the image features;

[0067] The keyframe with the highest matching degree is identified as the two target keyframes.

[0068] The three-dimensional model corresponding to the video to be processed is determined based on the two target keyframes.

[0069] Optionally, the processor 101 can be used to invoke the digital asset management program stored in the memory 102 and perform the following operations:

[0070] Determine the camera's position and attitude information based on the two target keyframes;

[0071] The three-dimensional coordinate points corresponding to the image features are determined based on the pixel position information of the image features of the two target key frames, and the position and pose information of the camera.

[0072] The three-dimensional model of the video to be processed is determined based on the three-dimensional coordinate points corresponding to the image features.

[0073] Optionally, the processor 101 can be used to invoke the digital asset management program stored in the memory 102 and perform the following operations:

[0074] Obtain feature point information of each video frame in the video frame sequence;

[0075] Determine the feature point matching rate between the current keyframe and the video frame;

[0076] Video frames whose feature point matching rate is greater than a preset threshold are identified as candidate video frames.

[0077] Optionally, the processor 101 can be used to invoke the digital asset management program stored in the memory 102 and perform the following operations:

[0078] Determine the total number of feature points in the current keyframe and each video frame, and the number of matching feature points;

[0079] The feature point matching rate is determined based on the total number of feature points and the number of matching feature points.

[0080] Based on the hardware architecture of the video keyframe extraction device described above, an embodiment of the video keyframe extraction method of the present invention is proposed.

[0081] Reference Figure 2 , Figure 2 This is a first embodiment of the video keyframe extraction method of the present invention, which includes the following steps:

[0082] Step S10: In the video frame sequence following the current keyframe, determine the candidate video frame that matches the current keyframe.

[0083] Optionally, a video to be processed is obtained. The video to be processed is a video stream that needs to be reconstructed in three dimensions. For example, the scene or people in the video to be processed can be reconstructed in three dimensions.

[0084] Optionally, the video to be processed is divided into multiple video frames to obtain a video frame sequence. Optionally, the video to be processed is sampled according to a preset sampling frequency to obtain a video frame sequence.

[0085] If we were to perform 3D reconstruction on every video frame in the video to be processed, the amount of data required would be enormous, and the 3D reconstruction would be inefficient. Therefore, it is necessary to select keyframes from the video frame sequence, and these keyframes are used for the 3D reconstruction of the video to be processed.

[0086] Optionally, the first frame of the video to be processed is the first keyframe, and the keyframes after the first keyframe are determined sequentially.

[0087] Optionally, the candidate video frame matching the current keyframe can be obtained by sampling from the video frame sequence after the current keyframe at a preset sampling frequency. Alternatively, the candidate video frame matching the current keyframe can be a video frame in the video frame sequence after the current keyframe whose feature matching rate with the current video frame is greater than a preset threshold. The feature matching rate is determined by the number of feature points matching the current keyframe and the video frame, and the total number of feature points of the current keyframe and the video frame.

[0088] Step S20: Determine the model fitness score between the current keyframe and each candidate video frame based on the feature point matching uniformity between the current keyframe and each candidate video frame, and the feature point distribution characteristics of the candidate video frames.

[0089] When calculating the model fitness score, considering the issue of feature point matching uniformity, a feature point matching uniformity is introduced, calculated based on the feature point matching information between each candidate video frame and the current keyframe. Optionally, the feature point matching uniformity between the current keyframe and each candidate video frame is determined based on the proportion of matched feature points in the grid of the meshed image. By meshing the video frames, the proportion of the matched feature points in all grids is used. For example, the video frame is divided into an 8*8 grid, and the feature point matching uniformity S is the number of grids filled by the matched feature points. The feature point matching uniformity evaluates the feature point matching uniformity of the video frames, enabling a simple and efficient introduction of image feature point matching uniformity information into the model's adaptability detection of image information, further improving the image information quality of the next selected keyframe.

[0090] Optionally, the feature point distribution characteristic ei of the candidate video frame represents the Euclidean distance between the feature point and its matching feature point on the homography plane H.

[0091] The model fitness score between the current keyframe and each candidate video frame is determined based on the feature point matching uniformity and feature point distribution characteristics. Optionally, the model fitness score is used to select candidate video frames and is determined by the first fitness score of the first motion model at the two-dimensional level and the second fitness score of the second motion model at the three-dimensional level. Optionally, the model for the model fitness score includes the first motion model at the two-dimensional level, i.e., the H motion model, and the second motion model at the three-dimensional level, i.e., the F motion model. Optionally, the first motion model and the second motion model are used for the three-dimensional reconstruction of the video to be processed. Both the fundamental matrix F and the homography matrix H can be used to describe the relationship between two images. The homography matrix H is more suitable for describing short baseline cases, while the fundamental matrix F is more suitable for describing the relationship between two images when the baseline distance is long.

[0092] Step S30: Based on the model fitness score, determine the next keyframe corresponding to the current keyframe among the candidate video frames.

[0093] Optionally, this application emphasizes the adaptability of the F-motion model to image information and pays attention to the differences in the adaptability of different motion models to image information, in order to select video frames with as much 3D information as possible and as little 2D information as possible, thereby improving the efficiency of 3D reconstruction. Optionally, the candidate video frame with the highest model fitness score is determined as the next keyframe.

[0094] Optionally, such as Figure 3 As shown, the first frame of the video to be processed is determined as the first keyframe, and then the keyframes following the first keyframe are determined sequentially. In the sequence of video frames following the first keyframe, the second keyframe is determined. After determining the second keyframe, the third keyframe is determined in the sequence of video frames following the second keyframe, and so on, until the last keyframe of the video to be processed is determined.

[0095] In this embodiment, candidate video frames matching the current keyframe are determined in the video frame sequence following the current keyframe. Based on the feature point matching uniformity between the current keyframe and each candidate video frame, and the feature point distribution characteristics of the candidate video frames, a model fitness score is determined between the current video frame and each candidate video frame. The next keyframe corresponding to the current keyframe is then determined based on the model fitness score. By incorporating a scoring mechanism during keyframe extraction of the video to be processed, determining the model fitness score between the current keyframe and candidate video frames, and selecting keyframes from the candidate video frames based on the model fitness score, the problem of high redundancy in keyframe extraction and high reprojection errors in subsequent 3D reconstruction is effectively solved, improving the accuracy and efficiency of video 3D reconstruction.

[0096] Reference Figure 4 , Figure 4 This is a second embodiment of the video keyframe extraction method of the present invention. Based on the first embodiment, step S20 includes:

[0097] Step S21: Determine the first fitness score of the preset first motion model based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the first motion model.

[0098] Step S22: Determine the second fitness score of the preset second motion model based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the second motion model.

[0099] Step S23: Determine the model fitness score based on the first fitness score and the second fitness score.

[0100] Optionally, based on the feature point matching uniformity, feature point distribution characteristics, and the degrees of freedom of the first motion model, a first fitness score for the preset first motion model is determined, as shown in the following formula:

[0101]

[0102]

[0103] Where d represents the dimension of the model, optionally d = 2 for the homography matrix of the H-motion model; n represents the total number of matched features; k represents the degrees of freedom in the model, optionally k = 7 for the homography matrix of the H-motion model; r represents the dimension of the feature point data, for example, r = 4 for two-dimensional points between two frames; σ 2 λ1 represents the variance of the error, λ2 = log(r), λ3 represents the residual. S represents the uniformity of the matching points. i This represents the Euclidean distance between a feature point and its matching feature point on the homography plane H.

[0104] Based on the feature point matching uniformity, feature point distribution characteristics, and degrees of freedom of the second motion model, the second fitness score of the preset second motion model is determined, as shown in the following formula:

[0105]

[0106]

[0107] Where d represents the dimension of the model, optionally d = 3 for the fundamental matrix of the F-motion model; n represents the total number of matched features; k represents the degrees of freedom in the model, optionally k = 8 for the fundamental matrix of the F-motion model; r represents the dimension of the feature point data, for example, r = 4 for two-dimensional points between two frames; σ 2 λ1 represents the variance of the error, λ2 = log(r), λ3 represents the residual. S represents the uniformity of the matching points. i This represents the Euclidean distance between a feature point and its matching feature point on the fundamental plane.

[0108] Optionally, by using GRIC H Values ​​and GRIC F The difference is calculated to obtain the difference in the ability of different motion models to adapt image information of the same video frame; the difference is then compared with GRIC. HThe ratio is calculated to quantify the differences in image information adaptation capabilities. Optionally, candidate video frames are sequentially compared with the current keyframe to obtain the model fitness score fg for each candidate video frame. fg(i,j) represents the objective function evaluating the quality of the current keyframe i and candidate keyframe j. The difference between the first fitness score and the second fitness score is determined; the ratio of this difference to the first fitness score is then determined as the model fitness score. For example, the formula for calculating the model fitness score fg is as follows:

[0109]

[0110] Among them, GRIC H (i,j) is the GRIC (Geometric Robust Information Criterion) value of the homography matrix, characterizing the fitness of the H motion model (two-dimensional level) to image information in video frames; GRIC F (i,j) are the GRIC values ​​of the basic matrix, representing the fitness of the F motion model (three-dimensional level) to image information in video frames.

[0111] Optionally, the candidate video frame with the highest fg value is determined as the next keyframe after the current keyframe.

[0112] Optionally, when the first frame of the video is used as the first keyframe, the second keyframe is obtained by calculating the fg value of each frame after the first keyframe in turn with the first frame, and selecting the candidate keyframe with the maximum fg value as the second keyframe, and so on, to extract keyframes. An example is shown in the following formula:

[0113] k i+1 =argmax(fg(k i ,j);

[0114] Where, k i Let k represent the i-th keyframe, j represent the j-th candidate keyframe after keyframe i, and k represent the i-th keyframe. i+1 This indicates the next keyframe after the i-th keyframe.

[0115] Optionally, when calculating the candidate keyframe score, a certain number of candidate video frames can be selected for calculation, or all subsequent candidate video frames can be calculated.

[0116] In this embodiment, a first fitness score for a preset first motion model is determined based on the feature point matching uniformity, feature point distribution characteristics, and the degrees of freedom of the first motion model. A second fitness score for a preset second motion model is determined based on the feature point matching uniformity, feature point distribution characteristics, and the degrees of freedom of the second motion model. Finally, a model fitness score is determined based on the first and second fitness scores. The model fitness score quantifies the differences in image information adaptation capabilities. Keyframes are then selected from candidate video frames based on the model fitness score, effectively solving the problems of high redundancy in keyframe extraction and high reprojection errors in subsequent 3D reconstruction, thus improving the accuracy and efficiency of video 3D reconstruction.

[0117] Reference Figure 5 , Figure 5 This is a third embodiment of the video keyframe extraction method of the present invention. Based on the first or second embodiment, after step S30, it further includes:

[0118] Step S40: Obtain the keyframe sequence and extract the image features of the keyframe sequence;

[0119] Step S50: Determine the matching degree between the keyframes based on the image features;

[0120] Step S60: The keyframe with the highest matching degree is determined as the two target keyframes;

[0121] Step S70: Determine the 3D model corresponding to the video to be processed based on the two target keyframes.

[0122] Optionally, after filtering out the keyframe sequence of all keyframes in the video to be processed, the keyframe sequence is obtained, and image features of the keyframe sequence are extracted. Optionally, the SIFT (Scale-invariant feature transform) algorithm is used to extract image features. The matching degree between the keyframes is determined based on the image features. Optionally, the FLANN (Fast Library for Approximate Nearest Neighbors) matching algorithm is used to quickly match the image features.

[0123] A database is constructed based on the feature matching results, storing the matching relationships between each keyframe. The image pair with the best matching relationship is selected, and the keyframe with the highest matching degree is determined as the two target keyframes. The bundle adjustment method is then used to reduce reprojection errors.

[0124] Optionally, the camera's position and pose information are determined based on the two target keyframes; the three-dimensional coordinate points corresponding to the image features are determined based on the pixel position information of the image features in the two target keyframes and the camera's position and pose information; and the three-dimensional model of the video to be processed is determined based on the three-dimensional coordinate points corresponding to the image features.

[0125] Optionally, after obtaining the three-dimensional coordinate points, the reprojection error of the three-dimensional coordinate points needs to be optimized by the bundle adjustment method. The process of processing all keyframes is completed in sequence to complete the three-dimensional reconstruction process and output the three-dimensional model.

[0126] In the technical solution of this embodiment, the 3D model is reconstructed based on key frames, which effectively solves the problems of high redundancy in key frame extraction and high reprojection error in subsequent 3D reconstruction, thereby improving the accuracy and efficiency of video 3D reconstruction.

[0127] Reference Figure 6 , Figure 6 This is a fourth embodiment of the video keyframe extraction method of the present invention. Based on any one of the first to third embodiments, step S10 includes:

[0128] Step S11: Obtain feature point information of each video frame in the video frame sequence;

[0129] Step S12: Determine the feature point matching rate between the current keyframe and the video frame;

[0130] Step S13: Determine the video frames whose feature point matching rate is greater than a preset threshold as the candidate video frames.

[0131] Optionally, the total number of feature points in the current keyframe and each of the video frames, and the number of matching feature points are determined; the feature point matching rate is determined based on the total number of feature points and the number of matching feature points. The Rc value between the latest keyframe and subsequent frames is calculated, and the formula for calculating the Rc value is as follows:

[0132] Rc = Tc / Tall;

[0133] Where Tc represents the number of matching feature points between two frames, and Tall represents the total number of feature points between the two frames, such as... Figure 7 As shown, the points in the two frames represent the calculated feature points, and the lines represent the feature point matching relationship between the two frames.

[0134] The feature point matching rate is inversely proportional to the camera's motion. A higher feature point matching rate indicates a higher degree of overlap between the two images, meaning the camera's motion distance is smaller and the baseline distance between the two images is shorter. To prevent a decrease in camera pose estimation accuracy due to a low number of corresponding feature points between two frames, a threshold is applied to the Rc value for keyframe filtering. The threshold range for Rc filtering is set to T1 to T2, and the threshold value can be selected differently depending on the scene. If the Rc value falls within the T1 to T2 range, the current video frame is determined as a candidate video frame; if it is not within the threshold range, the next frame is filtered again.

[0135] In this embodiment, feature point information of each video frame in the video frame sequence is obtained; the feature point matching rate between the current keyframe and the video frame is determined; and video frames with a feature point matching rate greater than a preset threshold are identified as candidate video frames. By filtering candidate video frames, the efficiency and accuracy of keyframe calculation are improved. Reconstructing the 3D model based on the keyframes effectively solves the problems of high redundancy in keyframe extraction and high reprojection errors in subsequent 3D reconstruction, thus improving the accuracy and efficiency of video 3D reconstruction.

[0136] Reference Figure 8 The present invention also provides a video keyframe extraction device, the device comprising:

[0137] The acquisition module 100 is used to determine the candidate video frames that match the current keyframe in the video frame sequence following the current keyframe.

[0138] The calculation module 200 is used to determine the model fitness score between the current keyframe and each of the candidate video frames based on the feature point matching uniformity between the current keyframe and each of the candidate video frames, and the feature point distribution characteristics of the candidate video frames.

[0139] The determination module 300 is used to determine the next keyframe corresponding to the current keyframe in the candidate video frames based on the model fitness score.

[0140] Optionally, the step of determining the model fitness score between the current video frame and each of the candidate video frames based on the feature point matching uniformity between the current keyframe and each of the candidate video frames, and the feature point distribution characteristics of the candidate video frames, includes:

[0141] Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the first motion model, a first fitness score for the preset first motion model is determined.

[0142] Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the second motion model, a second fitness score for the preset second motion model is determined.

[0143] The model fitness score is determined based on the first fitness score and the second fitness score.

[0144] Optionally, the step of determining the model fitness score based on the first fitness score and the second fitness score includes:

[0145] Determine the difference between the first fitness score and the second fitness score;

[0146] The ratio of the difference to the first fitness score is determined as the model fitness score.

[0147] Optionally, after the step of determining the next keyframe corresponding to the current keyframe among the candidate video frames based on the model fitness score, the method further includes:

[0148] Obtain the keyframe sequence and extract the image features of the keyframe sequence;

[0149] The matching degree between the keyframes is determined based on the image features;

[0150] The keyframe with the highest matching degree is identified as the two target keyframes.

[0151] The three-dimensional model corresponding to the video to be processed is determined based on the two target keyframes.

[0152] Optionally, the step of determining the 3D model corresponding to the video to be processed based on the two target keyframes includes:

[0153] Determine the camera's position and attitude information based on the two target keyframes;

[0154] The three-dimensional coordinate points corresponding to the image features are determined based on the pixel position information of the image features of the two target key frames, and the position and pose information of the camera.

[0155] The three-dimensional model of the video to be processed is determined based on the three-dimensional coordinate points corresponding to the image features.

[0156] Optionally, the step of determining candidate video frames that match the current keyframe in the video frame sequence following the current keyframe includes:

[0157] Obtain feature point information of each video frame in the video frame sequence;

[0158] Determine the feature point matching rate between the current keyframe and the video frame;

[0159] Video frames whose feature point matching rate is greater than a preset threshold are identified as candidate video frames.

[0160] Optionally, the step of determining the feature point matching rate between the current keyframe and the video frame includes:

[0161] Determine the total number of feature points in the current keyframe and each video frame, and the number of matching feature points;

[0162] The feature point matching rate is determined based on the total number of feature points and the number of matching feature points.

[0163] The present invention also provides a video keyframe extraction device, the video keyframe extraction device including a memory, a processor, and a video keyframe extraction program stored in the memory and executable on the processor. When the video keyframe extraction program is executed by the processor, it implements the various steps of the video keyframe extraction method as described in the above embodiments.

[0164] The present invention also provides a computer-readable storage medium storing a video keyframe extraction program, which, when executed by a processor, implements the various steps of the video keyframe extraction method described in the above embodiments.

[0165] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0166] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, system, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, system, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, system, article, or apparatus that includes that element.

[0167] Through the above description of the embodiments, those skilled in the art can clearly understand that the systems described in the embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a computer-readable storage medium (such as ROM / RAM, magnetic disk, optical disk) as described above, and includes several instructions to cause a terminal device (which may be a mobile phone, computer, parking management device, air conditioner, or network device, etc.) to execute the systems described in the various embodiments of the present invention.

[0168] The above are merely preferred embodiments of the present invention and do not limit the scope of the patent. Any equivalent structural or procedural transformations made based on the description and drawings of the present invention, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention.

Claims

1. A method for extracting keyframes from a video, characterized in that, The method includes: In the video frame sequence following the current keyframe, identify candidate video frames that match the current keyframe; Based on the feature point matching uniformity between the current keyframe and each of the candidate keyframes, the feature point distribution characteristics of the candidate video frames, and the degrees of freedom of the first motion model, a first fitness score of the preset first motion model is determined; the feature point matching uniformity is determined based on the grid proportion of the matched feature points in the current video frame after grid division; the feature point distribution characteristics represent the Euclidean distance between the feature point and its matching feature point on the homography plane; and the first motion model is a two-dimensional H-motion model. Based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the second motion model, a second fitness score for the preset second motion model is determined; the second motion model is a three-dimensional F-motion model. Determine the difference between the first fitness score and the second fitness score; The ratio of the difference to the first fitness score is determined as the model fitness score between the current keyframe and each of the candidate video frames; Based on the model fitness score, the next keyframe corresponding to the current keyframe is determined from the candidate video frames.

2. The video keyframe extraction method as described in claim 1, characterized in that, After the step of determining the next keyframe corresponding to the current keyframe among the candidate video frames based on the model fitness score, the method further includes: Obtain the keyframe sequence and extract the image features of the keyframe sequence; The matching degree between the keyframes is determined based on the image features; The keyframe with the highest matching degree is identified as the two target keyframes. The three-dimensional model corresponding to the video to be processed is determined based on the two target keyframes.

3. The video keyframe extraction method as described in claim 2, characterized in that, The step of determining the 3D model corresponding to the video to be processed based on the two target keyframes includes: Determine the camera's position and attitude information based on the two target keyframes; The three-dimensional coordinate points corresponding to the image features are determined based on the pixel position information of the image features of the two target key frames, and the position and pose information of the camera. The three-dimensional model of the video to be processed is determined based on the three-dimensional coordinate points corresponding to the image features.

4. The video keyframe extraction method as described in claim 1, characterized in that, The step of determining candidate video frames that match the current keyframe in the video frame sequence following the current keyframe includes: Obtain feature point information of each video frame in the video frame sequence; Determine the feature point matching rate between the current keyframe and the video frame; Video frames whose feature point matching rate is greater than a preset threshold are identified as candidate video frames.

5. The video keyframe extraction method as described in claim 1, characterized in that, The step of determining the feature point matching rate between the current keyframe and the video frame includes: Determine the total number of feature points in the current keyframe and each video frame, and the number of matching feature points; The feature point matching rate is determined based on the total number of feature points and the number of matching feature points.

6. A video keyframe extraction device, characterized in that, The device includes: The acquisition module is used to determine candidate video frames that match the current keyframe in the video frame sequence following the current keyframe. The calculation module is used to determine a first fitness score of a preset first motion model based on the feature point matching uniformity between the current keyframe and each of the candidate keyframes, the feature point distribution characteristics of the candidate video frames, and the degrees of freedom of the first motion model. The feature point matching uniformity is determined based on the proportion of the matched feature points in the grid of the current video frame after gridding. The feature point distribution characteristics represent the Euclidean distance between the feature points and their matching feature points on the homography plane. The first motion model is a two-dimensional H-motion model. The module also determines a second fitness score of a preset second motion model based on the feature point matching uniformity, the feature point distribution characteristics, and the degrees of freedom of the second motion model. The second motion model is a three-dimensional F-motion model. Finally, the module determines the difference between the first fitness score and the second fitness score. The ratio of the difference to the first fitness score is determined as the model fitness score between the current keyframe and each of the candidate video frames. The determination module is used to determine the next keyframe corresponding to the current keyframe among the candidate video frames based on the model fitness score.

7. A video keyframe extraction device, characterized in that, The video keyframe extraction device includes a memory, a processor, and a video keyframe extraction program stored in the memory and executable on the processor. When the video keyframe extraction program is executed by the processor, it implements the various steps of the video keyframe extraction method as described in any one of claims 1-5.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a video keyframe extraction program, which, when executed by a processor, implements the steps of the video keyframe extraction method as described in any one of claims 1-5.