A digital video stabilization method based on decomposition motion compensation

By classifying and matrix-fusioning videos using a decomposition motion compensation method, the problems of insufficient adaptability and processing effect in existing video stabilization technologies are solved, and efficient and stable video generation is achieved.

CN117714623BActive Publication Date: 2026-06-23HOHAI UNIV +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HOHAI UNIV
Filing Date
2023-10-27
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing video stabilization technologies cannot adapt to different types of videos, resulting in slow processing speeds, high video cropping rates, and image distortion and blurring.

Method used

A decomposition-based motion compensation method is adopted, which combines motion estimation, translation and rotation component extraction, depth estimation, grid flow prediction, similar spatiotemporal path optimization and L1 optimal camera path optimization with weight coefficients for video classification and matrix fusion to generate stable videos.

Benefits of technology

It achieves adaptive processing of different types of videos, improves processing speed and reduces image distortion, expands applicable scenarios and enhances the quality of video stabilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117714623B_ABST
    Figure CN117714623B_ABST
Patent Text Reader

Abstract

The application discloses a digital video stabilization method based on decomposition motion compensation, which comprises the following steps: firstly, calculating the motion path and stability of the shaking video, and combining the video depth to classify the shaking video; then, using the grid flow prediction adaptive path optimization, the similar space-time path optimization, the average filtering optimization and the L1 optimal camera path optimization method to smooth and optimize the shaking motion path, and then storing the corresponding path optimization information into four warping transformation matrices; introducing the corresponding weight coefficients into the four warping transformation matrices and then fusing them to obtain the final transformation matrix; finally, applying the final transformation matrix to each frame in the shaking video sequence through the warping operation to generate a stable video. According to different video types, the application adopts different weight coefficients, reduces the picture distortion while ensuring the processing speed, expands the application range and effectively improves the quality of the video stabilization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of video processing, specifically relating to a digital video stabilization method based on decomposition motion compensation. Background Technology

[0002] Today, video has become one of the main mediums for disseminating information. People can shoot videos freely in any setting, which inevitably leads to shaky videos because the photographer cannot remain stable throughout the recording process. Shaky videos not only affect the viewer's experience but also increase the difficulty of video processing. Therefore, an efficient video stabilization technology is essential. Over the past two decades, traditional video stabilization technologies have been proposed to solve the problem of video shakiness. However, most methods can only handle certain types of video and cannot achieve adaptive processing. Furthermore, most methods, while eliminating shakiness, also introduce problems such as image distortion, excessive cropping of the original footage, and excessively slow processing speed.

[0003] Existing video stabilization algorithms can be broadly categorized into traditional methods and deep learning-based methods. Traditional video stabilization algorithms mainly consist of three steps: motion estimation, motion compensation, and image warping. The first step is to estimate global motion vectors and establish a motion model. This stage typically employs optical flow or feature point matching methods for motion estimation. The second step involves applying various constraints to the estimated motion model to remove unwanted jitter. However, excessive constraints on the motion model can negatively impact the original normal motion of the video, leading to content loss and distortion. The third step is image warping, which applies the optimized motion model to each frame of the video to obtain the final stable video. In recent years, with the development of deep learning, more and more digital video stabilization methods based on convolutional neural networks have been proposed. These methods do not explicitly compute camera paths but instead simulate a super-visual learning approach. While deep learning-based video stabilization methods offer high stability and adaptability, they also have some drawbacks. First, these methods require a large amount of training data and powerful computing resources, thus placing high demands on the computer. Secondly, when processing high-resolution and long videos, this method is prone to problems such as blurring, noise, or distortion, thus affecting the visual quality of the output video. Furthermore, although this technique excels at removing image jitter, it can also cause side effects such as image distortion, artifacts, or lens flare, and improving these issues may require further processing and adjustments. Meanwhile, traditional methods and deep learning-based methods share a common drawback: they are often only designed for a specific problem. This means that these methods can only handle a fixed type of video, failing to work for other types. They cannot apply different processing methods to different video types, which significantly limits the scope of application of video stabilization technology. Summary of the Invention

[0004] The technical problem to be solved by this invention is: to address the problems in the prior art, a digital video stabilization method based on decomposition motion compensation is provided, which, while ensuring processing speed, solves the problems of previous methods being unable to adapt to video types and video content loss due to excessive constraints on motion models, namely, excessively high video cropping rate and image distortion and blurring.

[0005] To solve the above technical problems, the present invention provides the following technical solution: a digital video stabilization method based on decomposition motion compensation, comprising the following steps:

[0006] S1. Perform motion estimation on the reference frame and the current frame of the jittery video to obtain the jittery motion path of the jittery video;

[0007] S2. Extract translation and rotation components for each path in the jitter motion path, evaluate the percentage of low-frequency component energy in the translation and rotation components to obtain the stability of the jitter video, and then use the depth estimation method to determine the depth of the jitter video. Based on the stability and depth of the jitter video, classify the video into categories to obtain the videos of each category.

[0008] S3. Use grid flow prediction adaptive path optimization, similar spatiotemporal path optimization, average filtering optimization and L1 optimal camera path optimization to smooth the jittery motion path and obtain the corresponding path optimization information.

[0009] S4. Store the corresponding path optimization information into four warp transformation matrices respectively;

[0010] S5. Based on the videos of each category obtained in step S2, introduce the corresponding weight coefficients into the four warp transformation matrices and then fuse them to obtain the final transformation matrices.

[0011] S6. Apply each final transformation matrix to each frame of the jittery video sequence through a warping operation to generate a stable video.

[0012] Furthermore, in step S1 above, motion estimation for the reference frame and the current frame of the jittery video is performed using a grid-based fast ultra-robust feature matching method, including the following sub-steps:

[0013] S101. Split the shaky video into a continuous frame sequence;

[0014] S102. Divide each frame in the frame sequence into a grid region, and encapsulate the motion smoothness into a statistical likelihood of a preset number of feature matching within the region.

[0015] S103. Obtain motion information between adjacent frames based on the matched feature point pairs;

[0016] S104. Accumulate the motion information between adjacent frames to obtain the jitter motion path of the jitter video.

[0017] Furthermore, in the aforementioned step S2, the minimum metric among translation and rotation is used as the final metric to evaluate the percentage of low-frequency component energy; the video categories include: non-depth video with severe shaking, depth video with severe shaking, non-depth video with normal shaking, and depth video with normal shaking;

[0018] Furthermore, the aforementioned step S3 includes the following sub-steps:

[0019] S301. The adaptive path optimization using grid flow prediction specifically involves: dividing the video frame into 2D grids, then obtaining the image angles between consecutive frames, generating a motion vector at each motion position, and transferring the motion vectors to their corresponding adjacent grid vertices, where each grid vertex represents the motion vector in the neighborhood; then using a downsampling filter to smooth the temporal variation of the motion vector at each grid vertex to obtain a smooth path after optimization by grid flow prediction adaptive path; S302. The similar spatiotemporal path optimization specifically involves: dividing the video frame into grid cells, defining the concatenation of local isomorphic domains on the same grid cell over time as a path bundle, using multiple camera paths to simulate camera paths, and using bilateral filtering to smooth each camera path as a whole to obtain a smooth path after optimization by similar spatiotemporal path;

[0020] S303. Extract motion information on the x-axis, y-axis and angle of the jitter motion path of the jitter video obtained in step S1. Then, use a moving average filter, set a preset fixed value for the neighborhood pane, smooth the jitter motion path, and replace the values ​​on the path with the average value in the neighborhood pane to obtain the smoothed path after average filtering.

[0021] S304. Optimize the camera path by dividing it into three parts: a constant path representing a static camera, a constant velocity path representing translational or moving photography, and a constant acceleration path representing the transition between static and translational cameras. That is, divide the camera path into constant segments, linear segments, and parabolic segments for optimization; and use |D(P)|1, |D... 2 (P)|1 and |D 3 (P)|1 represents, where D is the differential symbol, and P(t) represents the camera path; the optimization is treated as a constrained L1 minimization problem, and the final optimized camera path is defined as follows:

[0022]

[0023] For |D(P)|1, |D 2 (P)|1 and |D 3 The optimal stable camera path is obtained by minimizing (P)|1 respectively.

[0024] Furthermore, the aforementioned step S4 includes the following sub-steps:

[0025] S401. Based on the smooth path optimized by the adaptive path prediction of the grid flow, the smooth transformation matrix D1 between adjacent frames is obtained;

[0026] S402. Based on the smoothed path optimized by similar spatiotemporal paths, the smooth transformation matrix D2 between adjacent frames is obtained.

[0027] S403. Calculate the transformation matrix D3 between adjacent frames based on the difference in the smoothed paths after average filtering.

[0028] S404. Based on minimizing the optimal stable camera path, the smooth transformation matrix D4 between adjacent frames is obtained.

[0029] Furthermore, the aforementioned step S5 includes the following sub-steps:

[0030] S501. Introduce four weight coefficients: α, β, γ, and λ. Based on the video categories obtained in step S2, including: severely shaking non-depth videos, severely shaking depth videos, normally shaking non-depth videos, and normally shaking videos, adjust the weights of the four coefficients so that the sum of the four weight coefficients is 1.

[0031] α+β+γ+λ=1

[0032] S502. Since the transformation matrices obtained in step S4 are of the same type, assign weights α, β, γ, and λ to the transformation matrices obtained in step S4 respectively, and add the four matrices together to obtain a new transformation matrix D between adjacent frames, as shown in the following formula:

[0033] D = α*D1 + β*D2 + γ*D3 + λ*D4

[0034] 0 < α, β, γ, λ < 1

[0035] S503. Based on formula D and the known clipping rate, stability, and distortion scores, determine the optimal coefficients; by applying this transformation D to the original path, the final stable camera motion path required by this invention can be obtained.

[0036] Furthermore, in the aforementioned step S301, a downsampling filter is used to smooth the time variation of the motion vector at each grid vertex, and the optimization formula is as follows:

[0037]

[0038] Where O(t)=∑ t f(t) is the camera path at time t, and the first constraint is ||P(t) - O(t)||. 2 To encourage stable video to maintain a path close to the original camera path and avoid excessive cropping and distortion, the second constraint is ||P(t)-P(s)||. 2 To enhance the smoothness of time, Ω t Let θ be the time smoothing radius. t,s To set it to exp(-||st|| 2 / (Ω t / 3) 2 Gaussian weights, δ tIt is used to balance two constraints, the energy function is a quadratic function, and the minimum value is found by a sparse linear solver.

[0039] Furthermore, in step S302 above, bilateral filtering is used to smooth the individual camera paths as a whole, and the optimization formula is as follows:

[0040]

[0041] Where P(t) is the optimized path, O(t) is the original camera path, and Ω t It is the neighborhood of frame t, and the constraint is: data item ||P(t) - O(t)|| 2 Force the new camera path to approximate the original path to reduce cropping and distortion; smooth the term ||P(t)-P(s)|| 2 Stable path; θ that maintains motion discontinuity during rapid translation / rotation or scene transitions. t,s (O) Weights; using δ t This is to balance the weights of the two items mentioned above.

[0042] Further, step S303 described above specifically involves: using a moving average filter to replace the values ​​on the path with the average value within the neighborhood pane, resulting in a new smooth path; a moving average filter with a window size of 5 smooths the curve, as shown below:

[0043]

[0044] We store the path curve in array O, then the points on the curve are O[0]…O[n-1], where k is the kth point on the path curve, f[k] updates the value at the kth point on the path, and finally obtains the smoothed path.

[0045] Furthermore, in the aforementioned step S304, based on the known original camera path O(t), the video is an image sequence I1, I2, ..., I... t Each frame pair (I t-1 ,I t ) and linear motion model L t (t) is associated with L t (t) for feature point t from I1 to I t-1 The motion is modeled, and the formula for calculating the camera path O(t) is:

[0046]

[0047] The formula for calculating the optimized camera path is as follows:

[0048] P(t)=O(t)F t

[0049] Where Ft =O(t) -1 P(t) is the update transform, which, when applied to the original camera path O(t), generates the optimal path P(t); it is then applied to each frame to obtain the final stabilized video.

[0050] Treating the optimization as a constrained L1 minimization problem, the final optimization formula for the camera path is defined as follows:

[0051]

[0052] For |D(P)|1, |D 2 (P)|1 and |D 3 The optimal stable camera path is obtained by minimizing (P)|1 respectively.

[0053] For |D(P)|1, |D 2 (P)|1 and |D 3 Minimizing (P)|1 to obtain the optimal stable camera path includes the following sub-steps, |D(P)|1, |D 2 (P)|1 and |D 3 (P)|1 corresponds to the first, second, and third derivatives of the path residual, respectively:

[0054] S304-1. Minimize |D(P)|1 using forward difference:

[0055]

[0056] Transform O(t) in the formula into its decomposition form:

[0057]

[0058] Given O(t), we seek to minimize this residual, as shown in the following equation:

[0059]

[0060] S304-2, To |D 2 Minimize (P)|1 as follows:

[0061]

[0062] The errors are modeled and summed to minimize the residuals, as shown in the following formula:

[0063] |M t+1 -M t |=|L t (t+2)F t+2 -(I+L t (t+1))F t+1 +Ft |;

[0064] S304-3, To |D 3 Minimize (P)|1:

[0065]

[0066] Compared to existing technologies, the beneficial technical effects of the present invention using the above technical solution are as follows: This invention provides a digital video stabilization method based on motion decomposition compensation, overcoming the limitations of traditional video stabilization methods in terms of applicability and processing effect. This invention can effectively classify and stabilize different types of videos, handling not only regular videos but also videos with severe shaking or depth information. Based on the characteristics of different video types, this invention employs corresponding weighting coefficients to minimize image distortion while ensuring processing speed, thereby expanding the applicable scenarios and significantly improving the quality of video stabilization effects. Attached Figure Description

[0067] Figure 1 This is a framework diagram of a digital video stabilization method based on decomposition motion compensation according to the present invention. Detailed Implementation

[0068] To better understand the technical content of the present invention, specific embodiments are described below in conjunction with the accompanying drawings.

[0069] In this invention, various aspects of the invention are described with reference to the accompanying drawings, in which numerous illustrative embodiments are shown. Embodiments of the invention are not limited to those depicted in the drawings. It should be understood that the invention is implemented through any of the various concepts and embodiments described above, as well as the concepts and embodiments described in detail below, because the concepts and embodiments disclosed herein are not limited to any particular implementation. Furthermore, some aspects of the invention disclosed may be used alone or in any suitable combination with other aspects of the invention disclosed.

[0070] refer to Figure 1 The present invention provides a digital video stabilization method based on decomposition motion compensation, comprising the following steps: S1, performing motion estimation on the reference frame and the current frame of the jittery video to obtain the jittery motion path of the jittery video;

[0071] S2. Extract translation and rotation components for each path in the jitter motion path, evaluate the percentage of low-frequency component energy in the translation and rotation components to obtain the stability of the jitter video, and then use the depth estimation method to determine the depth of the jitter video. Based on the stability and depth of the jitter video, classify the video into categories to obtain the videos of each category.

[0072] S3. Use grid flow prediction adaptive path optimization, similar spatiotemporal path optimization, average filtering optimization and L1 optimal camera path optimization to smooth the jittery motion path and obtain the corresponding path optimization information.

[0073] S4. Store the corresponding path optimization information into four warp transformation matrices respectively;

[0074] S5. Based on the videos of each category obtained in step S2, introduce the corresponding weight coefficients into the four warp transformation matrices and then fuse them to obtain the final transformation matrices.

[0075] S6. Apply each final transformation matrix to each frame of the jittery video sequence through a warping operation to generate a stable video.

[0076] As a preferred embodiment of the digital video stabilization method based on decomposition motion compensation of the present invention, step S1 includes the following sub-steps:

[0077] S101. Split the shaky video into a continuous frame sequence;

[0078] S102. Divide each frame in the frame sequence into a grid region, and encapsulate the motion smoothness into a statistical likelihood of a preset number of feature matching within the region to achieve high-quality matching of feature points.

[0079] Due to the small size of the region, this invention will consider limiting it to ideal pairs of true and false regions, ignoring partially similar locations. Let t c It is one of the n supporting features of region c, given t c Given a correct matching probability k, the goal of this invention is to derive the arrival rate of matching the {c,d} region when viewing the same / different positions in {c,d}. To make the problem easier to handle, assume t c A mismatch occurs because its nearest neighbor match could be located in any of the N possible positions, hence the formula:

[0080]

[0081] Where n is the number of features in region d, and ζ is an added factor to accommodate assumption violations due to repetitive structures (such as a row of windows). Let... Given the same position in the view of {c,d}, feature t c The probability that the nearest neighbor is in region d. This refers to the event where the {c,d} region is matched when viewing the same or different locations within {c,d}, and N is the N possible locations where the nearest neighbor match might be located.

[0082] Therefore, the following reasoning holds:

[0083]

[0084] Only when t c The event occurs when the match is correct, or when the match is incorrect but happens to fall within region b. This will happen. This gives us the first line of the above equation. The second line comes from Bayes' theorem. Because the features are pre-matched... With T cd Irrelevant. This is based on the assumption. Also with T cd Irrelevant. Remove condition T. cd Substitute these values ​​into the equation to obtain the final expression. S103. Based on the matched feature point pairs, obtain the motion information between adjacent frames;

[0085] S104. Accumulate the motion information between adjacent frames to obtain the jitter motion path of the jitter video.

[0086] As a preferred embodiment of the digital video stabilization method based on decomposition motion compensation of the present invention, step S2 specifically includes:

[0087] (a) Translation and rotation components are extracted from each path, and the percentage of energy of low-frequency components (excluding DC components) in these one-dimensional signals is evaluated to measure stability. The invention uses the minimum metric for translation and rotation as the final metric;

[0088] (b) Use depth estimation methods to determine the depth of the video;

[0089] (c) Based on the obtained stability and depth, the videos are divided into non-depth videos with severe shaking, depth videos with severe shaking, non-depth videos with normal shaking, and depth videos with normal shaking.

[0090] As a preferred embodiment of the digital video stabilization method based on decomposition motion compensation of the present invention, step S3 includes the following sub-steps:

[0091] S301. The adaptive path optimization using grid flow prediction specifically involves: dividing the video frame into a 2D grid, then obtaining the image angles between consecutive frames, and generating a motion vector at each motion position. The motion vectors are then transferred to their corresponding neighboring grid vertices, so that each vertex accumulates several motions from the features around it. The grid vertex represents the motion vector in the neighborhood. In terms of camera motion smoothing, a downsampling filter is used to smooth the temporal changes of the motion vector at each grid vertex. This filter is applied to each grid vertices to naturally handle spatially changing motion, resulting in a smooth path after the adaptive path optimization using grid flow prediction.

[0092] A filter is applied to each grid vertices to naturally handle spatially varying motion, and the optimization formula is as follows:

[0093]

[0094] Where O(t)=∑ t f(t) is the camera path at time t (f(t) represents the mesh flow at time t, f(0) = 0). The first constraint is ||P(t) - O(t)|| 2 Stable video is encouraged to maintain a path close to the original camera path to avoid over-cropping and distortion. The second constraint is ||P(t)-P(s)||. 2 It enhances the smoothness of time. Ω t Let θ be the time smoothing radius. t,s To set it to exp(-||st|| 2 / (Ω t / 3) 2 Gaussian weights, δ t This is used to balance two constraints. The energy function is a quadratic function, which can be minimized using a sparse linear solver. This invention uses a Jacobi-based solver for iterative solution, ultimately obtaining a smooth path optimized by the adaptive path for grid flow prediction.

[0095] S302. Similar spatiotemporal path optimization specifically involves: dividing video frames into grid cells; defining a path as a concatenation of local isomorphic domains within the same grid cell over time; simulating camera paths using multiple camera paths; and smoothing each camera path as a whole using bilateral filtering to obtain a smoothed path after similar spatiotemporal path optimization. The optimization formula is as follows:

[0096]

[0097] Where P(t) is the optimized path, O(t) is the original camera path, and Ω t It is the neighborhood of frame t, and the constraint is: data item ||P(t) - O(t)|| 2 Force the new camera path to approximate the original path to reduce cropping and distortion; smooth the term ||P(t)-P(s)|| 2 Stable path; θ that maintains motion discontinuity during rapid translation / rotation or scene transitions. t,s (O) Weights; using δ t To balance the weights of the two terms mentioned above, this invention uses a Jacobi-based iterative solver to solve the above formula, ultimately obtaining the optimized camera path.

[0098] S303. Extract motion information along the x-axis, y-axis, and angle from the jittery motion path of the jittery video obtained in step S1. Then, use a moving average filter with a preset fixed value in the neighborhood pane to smooth the jittery motion path, replacing the values ​​on the path with the average value within the neighborhood pane to obtain a smoothed path after average filtering. Use a moving average filter to replace the values ​​on the path with the average value within the neighborhood pane to obtain a new smoothed path. A moving average filter with a window size of 5 smooths the curve, as shown below:

[0099]

[0100] We store the path curve in array O, so the points on the curve are O[0]...O[n-1]. Here, k is the k-th point on the path curve. In this invention, we use f[k] to update the value at the k-th point on the path, and finally obtain the smoothed path.

[0101] S304. Optimize the camera path by dividing it into three parts: a constant path representing a static camera, a constant velocity path representing translational or moving photography, and a constant acceleration path representing the transition between static and translational cameras. That is, divide the camera path into constant segments, linear segments, and parabolic segments for optimization; and use |D(P)|1, |D... 2 (P)|1 and |D 3 (P)|1 represents, where D is the differential symbol and P represents the camera path; to obtain the optimal path composed of different constant segments, linear segments, and parabolic segments, rather than their superposition, this invention treats the optimization of this invention as a constrained L1 minimization problem. In this invention, the camera path O(t) of the original video segment has been calculated (from the feature trajectory) and described by a parametric linear motion model in each time instance. Specifically, the video is an image sequence I1, I2, ..., I t Each frame pair (I t-1 ,I t ) and linear motion model L t (t) is associated with L t (t) for feature point t from I1 to I t-1 The motion is modeled. The formula for calculating the camera path O(t) is:

[0102]

[0103] The formula for calculating the optimized camera path is as follows:

[0104] P(t)=O(t)F t

[0105] Where F t =O(t) -1P(t) is the update transform, which, when applied to the original camera path O(t), generates the optimal path P(t). This invention refers to this as the stabilizing transform, applying it to each frame to obtain the final stabilized video.

[0106] Treating the optimization as a constrained L1 minimization problem, the final optimization formula for the camera path is defined as follows:

[0107]

[0108] For |D(P)|1, |D 2 (P)|1 and |D 3 The optimal stable camera path is obtained by minimizing (P)|1 respectively.

[0109] In step S304, for |D(P)|1, |D 2 (P)|1 and |D 3 Minimizing (P)|1 to obtain the optimal stable camera path involves the following sub-steps:

[0110] S304-1. Minimize |D(P)|1 using forward difference:

[0111]

[0112] Transform O(t) in the formula into its decomposition form:

[0113]

[0114] Given O(t), we seek to minimize this residual, as shown in the following equation:

[0115]

[0116] S304-2, To |D 2 Minimize (P)|1 as follows:

[0117]

[0118] The errors are modeled and summed to minimize the residuals, as shown in the following formula:

[0119] |M t+1 -M t |=|L t (t+2)F t+2 -(I+L t (t+1))F t+1 +F t |;

[0120] S304-3, To |D 3 Minimize (P)|1:

[0121]

[0122] As a preferred embodiment of the digital video stabilization method based on decomposition motion compensation of the present invention, step S4 includes the following sub-steps:

[0123] S401. Based on the smooth path optimized by the adaptive path prediction of the grid flow, the smooth transformation matrix D1 between adjacent frames is obtained;

[0124] S402. Based on the smoothed path optimized by similar spatiotemporal paths, the smooth transformation matrix D2 between adjacent frames is obtained.

[0125] S403. Calculate the transformation matrix D3 between adjacent frames based on the difference in the smoothed paths after average filtering.

[0126] S404. The smooth transformation matrix D4 between adjacent frames is obtained by minimizing the optimal stable camera path. Based on the path obtained above, this invention can derive the smooth transformation matrix between adjacent frames. In this part, this invention uses affine transformation to achieve this, and the implementation process is as follows:

[0127] b = H * a T

[0128]

[0129] H is the transformation matrix used in this invention, which transforms point a in one frame into the corresponding point b in the adjacent frame. x and y represent the motion on the x-axis and y-axis, respectively, while sinθ and cosθ are used to represent the angle of motion.

[0130] Furthermore, as a preferred embodiment of the digital video stabilization method based on decomposition motion compensation of the present invention, step S5 includes the following sub-steps:

[0131] S501. Introduce four weight coefficients: α, β, γ, and λ. Based on the video categories obtained in step S2, including: severely shaking non-depth videos, severely shaking depth videos, normally shaking non-depth videos, and normally shaking videos, adjust the weights of the four coefficients so that the sum of the four weight coefficients is 1.

[0132] α+β+γ+λ=1

[0133] S502. Since the transformation matrices obtained in step S4 are of the same type, assign weights α, β, γ, and λ to the transformation matrices obtained in step S4 respectively, and add the four matrices together to obtain a new transformation matrix D between adjacent frames, as shown in the following formula:

[0134] D = α*D1 + β*D2 + γ*D3 + λ*D4

[0135] 0 < α, β, γ, λ < 1

[0136] S503. Based on formula D and the known clipping rate, stability, and distortion scores, determine the optimal coefficients; by applying this transformation D to the original path, the final stable camera motion path required by this invention can be obtained.

[0137] While the present invention has been described above with reference to preferred embodiments, it is not intended to limit the invention. Those skilled in the art can make various modifications and refinements without departing from the spirit and scope of the invention. Therefore, the scope of protection of the present invention shall be determined by the claims.

Claims

1. A digital video stabilization method based on decomposition motion compensation, characterized in that, Includes the following steps: S1. Perform motion estimation on the reference frame and the current frame of the jittery video to obtain the jittery motion path of the jittery video; S2. Extract translation and rotation components for each path in the jitter motion path, evaluate the percentage of low-frequency component energy in the translation and rotation components to obtain the stability of the jitter video, and then use the depth estimation method to determine the depth of the jitter video. Based on the stability and depth of the jitter video, classify the video into categories to obtain the videos of each category. S3. Use grid flow prediction adaptive path optimization, similar spatiotemporal path optimization, average filtering optimization and L1 optimal camera path optimization to smooth the jittery motion path and obtain the corresponding path optimization information. S4. Store the corresponding path optimization information into four warp transformation matrices respectively; S5. Based on the videos of each category obtained in step S2, introduce the corresponding weight coefficients into the four warp transformation matrices and then fuse them to obtain the final transformation matrices. Specifically, it includes the following sub-steps: S501. Introduce four weight coefficients: α, β, γ, and λ. Adjust these four weight coefficients based on the video categories obtained in step S2. The video categories are: non-depth videos with severe shaking, depth videos with severe shaking, non-depth videos with normal shaking, and depth videos with normal shaking. The sum of the four weight coefficients is 1. , S502. Since the transformation matrices obtained in step S4 are of the same type, assign weights α, β, γ, and λ to the transformation matrices obtained in step S4 respectively, and add the four matrices together to obtain a new transformation matrix D between adjacent frames, as shown in the following formula: , , S503. Based on formula D and the known clipping rate, stability, and distortion scores, determine the optimal coefficients; by applying this transformation D to the original path, the final stable camera motion path can be obtained. S6. Apply each final transformation matrix to each frame of the jittery video sequence through a warping operation to generate a stable video.

2. The digital video stabilization method based on decomposition motion compensation according to claim 1, characterized in that, In step S1, motion estimation for the reference frame and the current frame of the jittery video is performed using a grid-based fast ultra-robust feature matching method, which includes the following sub-steps: S101. Split the shaky video into a continuous frame sequence; S102. Divide each frame in the frame sequence into a grid region and encapsulate the motion smoothness into a statistical likelihood of a preset number of feature matching within the region. S103. Obtain motion information between adjacent frames based on the matched feature point pairs; S104. Accumulate the motion information between adjacent frames to obtain the jitter motion path of the jitter video.

3. The digital video stabilization method based on decomposition motion compensation according to claim 1, characterized in that, In step S2, the lowest metric among translation and rotation is used as the final metric to evaluate the percentage of low-frequency component energy.

4. The digital video stabilization method based on decomposition motion compensation according to claim 1, characterized in that, Step S3 includes the following sub-steps: S301. The adaptive path optimization using grid flow prediction specifically involves: dividing the video frame into a 2D grid, then obtaining the image angle between consecutive frames, generating a motion vector at each motion position, transferring the motion vector to their corresponding adjacent grid vertices, where each grid vertex represents the motion vector in the neighborhood; then using a downsampling filter to smooth the temporal change of the motion vector at each grid vertex to obtain the smoothed path after the adaptive path optimization using grid flow prediction. S302. The optimization of similar spatiotemporal paths is as follows: the video frame is divided into grid units, the concatenation of local isomorphic domains on the same grid unit over time is defined as a path bundle, multiple camera paths are used to simulate camera paths, and bilateral filtering is used to smooth each camera path as a whole to obtain a smooth path after optimization of similar spatiotemporal paths. S303. Extract motion information on the x-axis, y-axis and angle of the jittering motion path of the jittering video obtained in step S1. Then, use a moving average filter to set a neighborhood pane with a preset fixed value to smooth the jittering motion path. Replace the value on the path with the average value in the neighborhood pane to obtain the smoothed path after average filtering. S304. Optimize the camera path by dividing it into three parts: a constant path representing a static camera, a constant speed path representing panning or moving photography, and a constant acceleration path representing the transition between static and panning cameras; that is, divide the camera path into constant segments, linear segments, and parabolic segments for optimization; and use... , and It means that, among them, It is the differential symbol. Indicates the camera path; Treating the optimization as a constrained L1 minimization problem, the final optimization formula for the camera path is defined as follows: , right , and The optimal stable camera path is obtained by minimizing each path.

5. The digital video stabilization method based on decomposition motion compensation according to claim 1, characterized in that, Step S4 includes the following sub-steps: S401. Based on the smooth path optimized by the adaptive path prediction of the grid flow, the smooth transformation matrix D1 between adjacent frames is obtained; S402. Based on the smoothed path optimized by similar spatiotemporal paths, the smooth transformation matrix D2 between adjacent frames is obtained. S403. Calculate the transformation matrix D3 between adjacent frames based on the difference in the smoothed paths after average filtering. S404. Based on minimizing the optimal stable camera path, the smooth transformation matrix D4 between adjacent frames is obtained.

6. The digital video stabilization method based on decomposition motion compensation according to claim 4, characterized in that, In step S301, a downsampling filter is used to smooth the time variation of the motion vector at each grid vertex. The optimization formula is as follows: , in, The camera path at time t, the first constraint The second constraint encourages stable video to maintain a path close to the original camera path, avoiding excessive cropping and distortion. Enhance the smoothness of time. Let the time smoothing radius be , To set as Gaussian weights, It is used to balance two constraints, the energy function is a quadratic function, and the minimum value is found by a sparse linear solver.

7. A digital video stabilization method based on decomposition motion compensation according to claim 4, characterized in that, In step S302, bilateral filtering is used to smooth the individual camera paths as a whole. The optimization formula is as follows: , in, It's about optimizing the path. It is the original camera path. It is the neighborhood of frame t, and the constraint is: data item Force the new camera path to closely approximate the original path to reduce cropping and distortion; smoothing items Stable path; maintaining motion discontinuity during rapid translation / rotation or scene transitions. Weights; This is to balance the weights of the two items mentioned above.

8. A digital video stabilization method based on decomposition motion compensation according to claim 4, characterized in that, Step S303 specifically involves: using a moving average filter to replace the values ​​on the path with the average value within the neighborhood pane, resulting in a new smooth path. A moving average filter with a window size of 5 smooths the curve, as shown below: , We store the path curves in an array In the middle, the points on the curve are … Where k is the k-th point on the path curve, Update the value at the k-th point on the path to obtain the smoothed path.

9. A digital video stabilization method based on decomposition motion compensation according to claim 4, characterized in that, In step S304, based on the original camera path It is known that a video is a sequence of images. , , ..., Each frame of the pair Compared with linear motion model Related, For feature points from arrive Modeling motion, camera path The calculation formula is: , The formula for calculating the optimized camera path is as follows: , in It is an update transformation, applied when the original camera path is used. At that time, the optimal path is generated. ; Apply it to each frame to obtain the final stabilized video. Treating the optimization as a constrained L1 minimization problem, the final optimization formula for the camera path is defined as follows: , right , and The optimal stable camera path is obtained by minimizing each path. right , and Minimizing the optimal stable camera path involves the following sub-steps: , and These correspond to the first, second, and third derivatives of the path residuals, respectively: S304-1, to To minimize, use forward differencing: , In the formula Transform it into its decomposed form: , Given that the residual is minimized, we seek to minimize it as follows: , S304-2, to Minimize it as follows: , The errors are modeled and summed to minimize the residuals, as shown in the following formula: ; S304−3, to Minimize: 。