A video jitter detection method, apparatus, and device

By segmenting video frame images and performing displacement vector analysis, combined with motion direction consistency, the misjudgment problem in video jitter detection in existing technologies has been solved, achieving higher accuracy and reliability.

CN115550632BActive Publication Date: 2026-06-16ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2022-09-14
Publication Date
2026-06-16

Smart Images

  • Figure CN115550632B_ABST
    Figure CN115550632B_ABST
Patent Text Reader

Abstract

The embodiment of the specification discloses a video jitter detection method, device and equipment. The scheme comprises: obtaining a plurality of groups of two adjacent images in a frame image set contained in a to-be-detected video; performing cutting block processing on the two adjacent images respectively to obtain a plurality of image blocks; calculating displacement vectors of the image blocks, and identifying an abnormal block in the plurality of image blocks according to the displacement vectors; determining jitter degree representation values of the two adjacent images respectively according to the remaining blocks except the abnormal block; judging whether the motion directions corresponding to a plurality of continuous images in the frame image set respectively conform to a set consistency condition, if yes, adjusting the jitter degree representation values of at least part of the frame images in the plurality of frame images to reduce the jitter degree represented thereby; and after corresponding processing according to the judgment result, judging whether the to-be-detected video is jittered according to the jitter degree representation values of the images in the frame image set.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of image processing technology, and in particular to a video jitter detection method, apparatus, and device. Background Technology

[0002] Video jitter exists in both short video and live streaming businesses, and it directly affects the user's viewing experience. In order to help improve the user experience, it is necessary to detect video jitter more accurately and then process it accordingly to reduce the negative experience caused by video jitter.

[0003] Some current video jitter detection solutions take into account moving objects in the image, but the processing method is relatively crude, the extracted motion features are too simple, and the resulting errors are large.

[0004] Therefore, a more accurate and reliable video jitter detection solution is needed. Summary of the Invention

[0005] This specification provides one or more embodiments of a video jitter detection method, apparatus, device, and storage medium to address the following technical problem: the need for a more accurate and reliable video jitter detection solution.

[0006] To solve the above-mentioned technical problems, one or more embodiments of this specification are implemented as follows:

[0007] This specification provides a video jitter detection method according to one or more embodiments, including:

[0008] Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0009] The two adjacent images are each segmented into blocks to obtain multiple image blocks;

[0010] Calculate the displacement vector of the image block, identify abnormal blocks among the plurality of image blocks based on the displacement vector, and determine the jitter characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0011] Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents.

[0012] After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.

[0013] This specification provides a video jitter detection device according to one or more embodiments, comprising:

[0014] The image acquisition module acquires multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0015] The image segmentation module segments the two adjacent frames into multiple image blocks.

[0016] The local detection module calculates the displacement vector of the image block, identifies abnormal blocks among the multiple image blocks based on the displacement vector, and determines the jitter characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0017] The jitter adjustment module determines whether the motion directions corresponding to multiple consecutive frames in the frame image set meet the set consistency conditions. If so, it adjusts the jitter degree characterization value of at least some of the frames in the multiple frames to reduce the jitter degree it represents.

[0018] The overall detection module, after processing the results of the judgment accordingly, determines whether the video to be detected is jittery based on the jitter degree characterization value of the images in the frame image set.

[0019] This specification provides one or more embodiments of a video jitter detection device, comprising:

[0020] At least one processor; and,

[0021] A memory communicatively connected to the at least one processor; wherein,

[0022] The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to:

[0023] Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0024] The two adjacent images are each segmented into blocks to obtain multiple image blocks;

[0025] Calculate the displacement vector of the image block, identify abnormal blocks among the plurality of image blocks based on the displacement vector, and determine the jitter characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0026] Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents.

[0027] After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.

[0028] This specification provides one or more embodiments of a non-volatile computer storage medium storing computer-executable instructions, wherein the computer-executable instructions are configured as follows:

[0029] Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0030] The two adjacent images are each segmented into blocks to obtain multiple image blocks;

[0031] Calculate the displacement vector of the image block, identify abnormal blocks among the plurality of image blocks based on the displacement vector, and determine the jitter characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0032] Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents.

[0033] After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.

[0034] The above-described at least one technical solution adopted in one or more embodiments of this specification can achieve the following beneficial effects: it can not only focus on the features of moving objects in the image, but also on the features of other local areas, including the image background. Based on multiple image blocks obtained by slicing the image in the video, these features can be extracted more precisely. Moreover, based on the displacement vector of the image block, some abnormal local areas in the image can be excluded or their negative impact on detection can be reduced. In particular, for traditional solutions that easily misidentify normal video effects such as photographic panning and photographic stretching as video jitter, this method can accurately identify such normal effects based on whether the overall motion direction of adjacent frames is consistent, thereby reducing misjudgment and improving the reliability and accuracy of detection. Attached Figure Description

[0035] To more clearly illustrate the technical solutions in the embodiments or prior art of this specification, the drawings used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0036] Figure 1 A flowchart illustrating a video jitter detection method provided in one or more embodiments of this specification;

[0037] Figure 2 Provided for one or more embodiments of this specification Figure 1 A flowchart illustrating a specific implementation scheme of the Chinese method;

[0038] Figure 3 A schematic diagram of a direction statistics scheme provided for one or more embodiments of this specification;

[0039] Figure 4 A schematic diagram of image similarity comparison provided for one or more embodiments of this specification;

[0040] Figures 5(a) and 5(b) are schematic diagrams of the motion direction corresponding to an image provided by one or more embodiments of this specification;

[0041] Figure 6 A schematic diagram of the structure of a video jitter detection device provided in one or more embodiments of this specification;

[0042] Figure 7 This is a schematic diagram of the structure of a video jitter detection device provided in one or more embodiments of this specification. Detailed Implementation

[0043] This specification provides a video jitter detection method, apparatus, device, and storage medium through its embodiments.

[0044] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this application.

[0045] This application attempts to employ several methods to detect video jitter. However, these methods are limited by their application environment, only detecting video jitter in specific or partial environments, thus exhibiting the problems described in the background section, which will be further analyzed below. These methods include feature point methods, dense optical flow methods, and projection methods.

[0046] The feature point method mainly obtains features through methods such as SIFT, SURF, FAST, BREF, and ORB. Then, it obtains the transformation matrix of the previous and next frames based on the feature points, and then determines whether there is jitter. However, this approach will cause feature points to be clustered on moving objects, while there are very few background feature points, which will cause a large number of false positives. Moreover, some videos are difficult to extract feature points, which further increases the difficulty of detection.

[0047] For dense optical flow, this approach involves a large amount of computation, and because there are often many moving objects in the video, it can lead to many errors and misjudgments of the final result.

[0048] While the projection method requires less computation than the dense optical flow method, it is prone to misdiagnosis in cases of lens stretching and lens translation in video. Although the entire image is in motion, it may not necessarily be considered jitter.

[0049] This application addresses the above-mentioned problems and provides improvements. The following section will further explain the solution provided in this application based on this approach.

[0050] Figure 1 This diagram illustrates a video jitter detection method provided in one or more embodiments of this specification. This method can be applied to various business sectors, including short video, live streaming, instant messaging, e-commerce, electronic payment, and gaming. The process can be executed on image processing devices, such as smartphones and live streaming servers. Certain input parameters or intermediate results in the process can be manually adjusted to help improve accuracy.

[0051] Figure 1 The process includes the following steps:

[0052] S102: Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected.

[0053] In one or more embodiments of this specification, the video to be detected consists of a series of consecutive frames at its frame rate. The set of frame images is a subset or all of the frame images that constitute the video to be detected. For example, a subset of images can be sampled from the video to be detected as the set of frame images, which helps to improve detection efficiency.

[0054] In S102, "adjacent" refers to temporal adjacency; the temporal order of the images in the frame image set is the temporal order in the video to be detected. Following the temporal order of the images in the frame image set, each pair of temporally adjacent frames is sequentially acquired, forming a group of two adjacent frames (e.g., frames 1 and 2 as one group, frames 2 and 3 as another, frames 3 and 4 as yet another, and so on). Two adjacent frames can also be called consecutive frames, where the temporal order of the preceding frame is earlier than that of the following frame. For ease of description, the following mainly uses any one group of two adjacent frames as an example to illustrate the processing procedure; other groups of two adjacent frames can be processed in a similar manner until the entire frame image set has been processed.

[0055] S104: The two adjacent images are sliced ​​into blocks to obtain multiple image blocks.

[0056] In one or more embodiments of this specification, whether a video is jittery is closely related to the motion in the image. To accurately distinguish between normal motion and motion indirectly caused by jitter, more precise detection of motion in the image is necessary. To better achieve this, each frame of the image is segmented into blocks, each block reflecting the local situation of its respective frame. Local analysis is performed first, followed by combining multiple local analyses for global analysis. This approach helps to obtain more accurate global analysis results, eliminates anomalies early on to prevent certain local conditions from misleading the final detection result, and improves subsequent processing efficiency.

[0057] There are various ways to handle image segmentation. For example, after segmenting the target in each frame, the image can be further segmented into blocks after classifying the targets, or the image can be segmented into blocks after classifying the components of the target. The advantage of this approach is that the resulting image blocks themselves have a clearer business meaning, which helps to differentiate and target the subsequent processing. Of course, this approach also requires more computational power. When efficiency is a priority, a simpler segmentation method can be used, such as dividing each frame into m rows and n columns (denoted as m*n) image blocks according to a matrix partitioning method, and setting the values ​​of m and n as needed, for example, setting both m and n to 4, or both to 5, and so on.

[0058] S106: Calculate the displacement vector of the image block, identify abnormal blocks among the plurality of image blocks based on the displacement vector, and determine the jitter characterization values ​​of the two adjacent frames based on the remaining blocks other than the abnormal blocks.

[0059] In one or more embodiments of this specification, the displacement vector of each image block is represented by vector components in multiple specified directions, such as components in the horizontal direction (x-axis direction) and the vertical direction (y-axis direction). If, in a specific business scenario, particular attention is paid to certain special directions (for example, in a ball game video, the angle of the ball may be of interest, so a 45-degree direction or other directions that can represent the elevation angle of the ball when it takes off may be specified), the displacement vector can also be represented based on the components in that direction.

[0060] In one or more embodiments of this specification, it is assumed that an image block (or part of it) in the previous frame image has moved a certain displacement in a specified direction. The correlation between the image block (or part of it) in the subsequent frame image corresponding to that displacement and the content in the previous frame is then verified. A higher correlation indicates that the assumed displacement is more consistent with reality. This approach is followed multiple times to obtain the more likely displacements in each specified direction, and these displacements are then used as components to represent the displacement vector of the image block.

[0061] In one or more embodiments of this specification, the impact of local regions on jitter detection results is measured based on image patches. Then, multiple local regions are considered comprehensively to obtain the impact on the overall frame image's jitter detection results. In practical applications, some local regions can interfere with the overall results and affect the measurement. Therefore, it is first attempted to completely exclude or reduce the interference to some extent by treating these types of image patches as anomalous blocks. Then, the impact of the entire frame image on jitter detection results is mainly measured based on the remaining image patches (i.e., the remaining blocks). A jitter level characterization value can be predefined to represent this impact. In the video to be detected, the more frames corresponding to the higher the jitter level characterization value, the higher the probability that the video to be detected is jittery. This can be roughly understood; the actual relationship may not be linear but may be a more complex positive correlation.

[0062] S108: Determine whether the motion directions corresponding to the consecutive multiple frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some of the frames in the multiple frames to reduce the jitter degree it represents.

[0063] In some methods, including projection, false positives are easily generated when dealing with lens stretching and panning in video (especially in the case of uniform motion). This application argues that these two scenarios are characterized by a consistent direction of motion across multiple consecutive frames, even though the corresponding video frame is moving. Furthermore, in the case of uniform motion, not only the direction of motion but also the length of the displacement remains consistent. Therefore, based on these phenomena, the system can determine whether the motion in the current video is caused by these two scenarios, thus avoiding false positives for jitter.

[0064] In one or more embodiments of this specification, the jitter level characterization value of the multi-frame image in S108 can be calculated according to the preceding steps. For the abnormal blocks in S106, they can be directly excluded, or they can be partially retained, and their negative impact can be reduced by adjusting their jitter level characterization value (if reduced to a sufficiently low value, it may even achieve an effect basically the same as direct exclusion); these two processing methods can be flexibly selected or combined according to actual needs.

[0065] S110: After processing the result of the judgment accordingly, determine whether the video to be detected is jittering based on the jitter degree characterization value of the images in the frame image set.

[0066] In one or more embodiments of this specification, the jitter severity characterization values ​​of these images are accumulated or weighted, and the jitter severity characterization values ​​are used to characterize the jitter of the entire video to be detected based on a set threshold and the accumulated value of the jitter severity characterization values, thereby determining whether the video to be detected is jittery. The threshold can be set according to parameters such as jitter severity characterization values, block processing, and frame image set, so that the threshold can be adaptively adjusted.

[0067] pass Figure 1 This method not only focuses on the features of moving objects in the image, but also on the features of other local regions, including the image background. Based on multiple image blocks obtained by slicing the video image, it extracts these features more precisely. Moreover, it can exclude or reduce the negative impact that some abnormal local regions in the image may introduce on detection based on the displacement vector of the image blocks. In particular, it is particularly effective in addressing the common problem of traditional methods misidentifying normal video effects such as photographic panning and stretching as video jitter. It accurately identifies such normal effects based on whether the overall motion direction of adjacent frames is consistent, reducing misjudgments and improving the reliability and accuracy of detection.

[0068] based on Figure 1 In addition to the method described herein, this specification also provides some specific implementation schemes and extension schemes of this method, which will be further explained below.

[0069] In one or more embodiments of this specification, the timing from the previous frame to the next is correct, and the displacement generated in the image is realistic. However, in actual calculations, the displacement is not determined directly by visual observation, but rather by searching for the optimal solution through multiple attempts. Therefore, errors may exist, and it may get stuck in local optima. Considering this situation, to improve reliability, displacement is checked in both positive and negative directions. By comparing the positive and negative displacement vectors of corresponding image blocks in two adjacent images, the reliability of the current displacement calculation is determined. If the displacement calculation of an image block is reliable, its positive and negative displacement vectors should theoretically be opposites, and their sum should be zero.

[0070] Based on the previous approach, we calculate the forward displacement vector from the previous frame to the next frame and the reverse displacement vector from the next frame to the previous frame for each image patch. The degree of difference between the forward and reverse displacement vectors determines whether the image patch is an anomalous. If the sum of the two is zero or close to a predetermined threshold (i.e., their absolute values ​​are essentially the same), the image patch can be considered normal; otherwise, it is an anomalous patch, and at least one of the forward or reverse displacement vectors is unreliable in calculation. For image patches initially identified as anomalous, we can recalculate their displacement vectors. If reliable calculation results are obtained through recalculation, the image patch can be considered normal.

[0071] In one or more embodiments of this specification, in practical applications, there is another type of abnormal situation, namely, the entire frame of the image or a part of the image appears as a black screen, a white screen, or a stutter. In such cases, the content displayed in the image is likely to be meaningless and will interfere with the detection results. This type of situation is identified based on the mean and variance. Specifically, according to the displacement vector, the mean and variance (or standard deviation) of the sum of the squares of the displacement vectors of each image block in the same frame are calculated. For example, the mean and variance of the sum of the squares of the displacement vectors themselves, or the mean and variance of the sum of the square roots of the displacement vectors. If both the mean and variance are zero (a certain tolerance can be given when considering errors, for example, both can be close to zero), then each image block is identified as an abnormal block or the jitter level characterization value of the same frame image is reduced.

[0072] In one or more embodiments of this specification, another situation that may mislead detection results is also considered: in a frame of an image, only a few image blocks experience relatively intense motion. Although this motion is not caused by jitter, it can affect the detection of the entire frame. Therefore, it is considered to exclude such a small number of image blocks that change drastically relative to the entire frame. Based on this, the sum of squares of the displacement vectors of multiple image blocks contained in a frame of an image (if some abnormal blocks have been identified, these abnormal blocks are excluded first) can be calculated. It is then determined whether the dispersion of the sum of squares of the displacement vectors is greater than a set threshold. If so, at least some of the image blocks that cause the dispersion to increase are identified as abnormal blocks, so that the dispersion of the remaining blocks is no longer greater than the set threshold. The dispersion is represented by standard deviation or variance, and can also be normalized based on the mean to make the set threshold more universal. For example, the dispersion can be positively correlated with the standard deviation or variance of the sum of squares of the displacement vectors and negatively correlated with the mean of the sum of squares of the displacement vectors.

[0073] In one or more embodiments of this specification, after excluding abnormal blocks that cause the aforementioned abnormally large dispersion, the dispersion itself can reasonably reflect the jitter level of the corresponding image. Therefore, the jitter level characterization value can be calculated based on the dispersion. For example, for all remaining blocks in any frame of two adjacent images, the jitter level characterization value of any frame is calculated based on the number of remaining blocks (if there are any unmoved remaining blocks, they can be ignored or directly excluded) and the dispersion of the sum of squares of the displacement vectors.

[0074] In one or more embodiments of this specification, besides the possibility that a small number of localized areas of a frame might exhibit violent motion, misleading detection results, the overall motion within a frame, if it is violent and chaotic (this is often not due to shaking, but is something that could actually happen, such as a close-up shot of a wok while cooking), could also mislead detection results. This application introduces entropy to describe the chaos of motion within an image. For all remaining blocks in any frame of two adjacent frames, the motion directions corresponding to each remaining block are statistically analyzed. Based on these motion directions, the entropy value of any frame is calculated. The entropy value reflects the degree of chaos in the motion directions within any frame. If the entropy value exceeds a set threshold, the motion directions in any frame are considered too chaotic, and the degree of shaking represented by the shaking degree characterization value of that frame is correspondingly reduced. For example, by statistically analyzing the velocity directions of multiple image blocks contained in an image, then summing the number of blocks in each direction, and converting this sum into the proportion of image blocks in each direction, the entropy value of the image is calculated. Excluding excessively large entropy values ​​helps prevent the misidentification of the motion of a large object in a complex moving shot as shaking.

[0075] In practical applications, a video often consists of multiple transitional shots rather than a single, continuous long take. Significant changes occur between frames during these transitions, easily leading to misidentification of camera shake. This application identifies suspected scene transitions based on whether the current shake level characterization value changes drastically (e.g., a sharp increase). Then, by comparing the similarity of the preceding and following frames, it determines whether it is indeed a scene transition and takes appropriate action. Furthermore, considering that the interval between frames is often very short, resulting in a large amount of comparison, and that most images have high similarity, dhash values ​​are used for comparison to improve efficiency and accuracy.

[0076] Specifically, after determining the jitter level representation values ​​for two adjacent image frames, a screen switching reference threshold is determined based on the number of image blocks. It is then determined whether the jitter level representation value is greater than the screen switching reference threshold. If so, the similarity between the two adjacent image frames is calculated. If the similarity is less than a set threshold, the jitter level representation value is adjusted to reduce the represented jitter level. When calculating similarity, if the time interval between the comparison objects is very short (e.g., if the two adjacent image frames are two consecutive frames in the video to be detected), the dhash values ​​of the two adjacent image frames can be calculated. Based on the distance between the dhash values ​​of the two adjacent image frames (e.g., Hamming distance), the similarity between the two adjacent image frames is calculated.

[0077] In one or more embodiments of this specification, for multiple consecutive frames in a frame image set, the sum of displacement vectors corresponding to the previous frame and the sum of displacement vectors corresponding to the next frame are calculated based on the displacement vectors of the remaining blocks in the previous frame and the remaining blocks in the next frame (determined separately by components or by combining components and then determining uniformly). The product of these two displacement vector sums is used to determine whether the motion directions corresponding to the previous and next frames meet a set consistency condition. This process is repeated iteratively to determine whether the motion directions corresponding to the multiple consecutive frames meet the set consistency condition. If the motion directions of the two frames are completely identical (in which case the product is 1), then the consistency condition is most met. However, in practical applications, the determination can be more lenient. For example, the product of the two displacement vector sums can be greater than or equal to 0, indicating that the angle between the two directions is not obtuse, but generally acute, meaning the directions are roughly consistent, and this can also be considered to meet the consistency condition. The strictness of the consistency condition can be controlled according to actual needs.

[0078] Furthermore, if the motion directions of multiple consecutive frames in a frame image set meet the set consistency conditions, as explained above, this is likely a normal phenomenon rather than jitter. Therefore, it is unreasonable to allow these frames to cause the overall jitter level to continuously accumulate and expand according to the number of images. A small number of representative frames can be selected to contribute to the potential increase in jitter level, while ignoring other frames. This prevents unreasonable expansion while ensuring that these frames as a whole still contribute, since this situation may still be considered jitter.

[0079] Representative frames, such as the first or last frame, are used. Based on this, for example, the jitter level of the first and / or last frame in a multi-frame image is retained, while the jitter level of other intermediate frames is adjusted to minimize the jitter level.

[0080] Based on the above description, and more intuitively, one or more embodiments of this specification also provide Figure 1 A flowchart illustrating a specific implementation of the Chinese method, such as... Figure 2 As shown. For ease of understanding, this process employs some exemplary alternatives, calculation formulas, and relevant thresholds, which have yielded good results in actual testing.

[0081] Figure 2 The process includes the following steps:

[0082] Step 1: Take a video segment to be tested, and take the data at two-second intervals as the research object, collectively referred to as sub-videos. Extract 20 frames per second, denoted as chou_zhen=20. If the frame rate of the image is less than 20, select according to the frame rate and obtain the time interval between each adjacent image, denoted as dis_time=1 / chou_zhen.

[0083] Step 2: Let sum_t = []. sum_t is used to store the t values ​​that meet the requirements in the sub-video. The t values ​​are the jitter state characterization values ​​mentioned above.

[0084] Step 3: Cut two adjacent image frames into blocks, for example, into 5*5 or 4*4 image blocks respectively.

[0085] Step 4: Calculate the displacement vector of each image patch, denoted as And the sum of squares of the displacement vectors, denoted as ,in, This represents the displacement vector of the image block obtained in frame L and frame L-1.

[0086] Displacement vector acquisition:

[0087] (1) Calculate the horizontal projection of the image Longitudinal projection ;

[0088] ;

[0089] ;

[0090] Where m represents the number of columns in the image, n represents the number of rows in the image, and L represents the Lth frame of the image.

[0091] (2) Calculate the average value of the transverse projection. The average value of the longitudinal projection ;

[0092] ;

[0093] .

[0094] (3) Obtain the lateral projection offset value Longitudinal projection offset value ;

[0095] ;

[0096] .

[0097] (4) Calculate the horizontal and vertical displacements based on the lateral and longitudinal projection offsets. The calculation methods are the same; the lateral projection calculation is given as an example. First, calculate the lateral correlation function:

[0098] ;

[0099] Where, cur represents the current frame image, pre represents the previous frame image, and m is a setting value that represents the search range, such as a recommended value of 16.

[0100] (5) Calculate within the range Reaching the minimum value .

[0101] (6) Obtain horizontal displacement .

[0102] (7) Calculate the sum of squares of the displacement vectors. and its included angle ;

[0103] ;

[0104] ;

[0105] (8) Obtain the positive displacement vector from the previous frame to the next frame. The reverse displacement vector is obtained from the next frame to the previous frame. The positive displacement vector is the displacement vector mainly needed in the subsequent steps;

[0106] ;

[0107] .

[0108] Step 5: Determine if the displacement vector of each image block is reliable and meaningful. Add the forward and reverse displacement vectors of the image block and check if the sum is less than a set threshold. If the value is greater than the threshold, discard the block; otherwise, proceed to the next step.

[0109] ;

[0110] .

[0111] Step 6: Determine if the mean and variance of the sum of squares of the displacement vectors are both 0, or if the total number of remaining blocks in a frame is less than a set threshold. If so, then all... (That is, the t value corresponding to the Lth frame image) is set to 0, and then the process jumps to step 15; otherwise, proceed to the next step. Here, the condition that both values ​​are 0 simultaneously can identify black screen, white screen, or stuttering. The threshold for the number of remaining blocks is, for example, 6.

[0112] Step 7: Delete image blocks with abnormal jitter based on the mean and variance of the sum of squares of the displacement vectors.

[0113] Steps to delete a block:

[0114] judge Is it greater than 1? If it is greater than 1, then remove the distance. Largest absolute value The corresponding image patch, until Up to a value less than 1. Where, express The standard deviation or variance express mean

[0115] Step 8: Determine if the mean and variance are both 0. If so, then... Set it to 0, then jump to step 15.

[0116] Step 9: This is to prevent abnormally moving blocks in the image from causing misjudgments of a still image. Otherwise, proceed to the next step.

[0117] Step 10: Calculate the entropy value of the remaining block.

[0118] In step 4 The displacement is statistically analyzed in 8 directions, see [link / reference] Figure 3 . Figure 3 This is a schematic diagram of a direction statistics scheme provided in one or more embodiments of this specification. Figure 3 In this context, directions are divided into 0 degrees, 45 degrees, 90 degrees, 135 degrees, 180 degrees, 225 degrees, 270 degrees, and 315 degrees, thus obtaining... Then, the number of statistics in each direction is normalized to obtain...

[0119] For example, calculating the entropy value. .

[0120] Step 11: If the entropy value is greater than 0.43 (indicating that the image content's motion direction is too chaotic, which can avoid misidentification as shakiness due to objects occupying a large proportion of the shot with complex motion), then... Set to 0 and proceed to step 18; otherwise, proceed to the next step.

[0121] Step 12: Calculate the remaining blocks in the image value.

[0122] (1) Summing the remaining blocks with respect to the horizontal and vertical displacement vectors respectively yields the entire image. , .

[0123] ;

[0124] .

[0125] (2) Calculate the mean and variance of the sum of squares (or the square root of the sum of squares) of the remaining displacement vectors, and the number of remaining blocks. .

[0126] Solving for:

[0127] ;

[0128] express single The number of.

[0129] Step 13: Determine Is the value greater than , This indicates the number of rows and columns in the segmentation. For example, for a 5x5 segmentation, then... The value is 5. This will appear when the video transitions between scenes. Therefore, when a sharp increase occurs, the image is judged based on the dhash value. If no sharp increase occurs, proceed to step 15.

[0130] Specific judgment process:

[0131] (1) First, scale the image, for example, to a size of 9*8 (the number here represents the number of pixels).

[0132] (2) Calculate the difference value to obtain the hash value. By comparing the left and right pixels of each row, if the pixel value on the left is greater than the pixel value on the right, it is recorded as 1; otherwise, it is recorded as 0. Since there are 9 pixels in each row, 8 values ​​can be obtained. There are 8 rows in total. Finally, a 0-1 sequence with a hash value length of 64 can be obtained.

[0133] (3) Calculate the dhash values ​​of the two frames of images using Hamming distance to obtain the similarity;

[0134] For continuous video, since the images are taken at very short intervals, the continuity of the images is very strong and the similarity of the images is very high. The dhash judgment method is more in line with the usage requirements, and it is more than 100 times faster than the feature point judgment method.

[0135] Calculate the similarity ;

[0136] in, This indicates taking the Hamming distance.

[0137] See Figure 4 , Figure 4 This is a schematic diagram of image similarity comparison provided for one or more embodiments of this specification. Figure 4 The similarity of the dhash values ​​between the first two frames is 0.4843, and the similarity of the dhash values ​​between the last two frames is 0.8125.

[0138] Step 14: Determine if the similarity of the dhash values ​​between two image frames is less than 0.65. If so, set... If the result is 0, proceed to the next step; otherwise, skip to step 16.

[0139] Step 15: For frames with consistent displacement vector directions across multiple consecutive frames... The sequence of values, based on the counter `continue_num` that records this type of frame, retains the beginning. and the end , will the middle Set all values ​​to 0, then proceed to step 18.

[0140] Step 16: Multiply the sum of displacement vectors in the current frame with the sum of displacement vectors in the corresponding direction of the previous frame, and determine whether the motion direction of the current frame is consistent with that of the previous frame. Both the horizontal and vertical directions must be consistent. For lens stretching and lens panning, although the entire image is moving, and the motion directions of individual image blocks may be different, the horizontal and vertical displacement vectors of the two frames are consistent. See Figures 5(a) and 5(b), which are schematic diagrams of the motion direction corresponding to an image provided for one or more embodiments of this specification. Figure 5(a) shows the approximate sum of displacement vectors in the case of photographic stretching, and Figure 5(b) shows the approximate sum of displacement vectors in the case of photographic panning.

[0141] Consistency judgment conditions can be, for example, as follows: ; .

[0142] Step 17: If they do not match, jump to step 15; if they match, increment the counter continue_num by 1 and proceed to the next step.

[0143] Step 18: If they match, then... The values ​​are stored in sum_t, and the counter n is incremented by 1, where n represents the number of t values ​​in sum_t.

[0144] Step 19: Determine if the number of images traversed has reached the number of images within the interval. If so, calculate the sum of all t values ​​in sum_k. If the sum is greater than the threshold H, it is determined to be jitter; otherwise, it is determined to be no jitter. For example, H can be set to all_sum*(qie_num+2)*(qie_num+thresh)*(all_num / 40), where all_sum represents the total number of images extracted within the time interval. Following the previous example, a time interval is 2 seconds, 20 frames are extracted per second, for a total of 40 frames.

[0145] Step 20: If the expected quantity is not reached, proceed to step 3 and continue iterative execution.

[0146] Based on the same idea, one or more embodiments of this specification also provide apparatus and devices corresponding to the above methods, such as... Figure 6 , Figure 7 As shown.

[0147] Figure 6 This is a schematic diagram of a video jitter detection device provided in one or more embodiments of this specification. The device includes:

[0148] The image acquisition module 602 acquires multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0149] The block processing module 604 performs block processing on the two adjacent image frames respectively to obtain multiple image blocks;

[0150] The local detection module 606 calculates the displacement vector of the image block, identifies abnormal blocks in the plurality of image blocks based on the displacement vector, and determines the jitter degree characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0151] The jitter adjustment module 608 determines whether the motion directions corresponding to multiple consecutive frames in the frame image set meet the set consistency conditions. If so, it adjusts the jitter degree characterization value of at least some of the frames in the multiple frames to reduce the jitter degree it represents.

[0152] The overall detection module 610, after processing the result of the judgment accordingly, determines whether the video to be detected is jittery based on the jitter degree characterization value of the images in the frame image set.

[0153] Optionally, the local detection module 606 calculates for the image block a positive displacement vector from the previous frame to the next frame in the two adjacent frames, and a negative displacement vector from the next frame to the previous frame.

[0154] Based on the degree of difference between the positive displacement vector and the negative displacement vector, it is determined whether the image block is an abnormal block.

[0155] Optionally, the local detection module 606 calculates the mean and variance of the sum of squares of the displacement vectors of each image block in the same frame of the image based on the displacement vector;

[0156] If both the mean and the variance are zero, then each of the image blocks is identified as an outlier block or the jitter level of the image in the same frame is reduced.

[0157] Optionally, the local detection module 606 detects all remaining blocks in any frame of the two adjacent images;

[0158] The jitter level characterization value of any frame image is calculated based on the number of remaining blocks in all remaining blocks and the dispersion of the sum of squares of the displacement vectors.

[0159] Optionally, the local detection module 606 determines whether the dispersion of the sum of squares of the displacement vectors is greater than a set threshold.

[0160] If so, at least some of the image blocks that cause the dispersion to increase are identified as abnormal blocks, so that the dispersion of the remaining blocks is no longer greater than the set threshold.

[0161] The degree of dispersion is positively correlated with the standard deviation or variance of the sum of squares of the displacement vectors, and negatively correlated with the mean of the sum of squares of the displacement vectors.

[0162] Optionally, the local detection module 606 detects all remaining blocks in any frame of the two adjacent images;

[0163] Calculate the movement directions corresponding to all remaining blocks;

[0164] Based on the corresponding motion directions, the entropy value of any frame image is calculated, and the entropy value reflects the degree of disorder in the motion directions in any frame image;

[0165] If the entropy value is greater than the set threshold, the jitter level represented by the jitter level characterization value of any frame image is reduced accordingly.

[0166] Optionally, after determining the jitter level characterization values ​​of the two adjacent images, the local detection module 606 determines a screen switching reference threshold based on the number of the plurality of image blocks;

[0167] Determine whether the jitter level characterization values ​​of the two adjacent frames are greater than the screen switching reference threshold.

[0168] If so, then calculate the similarity between the two adjacent images;

[0169] If the similarity is less than a set threshold, the jitter level representation value is adjusted to reduce the jitter level it represents.

[0170] Optionally, the local detection module 606 calculates the similarity between the two adjacent images, specifically including: if the two adjacent images are two consecutive frames in the video to be detected, the local detection module 606 calculates the dhash values ​​of the two adjacent images respectively.

[0171] The similarity between the two adjacent images is calculated based on the Hamming distance between their respective dhash values.

[0172] Optionally, the jitter adjustment module 608, for multiple consecutive frames in the frame image set, calculates the sum of displacement vectors corresponding to the previous frame image and the sum of displacement vectors corresponding to the next frame image, respectively, based on the displacement vectors of the remaining blocks in the previous frame image and the displacement vectors of the remaining blocks in the next frame image.

[0173] Based on the product of these two displacement vectors, it is determined whether the motion directions corresponding to the previous frame image and the next frame image meet the set consistency conditions.

[0174] By iteratively executing the above steps, it is determined whether the motion directions corresponding to the consecutive multi-frame images meet the set consistency conditions.

[0175] Optionally, the jitter adjustment module 608 retains the jitter degree characterization value of the first frame image and / or the last frame image in the multi-frame image, and adjusts the jitter degree characterization value of the other intermediate frame images to minimize the jitter degree it represents.

[0176] The step of determining whether the video to be detected is jittery based on the jitter level characterization value of the images in the frame image set specifically includes:

[0177] Based on the cumulative value of the jitter level characterization value of the images in the frame image set, it is determined whether the video to be detected is jittery.

[0178] Figure 7 This specification provides a schematic diagram of the structure of a video jitter detection device according to one or more embodiments. The device includes:

[0179] At least one processor; and,

[0180] A memory communicatively connected to the at least one processor; wherein,

[0181] The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to:

[0182] Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0183] The two adjacent images are each segmented into blocks to obtain multiple image blocks;

[0184] Calculate the displacement vector of the image block, identify abnormal blocks among the plurality of image blocks based on the displacement vector, and determine the jitter characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0185] Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents.

[0186] After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.

[0187] Based on the same idea, one or more embodiments of this specification also provide corresponding... Figure 1 A non-volatile computer storage medium of the Chinese method stores computer-executable instructions, wherein the computer-executable instructions are configured as follows:

[0188] Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected;

[0189] The two adjacent images are each segmented into blocks to obtain multiple image blocks;

[0190] Calculate the displacement vector of the image block, identify abnormal blocks among the plurality of image blocks based on the displacement vector, and determine the jitter characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks.

[0191] Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents.

[0192] After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.

[0193] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must also be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0194] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0195] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0196] For ease of description, the above devices are described in terms of function, divided into various units. Of course, in implementing this specification, the functions of each unit can be implemented in one or more software and / or hardware components.

[0197] Those skilled in the art will understand that the embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, the embodiments of this specification can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the embodiments of this specification can take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0198] This specification is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this specification. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0199] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0200] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0201] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0202] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0203] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0204] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0205] This specification can be described in the general context of computer-executable instructions that are executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This specification can also be practiced in distributed computing environments, where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0206] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the embodiments of apparatus, devices, and non-volatile computer storage media are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0207] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0208] The above description is merely one or more embodiments of this specification and is not intended to limit this specification. Various modifications and variations can be made to the one or more embodiments of this specification by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of one or more embodiments of this specification should be included within the scope of the claims of this specification.

Claims

1. A video jitter detection method, comprising: Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected; The two adjacent images are each segmented into blocks to obtain multiple image blocks; Calculate the displacement vector of the image patch, and identify abnormal blocks among the plurality of image patches based on the displacement vector; Based on the remaining blocks excluding the abnormal blocks, determine the jitter level characterization values ​​for the two adjacent frames, specifically including: for all remaining blocks in any frame of the two adjacent frames; count the motion directions corresponding to each of the remaining blocks; calculate the entropy value of any frame based on the corresponding motion directions, the entropy value reflecting the degree of chaos in the motion direction in any frame; if the entropy value is greater than a set threshold, then the jitter level characterization value of any frame is reduced accordingly to prevent the motion of a large object in a complex moving shot from being misidentified as jitter; Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents. After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.

2. The method as described in claim 1, wherein calculating the displacement vector of the image patch and identifying abnormal blocks among the plurality of image patches based on the displacement vector specifically includes: Calculate the positive displacement vector from the previous frame to the next frame in the two adjacent frames for the image block, and the negative displacement vector from the next frame to the previous frame. Based on the degree of difference between the positive displacement vector and the negative displacement vector, it is determined whether the image block is an abnormal block.

3. The method as described in claim 1, wherein identifying abnormal blocks among the plurality of image blocks based on the displacement vector specifically includes: Based on the displacement vector, calculate the mean and variance of the sum of squares of the displacement vectors of each image block in the same frame of the image; If both the mean and the variance are zero, then each of the image blocks is identified as an anomalous block.

4. The method as described in claim 1, wherein identifying abnormal blocks among the plurality of image blocks based on the displacement vector specifically includes: Determine whether the dispersion of the sum of squares of the displacement vectors is greater than a set threshold; If so, at least some of the image blocks that cause the dispersion to increase are identified as abnormal blocks, so that the dispersion of the remaining blocks is no longer greater than a set threshold. The degree of dispersion is positively correlated with the standard deviation or variance of the sum of squares of the displacement vectors, and negatively correlated with the mean of the sum of squares of the displacement vectors.

5. The method of claim 1, wherein after determining the jitter level characterization values ​​of the two adjacent frames, the method further comprises: Based on the number of the plurality of image blocks, a screen switching reference threshold is determined; Determine whether the jitter level characterization values ​​of the two adjacent frames are greater than the screen switching reference threshold. If so, then calculate the similarity between the two adjacent images; If the similarity is less than a set threshold, the jitter level representation value is adjusted to reduce the jitter level it represents.

6. The method as described in claim 5, wherein calculating the similarity between the two adjacent images specifically includes: If the two adjacent images are two consecutive frames in the video to be detected, then calculate the dhash value of the two adjacent images respectively; The similarity between the two adjacent images is calculated based on the Hamming distance between their respective dhash values.

7. The method as described in claim 1, wherein determining whether the motion directions corresponding to consecutive frames in the frame image set meet the set of set consistency conditions specifically includes: For multiple consecutive frames in the frame image set, the sum of displacement vectors corresponding to the previous frame image and the sum of displacement vectors corresponding to the next frame image are calculated based on the displacement vectors of the remaining blocks in the previous frame image and the displacement vectors of the remaining blocks in the next frame image, respectively. Based on the product of these two displacement vectors, it is determined whether the motion directions corresponding to the previous frame image and the next frame image meet the set consistency conditions. By iteratively executing the above steps, it is determined whether the motion directions corresponding to the consecutive multi-frame images meet the set consistency conditions.

8. The method of claim 7, wherein adjusting the jitter degree characterization value of at least some frames in the multi-frame images specifically includes: The jitter level representation values ​​of the first and last frames in the multi-frame image are retained, while the jitter level representation values ​​of the other intermediate frames are adjusted to minimize the jitter level they represent. The step of determining whether the video to be detected is jittery based on the jitter level characterization value of the images in the frame image set specifically includes: Based on the cumulative value of the jitter level characterization value of the images in the frame image set, it is determined whether the video to be detected is jittery.

9. A video jitter detection device, comprising: The image acquisition module acquires multiple sets of two adjacent frames from the set of frame images contained in the video to be detected; The image segmentation module segments the two adjacent frames into multiple image blocks. The local detection module calculates the displacement vector of the image patch and identifies abnormal blocks among the plurality of image patches based on the displacement vector; The local detection module determines the jitter level characterization value of the two adjacent images based on the remaining blocks other than the abnormal blocks. Specifically, this includes: for all remaining blocks in any frame of the two adjacent images; counting the motion directions corresponding to all remaining blocks; calculating the entropy value of any frame based on the corresponding motion directions, where the entropy value reflects the degree of disorder in the motion direction of any frame; if the entropy value is greater than a set threshold, the jitter level characterization value of any frame is reduced accordingly to prevent the movement of a large object in a complex motion shot from being misidentified as jitter. The jitter adjustment module determines whether the motion directions corresponding to multiple consecutive frames in the frame image set meet the set consistency conditions. If so, it adjusts the jitter degree characterization value of at least some of the frames in the multiple frames to reduce the jitter degree it represents. The overall detection module, after processing the results of the judgment accordingly, determines whether the video to be detected is jittery based on the jitter degree characterization value of the images in the frame image set.

10. The apparatus of claim 9, wherein the local detection module calculates for the image block a positive displacement vector from the previous frame to the next frame in the two adjacent frames, and a negative displacement vector from the next frame to the previous frame. Based on the degree of difference between the positive displacement vector and the negative displacement vector, it is determined whether the image block is an abnormal block.

11. The apparatus of claim 9, wherein the local detection module calculates the mean and variance of the sum of squares of the displacement vectors of each image block in the same frame of the image, based on the displacement vector; If both the mean and the variance are zero, then each of the image blocks is identified as an anomalous block.

12. The apparatus of claim 9, wherein the local detection module determines whether the dispersion of the sum of squares of the displacement vectors is greater than a set threshold; If so, at least some of the image blocks that cause the dispersion to increase are identified as abnormal blocks, so that the dispersion of the remaining blocks is no longer greater than a set threshold. in, The degree of dispersion is positively correlated with the standard deviation or variance of the sum of squares of the displacement vectors, and negatively correlated with the mean of the sum of squares of the displacement vectors.

13. The apparatus of claim 9, wherein the local detection module, after determining the jitter degree characterization values ​​of the two adjacent frames respectively, determines a frame switching reference threshold based on the number of the plurality of image blocks; Determine whether the jitter level characterization values ​​of the two adjacent frames are greater than the screen switching reference threshold. If so, then calculate the similarity between the two adjacent images; If the similarity is less than a set threshold, the jitter level representation value is adjusted to reduce the jitter level it represents.

14. The apparatus of claim 13, wherein the local detection module calculates the similarity between the two adjacent frames of images, specifically comprising: If the two adjacent images are two consecutive frames in the video to be detected, the local detection module calculates the dhash value of the two adjacent images respectively. The similarity between the two adjacent images is calculated based on the Hamming distance between their respective dhash values.

15. The apparatus of claim 9, wherein the jitter adjustment module, for multiple consecutive frames in the frame image set, calculates the sum of displacement vectors corresponding to the previous frame image and the sum of displacement vectors corresponding to the next frame image, respectively, based on the displacement vectors of the remaining blocks of the previous frame image and the displacement vectors of the remaining blocks of the next frame image; Based on the product of these two displacement vectors, it is determined whether the motion directions corresponding to the previous frame image and the next frame image meet the set consistency conditions. By iteratively executing the above steps, it is determined whether the motion directions corresponding to the consecutive multi-frame images meet the set consistency conditions.

16. The apparatus of claim 15, wherein the jitter adjustment module retains the jitter degree characterization values ​​of the first frame and the last frame in the multi-frame images, and adjusts the jitter degree characterization values ​​of the other intermediate frame images to minimize the jitter degree they characterize. The step of determining whether the video to be detected is jittery based on the jitter level characterization value of the images in the frame image set specifically includes: Based on the cumulative value of the jitter level characterization value of the images in the frame image set, it is determined whether the video to be detected is jittery.

17. A video jitter detection device, comprising: At least one processor; as well as, A memory communicatively connected to the at least one processor; wherein, The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to: Obtain multiple sets of two adjacent frames from the set of frame images contained in the video to be detected; The two adjacent images are each segmented into blocks to obtain multiple image blocks; Calculate the displacement vector of the image patch, and identify abnormal blocks among the plurality of image patches based on the displacement vector; Based on the remaining blocks excluding the abnormal blocks, determine the jitter level characterization values ​​for the two adjacent frames, specifically including: for all remaining blocks in any frame of the two adjacent frames; count the motion directions corresponding to each of the remaining blocks; calculate the entropy value of any frame based on the corresponding motion directions, the entropy value reflecting the degree of chaos in the motion direction in any frame; if the entropy value is greater than a set threshold, then the jitter level characterization value of any frame is reduced accordingly to prevent the motion of a large object in a complex moving shot from being misidentified as jitter; Determine whether the motion directions corresponding to consecutive frames in the frame image set meet the set consistency conditions. If so, adjust the jitter degree characterization value of at least some frames in the frame images to reduce the jitter degree it represents. After processing the results of the judgment accordingly, the video to be detected is determined to be jittery based on the jitter level characterization value of the images in the frame image set.