A dynamic background target detection method and system
By extracting deformed background pixels under dynamic backgrounds and performing spatial smoothing and phase labeling, a candidate region self-aligned field sequence is generated, which solves the problem of the target being incorporated into the background main cluster and achieves accurate target detection and high recall.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG LANJIAN DEFENSE TECH CO LTD
- Filing Date
- 2026-05-11
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244425A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image data processing technology, and more specifically, to a dynamic background target detection method and system. Background Technology
[0002] Target detection in dynamic backgrounds is widely used in various fields such as maritime monitoring, low-altitude observation, and waterborne operation monitoring. In such scenarios, there are often situations where the motion characteristics of the target and the dynamic background are highly similar. For example, in a water surface environment, a small boat moves synchronously with the waves; in a low-altitude scenario, a drone slowly moves near the clouds; and in a water flow area, floating objects move in the same direction as the water flow. The target motion state and the dynamic changes of the background in these scenarios show a strong correlation, which brings challenges to the implementation of target detection technology.
[0003] In the target detection process described above, relevant techniques typically employ optical flow analysis to distinguish between the background and foreground. This involves extracting optical flow vectors from the image sequence and analyzing them using clustering or robust fitting algorithms to estimate the main motion trend of the background. Regions deviating from this main motion trend are then identified as foreground targets, thus achieving target extraction and detection. To improve the stability of the detection results, these techniques often include a background consistency constraint step to further confirm regions identified as part of the main motion trend of the background, ensuring the integrity of the background region and reducing the likelihood of misidentifying the background as foreground.
[0004] However, in practical applications, the above-mentioned technical solutions may exhibit detection bias when the motion features of the target and background meet specific combination conditions. When the target's motion direction aligns with the main direction of the background's dynamic changes, and the magnitude of the target's velocity falls within the variance range of the background's main motion optical flow cluster, while the target's area is relatively small, the cluster centers of the clustering algorithm or the fitting results of the robust fitting algorithm will be dominated by the dominant number of background pixels. In this case, the optical flow vector generated by the target cannot form an independent cluster; instead, it will be merged into the background's main motion optical flow cluster. In the subsequent background consistency constraint step, because the optical flow features of the target region are included in the background's main motion range, this region will be treated as an in-ground point and thus removed from the detection results, resulting in the target being unable to be effectively detected.
[0005] This detection bias directly affects the overall performance of object detection. Specifically, it manifests as a sharp drop in recall at certain velocity ranges, with this drop clearly aligned with the main peak of the background dynamic velocity spectrum. It's important to note that this problem is specific, occurring only under the combined conditions that the target motion falls within the statistical range of the background's main motion optical flow clusters, and the target's proportion in the entire image region is insufficient to shift the cluster centers or robust fitting results. This issue does not occur in all dynamic background object detection scenarios, making it more difficult to identify and resolve in the practical application of related technologies.
[0006] In view of this, the present invention proposes a dynamic background target detection method and system to solve the above problems. Summary of the Invention
[0007] To overcome the aforementioned deficiencies of the prior art and achieve the above objectives, the present invention provides the following technical solution: a dynamic background target detection method, comprising:
[0008] Acquire a sequence of frames; calculate the displacement direction and amplitude from adjacent frames to obtain the optical flow vector field sequence;
[0009] Deformed background pixels are extracted from the optical flow vector field sequence, and spatial smoothing is performed on the optical flow vector field sequence using the deformed background pixels as constraints to obtain the background deformation reference field sequence.
[0010] Based on the deformed background pixel and background deformed reference field sequence, a phase marker map sequence is calculated; under the constraint of the phase marker map sequence, a micro-deviation residual map sequence is calculated based on the optical flow vector field sequence and the background deformed reference field sequence; the micro-deviation residual map sequence is filtered to obtain a weak deviation candidate region map.
[0011] The frame sequence is registered with the background deformation reference field sequence to obtain the compensation frame sequence; the candidate region self-alignment field sequence is generated by combining the weak deviation candidate region map, phase marker map sequence and micro deviation residual map sequence; the weak deviation candidate region map of the compensation frame sequence is registered with the candidate region self-alignment field sequence to obtain the target support evidence map and the target candidate table.
[0012] An exclusion weight is set for each candidate region in the target candidate table on the compensation frame sequence, and the candidate regions are removed during background processing according to the exclusion weight; the target box sequence is extracted in the compensation frame sequence based on the target support evidence map and the candidate region self-aligned field sequence, and the target box sequence is mapped to the frame sequence to obtain the target detection result.
[0013] Furthermore, adjacent frame pairs are formed in the frame sequence according to the acquisition order. The two adjacent frames in the frame sequence are taken as a group. For each group of adjacent frame pairs, multiple local neighborhoods are selected in the reference frame with a preset fixed step size.
[0014] Define the local neighborhood center position, which is the geometric center pixel coordinate of the local neighborhood in the reference frame pixel coordinate. The local neighborhood center positions are uniformly distributed within the effective imaging area of the reference frame according to a fixed step size. Uniform distribution means that the pixel coordinate spacing between adjacent local neighborhood center positions in the horizontal and vertical directions is equal to the fixed step size.
[0015] Furthermore, taking the optical flow vector field corresponding to each frame in the optical flow vector field sequence as the processing object, at the center position of each local neighborhood of the optical flow vector field, the displacement direction and amplitude of the center position of the local neighborhood are taken as the center vector, and then a consistent neighborhood is set around the center position of the local neighborhood.
[0016] Furthermore, the displacement direction and magnitude at the center position of each local neighborhood within the uniform neighborhood are compared one by one. During the comparison, the preset upper limit of the second direction difference and the upper limit of the second magnitude difference are used to constrain the consistency relationship. Within the uniform neighborhood, only the positions that fall on the grid coordinate set of the center position of the local neighborhood are selected to participate in the comparison.
[0017] The criteria for determining consistency are that the direction difference does not exceed the upper limit of the second direction difference and the amplitude difference does not exceed the upper limit of the second amplitude difference;
[0018] The proportion of neighborhoods that satisfy the consistency relationship is defined as the motion consistency degree, and the positions with a motion consistency degree lower than the motion consistency threshold are marked as deformed background pixels.
[0019] Furthermore, phase fluctuation culling constraints are added to the deformed background pixels;
[0020] Phase fluctuation elimination constraints are based on phase marker map sequences and phase windows. Within a phase window, the number of phase marker value switching times of the same local neighborhood center position in the phase marker map sequence is counted. The phase window is a continuous multi-frame interval in the optical flow vector field sequence.
[0021] Set a switching count threshold. For positions where the phase marker value switching count exceeds the switching count threshold, remove that position from the deformed background pixels.
[0022] Furthermore, residual elimination constraints are added to the deformed background pixels;
[0023] The residual elimination constraint is based on the micro-deviation residual map sequence and phase window. Within the phase window, the number of consecutive frames at the center position of the same local neighborhood that satisfy the condition that the magnitude residual exceeds the preset residual magnitude threshold or the direction residual exceeds the preset residual direction threshold is counted.
[0024] For positions where the number of consecutive frames reaches a preset threshold, remove those positions from the deformed background pixels.
[0025] Furthermore, candidate region culling constraints are added to deformed background pixels within the same frame;
[0026] The candidate region elimination constraint is based on the weakly deviated candidate region map, which represents the candidate region coverage location under a coordinate index consistent with the optical flow vector field sequence;
[0027] For a deformed background pixel in a frame, if its coordinates fall within the candidate area coverage position marked by the weak deviation candidate area map of that frame, then the position corresponding to that coordinate is removed from the deformed background pixels.
[0028] Furthermore, the background processing includes the process of calculating motion consistency on the optical flow vector field sequence to mark deformed background pixels, the process of extracting deformed background pixels on the optical flow vector field sequence, the process of calculating a micro-deviation residual map sequence based on the optical flow vector field sequence and the background deformation reference field sequence, and the process of statistically analyzing the background amplitude sample set according to the phase window.
[0029] Furthermore, for each candidate region, the elimination distance is determined based on its exclusion weight;
[0030] The elimination coverage area is formed by extending the elimination distance outward from the candidate area boundary;
[0031] For locations within the exclusion coverage area, a uniform location masking process is performed, which involves marking the location as a preset null value.
[0032] A dynamic background target detection system, comprising:
[0033] The sequence acquisition module is used to acquire frame sequences; the displacement direction and amplitude are calculated from adjacent frames to obtain the optical flow vector field sequence;
[0034] The background extraction module is used to extract deformed background pixels from the optical flow vector field sequence, and to perform spatial smoothing on the optical flow vector field sequence with the deformed background pixels as constraints to obtain the background deformation reference field sequence.
[0035] The sequence filtering module calculates a phase marker map sequence based on the deformed background pixel and background deformation reference field sequence; under the constraint of the phase marker map sequence, it calculates a micro-deviation residual map sequence based on the optical flow vector field sequence and the background deformation reference field sequence; and filters the micro-deviation residual map sequence to obtain a weak deviation candidate region map.
[0036] The sequence registration module is used to register the frame sequence with the background deformation reference field sequence to obtain the compensation frame sequence; combine the weak deviation candidate area map, phase marker map sequence and micro deviation residual map sequence to generate the candidate area self-aligned field sequence; and use the candidate area self-aligned field sequence to perform corresponding region registration on the weak deviation candidate area map of the compensation frame sequence to obtain the target support evidence map and the target candidate table.
[0037] The target extraction module is used to set exclusion weights for each candidate region in the target candidate table on the compensation frame sequence, and to remove candidate regions during background processing according to the exclusion weights; based on the target support evidence map and the candidate region self-aligned field sequence, the target box sequence is extracted in the compensation frame sequence, and the target box sequence is mapped to the frame sequence to obtain the target detection result.
[0038] Compared with the prior art, the technical effects and advantages of the dynamic background target detection method and system of the present invention are as follows:
[0039] This invention first acquires a frame sequence and calculates the displacement direction and amplitude from adjacent frames to obtain an optical flow vector field sequence. Then, it extracts deformed background pixels from the optical flow vector field sequence and performs spatial smoothing on the optical flow vector field sequence using the deformed background pixels as constraints to obtain a background deformed reference field sequence. Next, it calculates a phase marker map sequence based on the deformed background pixels and the background deformed reference field sequence. Under the constraint of the phase marker map sequence, it combines the optical flow vector field sequence and the background deformed reference field sequence to obtain a micro-deviation residual map sequence. After screening, a weak deviation candidate region map is obtained. Subsequently, the frame sequence is registered using the background deformed reference field sequence to obtain a compensated frame sequence. Finally, a candidate region self-alignment field is generated by combining the weak deviation candidate region map, the phase marker map sequence, and the micro-deviation residual map sequence. After registering the corresponding regions of the weak deviation candidate region map in the compensation frame sequence, the target support evidence map and the target candidate table are obtained. Finally, an exclusion weight is set for each candidate region in the target candidate table. During background processing, the corresponding candidate regions are removed according to the exclusion weight. Then, the target box sequence is extracted in the compensation frame sequence based on the target support evidence map and the self-aligned field sequence of the candidate regions. After mapping the target box sequence back to the frame sequence, the final target detection result is obtained. In the whole process, multiple removal constraints are applied to deformed background pixels to optimize the extraction effect. The calculation of micro-deviation residuals is optimized by phase window and residual suppression coefficient. The spatial reference of each sequence is unified by relying on coordinate translation. Position masking processing is performed around the candidate region by setting the removal distance.
[0040] This invention effectively solves the problem that when the target's motion direction is consistent with the background's main dynamic direction, the velocity amplitude falls within the variance range of the background's main motion optical flow cluster, and the target area is small, the optical flow vector clustering or robust fitting is dominated by the background pixels, causing the target's optical flow to be merged into the background's main cluster and erased as an in-background point in the background consistency constraint. At the same time, it solves the problem that the target detection recall rate drops sharply in a specific velocity range.
[0041] This invention constructs a background deformation reference field using deformed background pixels as the core constraint, achieving accurate characterization of dynamic background motion features. Combined with phase markers, it completes fine differentiation of local quasi-periodic states of the dynamic background, accurately capturing subtle deviation motion differences between the target and the background. The construction of a candidate region self-alignment field provides an independent registration basis for weak deviation candidate regions. By eliminating the setting of weights and corresponding position masking processing, the dominant role of the background pixel quantity advantage in the background processing process is effectively weakened. Each processing sequence maintains a unified spatial coordinate index, ensuring that weak deviation motion clues of the target are effectively preserved in each processing stage. This achieves effective target detection in scenarios where the dynamic background and target motion features are highly similar, and ensures that the detection results maintain accurate positional representation in the original observation frame. At the same time, the relevant parameters of each processing step can be flexibly selected according to the frame sequence attributes and the actual characteristics of the dynamic background, adapting to different dynamic background target detection application scenarios. Attached Figure Description
[0042] Figure 1 This is a schematic diagram of a dynamic background target detection system according to an embodiment of the present invention;
[0043] Figure 2 This is a flowchart of a dynamic background target detection method according to an embodiment of the present invention. Detailed Implementation
[0044] The technical solutions of the embodiments of the present invention will be described in detail, clearly, and completely below with reference to the accompanying drawings. It should be particularly noted that the specific embodiments described below are only for better illustrating and explaining the technical solutions of the present invention, and are intended to enable those skilled in the art to better understand and implement the present invention, and should not be construed as limiting the scope of protection of the present invention. Without departing from the spirit and substance of the present invention, those skilled in the art can modify, adjust, or make equivalent substitutions based on the content disclosed in the present invention, and these should all be considered within the scope of protection of the present invention.
[0045] Example 1:
[0046] Please see Figure 1 As shown in the figure, this embodiment discloses a dynamic background target detection system, including a sequence acquisition module, a background extraction module, a sequence filtering module, a sequence registration module, and a target extraction module. Each module is connected by wired or wireless means to realize data transmission.
[0047] To eliminate the cyclic dependency of the elimination constraint on the background deformation reference field sequence, this embodiment can adopt a two-stage or multi-round iteration: first, obtain the initial deformed background pixels based on motion consistency and construct the initial background deformation reference field sequence; then generate the phase marker map sequence and the micro-deviation residual map sequence and perform phase fluctuation elimination, residual continuous elimination and candidate region elimination to obtain the updated deformed background pixel set; then recalculate the background deformation reference field sequence with the updated set until the change ratio of the deformed background pixel set in the adjacent two rounds is lower than the convergence threshold or reaches the preset number of rounds.
[0048] The sequence acquisition module is used to acquire frame sequences; the displacement direction and amplitude are calculated from adjacent frames to obtain the optical flow vector field sequence.
[0049] A frame sequence is continuously acquired using a camera device at sampling intervals. The sampling interval is used to limit the time interval between adjacent frames in the frame sequence, ensuring that the positional changes of the target contour in the image are comparable between adjacent frames. The preferred range for the sampling interval is 10 milliseconds to 200 milliseconds. The selection rules are as follows: when the frequency of dynamic background changes is high, 10 to 50 milliseconds is used to ensure that the difference in dynamic background morphology between adjacent frames does not exceed the range that can be recognized by the local neighborhood; when the observation distance is far and the target occupies a small proportion in a single frame, 10 to 80 milliseconds is used to ensure that the target retains corresponding contour fragments in adjacent frames; when the lighting conditions are weak and the noise ratio of the frame sequence increases, 50 to 200 milliseconds is used to ensure that the effective texture information density of the frame sequence meets the needs of subsequent displacement direction and amplitude calculations. This sub-step constrains the scale of difference between adjacent frames in the frame sequence by the sampling interval, ensuring that even when the motion characteristics of the target and the dynamic background are highly similar, discernible temporal change clues are still retained, reducing the risk of detection bias caused by the mismatch of the scale of difference between adjacent frames.
[0050] The frame sequence is formed into adjacent frame pairs according to the acquisition order. An adjacent frame pair is formed by taking two adjacent frames in the frame sequence as a group, with the previous frame as the reference frame and the next frame as the comparison frame, so that the time interval corresponding to each group of adjacent frame pairs is consistent with the sampling interval, ensuring that the calculation of displacement direction and amplitude has a consistent time scale.
[0051] For each pair of adjacent frames, multiple local neighborhoods are selected in the reference frame with a fixed step size. The fixed step size is used to determine the distance between the center positions of adjacent local neighborhoods. The preferred range of the fixed step size is 1 to 4 sampling points. The selection rule is that when the frame sequence resolution is higher and the displacement amplitude of the target in the adjacent frame pair is smaller, the fixed step size is 1 to 2 sampling points; when the frame sequence noise ratio is higher or the computational resource constraints are stronger, the fixed step size is 2 to 4 sampling points. The sampling points and pixel coordinates use the same semantics. The sampling points correspond to the pixel coordinate grid points in the reference frame. The neighborhood range described by the extension of several sampling points is all performed in pixel coordinates, thereby ensuring that the neighborhood extension distance is consistent with the image spatial distance.
[0052] The local neighborhood center position is the geometric center pixel coordinate of the local neighborhood within the reference frame's pixel coordinates. These local neighborhood centers are evenly distributed within the effective imaging area of the reference frame with a fixed step size. Even distribution means that the pixel coordinate spacing between adjacent local neighborhood centers in both the horizontal and vertical directions is equal to the fixed step size. To prevent local neighborhoods from exceeding the reference frame boundary, the selected area for the local neighborhood center position is the inner region of the reference frame after removing half the local neighborhood's side length, ensuring that the local neighborhood corresponding to each local neighborhood center position completely falls within the effective imaging area of the reference frame. Positions in the reference frame that fall within the coverage area of a preset null value marker are not selected as local neighborhood center positions, thus preventing pixels without a source from entering the calculation support area for displacement direction and amplitude.
[0053] The local neighborhood is used to define the calculation support area for the displacement direction and amplitude. The local neighborhood adopts a square region, and the side length of the local neighborhood is measured in sampling points. The preferred range of the local neighborhood is a side length of 8 to 64 sampling points. The value of the local neighborhood side length is set in correspondence with the target edge detail density and the dynamic background texture density. The target edge detail density is represented by the average gradient magnitude of the texture gradient map sequence around the center position of the local neighborhood in the reference frame. The dynamic background texture density is represented by the proportion of positions with gradient magnitudes higher than the boundary sharpness threshold around the same position.
[0054] When the average gradient magnitude of the texture gradient map sequence around the center of the local neighborhood in the reference frame is less than 0.8 times the average gradient magnitude of the reference frame, the side length of the local neighborhood is taken as 48 to 64 sampling points, so that the local neighborhood covers a larger range and provides more sufficient corresponding constraints; when the average gradient magnitude of the texture gradient map sequence around the center of the local neighborhood in the reference frame is between 0.8 and 1.2 times the average gradient magnitude of the reference frame, the side length of the local neighborhood is taken as 24 to 48 sampling points; when the average gradient magnitude of the texture gradient map sequence around the center of the local neighborhood in the reference frame is greater than 1.2 times the average gradient magnitude of the reference frame, the side length of the local neighborhood is taken as 8 to 24 sampling points. On the other hand, when the proportion of locations with gradient magnitudes higher than the boundary sharpness threshold at the same position is greater than 0.4, the local neighborhood side length is set to 8 to 24 sampling points to reduce aliasing caused by dynamic background morphology changes entering the same local neighborhood; when this proportion is between 0.2 and 0.4, the local neighborhood side length is set to 24 to 48 sampling points; when this proportion is less than 0.2, the local neighborhood side length is set to 48 to 64 sampling points. By constructing adjacent frame pairs and local neighborhoods, and with consistent definitions of fixed step size, sampling point coordinate semantics, and local neighborhood center position semantics, the calculation of displacement direction and magnitude is limited to a controllable spatial support range, reducing the risk of subsequent threshold distance distortion, and ensuring that the spatial basis for forming difference cues is still preserved when the target is covered by the background main motion optical flow cluster.
[0055] For each pair of adjacent frames, a local neighborhood correspondence search is performed to obtain candidate displacements for the local neighborhood. For each local neighborhood in the reference frame, a displacement range is set in the comparison frame based on the center position of that local neighborhood. The displacement range is used to limit the set of possible corresponding positions in the comparison frame, reducing erroneous correspondences caused by similar textures in the dynamic background during large-scale searches. The preferred displacement range is 2 to 20 sampling points extended outward from the center position of the reference frame. The selection rule is that when the sampling interval is small and the displacement amplitude of the target in the adjacent frame pair is small, the displacement range is 2 to 8 sampling points; when the sampling interval is large or the displacement amplitude of the target in the adjacent frame pair is large, the displacement range is 8 to 20 sampling points.
[0056] Within the displacement range, the texture similarity between each candidate position and its local neighborhood in the reference frame is calculated. Texture similarity is calculated based on the grayscale values of the local neighborhood after grayscale processing, which reduces the impact of illumination changes on the comparison results. To reduce errors caused by local brightness bias, zero-mean processing and amplitude normalization are performed on the local neighborhoods of the reference frame and the candidate positions in the comparison frame, respectively, so that the grayscale values of both are based on their own mean and limited to a consistent amplitude range. Then, the degree of similarity between the two grayscale values at their corresponding pixel coordinates is calculated, and this similarity is mapped to 0 to 1 as the quantification result of texture similarity. The closer the texture similarity is to 1, the closer the textures are. A similarity threshold is used to constrain the lower limit of texture similarity. The preferred range for the similarity threshold is 0.6 to 0.9. The selection rule is that when the dynamic background texture is dense, the similarity threshold is 0.8 to 0.9; when the frame sequence noise accounts for a higher proportion, the similarity threshold is 0.6 to 0.8.
[0057] Within the displacement range, candidate positions with the highest texture similarity above the similarity threshold are selected as corresponding positions, thus obtaining displacement candidates for this local neighborhood. When the texture similarity of all candidate positions within the displacement range is below the similarity threshold, the displacement candidates of this local neighborhood in the adjacent frame pair are marked with a preset null value. At the same time, this local neighborhood is recorded as a local neighborhood to be expanded. In the subsequent processing of adjacent frame pairs, the corresponding position search is re-executed for this local neighborhood using the expanded displacement range. The expanded displacement range is extended outward by 2 sampling points to 10 sampling points based on the original displacement range. The selection rule is that when the original displacement range is 2 to 8 sampling points, it is further expanded by 2 sampling points to 6 sampling points; when the original displacement range is 8 to 20 sampling points, it is further expanded by 6 sampling points to 10 sampling points. By establishing the correspondence between adjacent frame pairs through displacement range, grayscale processing, zero-mean processing, amplitude normalization processing, and similarity threshold constraints, when the target and the dynamic background move in the same direction and have similar velocity amplitudes, repeatable displacement candidates can still be obtained in the local neighborhood, reducing the probability that displacement candidates are indistinguishable due to being dominated by similar textures in the dynamic background.
[0058] Displacement direction and magnitude are calculated based on displacement candidates. For each local neighborhood, the direction of the line connecting the center position of the local neighborhood in the reference frame and the corresponding position in the comparison frame is defined as the displacement direction, and the distance between them is defined as the magnitude. The displacement direction is represented by an angle, and the magnitude is represented by a sampling point. The sampling point and pixel coordinates use the same semantics, so that the displacement direction and magnitude have consistent spatial meaning under the coordinate index of the frame sequence. To avoid discrete jumps in displacement direction and magnitude caused by dynamic background morphology changes, consistency constraints are applied to the displacement direction and magnitude obtained for the same local neighborhood in multiple consecutive adjacent frame pairs. For the displacement candidate of the current adjacent frame pair, the direction difference and magnitude difference are calculated with the displacement candidate of the same local neighborhood of the previous adjacent frame pair. When the direction difference exceeds the first upper limit of the direction difference or the magnitude difference exceeds the first upper limit of the magnitude difference, the displacement direction and magnitude of the current adjacent frame pair are not directly adopted. Instead, an alternative sample set is selected within the vicinity of the center position of the local neighborhood. The alternative sample set consists of the displacement direction and magnitude obtained from other local neighborhoods within the current adjacent frame pair. The neighborhood is defined as the region extending outward from the center of the local neighborhood by 8 to 24 sampling points. The selection rule is as follows: when the step size is 1 to 2 sampling points, the neighborhood is 8 to 16 sampling points; when the step size is 2 to 4 sampling points, the neighborhood is 16 to 24 sampling points. Samples corresponding to preset null values are first removed from the alternative sample set. Then, samples whose direction difference does not exceed the upper limit of the first direction difference and whose magnitude difference does not exceed the upper limit of the first magnitude difference, and whose displacement candidate for the local neighborhood is consistent with that in the previous adjacent frame, are selected as the consistent sample set.
[0059] When the number of samples in the consistent sample set is not less than 0.6 times the number of valid samples in the alternative sample set, the amplitude corresponding to the sample with the middle amplitude in the consistent sample set is used as the alternative amplitude, and the displacement direction corresponding to the sample with the middle displacement angular distance in the consistent sample set is used as the alternative displacement direction. The displacement direction and amplitude of the current adjacent frame for this local neighborhood are replaced with the alternative displacement direction and the alternative amplitude. When the number of samples in the consistent sample set is less than 0.6 times the number of valid samples in the alternative sample set, the displacement direction and amplitude of the current adjacent frame for this local neighborhood are marked as preset null values, and are processed according to the preset null values in subsequent processing based on the optical flow vector field sequence. The preferred range for the upper limit of the first directional difference is 10 degrees to 45 degrees, and the preferred range for the upper limit of the first amplitude difference is 1 to 8 sampling points. The selection rules are as follows: when the first sampling interval is 10 milliseconds to 50 milliseconds, the upper limit of the first directional difference is 10 degrees to 25 degrees and the upper limit of the first amplitude difference is 1 to 4 sampling points; when the first sampling interval is 50 milliseconds to 200 milliseconds, the upper limit of the first directional difference is 25 degrees to 45 degrees and the upper limit of the first amplitude difference is 4 to 8 sampling points. This processing section replaces discrete and abrupt displacement candidates with neighboring consistent samples that satisfy the consistency relationship through the calculation of displacement direction and amplitude and consistency constraint processing. This reduces the impact of accidental responses caused by local deformation of the dynamic background on the displacement direction and amplitude, and the displacement direction and amplitude still retain a usable continuity basis when the target and the dynamic background motion characteristics are similar.
[0060] The displacement direction and magnitude are arranged in the order of adjacent frame pairs to form an optical flow vector field sequence. For each pair of adjacent frames, the displacement direction and magnitude of the local neighborhood are recorded at the center position of each local neighborhood in the reference frame, thus obtaining the optical flow vector field corresponding to that pair of adjacent frames. The optical flow vector field is constructed using a grid sampling method, with a fixed step size for the grid spacing. The fixed step size is consistent with the selection of the center position of the local neighborhood, so that the optical flow vector field only has displacement direction and magnitude at the center position of the local neighborhood, and the displacement direction and magnitude are not written at other pixel coordinate positions. Multiple optical flow vector fields are arranged in the order of adjacent frame pairs in the frame sequence to obtain the optical flow vector field sequence.
[0061] To ensure a consistent spatial reference between the optical flow vector field sequence and the frame sequence, each optical flow vector field in the sequence uses the same coordinate indexing method as the reference frame. The coordinates of the local neighborhood center correspond one-to-one with the image coordinates of the frame sequence, forming a consistent grid coordinate set under the frame sequence coordinate index. Subsequent motion consistency calculations, spatial smoothing, phase marking, micro-deviation residual calculations, and weak deviation candidate region screening are all performed on this grid coordinate set. Distance thresholds involving the expansion of several sampling points are all based on sampling points. Sampling points and pixel coordinates use the same semantics, with sampling points corresponding to pixel grid points in the image coordinates. This ensures that the distance thresholds have a consistent spatial meaning in each step, without introducing unit switching between grid coordinates and pixel coordinates. By organizing the displacement direction and amplitude using a grid sampling method with a fixed step size, an optical flow vector field sequence oriented towards dynamic background motion characteristics is formed. This allows the target to retain expressible motion cues even when its motion is consistent with the main background motion and its velocity amplitude is similar, reducing the risk of missed detection due to inconsistent spatial references.
[0062] The background extraction module is used to extract deformed background pixels from the optical flow vector field sequence, and to perform spatial smoothing on the optical flow vector field sequence using the deformed background pixels as constraints to obtain a background deformation reference field sequence.
[0063] The extraction benchmark for deformed background pixels is determined within the optical flow vector field sequence. Taking the optical flow vector field corresponding to each frame in the sequence as the processing object, at the center position of each local neighborhood of the optical flow vector field, the displacement direction and magnitude of that position are taken as the center vector. A consistent neighborhood is then set around this position, with the preferred range being 2 to 12 sampling points extending outward from the center position. Within the consistent neighborhood, only positions falling on the grid coordinate set of the local neighborhood center position are selected for comparison, ensuring that the statistical caliber of motion consistency is consistent with the grid sampling caliber of the optical flow vector field.
[0064] The displacement direction and amplitude at the center position of each local neighborhood within the consistent neighborhood are compared one by one. The consistency relationship is constrained by the upper limit of the second direction difference and the upper limit of the second amplitude difference. The criteria for determining the consistency relationship are that the direction difference does not exceed the upper limit of the second direction difference and the amplitude difference does not exceed the upper limit of the second amplitude difference. The preferred range for the upper limit of the second direction difference is 10 degrees to 60 degrees, and the preferred range for the upper limit of the second amplitude difference is 1 to 10 sampling points. The selection rule is that when the dynamic background texture is denser, the upper limit of the second direction difference is 10 degrees to 30 degrees and the upper limit of the second amplitude difference is 1 to 5 sampling points; when the dynamic background texture is sparser, the upper limit of the second direction difference is 30 degrees to 60 degrees and the upper limit of the second amplitude difference is 5 to 10 sampling points. The proportion of neighborhoods that satisfy the consistency relationship is defined as the motion consistency degree. Positions with a motion consistency degree lower than the motion consistency threshold are marked as deformed background pixels. The preferred range of the motion consistency threshold is 0.2 to 0.6. The selection rule is that when the dynamic background shape changes more frequently, the motion consistency threshold is 0.4 to 0.6, and when the dynamic background shape changes more simply, the motion consistency threshold is 0.2 to 0.4.
[0065] To reduce the probability of background pixel set shifts caused by targets or occlusion boundaries in regions with low motion consistency, a phase fluctuation culling constraint is added to the deformed background pixels. The phase fluctuation culling constraint is based on a phase marker map sequence and a phase window. The phase window is a continuous multi-frame interval in the optical flow vector field sequence, corresponding to a continuous time period in the frame sequence. The length of the phase window is measured in frames, and the start and end frames covered by the phase window are determined according to the chronological order of the frame sequence. The phase window is set using a sliding method. When processing a frame in the optical flow vector field sequence, that frame is taken as the end frame of the phase window, and consecutive frames are taken forward until the length of the phase window is met, forming the phase window corresponding to that frame. When processing the next frame, the phase window slides forward and updates according to the chronological order of the frame sequence, ensuring that the phase window always covers the continuous multi-frame interval preceding the currently processed frame. The preferred range for the phase window is 5 to 120 frames. The selection rule is that when the sampling interval is smaller and the background amplitude of the background deformation reference field sequence changes more frequently between adjacent frames, the phase window is 5 to 30 frames. When the sampling interval is larger or the background amplitude of the background deformation reference field sequence changes more gently, the phase window is 30 to 120 frames.
[0066] Within the phase window, the number of phase marker value transitions at the center position of the same local neighborhood in the phase marker map sequence is counted. The number of phase marker value transitions characterizes the number of times the phase marker value at that position changes between adjacent frames. A transition threshold is set, with an optimal range of 2 to 10 transitions. The selection rule is that when the phase window is 5 to 30 frames, the transition threshold is 2 to 4 transitions; when the phase window is 30 to 120 frames, the transition threshold is 4 to 10 transitions. For positions where the number of phase marker value transitions exceeds the transition threshold, these positions are removed from the deformed background pixels. This prevents these positions from participating in the spatial smoothing sampling constrained by deformed background pixels, thereby reducing the pull of phase fluctuations introduced by the target or occlusion boundary in the temporal dimension on the deformed background pixel set.
[0067] Before setting the residual amplitude threshold and residual direction threshold, the dimensionality of the amplitude residual is first standardized. The amplitude residual adopts the sampling point difference dimensionality, which is the difference between the amplitude and the background amplitude, with the unit being sampling points. This ensures that the amplitude residual and the background velocity spectrum are in the same amplitude range, avoiding the incomparability between the threshold and the residual due to their different scales. The background velocity spectrum is obtained from the background deformation reference field sequence. The statistical time period is limited by a phase window. Within each frame covered by the phase window, non-preset null value markers are selected, and the background amplitude does not originate from the local neighborhood center position inherited from the same position in the previous frame. The background amplitudes at these positions are included in the background amplitude sample set. Then, the amplitude interval is divided according to the width of the first amplitude bin, and the cumulative proportion is calculated to obtain the background velocity spectrum. The upper bound of the amplitude interval corresponding to the quantile position is determined by the position where the cumulative proportion first reaches the corresponding quantile position on the background velocity spectrum, which serves as the amplitude representation of the quantile position.
[0068] The residual amplitude threshold is taken as the upper bound of the amplitude interval corresponding to the 0.05th to 0.3th quantile of the displacement amplitude in the background velocity spectrum. The selection rule is that when the amplitude distribution of the background amplitude sample set is more dispersed, the residual amplitude threshold is taken as the 0.15th to 0.3th quantile; when the amplitude distribution of the background amplitude sample set is more concentrated, the residual amplitude threshold is taken as the 0.05th to 0.15th quantile. The preferred range for the residual direction threshold is 5 degrees to 30 degrees. The selection rule is that when the residual amplitude threshold is taken as the 0.05th to 0.15th quantile, the residual direction threshold is taken as the 5th to 15th degree; when the residual amplitude threshold is taken as the 0.15th to 0.3th quantile, the residual direction threshold is taken as the 15th to 30th degree. In the micro-deviation residual map sequence, the directional residual uses an angular distance of 0 to 180 degrees, and the amplitude residual uses the sampling point difference. Positions where the amplitude residual exceeds the residual amplitude threshold and the directional residual exceeds the residual direction threshold are marked as weak deviation pixels.
[0069] A consecutive frame count threshold is set, with an optimal range of 2 to 10 frames. The selection rule is that when the phase window value is small, the consecutive frame count threshold is 2 to 4 frames; when the phase window value is large, the consecutive frame count threshold is 4 to 10 frames. For cases where weakly deviated pixels continuously appear at the center of the same local neighborhood within the consecutive frame count threshold range, this position is retained for the formation of the weak deviation candidate region map. Simultaneously, to reduce the probability that the target, exhibiting continuous deviation in the micro-deviation residual map sequence, still enters the deformed background pixel set, a residual continuous elimination constraint is added to the deformed background pixels. The residual continuous elimination constraint is based on the micro-deviation residual map sequence and phase window. Within the phase window, the number of consecutive frames at the center position of the same local neighborhood that satisfy the residual magnitude threshold or the residual direction threshold is counted. For positions where the number of consecutive frames reaches the consecutive frame number threshold, the position is removed from the deformed background pixels so that the position does not participate in the spatial smoothing sampling constrained by the deformed background pixels. This prevents the continuously deviating position from entering the background deformation reference field sequence as a background support, thereby causing a chain error of background reference field movement offset towards the target.
[0070] To reduce the probability of a target being mistakenly included in the deformed background pixel set, a candidate region culling constraint is added to deformed background pixels within the same frame. This constraint is based on a weak deviation candidate region map, which is the candidate region labeling result obtained from the micro-deviation residual map sequence. The weak deviation candidate region map represents the candidate region coverage position under a coordinate index consistent with the optical flow vector field sequence. For a deformed background pixel in that frame, if its coordinates fall within the candidate region coverage position marked by the weak deviation candidate region map of that frame, the corresponding position is removed from the deformed background pixels, ensuring that the candidate region coverage position does not participate in the spatial smoothing sampling constrained by deformed background pixels.
[0071] The interference of boundary regions on deformed background pixels is suppressed using texture gradient map sequences. Using the same frame corresponding to the optical flow vector field sequence as the index, the texture gradient map of that frame is retrieved. Boundary sharpness is calculated at the center of each local neighborhood. Boundary sharpness is characterized by the difference between the gradient magnitude at that location and the gradient magnitude distribution within its consistent neighborhood. Positions with boundary sharpness exceeding a boundary sharpness threshold are marked as excluded pixels. The preferred range for the boundary sharpness threshold is the 0.7th to 0.95th percentile of the gradient magnitude distribution in the texture gradient map. The selection rule is that when there are fewer target contour details, the boundary sharpness threshold is set to the 0.85th to 0.95th percentile; when there are more target contour details, the boundary sharpness threshold is set to the 0.7th to 0.85th percentile. Deformed background pixels are obtained by constraining motion consistency. Elimination constraints are performed by phase marker map sequence, micro-deviation residual map sequence and weak deviation candidate region map. Then, excluded pixels are obtained by texture gradient map sequence. This makes the set of deformed background pixels more concentrated in the shape change area of dynamic background, reducing the probability of the target entering the background support set when the target and dynamic background move in the same direction and have similar velocity amplitudes, and reducing the problem of the target being covered by the advantage of the number of background pixels.
[0072] A set of constraints for spatial smoothing is formed on the optical flow vector field sequence. For the optical flow vector field corresponding to each frame, deformed background pixels are first retained, and then excluded pixels are removed from the deformed background pixels to obtain smoothing support pixels. Smoothing support pixels are used to limit the participation positions of spatial smoothing, avoiding the introduction of displacement direction and magnitude at the excluded pixels in spatial smoothing, so that the sampling range of spatial smoothing is consistent with the boundary sharpness constraint.
[0073] To ensure the continuity of smooth support pixels' coverage in the image, the number of smooth support pixels within their consistent neighborhood is counted for each local neighborhood center. Locations with a number below a first support pixel count threshold are temporarily excluded from spatial smoothing sampling. The preferred range for the first support pixel count threshold is 3 to 12. The selection rule is as follows: when the consistent neighborhood expands outward by 2 to 6 sampling points, the first support pixel count threshold is 3 to 6; when the consistent neighborhood expands outward by 6 to 12 sampling points, the first support pixel count threshold is 6 to 12. To reduce the probability of missing background description due to breaks in the smooth support pixel set, a fallback sampling rule is set for locations temporarily excluded from smooth support pixel sampling. The fallback sampling rule employs a progressively expanding sampling range. First, the number of smoothed support pixels is counted within a first smoothing range, preferably expanded by 2 to 16 sampling points. If the number of smoothed support pixels is still lower than the first support number threshold, the count is repeated within a second smoothing range, preferably expanded by 16 to 32 sampling points. If the number of smoothed support pixels is still lower than the first support number threshold within the second smoothing range, the background displacement direction and amplitude at that location in the current frame are taken from the background displacement direction and amplitude of the same location in the previous frame. The background displacement direction and amplitude of the previous frame are taken from the background deformation reference field corresponding to the previous frame in the background deformation reference field sequence. A missing frame threshold is set, preferably ranging from 2 to 20 frames. The selection rule is that when the phase window value is small, the missing frame threshold is 2 to 6 frames; when the phase window value is large, the missing frame threshold is 6 to 20 frames. If the same position continuously meets the fallback sampling rule and the number of consecutive frames exceeds the missing frame threshold, the position is marked as a preset null value mark so that the position does not participate in coordinate translation sampling when registering the frame sequence with the background deformation reference field sequence in the future.
[0074] By establishing a spatial smoothing constraint set with smoothing support pixels, and combining it with the fallback sampling rule of progressively expanding the sampling range and inheriting the background displacement direction and background amplitude from the previous frame, spatial smoothing has a usable support source in the dynamic background shape change area. At the same time, it reduces the background description loss caused by the breakage of the support set, and reduces the probability of small target areas being absorbed into the background description by the smoothing process or being affected by the void in the background description, resulting in missed detection.
[0075] Spatial smoothing is performed on the optical flow vector field sequence with smoothing support pixels as constraints to obtain a single-frame result of the background deformation reference field sequence. For the optical flow vector field corresponding to each frame, a first smoothing range is set at the center position of each local neighborhood. The preferred range of the first smoothing range is to expand outward by 2 to 16 sampling points. The selection rule is to take 2 to 8 sampling points when the dynamic background shape changes are more fragmented, and to take 8 to 16 sampling points when the dynamic background shape changes are more continuous. Within the first smoothing range, only the displacement direction and amplitude corresponding to the smoothing support pixels are selected as smoothing samples. The values of the displacement direction and amplitude at the middle position are taken as the smoothing result to obtain the background displacement direction and background amplitude at that position. If the number of smoothed samples within the first smoothing range is lower than the second support number threshold, the first smoothing range is expanded to the second smoothing range and resampling is performed. The preferred range for the second smoothing range is 16 to 32 sampling points, and the preferred range for the second support number threshold is 5 to 20. The selection rule is to use a larger second smoothing range and a larger second support number threshold when the proportion of excluded pixels at the center of the local neighborhood is higher. This sub-step forms the background displacement direction and background amplitude under the constraint of smoothed support pixels. The background description comes more from the morphological change area of the dynamic background than the target area, thereby reducing the probability of the target motion being diluted by spatial smoothing when it falls into the main motion range of the background, and mitigating the source of missed detection caused by the target being incorporated into the main motion optical flow cluster of the background.
[0076] The background displacement direction and background amplitude of a single frame are organized into a background deformation reference field sequence. For each frame of the optical flow vector field sequence, the aforementioned spatial smoothing process is repeated. Under the same coordinate index as that frame, the background displacement direction and background amplitude obtained at the center position of each local neighborhood are written into the corresponding position, forming the background deformation reference field for that frame. The background deformation reference fields of each frame are arranged in chronological order according to the optical flow vector field sequence to obtain the background deformation reference field sequence. The coordinate index of each frame in the background deformation reference field sequence is consistent with the optical flow vector field sequence, ensuring that the same position has a consistent spatial meaning in different sequences. This sub-step represents the non-rigid morphological changes of the dynamic background in the form of a background deformation reference field sequence, allowing subsequent processing to use this sequence as a reference for background motion. This fundamentally reduces the dominance of the background pixel quantity advantage in motion description and lowers the probability of the target being swallowed up by the background description when the target and the dynamic background motion features are highly similar.
[0077] The sequence filtering module calculates a phase marker map sequence based on the deformed background pixel and background deformation reference field sequence; under the constraint of the phase marker map sequence, it calculates a micro-deviation residual map sequence based on the optical flow vector field sequence and the background deformation reference field sequence; and filters the micro-deviation residual map sequence to obtain a weak deviation candidate region map.
[0078] A phase window is set based on the deformed background pixels and the background deformed reference field sequence to limit the time support range of the phase marker map sequence. The phase window takes multiple consecutive frames in the optical flow vector field sequence, corresponding to consecutive time periods in the frame sequence. The preferred range of the phase window is 5 to 120 frames. The selection rule is as follows: when the sampling interval is small and the background amplitude of the background deformed reference field sequence changes more frequently between adjacent frames, the phase window is 5 to 30 frames to avoid compressing multiple repetitive changes into the same window and reducing phase discrimination; when the sampling interval is large or the background amplitude of the background deformed reference field sequence changes more gently, the phase window is 30 to 120 frames to cover the complete process of at least one local quasi-periodic change. The start and end positions of the phase window slide and update according to the time sequence of the frame sequence. The preferred range of the update step size is 1 to 10 frames. The selection rule is that when the phase window value is small, the update step size is 1 to 3 frames, and when the phase window value is large, the update step size is 3 to 10 frames. This sub-step constrains the time scale of the phase marker map sequence with a phase window, avoiding the aliasing of the dynamic background's repetitive morphological changes into the same state in the time dimension, thereby reducing the risk of masking caused by the background pixels being statistically dominant in time when the target's motion is in the same direction as the dynamic background's motion.
[0079] Within the phase window, the local quasi-periodic phase is calculated at the corresponding positions of deformed background pixels to obtain the phase marking rules for the phase marker map sequence. Based on the background deformation reference field sequence, the change sequence of background displacement direction and background amplitude at each deformed background pixel position with each frame within the phase window is taken. The background amplitude difference between adjacent frames is compared in chronological order, and the sign change points of the background amplitude difference are recorded. The sign change points are used to characterize the moment when the background amplitude changes from increasing to decreasing or from decreasing to increasing. The frame interval between two adjacent sign change points of the same type is used as the candidate periodic frame interval. The value of the middle position of multiple candidate periodic frame intervals within the phase window is taken as the periodic frame number. The preferred range of the periodic frame number is 6 to 200 frames. The selection rule is: when the sampling interval is small and the sign change point interval is small, the periodic frame number is 6 to 60 frames; when the sampling interval is large or the sign change point interval is large, the periodic frame number is 60 to 200 frames. The number of phases is set, with an optimal range of 4 to 16. The selection rule is that when the number of periodic frames is small, the number of phases is 4 to 8, and when the number of periodic frames is large, the number of phases is 8 to 16. The number of periodic frames is divided equally according to the number of phases. Within the phase window, each frame is assigned a phase marker value according to the equally divided interval into which its relative sign change point falls. This sub-step establishes phase marker rules based on the dynamic background region defined by deformed background pixels and the background amplitude change sequence of the background deformation reference field sequence. Even when the target and the dynamic background motion characteristics are similar, the reciprocating changes of the dynamic background can still be separated into distinguishable phase markers, thereby reducing the probability that the differences of the dynamic background at different phases are mistakenly compressed into the same background state.
[0080] A phase marker map sequence is generated according to the phase marking rules. Using each frame of the optical flow vector field sequence as an index, the phase marker value for that frame is written to the corresponding position of each deformed background pixel under the same coordinate index, resulting in the phase marker map for that frame. Preset default values are written to the positions of non-deformed background pixels, indicating that the position does not participate in the marking of the local quasi-periodic phase. The phase marker maps are arranged in chronological order according to the optical flow vector field sequence, resulting in the phase marker map sequence. The phase marker map sequence and the background deformation reference field sequence use the same coordinate indexing method to ensure that the spatial meaning of the same position is consistent in different sequences. This sub-step solidifies the local quasi-periodic phase of the dynamic background into the image coordinates in the form of a phase marker map sequence. When the target motion falls near the main peak of the background velocity spectrum, the phase marker map sequence can still be used to limit the background state at the same position to a finer phase condition, thereby reducing the risk of foreground submersion caused by indiscriminate background state.
[0081] Under the constraint of the phase marker map sequence, a micro-deviation residual map sequence is calculated based on the optical flow vector field sequence and the background deformation reference field sequence. For each frame in the optical flow vector field sequence, the phase marker values of the phase marker map are used as grouping conditions to select a set of locations with the same phase marker values that are not preset null values. Within this set of locations, the displacement direction and amplitude of the optical flow vector field sequence, and the background displacement direction and amplitude of the background deformation reference field sequence are taken for each location. The directional residual is characterized by the angular distance between the displacement direction and the background displacement direction, with the angular distance limited to 0 degrees to 180 degrees to standardize the measurement of directional differences. The amplitude residual is characterized by the normalized difference between the amplitude and the background amplitude. Normalization uses the background amplitude plus a lower amplitude limit as the scaling benchmark. The preferred range for the lower amplitude limit is 1 to 3 sampling points. The selection rule is as follows: when the background amplitude is closer to 0 sampling points within the corresponding phase of the phase marker sequence, the lower amplitude limit is 2 to 3 sampling points; when the background amplitude is further away from 0 sampling points within the corresponding phase of the phase marker sequence, the lower amplitude limit is 1 to 2 sampling points. The direction residual and the amplitude residual together constitute the micro-deviation residual at that location.
[0082] To reduce the impact of background morphology changes at deformed background pixels on the proportion of micro-deviation residuals, a residual suppression coefficient is applied to the amplitude residual at the corresponding position of the deformed background pixel, provided that the sampling source of the background deformation reference field sequence is sufficient. Sufficient sampling source means that the position is not a preset null value marker in the current frame, and the background displacement direction and amplitude at that position are obtained from spatial smoothing sampling of smooth support pixels that meet the first support quantity threshold, rather than from inheritance of the background displacement direction and amplitude from the same position in the previous frame. The preferred range for the residual suppression coefficient is 0.2 to 0.8. The selection rule is that within the phase window and in the set of positions with the same phase marker value in the current frame, when the ratio of the number of deformed background pixels to the number of positions not marked by the preset null value is high, the residual suppression coefficient is 0.2 to 0.5; when the ratio is low, the residual suppression coefficient is 0.5 to 0.8. For positions that do not meet the sufficient sampling source condition, the residual suppression coefficient is 1. The micro-deviation residuals at each location are written according to their coordinate indices to obtain the micro-deviation residual map for that frame. These maps are then arranged in chronological order to form a sequence of micro-deviation residual maps. Under the constraint of the phase marker map sequence, the measurement standards for directional and amplitude residuals are standardized, reducing the probability of different phase background states entering the same residual calculation. Simultaneously, the residual suppression coefficient is limited to locations with sufficient sampling sources from the background deformation reference field sequence, reducing the risk of mislabeling of deformed background pixels suppressing the true target residual. This ensures that even when the target's motion direction is consistent with the main direction of dynamic background change and the velocity amplitude is similar, distinguishable micro-deviation residual clues are still preserved, mitigating the incentive for the target to be erased by background in-point processing.
[0083] The micro-deviation residual map sequence is filtered to obtain weak deviation candidate region maps. Before filtering, the statistical scope of the background velocity spectrum is defined. The background velocity spectrum is obtained from the background amplitude distribution of the background deformation reference field sequence. The statistical time range is limited by a phase window. Within each frame covered by the phase window, the center positions of local neighborhoods that are not marked as preset null values and whose background amplitudes do not originate from the same position inherited from the previous frame are selected. The background amplitudes at these positions are added to the background amplitude sample set sequentially. The amplitude bin width is set. The preferred range for the amplitude bin width is 1 to 3 sampling points. The selection rule is that when the interval between the maximum and minimum amplitudes in the background amplitude sample set does not exceed 20 sampling points, the amplitude bin width is 1 sampling point; when the interval exceeds 20 sampling points, the amplitude bin width is 2 to 3 sampling points. The background amplitude sample set is divided into multiple amplitude intervals according to the amplitude bin width. The proportion of the number of samples in each amplitude interval is counted to form a background velocity spectrum indexed by the amplitude interval. The quantile position of the displacement amplitude is obtained on the background velocity spectrum according to the cumulative proportion. The quantile position is characterized by the upper bound of the amplitude interval when the cumulative proportion first reaches the corresponding quantile, which is used to provide a comparable scale for the residual amplitude threshold.
[0084] For instances where weakly deviating pixels appear consecutively at the same location within a consecutive frame number threshold, that location is retained to form a candidate pixel map. Adjacent locations in the candidate pixel map are expanded, using a proximity distance threshold to define adjacency relationships. The preferred range for the proximity distance threshold is 1 to 6 sampling points. The selection rule is that when the local neighborhood value is small, the proximity distance threshold is 1 to 3 sampling points; when the local neighborhood value is large, the proximity distance threshold is 3 to 6 sampling points. For each expanded region, the maximum gap on the region's boundary is calculated. Regions with a maximum gap not exceeding a closure gap threshold are designated as closable regions. The preferred range for the closure gap threshold is 1 to 8 sampling points. The selection rule is that when the background velocity spectrum distribution is more concentrated, 1 to 4 sampling points are selected; when the background velocity spectrum distribution is more dispersed, 4 to 8 sampling points are selected. Closable regions are marked as candidate regions using coordinate indices consistent with the frame sequence, resulting in a weakly deviating candidate region map. This sub-step uses the background velocity spectrum to constrain the residual amplitude threshold, the consecutive frame number threshold to constrain temporal continuity, and the boundary closure constraint to constrain the region morphology. This reduces the scattered residuals generated by dynamic background morphology changes in the micro-deviation residual map sequence. The weak deviation residuals generated by the target within the statistical range of the background main motion are more likely to form a retainable weak deviation candidate region map, thereby reducing the risk of a decrease in recall rate for the target in a specific velocity range.
[0085] The sequence registration module is used to register the frame sequence with the background deformation reference field sequence to obtain the compensation frame sequence; combine the weak deviation candidate area map, phase marker map sequence and micro deviation residual map sequence to generate the candidate area self-aligned field sequence; and use the candidate area self-aligned field sequence to perform corresponding region registration on the weak deviation candidate area map of the compensation frame sequence to obtain the target support evidence map and the target candidate table.
[0086] A compensated frame sequence is obtained by registering the frame sequence with the background deformation reference field sequence. The first frame of the frame sequence is used as the reference frame, and the coordinate index of the reference frame is used as the unified coordinate index of the compensated frame sequence. For each frame in the frame sequence, the background deformation reference field at the same moment is taken, and the background deformation reference field gives the background displacement direction and background amplitude at the center position of each local neighborhood. For non-reference frames, the pixel coordinates of the frame are translated according to the background displacement direction and background amplitude, and the pixels of the frame are mapped to the coordinate index of the reference frame to obtain the compensated frame of that frame. The coordinate translation adopts a segmented processing: first, the translation is performed at the center position of the local neighborhood, and then the mapped pixel value is determined by the nearest sampling point within the local neighborhood to avoid pixels without a source. For pixel positions that exceed the boundary after mapping, a preset null value mark is assigned to indicate that the position will not participate in the subsequent candidate area processing in this frame. The compensated frames are arranged in chronological order of the frame sequence to form the compensated frame sequence. This sub-step aligns the displacement components of the dynamic background in the frame sequence with the background deformation reference field sequence. The main motion of the dynamic background is compressed to a smaller range in the compensation frame sequence, reducing the obscuring of the target extraction by the background pixel advantage when the target and the dynamic background move in the same direction and have similar velocity amplitudes.
[0087] The candidate regions are located under the unified coordinate index of the compensation frame sequence by combining the weak deviation candidate region map. The weak deviation candidate region map already provides the candidate region positions under the frame sequence coordinate index. To maintain spatial consistency with the compensation frame sequence, a coordinate translation rule consistent with the background deformation reference field sequence is applied to the weak deviation candidate region map. For the positions marked as candidate regions in the weak deviation candidate region map, the corresponding background deformation reference field is taken, and the candidate region positions are translated according to the background displacement direction and background amplitude to obtain the set of candidate region positions under the compensation coordinates, forming the compensated weak deviation candidate region map. For discrete points at the boundaries of candidate regions in the weak deviation candidate region map, neighbor-to-neighbor filling is performed using boundary filling distance. The preferred range of boundary filling distance is 1 to 6 sampling points. The selection rule is to use 1 to 3 sampling points when the candidate region area is smaller in the weak deviation candidate region map, and 3 to 6 sampling points when the candidate region area is larger in the weak deviation candidate region map, to reduce the probability of boundary breakage after coordinate translation. This sub-step establishes a consistent coordinate correspondence between the weak deviation candidate area map and the compensation frame sequence, avoiding positional drift of the candidate area under the compensation coordinates, which would cause the candidate area to be replaced by the background area, thereby reducing the risk that the target has formed a weak deviation candidate area map but is covered by the background in subsequent processing.
[0088] A candidate region self-alignment field sequence is generated by combining the compensated weak deviation candidate region map, the phase marker map sequence, and the micro-deviation residual map sequence. For each candidate region in the compensated weak deviation candidate region map, the distribution of phase marker values at the covered location in the phase marker map sequence is taken to determine the main phase marker set for that candidate region. The main phase marker set is used to limit the candidate region self-alignment to only take residual cues under the condition of in-phase conditions. A phase consistency ratio threshold is set, with an optimal range of 0.5 to 0.9. The selection rule is to use 0.5 to 0.7 when the phase window value is small and 0.7 to 0.9 when the phase window value is large, in order to reduce the mixing of cross-phase residuals. For the candidate region location that satisfies the main phase marker set, the direction residual and amplitude residual at that location in the micro-deviation residual map sequence are taken, and the candidate region residual support location set is obtained by filtering according to the residual amplitude threshold and the residual direction threshold. For the set of candidate region residual support locations, the self-alignment direction of the candidate region is determined according to the consistent direction of the directional residuals, and the self-alignment amplitude of the candidate region is determined by taking the value at the middle position of the amplitude residuals, forming a candidate region self-alignment vector. The candidate region self-alignment vector is assigned to the candidate region coverage location to obtain the candidate region self-alignment field of that frame. The candidate region self-alignment fields of each frame are arranged in chronological order to form a candidate region self-alignment field sequence. Under the constraint of the phase marker map sequence, this sub-step transforms the weak deviation clues consistent with the candidate region in the micro-deviation residual map sequence into a candidate region self-alignment field sequence. The candidate region still obtains independent self-alignment evidence in the velocity segment close to the main peak of the background velocity spectrum, reducing the probability that the candidate region residuals are diluted by the background residuals and fail to form target evidence.
[0089] The candidate region map of the compensated weak deviation in the compensated frame sequence is registered with the candidate region self-aligned field sequence to form a candidate region aligned region sequence, and a target support evidence map is constructed. For each frame of the compensated frame sequence, the candidate region map of the compensated weak deviation at the same time is taken to define the candidate region coverage area. For the pixel coordinates within the candidate region coverage area, the candidate region self-alignment direction and candidate region self-alignment magnitude of the candidate region self-alignment field sequence at that coordinate are taken, and coordinate translation is performed to map the candidate region coverage area of that frame to the candidate region aligned coordinates, resulting in the candidate region region after candidate region alignment. For the candidate region after candidate region alignment, the gradient magnitude of the texture gradient map sequence is recalculated to form the texture gradient map sequence corresponding to the compensated frame sequence. Set a texture consistency threshold and an orientation consistency threshold. The preferred range for the texture consistency threshold is the 0.1 quantile to the 0.4 quantile of the gradient magnitude distribution in the texture gradient map sequence. The selection rule is to use the 0.2 quantile to the 0.4 quantile when the frame sequence noise ratio is higher, and to use the 0.1 quantile to the 0.2 quantile when the frame sequence noise ratio is lower. The preferred range for the orientation consistency threshold is 5 degrees to 25 degrees. The selection rule is to use the orientation consistency threshold of 5 degrees to 15 degrees when the residual orientation threshold value is small, and to use the orientation consistency threshold of 15 degrees to 25 degrees when the residual orientation threshold value is large. For each position within the aligned candidate region, the percentage of frames where the difference in texture gradient magnitude within the phase window does not exceed the texture consistency threshold, and the percentage of frames where the deviation of the directional residual from the candidate region's self-alignment direction does not exceed the directional consistency threshold, are counted. Positions where both percentages simultaneously meet the threshold are marked as evidence positions. The preferred range for the percentage threshold is 0.5 to 0.9. The selection rule is that when the consecutive frame number threshold is small, the percentage threshold is 0.5 to 0.7; when the consecutive frame number threshold is large, the percentage threshold is 0.7 to 0.9. The evidence positions are written according to the compensation coordinates to form a target support evidence map. This sub-step performs candidate region self-alignment on the weakly deviating candidate region map corresponding to the compensation frame sequence. The texture gradient and directional residual within the candidate region have stronger comparability under the same coordinates. The target forms continuous evidence positions in the target support evidence map. Dynamic backgrounds are less likely to form evidence positions that meet the percentage threshold under the same processing, thereby reducing the risk of the target being treated as an in-ground point and erased.
[0090] A target candidate table is generated based on the target supporting evidence map. Region aggregation is performed on the evidence locations in the target supporting evidence map. Region aggregation uses a proximity distance threshold to limit adjacency relationships, still using the aforementioned proximity distance threshold value rules: when the local neighborhood side length is 8 to 32 sampling points, the proximity distance threshold is 1 to 3 sampling points; when the local neighborhood side length is 32 to 64 sampling points, the proximity distance threshold is 3 to 6 sampling points, forming an evidence region set. For each evidence region, the evidence region area is calculated, with the evidence region area measured in sampling points. Within the phase window, the proportion of evidence locations corresponding to that evidence region is counted, and the number of frames where the evidence location proportion meets the proportion threshold is recorded as the duration frame count.
[0091] A lower limit and an upper limit for the candidate area are set, so that the values of the lower limit and the upper limit are related to the frame sequence resolution and have a quantization caliber. The preferred range for the lower limit of the candidate area is 9 to 400 sampling points, and the preferred range for the upper limit of the candidate area is 200 to 20,000 sampling points. The upper limit of the candidate area is segmented according to the total number of pixels of the frame sequence resolution. When the frame sequence resolution is no more than 1280 pixels by 720 pixels, the upper limit of the candidate area is 200 to 5,000 sampling points; when the frame sequence resolution is between 1280 pixels by 720 pixels and 1920 pixels by 1080 pixels, the upper limit of the candidate area is 5,000 to 12,000 sampling points; when the frame sequence resolution is not less than 1920 pixels by 1080 pixels, the upper limit of the candidate area is 12,000 to 20,000 sampling points. The lower limit of the candidate area is determined by the historical records of the target candidate table, based on the proportion of the target in the frame sequence. Within a continuous phase window of the same scene, the lower limit of the candidate area is taken as 0.2 to 0.6 times the minimum bounding box area of each candidate region that has formed a target candidate table in the previous phase window. If the lower limit of the candidate area falls outside the range of 9 to 400 sampling points, the lower limit of the candidate area is truncated to 9 or 400 sampling points. When there is no historical record of the target candidate table in the current scene, the lower limit of the candidate area is determined in segments according to the frame sequence resolution. When the frame sequence resolution is no more than 1280 pixels by 720 pixels, the lower limit of the candidate area is 9 to 60 sampling points. When the frame sequence resolution is between 1280 pixels by 720 pixels and 1920 pixels by 1080 pixels, the lower limit of the candidate area is 60 to 180 sampling points. When the frame sequence resolution is not less than 1920 pixels by 1080 pixels, the lower limit of the candidate area is 180 to 400 sampling points.
[0092] Evidence regions whose area falls between the lower and upper limits of the candidate area are retained. Their minimum bounding box is calculated, and the set of principal phase markers corresponding to each evidence region is recorded. The self-aligned amplitude range of the candidate region corresponding to this evidence region is also recorded, along with the number of consecutive frames corresponding to the evidence region, forming a target candidate table. This sub-step, based on the target support evidence map, solidifies the continuous evidence regions formed after candidate region alignment into a target candidate table. The target candidate table is binding on cases where the target area is small and the velocity amplitude falls near the main peak of the background velocity spectrum, reducing the probability of scattered residual regions being treated as target candidates, and simultaneously reducing the probability of real targets being missed due to sparse evidence.
[0093] The target extraction module is used to set exclusion weights for each candidate region in the target candidate table on the compensation frame sequence, and to remove candidate regions during background processing according to the exclusion weights; based on the target support evidence map and the candidate region self-aligned field sequence, the target box sequence is extracted in the compensation frame sequence, and the target box sequence is mapped to the frame sequence to obtain the target detection result.
[0094] The following describes a specific implementation method that involves setting exclusion weights for each candidate region in the target candidate table on the compensation frame sequence, removing candidate regions during background processing according to the exclusion weights, extracting the target bounding box sequence in the compensation frame sequence based on the target support evidence map and the self-aligned field sequence of the candidate regions, and mapping the target bounding box sequence to the frame sequence to obtain the target detection result.
[0095] Exclusion weights are set for each candidate region based on the target candidate table and the target supporting evidence map. For each candidate region in the target candidate table, the proportion of evidence positions in the target supporting evidence map is taken. The evidence position proportion is the percentage of positions covered by the candidate region that are marked as evidence positions. The number of consecutive frames within the phase window for that candidate region is also taken. The number of consecutive frames is the number of frames for which the corresponding position of the candidate region meets the proportion threshold. The preferred range for setting the exclusion weight is 0 to 1. The selection rules are as follows: when the evidence position proportion is between 0.6 and 1 and the number of consecutive frames is between 0.5 and 1 of the phase window frame count, the exclusion weight is 0.8 to 1; when the evidence position proportion is between 0.3 and 0.6 and the number of consecutive frames is between 0.3 and 0.5 of the phase window frame count, the exclusion weight is 0.4 to 0.8; when the evidence position proportion is between 0.1 and 0.3 or the number of consecutive frames is between 0.1 and 0.3 of the phase window frame count, the exclusion weight is 0.1 to 0.4. The exclusion weights are recorded in the corresponding candidate region entries of the target candidate table, ensuring that the exclusion weights are consistent with the coordinate indices of the candidate regions in the compensation frame sequence. This sub-step assigns values to the exclusion weights using the evidence density and persistence provided by the target support evidence map, making the exclusion weights change in the same direction as the target probability. This reduces the probability that a candidate region will be treated as an in-ground point and enter the background processing process when the target's motion direction is consistent with the main direction of the background dynamic change and the velocity amplitude is close.
[0096] Candidate regions are eliminated during background processing based on exclusion weights, and this elimination is performed under the coordinate index of the compensated frame sequence. The background processing procedure is first defined in scope. It includes: calculating motion consistency on the optical flow vector field sequence to mark deformed background pixels; forming smooth support pixels under constraints of deformed background pixels and excluded pixels and performing spatial smoothing sampling to obtain a background deformation reference field sequence; calculating a micro-deviation residual map sequence based on the optical flow vector field sequence and the background deformation reference field sequence under phase marker map sequence constraints; and statistically analyzing the background amplitude sample set according to the phase window to form a background velocity spectrum. In each frame, the above processes use the coordinate index of the compensated frame sequence or a grid coordinate set consistent with the compensated frame sequence as the position index, and all rely on valid positions to provide displacement direction, amplitude, background displacement direction, background amplitude, and phase marker value.
[0097] For each frame in the compensation frame sequence, a set of candidate regions is taken from the target candidate table corresponding to that frame. For each candidate region, the elimination distance is determined based on its elimination weight, with the preferred range being 1 to 12 sampling points. The selection rule is as follows: when the elimination weight is between 0.8 and 1, the elimination distance is 6 to 12 sampling points; when the elimination weight is between 0.4 and 0.8, the elimination distance is 3 to 6 sampling points; and when the elimination weight is between 0.1 and 0.4, the elimination distance is 1 to 3 sampling points. The elimination distance is extended outward from the candidate region boundary to form the elimination coverage area. For positions within the elimination coverage area, a uniform position masking process is performed. The position masking process marks the position as a preset null value, so that the position does not participate in motion consistency calculation, smoothing support pixel statistics, spatial smoothing sampling, or micro-deviation residual map sequence calculation during background processing. At the same time, the position is not added to the background amplitude sample set, thus not providing samples to the background velocity spectrum. The residual direction threshold isolates the candidate region from the background participation weight of its neighborhood under the coordinate index of the compensated frame sequence, so that the removal coverage area has a consistent shielding effect on each sub-process of the background processing, weakens the dominance of the background pixel quantity advantage in the background processing, and reduces the probability that the target will be erased in the target detection result after being absorbed by the background processing.
[0098] Based on the candidate region self-aligned field sequence, candidate region registration is performed on the candidate region coverage area in the compensation frame sequence to form a set of evidence locations after candidate region alignment. For each frame of the compensation frame sequence, the candidate region self-aligned field sequence and the target support evidence map at the same time are taken. For each candidate region in the target candidate table, the candidate region coverage area is defined. For the coordinates of the evidence locations marked by the target support evidence map within the candidate region coverage area, the candidate region self-alignment direction and candidate region self-alignment amplitude at that coordinate are taken from the candidate region self-aligned field sequence, and the coordinate translation is performed on that coordinate to obtain the position of the evidence location under the candidate region alignment coordinates. For the evidence location sets under the candidate region alignment coordinates obtained in each frame within the phase window for the same candidate region, the correspondence between the location sets is preserved according to the time sequence of the frame sequence, so that the evidence location sets of the same candidate region in different frames are under the same candidate region alignment coordinates. This sub-step uses the candidate region self-aligned field sequence to align the evidence locations within the candidate region under the same coordinates, reducing the disturbance of the background residual on the evidence location distribution when the target is aligned with the main peak of the dynamic background velocity spectrum, making it easier for the target evidence to form a spatially locatable concentrated area.
[0099] Based on the evidence location set after candidate region alignment, target bounding box sequences are extracted from the compensation frame sequence. For each frame of the compensation frame sequence, for each candidate region in the target candidate table, the evidence location set under the candidate region alignment coordinates of that frame is taken, and the candidate region coverage area under the coordinate index of the compensation frame sequence is taken. The evidence location set under the candidate region alignment coordinates is translated back to the coordinate index of the compensation frame sequence according to the reverse coordinate of the candidate region from the alignment field sequence, to obtain the candidate region evidence location set of that frame. The minimum bounding box is calculated for the candidate region evidence location set to obtain the candidate region bounding box of that frame. The bounding box extension distance is set, with the preferred range being 0 to 10 sampling points. The selection rule is as follows: when the boundary gap of the candidate region evidence location set does not exceed the closing gap threshold, the bounding box extension distance is 0 to 4 sampling points; when the boundary gap of the candidate region evidence location set is between the closing gap threshold and twice the closing gap threshold, the bounding box extension distance is 4 to 10 sampling points. The candidate region bounding box is expanded according to the bounding box extension distance to obtain the target bounding box of that frame. The target bounding boxes of each frame are arranged in chronological order to form a target bounding box sequence. This sub-step extracts the target bounding box sequence based on the evidence positions provided by the target support evidence map and the candidate region self-alignment field sequence. The target boxes are directly defined by the evidence position set, which reduces the pulling effect of scattered weak deviation pixels caused by dynamic background texture on the box localization and reduces the occurrence of box drift or missing boxes when the target motion falls into the statistical range of the background main motion.
[0100] The target bounding box sequence is mapped to the frame sequence to obtain the target detection result. The target bounding box sequence is under the coordinate index of the compensated frame sequence, and the frame sequence is under the coordinate index of the original image. For each frame of the frame sequence, the background deformation reference field sequence at the same moment is taken. The background deformation reference field gives the background displacement direction and background amplitude at the center position of each local neighborhood. For the target bounding box of the frame, the background displacement direction and background amplitude corresponding to the target bounding box coverage position are taken. The target bounding box boundary points are translated in reverse coordinates according to the background displacement direction and background amplitude to obtain the mapped box of the frame under the coordinate index of the frame sequence. If the boundary point of the mapped box falls within the coverage area of the preset null value mark, boundary back-off is performed along the normal direction of the target bounding box boundary with a step size of 1 sampling point. The preferred range of the upper limit of back-off is 2 to 20 sampling points. The selection rule is that when the elimination distance value is large, the upper limit of back-off is 10 to 20 sampling points, and when the elimination distance value is small, the upper limit of back-off is 2 to 10 sampling points. The bounding boxes of each frame are arranged in chronological order according to the frame sequence to form the target detection results under the frame sequence coordinate index. This sub-step uses the background deformation reference field sequence to establish the coordinate correspondence between the compensated frame sequence and the frame sequence, so that the target box sequence can be returned to the frame sequence for consistent representation. This reduces the positional deviation of the detection results caused by the inconsistency of the coordinate system when the dynamic background and the target motion features are close, and ensures the usability of the target detection results in the original observation frame.
[0101] Example 2:
[0102] Please see Figure 2 As shown, this embodiment provides a dynamic background target detection method, including:
[0103] Acquire a sequence of frames; calculate the displacement direction and amplitude from adjacent frames to obtain the optical flow vector field sequence;
[0104] Deformed background pixels are extracted from the optical flow vector field sequence, and spatial smoothing is performed on the optical flow vector field sequence using the deformed background pixels as constraints to obtain the background deformation reference field sequence.
[0105] Based on the deformed background pixel and background deformed reference field sequence, a phase marker map sequence is calculated; under the constraint of the phase marker map sequence, a micro-deviation residual map sequence is calculated based on the optical flow vector field sequence and the background deformed reference field sequence; the micro-deviation residual map sequence is filtered to obtain a weak deviation candidate region map.
[0106] The frame sequence is registered with the background deformation reference field sequence to obtain the compensation frame sequence; the candidate region self-alignment field sequence is generated by combining the weak deviation candidate region map, phase marker map sequence and micro deviation residual map sequence; the weak deviation candidate region map of the compensation frame sequence is registered with the candidate region self-alignment field sequence to obtain the target support evidence map and the target candidate table.
[0107] An exclusion weight is set for each candidate region in the target candidate table on the compensation frame sequence, and the candidate regions are removed during background processing according to the exclusion weight; the target box sequence is extracted in the compensation frame sequence based on the target support evidence map and the candidate region self-aligned field sequence, and the target box sequence is mapped to the frame sequence to obtain the target detection result.
[0108] Example 3:
[0109] In one feasible embodiment, the camera device is fixedly installed at the observation position. The camera device outputs an image with a resolution of 1920 pixels by 1080 pixels. The camera device continuously acquires a frame sequence at a sampling interval of 50 milliseconds, and the frame sequence covers 200 consecutive frames. The sampling points and pixel coordinates use the same semantics, and the sampling points correspond to pixel coordinate grid points. The frame sequence forms adjacent frame pairs according to the acquisition order. The previous frame is used as the reference frame and the next frame is used as the comparison frame. In the reference frame, multiple local neighborhoods are selected with a fixed step size of 2 sampling points. The local neighborhood is a square region with a side length of 32 sampling points. The center position of the local neighborhood is the pixel coordinate of the geometric center of the local neighborhood. The center position of the local neighborhood falls within the inner region of the reference frame after removing the boundary of 16 sampling points, so that the local neighborhood completely falls within the effective imaging area of the reference frame.
[0110] For each pair of adjacent frames, a local neighborhood search is performed to obtain displacement candidates. The displacement range is taken as the center of the local neighborhood of the reference frame, extended outward by 8 sampling points. The texture similarity is calculated for the candidate positions within the displacement range. The texture similarity is calculated using the grayscale values of the local neighborhood after grayscale processing. Zero-mean processing and amplitude normalization are performed on the local neighborhoods of the reference frame and the candidate positions in the comparison frame, respectively. The similarity is then calculated and mapped to 0 to 1 as the texture similarity, with a similarity threshold of 0.82. For positions within the displacement range where the texture similarity is higher than the similarity threshold, the candidate position with the highest texture similarity is selected as the corresponding position to obtain the displacement candidate. When the texture similarity of all candidate positions in a local neighborhood is lower than the similarity threshold in the adjacent frame pair, the displacement candidate of that local neighborhood is marked as a preset null value, and the displacement range is expanded outward by 12 sampling points in subsequent adjacent frame pair processing to re-perform the corresponding position search. The displacement direction and magnitude are obtained from the displacement candidates. The displacement direction and magnitude are recorded at the center position of each local neighborhood of the reference frame to obtain the optical flow vector field corresponding to the adjacent frame pair. The optical flow vector field sequence is obtained by arranging the adjacent frame pairs in order. The optical flow vector field sequence has displacement direction and magnitude only on the grid coordinate set of the center position of the local neighborhood.
[0111] Deformed background pixels are extracted from the optical flow vector field sequence to form a background deformation reference field sequence. A consistent neighborhood is defined by extending 6 sampling points outwards from the center of the local neighborhood. Within the consistent neighborhood, only positions falling on the grid coordinate set of the center of the local neighborhood are selected for comparison. The upper limit of the second directional difference is set to 20 degrees, the upper limit of the second magnitude difference is set to 3 sampling points, and the motion consistency threshold is set to 0.5. Positions whose directional difference and magnitude difference do not exceed the upper limit of the second directional difference are denoted as positions satisfying the consistency relationship. The percentage of positions satisfying the consistency relationship is denoted as the motion consistency degree. Positions with a motion consistency degree lower than the motion consistency threshold are marked as deformed background pixels. The texture gradient map sequence is obtained by calculating the gradient magnitude from the frame sequence. The boundary sharpness threshold is set to the 0.9 quantile of the gradient magnitude distribution in the texture gradient map. Positions with boundary sharpness higher than the boundary sharpness threshold are marked as excluded pixels. Smoothing support pixels are obtained by removing deformed background pixels. A first support number threshold of 8 is set, a first smoothing range extends outward by 10 sampling points, and a second smoothing range extends outward by 24 sampling points. When the number of smooth support pixels in both the first and second smoothing ranges is lower than the first support number threshold, the background displacement direction and amplitude at that location are taken from the same location in the previous frame. Positions continuously exceeding a missing frame number threshold are marked as preset null values. The missing frame number threshold is 8 frames. A background deformation reference field is obtained for each frame according to this method, and the background deformation reference field sequence is obtained by arranging them in chronological order.
[0112] The phase marker map sequence and the micro-deviation residual map sequence are calculated within a phase window of 40 frames, with 8 phases. The phase marker map sequence is obtained by determining the phase marker value from the background amplitude change at the corresponding position of the deformed background pixel in the background deformation reference field sequence and writing it into the coordinate index. The micro-deviation residual map sequence is calculated under the constraints of the phase marker map sequence. The directional residual uses an angular distance range of 0 to 180 degrees, and the amplitude residual uses a normalized difference scale based on the background amplitude plus the lower limit of the amplitude, with the lower limit of the amplitude taken as 2 sampling points. When the position satisfies the requirement of sufficient sampling source of the background deformation reference field sequence, a residual suppression coefficient is applied to the amplitude residual at the corresponding position of the deformed background pixel. The residual suppression coefficient is 0.4. The statistical domain of the deformed background pixel proportion is the set of positions with the same phase marker value in the current frame within the phase window. The ratio of the number of deformed background pixels to the number of positions that are not preset null marker values is used to select the residual suppression coefficient. The background velocity spectrum is obtained from the background amplitude sample set of the background deformation reference field sequence within the phase window. The background amplitude sample set only includes positions that are not marked as preset null values and whose background amplitudes are not inherited from the previous frame. The amplitude bin width is 1 sampling point, and the 0.15 quantile of the displacement amplitude corresponds to the upper limit of the amplitude interval, which is 3 sampling points. Under the same scale, the residual amplitude threshold is 0.25, the residual direction threshold is 12 degrees, and the consecutive frame number threshold is 3 frames. Weakly deviated pixels are marked on the micro-deviation residual map sequence to form a candidate pixel map. The proximity distance threshold is 2 sampling points, and the closure gap threshold is 3 sampling points. The number of candidate regions obtained in the weak deviation candidate region map corresponding to the 120th frame is 6, and one candidate region covers an area of 156 sampling points.
[0113] A compensated frame sequence is obtained by registering the frame sequence with the background deformation reference field sequence, using the coordinate index of the first frame. A compensated weak deviation candidate region map is obtained by translating the weak deviation candidate region map according to the same coordinate translation rules. A candidate region self-alignment field sequence is generated by combining the compensated weak deviation candidate region map, the phase marker map sequence, and the micro-deviation residual map sequence. The phase consistency ratio threshold is set to 0.75, the upper limit of the third direction difference is set to 20 degrees, and the direction consistency ratio threshold is set to 0.8. For the aforementioned candidate region with an area of 156 sampling points, the direction consistency ratio is 0.84 in the 120th frame, the candidate region self-alignment direction is determined to be 27 degrees, and the candidate region self-alignment amplitude is determined to be 2 sampling points. The candidate region self-alignment vector is assigned to the candidate region coverage position to obtain the candidate region self-alignment field. The candidate region-aligned region sequence is obtained by registering the corresponding region of the weakly offset candidate region map of the compensated frame sequence with the candidate region self-aligned field sequence. Evidence positions are statistically analyzed within the phase window to form a target support evidence map. The texture consistency threshold is set to the 0.2 quantile of the gradient magnitude distribution of the texture gradient map sequence, the orientation consistency threshold is set to 10 degrees, and the proportion threshold is set to 0.7. The evidence position proportion of this candidate region in the target support evidence map is 0.72, the duration is 28 frames, and the exclusion weight after recording in the target candidate table is set to 0.9.
[0114] Candidate regions are eliminated during background processing according to exclusion weights. The elimination distance is calculated by taking 8 sampling points according to the exclusion weights to form an elimination coverage area. The positions within the elimination coverage area are marked with preset null values, so that these positions are not considered as participating positions in motion consistency calculation, spatial smoothing sampling, micro-deviation residual map sequence calculation, and background velocity spectrum statistics. Based on the target support evidence map and the candidate region self-aligned field sequence, the target box sequence is extracted in the compensated frame sequence. The boundary coordinates of the candidate region in the target box sequence of frame 120 are 860 pixels on the left, 420 pixels on the top, 910 pixels on the right, and 455 pixels on the bottom. The target box sequence is maintained for 60 consecutive frames. The target box sequence is mapped to the frame sequence by performing a reverse coordinate translation according to the background deformation reference field sequence. In frame 120, the boundary coordinates of the target detection result under the frame sequence coordinate index are 858 pixels on the left, 418 pixels on the top, 908 pixels on the right, and 453 pixels on the bottom.
[0115] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
[0116] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A dynamic background target detection method, characterized in that, include: Acquire frame sequence; The displacement direction and magnitude are calculated from adjacent frames to obtain the optical flow vector field sequence; Deformed background pixels are extracted from the optical flow vector field sequence, and spatial smoothing is performed on the optical flow vector field sequence using the deformed background pixels as constraints to obtain the background deformation reference field sequence. Based on the deformed background pixel and background deformed reference field sequence, a phase marker map sequence is calculated; under the constraint of the phase marker map sequence, a micro-deviation residual map sequence is calculated based on the optical flow vector field sequence and the background deformed reference field sequence. By filtering the micro-deviation residual map sequence, weak deviation candidate region maps are obtained; A compensated frame sequence is obtained by registering the frame sequence with the background deformation reference field sequence; a candidate region self-alignment field sequence is generated by combining the weak deviation candidate region map, phase marker map sequence and micro deviation residual map sequence. The weak deviation candidate region map of the compensation frame sequence is registered with the candidate region self-aligned field sequence to obtain the target support evidence map and the target candidate table. An exclusion weight is set for each candidate region in the target candidate table on the compensation frame sequence, and the candidate regions are removed during background processing according to the exclusion weight. Based on the target support evidence map and the candidate region self-aligned field sequence, the target bounding box sequence is extracted from the compensated frame sequence, and the target bounding box sequence is mapped to the frame sequence to obtain the target detection result.
2. The dynamic background target detection method according to claim 1, characterized in that, The frame sequence is formed into adjacent frame pairs according to the acquisition order. The two adjacent frames in the frame sequence are taken as a group. For each group of adjacent frame pairs, multiple local neighborhoods are selected in the reference frame with a preset fixed step size. Define the local neighborhood center position, which is the geometric center pixel coordinate of the local neighborhood in the reference frame pixel coordinate. The local neighborhood center positions are uniformly distributed within the effective imaging area of the reference frame according to a fixed step size. Uniform distribution means that the pixel coordinate spacing between adjacent local neighborhood center positions in the horizontal and vertical directions is equal to the fixed step size.
3. The dynamic background target detection method according to claim 2, characterized in that, Taking the optical flow vector field corresponding to each frame in the optical flow vector field sequence as the processing object, at the center position of each local neighborhood of the optical flow vector field, the displacement direction and amplitude of the center position of the local neighborhood are taken as the center vector, and then a consistent neighborhood is set around the center position of the local neighborhood.
4. The dynamic background target detection method according to claim 3, characterized in that, The displacement direction and magnitude at the center position of each local neighborhood within the uniform neighborhood are compared one by one. During the comparison, the preset upper limit of the second direction difference and the upper limit of the second magnitude difference are used to constrain the consistency relationship. Only the positions that fall on the grid coordinate set of the center position of the local neighborhood within the uniform neighborhood are selected to participate in the comparison. The criteria for determining consistency are that the direction difference does not exceed the upper limit of the second direction difference and the amplitude difference does not exceed the upper limit of the second amplitude difference; The proportion of neighborhoods that satisfy the consistency relationship is defined as the motion consistency degree, and the positions with a motion consistency degree lower than the motion consistency threshold are marked as deformed background pixels.
5. The dynamic background target detection method according to claim 2, characterized in that, Add phase ripple culling constraints to deformed background pixels; Phase fluctuation elimination constraints are based on phase marker map sequences and phase windows. Within a phase window, the number of phase marker value switching times of the same local neighborhood center position in the phase marker map sequence is counted. The phase window is a continuous multi-frame interval in the optical flow vector field sequence. Set a switching count threshold. For positions where the phase marker value switching count exceeds the switching count threshold, remove that position from the deformed background pixels.
6. The dynamic background target detection method according to claim 5, characterized in that, Add residual elimination constraints to deformed background pixels; The residual elimination constraint is based on the micro-deviation residual map sequence and phase window. Within the phase window, the number of consecutive frames at the center position of the same local neighborhood that satisfy the condition that the magnitude residual exceeds the preset residual magnitude threshold or the direction residual exceeds the preset residual direction threshold is counted. For positions where the number of consecutive frames reaches a preset threshold, remove those positions from the deformed background pixels.
7. The dynamic background target detection method according to claim 5, characterized in that, Add candidate region culling constraints to deformed background pixels within the same frame; The candidate region elimination constraint is based on the weakly deviated candidate region map, which represents the candidate region coverage location under a coordinate index consistent with the optical flow vector field sequence; For a deformed background pixel in a frame, if its coordinates fall within the candidate area coverage position marked by the weak deviation candidate area map of that frame, then the position corresponding to that coordinate is removed from the deformed background pixels.
8. The dynamic background target detection method according to claim 5, characterized in that, The background processing includes the following steps: calculating motion consistency on the optical flow vector field sequence to mark deformed background pixels; extracting deformed background pixels from the optical flow vector field sequence; calculating a micro-deviation residual map sequence based on the optical flow vector field sequence and the background deformation reference field sequence; and statistically analyzing the background amplitude sample set according to the phase window.
9. The dynamic background target detection method according to claim 1, characterized in that, For each candidate region, the elimination distance is determined based on its exclusion weight; The elimination coverage area is formed by extending the elimination distance outward from the candidate area boundary; For locations within the exclusion coverage area, a uniform location masking process is performed, which involves marking the location as a preset null value.
10. A dynamic background target detection system, used to implement the dynamic background target detection method according to any one of claims 1-9, characterized in that, include: Sequence acquisition module, used to acquire frame sequences; The displacement direction and magnitude are calculated from adjacent frames to obtain the optical flow vector field sequence; The background extraction module is used to extract deformed background pixels from the optical flow vector field sequence, and to perform spatial smoothing on the optical flow vector field sequence with the deformed background pixels as constraints to obtain the background deformation reference field sequence. The sequence filtering module calculates a phase marker map sequence based on the deformed background pixel and background deformed reference field sequence; under the constraint of the phase marker map sequence, it calculates a micro-deviation residual map sequence based on the optical flow vector field sequence and the background deformed reference field sequence. By filtering the micro-deviation residual map sequence, weak deviation candidate region maps are obtained; The sequence registration module is used to register the frame sequence with the background deformation reference field sequence to obtain the compensated frame sequence; and to generate the candidate region self-aligned field sequence by combining the weak deviation candidate region map, phase marker map sequence and micro deviation residual map sequence. The weak deviation candidate region map of the compensation frame sequence is registered with the candidate region self-aligned field sequence to obtain the target support evidence map and the target candidate table. The target extraction module is used to set exclusion weights for each candidate region in the target candidate table on the compensation frame sequence, and to remove candidate regions during background processing according to the exclusion weights. Based on the target support evidence map and the candidate region self-aligned field sequence, the target bounding box sequence is extracted from the compensated frame sequence, and the target bounding box sequence is mapped to the frame sequence to obtain the target detection result.