A time domain integral image sensing method fusing antenna field information
By using a time-domain integral image sensing method that integrates antenna domain information, the problem of insufficient target detection stability in existing technologies is solved, achieving high-precision target recognition in complex environments, reducing false detection rate and improving detection stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HEFEI NORMAL UNIV
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies struggle to accurately extract target features in complex environments, especially under dynamic changes or weak signals. They suffer from insufficient target detection stability, high false detection rates, and a lack of comprehensive utilization of multi-source information, which affects the continuity and reliability of target recognition.
By using a time-domain integral image sensing method that integrates antenna domain information, time-domain integral image sequences and antenna detection information are obtained. Motion vector fields are calculated using optical flow analysis. Target regions are divided and identified by combining appearance feature sequences and antenna detection information. Joint discriminative feature vectors are constructed for adaptive classification.
It improves the accuracy and stability of target detection, reduces the false detection and false negative rates, and achieves high-precision target recognition in complex environments.
Smart Images

Figure CN122265670A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing and multi-source information fusion sensing technology, specifically to a time-domain integral image sensing method that fuses antenna domain information. Background Technology
[0002] In the fields of image processing and target recognition technology, with the development of intelligent sensing systems, image data processing methods based on time-domain information have gradually attracted attention. Time-domain integral images, by accumulating pixel information from consecutive time frames, can enhance weak target signals to a certain extent and improve the detectability of targets, and are therefore widely used in target perception scenarios in complex environments.
[0003] However, most existing technologies still rely on single image data for target detection and recognition, and the utilization of temporal information remains at a basic level. This is especially true when the target is dynamically changing or the signal is weak, making it difficult to extract effective features in a timely and accurate manner. Furthermore, in complex backgrounds or environments with multiple interferences, relying solely on image features for target recognition is easily affected by changes in illumination, occlusion, and noise interference, leading to insufficient stability and a high false detection rate. In addition, while existing methods introduce techniques such as optical flow to describe the temporal changes of pixels when processing target motion information, they lack external auxiliary information to effectively constrain motion features. This makes it difficult to accurately distinguish between real targets and interference areas when target motion is discontinuous or exhibits abnormal changes, thus affecting the selection of candidate target regions. Moreover, in the target feature analysis process, existing technologies typically rely solely on image appearance features for judgment, lacking comprehensive utilization of multi-source information, especially failing to effectively incorporate spatial signal information acquired by sensing devices such as antennas. This results in the system's inability to make accurate judgments when the target is occluded, deformed, or disappears briefly, affecting the continuity and reliability of target tracking and recognition. Summary of the Invention
[0004] The purpose of this invention is to provide a time-domain integral image sensing method that integrates antenna information, thereby solving the problems existing in the prior art.
[0005] To achieve the above objectives, the present invention provides the following technical solution: a time-domain integral image sensing method that fuses antenna domain information, comprising: S1. Acquire the image sequence data after time-domain integration processing, and simultaneously acquire the antenna detection information corresponding to the image sequence. The antenna detection information includes signal strength distribution or spatial direction information. S2. The optical flow analysis method is used to calculate the motion vector field of each pixel in the image sequence between adjacent time frames. The motion vector field describes the change trend of the pixel in the time dimension. S3. Based on the antenna detection information, constrain or weight the motion vector field, and according to the consistency of the amplitude and direction of the motion vector field, divide the image into connected regions with coherent motion patterns, and use the connected regions as candidate motion target regions. S4. For each candidate moving target region, extract its appearance feature sequence in the multi-frame data of the image sequence. The appearance feature sequence includes color histogram, texture features and edge gradient features. S5. If the appearance feature sequence shows a continuous change pattern in the time dimension, then the features of the candidate moving target region are determined to be stable. S6. If the appearance feature sequence jumps or disappears periodically in the time dimension, then combine the signal changes in the antenna detection information to determine whether there is an object blocking or the target is rapidly deformed. S7. Identify candidate moving target regions with stable features as potential targets and record their temporal intervals and spatial trajectories throughout the entire image sequence.
[0006] Preferably, S1 includes: Acquire raw photoelectric data in continuous time frames collected by the sensor, and perform time-domain integration on the raw photoelectric data to generate a time-domain integrated image sequence; Antenna detection information is extracted based on the frame synchronization timestamp of the time-domain integral image sequence. The signal intensity distribution is projected onto the time-domain integral image sequence using the spatial direction information in the antenna detection information to obtain composite image sequence data. If the cumulative pixel intensity of a region in the composite image sequence data is higher than the preset background threshold, the acquired image sequence data after time-domain integration processing and the antenna detection information corresponding to the image sequence acquired simultaneously will be output.
[0007] Preferably, S2 includes: Obtain the spatiotemporal gradient information of pixels in adjacent frames of an image sequence to obtain a gradient data set; Based on the gradient data set, optical flow constraint equations are constructed to obtain an underdetermined constraint relationship model, which contains two unknown velocity components. For the underdetermined constraint relationship model, a smoothing constraint term is introduced to construct an energy functional describing the global motion state, and the global energy objective function is obtained. The global energy objective function is minimized to obtain a two-dimensional velocity component matrix. Vector synthesis is then performed on the two-dimensional velocity component matrix to generate a motion vector field describing the changing trend of pixels in the time dimension.
[0008] Preferably, S3 includes: Obtain antenna detection information containing detection azimuth data, and map the antenna detection information to construct a spatially distributed weight matrix; The initial motion vector field is weighted by a spatially distributed weight matrix to obtain the modified motion vector field; Calculate the directional consistency coefficient and amplitude change rate of the corrected motion vector field, and generate a vector similarity map based on the directional consistency coefficient and amplitude change rate; The vector similarity map is segmented into independent connected regions using a region growing algorithm, and these independent connected regions are used as candidate motion target regions.
[0009] Preferably, S4 includes: Acquire multi-frame data of an image sequence and locate the foreground pixel set; determine the boundary coordinates of the candidate moving target region based on the foreground pixel set. The target sub-image is segmented based on the boundary coordinates, its pixel distribution is statistically analyzed to generate a color histogram, and the target sub-image is converted to grayscale to obtain a grayscale image. The gray-level co-occurrence matrix is calculated based on the gray-level image to construct texture feature data, and the gradient operator is applied to obtain edge gradient features; The color histogram, texture feature data and edge gradient features are concatenated and stitched together to extract the appearance feature sequence of candidate moving target regions in multi-frame data of image sequence.
[0010] Preferably, S5 includes: Acquire multiple frames of images corresponding to consecutive timestamps in the video stream data, and extract appearance feature vectors for candidate moving target regions in the multiple frames of images to generate appearance feature sequences; The inter-frame difference sequence is obtained by calculating the similarity values of adjacent feature vectors in the appearance feature sequence; Numerical differentiation is performed on the inter-frame difference sequence to obtain the gradient magnitude, and a change rate sequence is generated based on the gradient magnitude. If the inter-frame difference sequence is lower than the preset continuity threshold and the maximum gradient magnitude in the rate of change sequence is less than the preset slow change threshold, then the features of the candidate moving target region are determined to be stable.
[0011] Preferably, S6 includes: Acquire video streams of the monitored area and extract texture vectors, then generate appearance feature sequences based on the texture vectors; Calculate the distance difference between texture vectors in the appearance feature sequence. If the distance difference exceeds the threshold or the texture vector returns to zero, mark the abnormal time segment. Radio frequency (RF) signal segments are extracted from antenna detection information based on abnormal time segments, and the signal strength attenuation evolution and Doppler frequency shift characteristics are obtained by analyzing the RF signal segments. If the signal strength attenuation decreases, it is determined that there is an object blocking the signal. If the signal strength attenuation is stable and the Doppler frequency shift characteristics show spectral broadening, then it is determined that there is rapid deformation of the target.
[0012] Preferably, S7 includes: Acquire image sequences and extract initial candidate regions, calculate the region feature vectors of the initial candidate regions, and determine the feature-stable candidate moving target regions by calculating the cosine similarity of the region feature vectors; Obtain the centroid coordinates of candidate moving target regions with stable features and generate spatiotemporal data points; Construct an associated linked list based on spatiotemporal data points to identify potential targets; Extract the time intervals of potential targets and arrange the centroid coordinates to generate spatial trajectories. Record the time intervals and spatial trajectories of potential targets throughout the entire image sequence.
[0013] Preferably, it also includes step S8: determining the corresponding minimum bounding rectangle region on each frame of the original image based on the spatial trajectory and time interval of the potential target, thereby completing target localization, specifically including: Obtain the spatial trajectory and time interval of the potential target, and extract the corresponding frame image from the original image sequence according to the time interval; The initial center point is determined by mapping the geographic coordinates of the frame image using the spatial trajectory, and the candidate region corresponding to the initial center point is calculated by combining the motion vectors between adjacent frame images. If the pixel intensity distribution within the candidate region meets a preset threshold, the minimum bounding rectangle of the candidate region is determined by a boundary refinement algorithm to complete target localization on each frame of the original image.
[0014] Preferably, the method further includes S9: performing cross-modal feature alignment and weighted fusion on the located potential target region image patch and the corresponding antenna feature information to construct a joint discriminative feature vector, and inputting it into a pre-trained convolutional neural network classifier for adaptive classification to obtain the category confidence of the target and complete the target recognition. Specifically, this includes: After obtaining the image patch of the potential target area after localization and the corresponding antenna feature information, the image pixel coordinates and antenna signal intensity distribution are matched by spatial mapping relationship to obtain the initial aligned original multimodal dataset; Extract image patch texture details and antenna feature phase offset from the original multimodal dataset, and use affine transformation matrix to perform cross-modal feature alignment on feature spaces of different dimensions to obtain aligned high-dimensional feature matrix; The feature matrix is dynamically allocated according to the preset signal-to-noise ratio weights. The visual semantic information and electromagnetic spectrum features are deeply interacted through a weighted fusion algorithm to obtain a joint discriminative feature vector containing multi-dimensional attributes. The joint discriminative feature vector is input into a pre-trained convolutional neural network classifier. If the activation value output by the classifier exceeds the preset discrimination threshold, adaptive classification logic is executed based on the joint discriminative feature vector to obtain the target's category confidence and complete the target recognition.
[0015] As can be seen from the above technical solution, the present invention has the following beneficial effects: This time-domain integral image sensing method, which integrates antenna domain information, enhances weak target signals by introducing time-domain integral images and combines antenna detection information to constrain and assist in the discrimination of target motion features, thus achieving effective fusion of image information and spatial signal information. Simultaneously, it improves the accuracy of target extraction in complex scenes through the analysis of motion vector fields and candidate region screening, and enhances the ability to judge target persistence by utilizing the stability analysis of multi-frame appearance feature sequences. Furthermore, it combines antenna information to assist in the identification of occlusion and deformation, effectively reducing false detection and false negative rates. Based on this, it constructs joint discriminative features through cross-modal feature alignment and weighted fusion, achieving high-precision target classification and recognition, thereby comprehensively improving the stability, accuracy, and robustness of target detection and recognition in complex environments. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of the overall signal transmission of the present invention; Figure 2 This is a flowchart of the overall method of the present invention; Figure 3 This is a structural block diagram of the local terminal of an exemplary electronic device of the present invention; Figure 4 This is a structural block diagram of the network terminal of an exemplary electronic device of the present invention. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] like Figure 1 and Figure 2 As shown, the present invention provides a technical solution: a time-domain integral image sensing method that fuses antenna domain information, comprising: S1. Obtain image sequence data after time-domain integration processing. The data contains pixel intensity accumulation information of consecutive time frames. Simultaneously, obtain antenna detection information corresponding to the image sequence. The antenna detection information includes signal intensity distribution or spatial direction information. S2. The optical flow analysis method is used to calculate the motion vector field of each pixel in the image sequence between adjacent time frames. The motion vector field describes the change trend of the pixel in the time dimension. S3. Based on the antenna detection information, constrain or weight the motion vector field, and according to the consistency of the amplitude and direction of the motion vector field, divide the image into connected regions with coherent motion patterns, and use the connected regions as candidate motion target regions. S4. For each candidate moving target region, extract its appearance feature sequence in the multi-frame data of the image sequence. The appearance feature sequence includes color histogram, texture features and edge gradient features. S5. If the appearance feature sequence shows a continuous and slow change pattern in the time dimension, then the features of the candidate moving target region are determined to be stable. S6. If the appearance feature sequence jumps or disappears periodically in the time dimension, then combine the signal changes in the antenna detection information to determine whether there is an object blocking or the target is rapidly deformed. S7. Identify candidate moving target regions with stable features as potential targets, and record their temporal intervals and spatial trajectories throughout the entire image sequence. S8. Based on the spatial trajectory and time interval of the potential target, determine its corresponding minimum bounding rectangle region on each frame of the original image to complete the target localization. S9. After locating the potential target area image block and the corresponding antenna feature information, perform cross-modal feature alignment and weighted fusion to construct a joint discriminative feature vector, and input it into a pre-trained convolutional neural network classifier for adaptive classification to obtain the category confidence of the target and complete the target recognition.
[0019] In this embodiment, stable detection, localization, and identification of targets in complex environments are achieved primarily through joint processing of time-domain integrated image information and antenna detection information. As a preferred embodiment, image sequence data after time-domain integration processing is first acquired. It is evident that time-domain integration processing accumulates pixel intensity within a continuous time window, thereby enhancing the temporal response characteristics of weak targets, improving the signal-to-noise ratio under low-light, low-contrast, or interfering background conditions, and making target information more prominent.
[0020] In this embodiment, furthermore, for the image sequence, an optical flow analysis method is used to calculate the motion vector field of each pixel between adjacent time frames, to characterize the displacement trend, velocity change, and orientation distribution of the pixels in the time dimension. In some embodiments, the optical flow analysis method can be implemented using either a sparse optical flow method or a dense optical flow method to adapt to application scenarios with different resolutions and motion complexities. Through this motion vector field, regions with dynamic response characteristics can be initially separated from the image sequence.
[0021] Furthermore, it should be noted that this application does not rely solely on motion information from visual images for target identification. Instead, it further introduces antenna detection information acquired synchronously with the image sequence to constrain or weight the motion vector field. Thus, the signal strength distribution or spatial orientation information in the antenna detection information can provide additional spatial priors for the moving regions in the image, allowing vectors more consistent with the actual target position and direction of motion to receive higher weights, while noise motion, background disturbances, or false target vectors that do not match the antenna detection results are suppressed. In this way, the accuracy of candidate motion region extraction can be effectively improved.
[0022] In this embodiment, by performing amplitude and direction consistency analysis on the constrained or weighted motion vector field, connected regions with coherent motion patterns are segmented in the image, and these connected regions are used as candidate moving target regions. As can be seen, if the pixel motion within a certain region has high consistency in direction and amplitude, it indicates that the region is more likely to correspond to a real moving object, rather than random noise or local flicker interference. Therefore, this application improves the reliability of the candidate target extraction stage through motion consistency constraints.
[0023] Furthermore, for each candidate moving target region, its appearance feature sequence in multi-frame image data is extracted. This appearance feature sequence includes a color histogram, texture features, and edge gradient features. As a preferred embodiment, the color histogram describes the color distribution characteristics within the region, texture features describe changes in the region's surface structure, and edge gradient features describe changes in the target's contour and local shape. By continuously analyzing the above appearance feature sequence over time, it is possible to further determine whether a candidate region belongs to a real target.
[0024] In another specific embodiment of this application, it can be seen that when the appearance feature sequence corresponding to a candidate moving target region exhibits smooth changes or continuous evolution over multiple consecutive frames, the candidate region can be determined to have high feature stability. This shows that real targets typically maintain a certain degree of appearance continuity within a short timeframe; even with slight attitude changes, their color, texture, and edge information still exhibit relatively stable temporal correlation. Conversely, if the appearance feature sequence shows abrupt changes, intermittent disappearances, or periodic interruptions in the time dimension, the signal changes in the antenna detection information can be combined to further determine whether there is occlusion, rapid deformation, or the target temporarily deviating from the field of view.
[0025] In this embodiment, by jointly determining changes in appearance features and antenna signal changes, the adaptability to complex scenes can be improved. For example, when the features of an image region weaken briefly but the antenna signal remains present, it can be determined that the target may be in a partially occluded state; while when the image features change abruptly and the antenna direction or intensity changes significantly at the same time, it can be inferred that the target may have undergone rapid deformation or a sudden change in motion. Therefore, this application can effectively improve the robustness of continuous target perception.
[0026] After the above processing, candidate moving target regions with stable features are identified as potential targets, and their appearance time intervals and spatial trajectories throughout the image sequence are recorded. Further, based on the trajectory position of the potential target in each frame, its corresponding minimum bounding rectangle region is determined on each frame of the original image to complete target localization. Thus, the time interval information reflects the temporal characteristics of the target's appearance and disappearance, the spatial trajectory information reflects the target's motion path in image space, and the minimum bounding rectangle provides a standardized region input for subsequent recognition.
[0027] Finally, the image blocks of the located potential target regions are aligned and weighted with the corresponding antenna feature information across modal features to construct a joint discriminative feature vector. This vector is then input into a pre-trained convolutional neural network classifier for adaptive classification, thereby obtaining the target category confidence and completing target recognition.
[0028] S1 includes acquiring raw photoelectric data in continuous time frames collected by the sensor, performing time-domain integration on the raw photoelectric data to generate a time-domain integrated image sequence; extracting antenna detection information based on the frame synchronization timestamp of the time-domain integrated image sequence, and projecting the signal intensity distribution onto the time-domain integrated image sequence using the spatial direction information in the antenna detection information to obtain composite image sequence data; if the cumulative pixel intensity value of a region in the composite image sequence data is higher than a preset background threshold, then outputting the acquired image sequence data after time-domain integration processing and the synchronously acquired antenna detection information corresponding to the image sequence.
[0029] In this embodiment, based on the aforementioned embodiments, step S1 mainly achieves the acquisition, enhancement processing, and spatiotemporal alignment of the original sensing data, thereby providing high-quality input data for subsequent motion analysis and target recognition. First, raw photoelectric data of continuous time frames is acquired through an image sensor, where the raw photoelectric data reflects the instantaneous light intensity response of each pixel at discrete sampling times. As a preferred embodiment, a time-domain integration operation is performed on the raw photoelectric data, that is, the light intensity signal at the same pixel location is accumulated frame by frame or weighted accumulation within a preset time window, thereby generating a corresponding time-domain integrated image sequence. Therefore, the method of this application can superimpose and enhance weak signals in the time dimension, effectively highlighting low-intensity targets that are originally difficult to identify in a single frame image after integration.
[0030] In this embodiment, further, after obtaining the time-domain integral image sequence, antenna detection information that matches the time of each frame is extracted from the antenna detection system based on the frame synchronization timestamp corresponding to each frame. It should be noted that, to ensure the accuracy of data fusion, timestamp synchronization can be achieved through a unified clock source or trigger signal, thereby ensuring that image acquisition and antenna detection are performed under the same time reference. Therefore, the method of this application can achieve strict alignment of image data and antenna data in the time dimension, avoiding information mismatch problems caused by time deviations.
[0031] Furthermore, after time alignment, the spatial direction information from the antenna detection data is used to perform spatial mapping processing on the signal intensity distribution. In this embodiment, the direction vector and corresponding signal intensity detected by the antenna are projected onto the pixel coordinate system of the time-domain integral image through a spatial coordinate transformation relationship, thereby forming an intensity distribution corresponding to the electromagnetic signal on the image plane. As a preferred embodiment, this spatial mapping process can be implemented based on system calibration parameters, including the extrinsic parameter matrix and intrinsic parameter mapping relationship between the antenna coordinate system and the image coordinate system. In some embodiments, the projection result can also be smoothed using interpolation methods to improve the continuity of the spatial mapping. Thus, through the above processing, the originally independent antenna signal can be transformed into distribution information consistent with the image space, thereby constructing composite image sequence data.
[0032] In addition, in this embodiment, after obtaining the composite image sequence data, pixel intensity cumulative value analysis is performed on specific regions within it. Specifically, the pixel intensity of each preset region or sliding window region in the composite image can be statistically calculated to obtain the corresponding cumulative intensity value. As a preferred embodiment, this cumulative value is compared with a preset background threshold, which can be obtained by statistically modeling historical background data, such as an adaptive threshold model based on mean and variance. As can be seen from the above, when the pixel intensity cumulative value of a certain region is higher than the background threshold, it indicates that the region exhibits significant enhancement features in both visual and antenna signals, thereby determining that the region contains potentially effective target information.
[0033] In another specific embodiment of this application, the background threshold can be dynamically adjusted according to the overall changes in the antenna signal. For example, when the antenna detects an overall signal enhancement, the threshold can be appropriately increased to suppress false detections; conversely, in a weak signal environment, the threshold can be decreased to improve detection sensitivity. Therefore, the method of this application can achieve an adaptive data filtering mechanism under different environmental conditions.
[0034] Finally, in this embodiment, when it is determined that a specific region in the composite image sequence data meets the intensity threshold condition, the corresponding time-domain integrated image sequence data and the antenna detection information acquired synchronously with the image sequence are output.
[0035] S2 includes acquiring the spatiotemporal gradient information of pixels in adjacent frames of an image sequence to obtain a gradient data set; constructing an optical flow constraint equation based on the gradient data set to obtain an underdetermined constraint relationship model, which contains two unknown velocity components; introducing a smoothing constraint term for the underdetermined constraint relationship model to construct an energy functional describing the global motion state, thus obtaining a global energy objective function; performing a minimization derivation of the global energy objective function to obtain a two-dimensional velocity component matrix; and performing vector synthesis on the two-dimensional velocity component matrix to generate a motion vector field describing the changing trend of pixels in the time dimension.
[0036] In this embodiment, based on the aforementioned embodiments, step S2 is mainly used to extract pixel-level motion information from the image sequence and construct a motion vector field describing the changing trend of each pixel in the time dimension, thereby providing a basis for the subsequent division of candidate target regions. First, the spatiotemporal gradient information of pixels in adjacent frames of the image sequence is obtained to obtain a gradient data set. Specifically, for any two adjacent frames in the image sequence, the grayscale change of each pixel in the horizontal, vertical, and temporal directions can be calculated respectively. The horizontal gradient reflects the brightness change of the pixel along the horizontal direction of the image, the vertical gradient reflects the brightness change of the pixel along the vertical direction of the image, and the temporal gradient reflects the brightness change of the same pixel between two consecutive frames. Thus, it can be seen that the original brightness changes in the image can be converted into a quantifiable gradient data set through the method of this application.
[0037] In this embodiment, furthermore, an optical flow constraint equation is constructed based on the gradient data set to obtain an underdetermined constraint relationship model. Specifically, the establishment of the optical flow constraint equation is based on the assumption of constant brightness, that is, assuming that the brightness value of the same target point remains basically unchanged during short-term continuous movement. Under this assumption, the gradient changes of pixels in space can be related to the gradient changes in time to establish a constraint expression that includes velocity components. As a preferred embodiment, the velocity components include the velocity components of pixels in the horizontal direction and the velocity components in the vertical direction of the image plane. Since only one constraint equation can usually be obtained for a single pixel through the constant brightness relationship, while the velocity components to be solved include two unknowns, an underdetermined constraint relationship model is formed.
[0038] Furthermore, a smoothing constraint term is introduced into the underdetermined constraint relationship model to construct an energy functional describing the global motion state, resulting in a global energy objective function. In this embodiment, to address the problem of insufficient constraints on individual pixels, it is further assumed that the motion states of adjacent pixels in the image have a certain continuity within a local range, meaning that adjacent pixels typically have similar velocity distributions. Based on this continuity assumption, a smoothing constraint term is added to the original optical flow constraint relationship to limit the abrupt changes in velocity components within the spatial neighborhood. As a preferred embodiment, the energy functional includes at least a data consistency term and a smoothing constraint term. The data consistency term measures the degree to which the velocity components satisfy the optical flow constraint equation, while the smoothing constraint term measures the smoothness of velocity changes between adjacent pixels. By combining these two parts, an energy objective function describing the global motion state of the entire image can be obtained.
[0039] Furthermore, it should be noted that in this embodiment, the global energy objective function is minimized to obtain a two-dimensional velocity component matrix. Specifically, the energy objective function can be solved by variational differentiation or discretization iteration to obtain the velocity solution that minimizes the objective function. Through this solution process, the velocity components of each pixel in the image in the horizontal and vertical directions can be obtained, forming a two-dimensional velocity component matrix. One velocity component matrix represents the horizontal motion of each pixel, and the other velocity component matrix represents the vertical motion of each pixel. In some embodiments, to improve the stability and accuracy of the solution, iterative termination conditions, convergence thresholds, or multi-scale initial values can be set for the solution process. Thus, the method of this application transforms the abstract global energy optimization process into a pixel-by-pixel two-dimensional velocity solution.
[0040] In another specific embodiment of this application, after obtaining the two-dimensional velocity component matrix, vector synthesis is performed on the two-dimensional velocity component matrix to generate a motion vector field describing the changing trend of pixels in the time dimension. Specifically, the horizontal velocity component and the vertical velocity component corresponding to the same pixel position are combined to obtain the two-dimensional motion vector of that pixel; further, the amplitude and direction of this motion vector can be calculated, where the amplitude is used to characterize the strength of pixel motion, and the direction is used to characterize the orientation of pixel motion. In this way, the motion vectors of all pixels in the entire image together constitute the motion vector field. Thus, the brightness change relationship between adjacent frames is ultimately transformed into an intuitive expression of pixel-level motion trends.
[0041] In this embodiment, as can be seen from the above, this scheme actually forms a complete motion estimation process, namely, firstly, obtaining the spatiotemporal gradient information of adjacent frame pixels, then establishing optical flow constraint equations based on the gradient data, then transforming the underdetermined problem into a global energy optimization problem by introducing smoothness constraints, and finally obtaining the two-dimensional velocity components by minimization and performing vector synthesis to output the motion vector field.
[0042] S3 includes acquiring antenna detection information containing detection azimuth data, mapping the antenna detection information to construct a spatially distributed weight matrix; applying weighted constraints to the initial motion vector field using the spatially distributed weight matrix to obtain a modified motion vector field; calculating the directional consistency coefficient and amplitude change rate of the modified motion vector field, and generating a vector similarity map based on the directional consistency coefficient and amplitude change rate; performing a region growing algorithm on the vector similarity map to segment independent connected regions, and using the independent connected regions as candidate motion target regions.
[0043] In this embodiment, based on the aforementioned embodiments, step S3 is used to introduce antenna detection information constraints on the initial motion vector field and to extract candidate moving target regions through vector consistency analysis. First, antenna detection information containing detection azimuth data is acquired. This detection azimuth data characterizes the spatial pointing information of the antenna response to the target and corresponds to the signal strength distribution in different directions. In this embodiment, based on the spatial correspondence between the antenna coordinate system and the image coordinate system, the detection azimuth data is mapped to the image space coordinates, and the signal strength in each direction is allocated according to pixel position, thereby constructing a spatial distribution weight matrix that corresponds one-to-one with the image pixels. Therefore, the method of this application transforms the antenna detection results into a weight distribution form in image space, providing a spatial constraint basis for subsequent motion vector processing.
[0044] In this embodiment, the spatially distributed weight matrix is further used to perform weighted constraint processing on the initial motion vector field. Specifically, for the motion vector at each pixel position in the initial motion vector field, a weight value corresponding to that position is introduced for constraint calculation, ensuring that the motion vector is spatially consistent with the antenna detection orientation. Through the above processing, motion vectors located within the antenna detection direction are retained and participate in subsequent calculations, while motion vectors located outside the detection direction are suppressed or weakened. Thus, abnormal vectors generated by background changes, noise disturbances, or local non-target motions in the initial motion vector field are eliminated, resulting in a corrected motion vector field.
[0045] In this embodiment, a further consistency calculation is performed on the modified motion vector field. Specifically, for each motion vector, multiple neighboring pixels within its neighborhood are selected, and their motion directions are compared and calculated to obtain a directional consistency coefficient. Simultaneously, the difference between the motion amplitude and the amplitude of the neighboring pixels is calculated to obtain the amplitude change rate. The directional consistency coefficient characterizes the consistency relationship between the current pixel and its neighboring pixels in the motion direction, and the amplitude change rate characterizes the change relationship between the current pixel and its neighboring pixels in motion intensity. Therefore, through the above calculations, the directional and amplitude relationships between motion vectors are transformed into quantifiable indicators.
[0046] Furthermore, based on the directional consistency coefficient and amplitude change rate, a comprehensive evaluation is performed on each pixel in the modified motion vector field, and a vector similarity map is generated. Specifically, the directional consistency result and amplitude change result of each pixel are fused, so that pixels that meet the conditions of directional consistency and continuous amplitude change form clustered regions in the map, and pixels that do not meet the above conditions form separated regions in the map. As can be seen from the above, the vector similarity map realizes a structured expression of the spatial distribution relationship of motion vectors.
[0047] Additionally, it should be noted that in this embodiment, a region growing algorithm is performed on the vector similarity map. Specifically, firstly, pixels that meet the set consistency conditions in the vector similarity map are selected as seed points. Then, starting from the seed point, its neighboring pixels are traversed and judged step by step. When a neighboring pixel meets the preset similarity condition, the pixel is merged into the current region, and the process continues to expand outward until no pixels that meet the conditions are found. Through the above region growing process, the map is divided into multiple independent connected regions.
[0048] S4 includes acquiring multi-frame data of the image sequence and locating the foreground pixel set; determining the boundary coordinates of the candidate moving target region based on the foreground pixel set; segmenting the target sub-image based on the boundary coordinates, generating a color histogram by statistically analyzing its pixel distribution, and performing grayscale processing on the target sub-image to obtain a grayscale image; calculating the grayscale co-occurrence matrix based on the grayscale image to construct texture feature data, and applying a gradient operator to obtain edge gradient features; concatenating and stitching the color histogram, texture feature data, and edge gradient features to extract the appearance feature sequence of the candidate moving target region in the multi-frame data of the image sequence.
[0049] In this embodiment, based on the aforementioned embodiments, step S4 is used to extract multi-dimensional appearance features of the candidate moving target region and construct a corresponding feature sequence in the time dimension. First, multi-frame data of the image sequence is acquired, and the foreground pixel set is located in each frame. The foreground pixel set originates from the candidate moving target region obtained in step S3. By marking all pixels within this region, a foreground pixel set in the corresponding frame is formed. Further, based on the spatial coordinates of each pixel in the foreground pixel set, its horizontal and vertical coordinate ranges in the image coordinate system are extracted, thereby determining the boundary coordinates of the candidate moving target region. Thus, it can be seen that the method of this application transforms discretely distributed foreground pixels into a target region representation with a clear spatial range.
[0050] In this embodiment, further, a region segmentation operation is performed on the original image based on the boundary coordinates to obtain a target sub-image corresponding to the candidate moving target region. For the target sub-image, the distribution of its pixels across each color channel is statistically processed. The pixel values are divided into preset intervals, and the number of pixels within each interval is counted to generate a color histogram, which is used to characterize the color distribution of the target region. Thus, the color information of the target region is converted into structured statistical data using the method of this application. Simultaneously, grayscale processing is performed on the target sub-image, converting the multi-channel image into a single-channel grayscale image, so that each pixel retains only brightness information, providing unified input data for subsequent texture feature extraction.
[0051] In this embodiment, a gray-level co-occurrence matrix is further calculated based on the gray-level image. Specifically, pixel pairs are selected in the gray-level image according to a preset spatial distance and direction, and the joint occurrence frequency of pixel gray-level values is statistically analyzed to construct a gray-level co-occurrence matrix, which describes the joint distribution relationship of gray-level values in space. Further, texture feature data is calculated based on the gray-level co-occurrence matrix, and the texture feature data is used to characterize the structural distribution within the target region. Therefore, the method of this application transforms the spatial structural information of an image into a quantifiable texture description.
[0052] In this embodiment, a gradient operator is further applied to the grayscale image to calculate the changes in pixel grayscale values in the spatial direction, obtaining the gradient magnitude and gradient direction of each pixel, thereby forming edge gradient features. These edge gradient features are used to characterize the contour position and boundary changes of the target region. Thus, through the grayscale gradient calculation process, the boundary information of the target region is transformed into computable feature data.
[0053] Additionally, it should be noted that in this embodiment, the color histogram, texture feature data, and edge gradient features are processed uniformly. Specifically, the above three types of features are concatenated and spliced in a fixed order to form a single feature vector, which is used to characterize the appearance information of the candidate moving target region in the current frame. Furthermore, in the multi-frame data of the image sequence, the feature vectors extracted from each frame are arranged in chronological order to form an appearance feature sequence.
[0054] S5 includes acquiring multiple frames of images corresponding to consecutive timestamps in the video stream data; extracting appearance feature vectors from candidate moving target regions in the multiple frames of images to generate an appearance feature sequence; calculating the similarity values of adjacent feature vectors in the appearance feature sequence to obtain an inter-frame difference sequence; performing numerical differentiation calculation on the inter-frame difference sequence to obtain gradient magnitude; generating a change rate sequence based on the gradient magnitude; if the inter-frame difference sequence is lower than a preset continuity threshold and the maximum gradient magnitude in the change rate sequence is less than a preset slow change threshold, then it is determined that the features of the candidate moving target region are stable.
[0055] In this embodiment, based on the aforementioned embodiments, step S5 is used to quantitatively analyze the appearance change process of the candidate moving target region in the time dimension, and to determine the feature stability based on the analysis results. First, multiple frames of images corresponding to consecutive timestamps in the video stream data are acquired. The multiple frames of images form a continuous image sequence in chronological order, and the timestamps are used to ensure the temporal correlation between each frame. In this embodiment, for the candidate moving target region in the multiple frames of images, its appearance feature vector is extracted frame by frame. Each appearance feature vector corresponds to the combination of color distribution, texture structure, and edge information of the target region in the current frame. Further, the appearance feature vectors extracted from each frame are arranged in chronological order to generate an appearance feature sequence. Thus, it can be seen that the appearance state of the candidate moving target region in a continuous time range is converted into an ordered feature sequence through the method of this application.
[0056] In this embodiment, the similarity value is further calculated pairwise for adjacent feature vectors in the appearance feature sequence. Specifically, the feature vectors corresponding to frame t and frame t+1 in the sequence are compared and calculated, the corresponding similarity results are output, and arranged in chronological order to form an inter-frame difference sequence. The inter-frame difference sequence is used to characterize the degree of appearance change of the candidate moving target region between adjacent time frames. Thus, the distribution of appearance change in the time dimension is converted into a continuous numerical sequence.
[0057] In this embodiment, further, numerical differentiation calculation is performed on the inter-frame difference sequence. Specifically, the difference operation is performed on adjacent difference values in the inter-frame difference sequence to obtain the change amount corresponding to each time position, and this change amount is recorded as the gradient magnitude. Based on the gradient magnitude of each time position, a change rate sequence is generated in chronological order. Thus, by reprocessing the inter-frame difference sequence, the appearance change process is further transformed from "degree of change" to "speed of change," thereby realizing the expression of the change trend.
[0058] Additionally, it should be noted that in this embodiment, stability determination is performed jointly based on the inter-frame difference sequence and the rate of change sequence. Specifically, firstly, threshold comparisons are performed on each value in the inter-frame difference sequence. When all difference values are lower than a preset continuity threshold, the appearance change between adjacent frames is determined to meet the continuity condition. Further, the gradient magnitudes in the rate of change sequence are statistically analyzed, the maximum gradient magnitude is extracted, and compared with a preset slow change threshold. When the maximum gradient magnitude is less than the slow change threshold, the appearance change rate is determined to meet the smoothness condition throughout the entire time interval. When both of the above determination conditions are met simultaneously, the corresponding candidate moving target region is determined to be a feature stable region.
[0059] S6 includes acquiring the video stream of the monitoring area and extracting texture vectors, generating an appearance feature sequence based on the texture vectors; calculating the distance difference between the texture vectors in the appearance feature sequence, and marking abnormal time segments if the distance difference exceeds a threshold or the texture vectors return to zero; extracting radio frequency signal segments from the antenna detection information based on the abnormal time segments, and analyzing the radio frequency signal segments to obtain signal strength attenuation evolution and Doppler frequency shift characteristics; if the signal strength attenuation evolution amplitude decreases, it is determined that there is object occlusion; if the signal strength attenuation evolution is stable and the Doppler frequency shift characteristics show spectral broadening, it is determined that there is rapid target deformation.
[0060] In this embodiment, based on the aforementioned embodiments, step S6 is used to determine the cause of the anomaly when an abnormal change in appearance occurs in the candidate moving target area, combining antenna detection information, and distinguishing between two states: object occlusion and rapid target deformation. First, a video stream of the monitoring area is acquired, and the texture vector of the target area in consecutive time frames is extracted from the video stream. The texture vector is used to characterize the texture distribution state of the target area in the current frame. Further, the texture vectors corresponding to each time frame are arranged in chronological order to generate an appearance feature sequence. Thus, it can be seen that the method of this application transforms the texture change process of the target area within a continuous time range into a temporally sequenced appearance feature expression.
[0061] In this embodiment, the distance difference between texture vectors in the appearance feature sequence is further calculated. Specifically, for adjacent texture vectors in the appearance feature sequence, the difference value is calculated pairwise according to a unified distance metric, and the difference values are arranged in chronological order to characterize the change in texture state between adjacent time frames. According to the above processing, when the distance difference corresponding to a certain time position exceeds a preset threshold, it indicates that the texture distribution between the two frames before and after that time position has a sudden change; when the texture vector corresponding to a certain time position is zero, it indicates that the texture information of the target area at that time position is missing. Thus, the time intervals that meet any of the above conditions are marked to obtain abnormal time segments. As can be seen from the above, step S6 first completes the abnormal time location through texture vector difference analysis.
[0062] In this embodiment, furthermore, radio frequency (RF) signal segments within a corresponding time range are extracted from the antenna detection information based on the abnormal time segments. The abnormal time segments and the extracted RF signal segments maintain a correspondence on the time axis. Signal analysis processing is performed on the RF signal segments to extract the attenuation evolution information of the signal intensity over time, and to extract the Doppler frequency shift characteristics of the signal in the frequency domain. The signal intensity attenuation evolution is used to characterize the change state of the target's corresponding RF echo intensity within the abnormal time range, and the Doppler frequency shift characteristics are used to characterize the target's motion spectrum distribution state within the abnormal time range. Therefore, the method of this application transforms the electromagnetic response process within the abnormal time segment into two judgment criteria: intensity characteristics and frequency shift characteristics.
[0063] Additionally, it should be noted that in this embodiment, if the signal strength attenuation decreases, an object occlusion is determined. Specifically, within an abnormal time segment, when the radio frequency signal strength decreases along the time axis, it indicates that the propagation path between the target and the antenna is obstructed, causing the received signal amplitude to attenuate. Based on this, it is determined that an object occlusion exists within the monitoring area. Thus, a correspondence is established between image texture anomalies and radio frequency strength decreases, completing the occlusion determination.
[0064] In this embodiment, further, if the signal strength attenuation is stable and the Doppler frequency shift characteristic shows spectral broadening, then it is determined that the target is undergoing rapid deformation. Specifically, within an abnormal time segment, when the radio frequency signal strength remains stable along the time axis, it indicates that the target echo channel is not affected by obstruction; simultaneously, when the Doppler frequency shift characteristic shows spectral broadening, it indicates that the target surface or local structure has undergone rapid changes in a short period of time, causing the distribution range of the corresponding echo frequency components to expand, thereby determining that the target is in a rapid deformation state. Thus, through the joint analysis of the signal strength state and the spectral distribution state, a deformation state determination that is distinguishable from the obstruction state is achieved.
[0065] In this embodiment, as can be seen from the above, this solution forms a complete abnormal cause analysis process, namely, firstly, extracting texture vectors based on the video stream of the monitoring area and generating an appearance feature sequence, then marking abnormal time segments by texture vector distance difference and texture vector zeroing state, then extracting the corresponding radio frequency signal segments from the antenna detection information and extracting signal strength attenuation evolution and Doppler frequency shift features, and finally completing object occlusion determination and target rapid deformation determination based on radio frequency signal features.
[0066] S7 includes acquiring an image sequence and extracting initial candidate regions, calculating the regional feature vectors of the initial candidate regions, determining stable candidate moving target regions by calculating the cosine similarity of the regional feature vectors, acquiring the centroid coordinates of stable candidate moving target regions and generating spatiotemporal data points, constructing an association linked list based on the spatiotemporal data points to determine potential targets, extracting the time intervals of potential targets and arranging the centroid coordinates to generate spatial trajectories, and recording the time intervals and spatial trajectories of potential targets appearing in the entire image sequence.
[0067] In this embodiment, based on the foregoing embodiments, step S7 is used to filter feature-stable regions within the candidate moving target regions and establish target associations in the time dimension, thereby determining potential targets and constructing their motion trajectories. First, an image sequence is acquired, and initial candidate regions are extracted in each frame. These initial candidate regions correspond to the candidate moving target regions output in step S3. Further, a region feature vector is extracted for each initial candidate region. This region feature vector is composed of the color features, texture features, and edge features of the pixels within the region, used to characterize the appearance attributes of the region. Thus, the method of this application converts candidate regions into a unified form of feature representation.
[0068] In this embodiment, cosine similarity is further calculated for the regional feature vectors of corresponding candidate regions in different time frames. Specifically, the regional feature vectors in adjacent time frames are calculated pairwise according to chronological order to obtain the corresponding cosine similarity values, and the calculation results are used to characterize the consistency of regional features in the time dimension. Based on the above calculation results, candidate regions that meet the similarity judgment criteria are screened to determine candidate moving target regions with stable features. Thus, the method of this application transforms the feature consistency of regions in the time dimension into a clear judgment result.
[0069] In this embodiment, further, the centroid coordinates of the identified stable candidate moving target regions are extracted. Specifically, the geometric center of the pixel set of the corresponding region in each frame is calculated to obtain the centroid position of the region in the image coordinate system. Further, the centroid coordinates are combined with the temporal information of the corresponding time frame to generate spatiotemporal data points. These spatiotemporal data points are used to describe the spatial position of the target at a specific time. Thus, by combining spatial coordinates and temporal information, the spatiotemporal representation of the target position is achieved.
[0070] Furthermore, it should be noted that in this embodiment, an associated linked list is constructed based on the spatiotemporal data points. Specifically, all spatiotemporal data points are traversed in chronological order, and data points belonging to the same moving target are connected according to the continuity of spatial location and temporal adjacency to form a chain structure. Through this association process, discretely distributed spatiotemporal data points are merged into multiple continuous data chains, each corresponding to a potential target. Therefore, the method of this application achieves the transformation from discrete detection results to continuous target representation.
[0071] In this embodiment, the time interval of the potential target is further extracted from the associated linked list. Specifically, the time corresponding to the first node in the linked list is taken as the target's appearance time, and the time corresponding to the last node in the linked list is taken as the target's end time, thereby determining the time interval of the potential target in the image sequence. Simultaneously, the centroid coordinates corresponding to the nodes in the linked list are arranged according to their time order to generate the target's spatial trajectory. Thus, the method of this application unifies the expression of the target's existence interval in the time dimension and its movement path in the spatial dimension.
[0072] S8 includes acquiring the spatial trajectory and time interval of the potential target, and extracting the corresponding frame image from the original image sequence according to the time interval; using the geographic coordinate points in the spatial trajectory to perform coordinate mapping on the frame image to determine the initial center point, and combining the motion vectors between adjacent frame images to calculate the candidate region corresponding to the initial center point; if the pixel intensity distribution in the candidate region meets the preset threshold, the minimum bounding rectangle of the candidate region is determined by the boundary refinement algorithm to complete the target localization on each frame of the original image.
[0073] In this embodiment, based on the foregoing embodiments, step S8 is used to recover the target position frame by frame in the original image sequence and complete precise localization based on the potential target's temporal information and spatial trajectory. First, the spatial trajectory and time interval of the potential target are obtained. The spatial trajectory is composed of the centroid coordinate sequence recorded in step S7, and the time interval is determined by the start and end times of the potential target in the image sequence. Further, according to the time interval, frame images within the corresponding time range are extracted from the original image sequence and arranged in chronological order to form a set of frame images corresponding to the potential target. Thus, the method of this application establishes a correspondence between the temporal information of the potential target and the original image data.
[0074] In this embodiment, furthermore, coordinate mapping processing is performed on the frame image using geographic coordinate points in the spatial trajectory. Specifically, a temporal correspondence is established between the geographic coordinate points corresponding to each time node in the spatial trajectory and the frame image, and the geographic coordinates are converted into image pixel coordinates based on the calibration relationship between the geographic coordinate system and the image coordinate system. The converted pixel coordinates serve as the initial center point in the corresponding frame. Thus, it can be seen that through the coordinate mapping process, the spatial position of the potential target in the trajectory is converted into a specific pixel position in the image.
[0075] In this embodiment, the candidate region corresponding to the initial center point is further calculated by combining the motion vectors between adjacent frame images. Specifically, in the current frame, using the initial center point as a reference, the motion vector data between the current frame and adjacent frames is read. Based on the direction and displacement of the motion vectors, the initial center point is spatially expanded to form a region covering the target's motion range. This region is constructed around the initial center point and maintains a consistent relationship with the target's motion state in the time dimension. Thus, candidate regions are obtained through the joint constraints of spatial trajectory information and motion vector information.
[0076] Additionally, it should be noted that in this embodiment, a determination process is performed on the pixel intensity distribution within the candidate region. Specifically, the intensity values of all pixels within the candidate region are statistically analyzed, and the statistical results are compared with a preset threshold. When the pixel intensity distribution within the candidate region meets the preset threshold condition, it is determined that the region contains target response information, and subsequent boundary refinement processing begins. As can be seen from the above, the candidate region is filtered through the pixel intensity determination process to ensure that subsequent processing is performed on the effective target region.
[0077] In this embodiment, a boundary refinement algorithm is further performed on candidate regions that meet the conditions. Specifically, the boundary pixels of the candidate regions are traversed, and the region boundaries are shrunk and corrected according to the pixel intensity distribution and region connectivity, removing the boundary portions of non-target regions and retaining the region covering the target subject. After the boundary correction is completed, the minimum bounding rectangle is extracted from the corrected region. The minimum bounding rectangle is used to represent the target's location result in the current frame image. The above process is repeated for each frame within the time interval to complete the location process of the potential target in each frame of the original image.
[0078] S9 includes acquiring image patches of the potential target region after localization and their corresponding antenna feature information; matching image pixel coordinates with antenna signal strength distribution through spatial mapping to obtain a preliminarily aligned original multimodal dataset; extracting image patch texture details and antenna feature phase offsets from the original multimodal dataset; performing cross-modal feature alignment on feature spaces of different dimensions using an affine transformation matrix to obtain an aligned high-dimensional feature matrix; dynamically allocating the feature matrix according to a preset signal-to-noise ratio weight; deeply interacting visual semantic information and electromagnetic spectrum features through a weighted fusion algorithm to obtain a joint discriminative feature vector containing multidimensional attributes; inputting the joint discriminative feature vector into a pre-trained convolutional neural network classifier; if the activation value output by the classifier exceeds a preset discrimination threshold, then performing adaptive classification logic based on the joint discriminative feature vector to obtain the target's category confidence and complete target recognition.
[0079] In this embodiment, based on the aforementioned embodiments, step S9 is used to perform cross-modal feature alignment, fusion processing, and classification recognition on the target region after target localization, thereby outputting the target category result. First, the potential target region image patch and the corresponding antenna feature information are obtained after localization. The potential target region image patch is obtained by cropping the minimum bounding rectangle region obtained in step S8, and the antenna feature information consists of antenna detection data corresponding to the target in time interval and spatial location. Further, matching processing is performed on the image pixel coordinates and antenna signal strength distribution through spatial mapping relationship. Specifically, according to the calibration parameters between the image coordinate system and the antenna coordinate system, the position of each pixel in the image patch is mapped to the corresponding position in the antenna signal space, and the signal strength value of the corresponding position is extracted, so that the image data and antenna data establish a correspondence under the same spatial reference, thereby forming a preliminary aligned original multimodal dataset. It can be seen that, through the method of this application, a unified expression of visual information and antenna information in the spatial dimension is achieved.
[0080] In this embodiment, further, image patch texture details and antenna feature phase shifts are extracted from the original multimodal dataset. Specifically, texture feature extraction processing is performed on the image patches to obtain texture detail data reflecting the target surface structure and local changes; simultaneously, phase analysis processing is performed on the antenna signal to obtain the corresponding phase shift features, which are used to characterize the electromagnetic response state of the target. Further, an affine transformation matrix is used to perform alignment processing on feature spaces of different dimensions. Specifically, based on the differences in coordinate representation and dimensional distribution between the visual feature space and the antenna feature space, an affine transformation matrix is constructed to uniformly map the two types of features, enabling texture features and phase features to be aligned in the same high-dimensional space. Based on the above processing, an aligned high-dimensional feature matrix is generated. Thus, cross-modal feature alignment enables features from different sources to have a unified representation.
[0081] In this embodiment, the high-dimensional feature matrix is further dynamically allocated according to a preset signal-to-noise ratio (SNR) weight. Specifically, based on the SNR parameters of the antenna signal and image signal during acquisition, corresponding weights are assigned to each feature component in the high-dimensional feature matrix, ensuring that features corresponding to signal quality occupy a corresponding proportion during the fusion process. Furthermore, a weighted fusion algorithm is used to perform joint calculations on visual semantic information and electromagnetic spectrum features. Specifically, texture features, structural features, and edge information are fused with signal strength, phase shift, and spectral features according to weighted relationships, enabling different modal features to form associated representations in the same vector space. Based on the above processing, a joint discriminative feature vector containing multi-dimensional attributes is generated. Thus, through the feature weighting and fusion process, unified encoding of multi-modal information is achieved.
[0082] Additionally, it should be noted that in this embodiment, the joint discriminative feature vector is input into a pre-trained convolutional neural network classifier. The convolutional neural network classifier receives the joint discriminative feature vector as input, performs forward propagation calculations, and outputs activation values corresponding to each category. Further, a threshold determination process is performed on the activation values output by the classifier. When the activation value exceeds a preset discrimination threshold, a classification decision is made for the target. Specifically, based on the response results of the joint discriminative feature vector at each category output node, the category to which the target belongs is determined, and the corresponding category confidence is calculated. Thus, the target recognition process is completed, and the target category result is output.
[0083] like Figure 3 As shown, a device is also provided, comprising at least one processor and at least one memory communicatively connected to the processor; wherein the memory stores program instructions executable by the processor, and the processor, by invoking the program instructions, can execute the time-domain integral image sensing method for fusing antenna domain information as described above. The electronic device of this application includes a processor 31, a memory 32, and a storage space 33 for storing program code. The storage space 33 contains program code 34 for executing the prompt-based logical reasoning method according to this application, which is used to execute the aforementioned time-domain integral image sensing method for fusing antenna domain information.
[0084] like Figure 4 As shown, this application also proposes a non-transitory computer-readable storage medium storing program code (computer instructions) 41 for executing the prompt-based logical reasoning method according to this application. The aforementioned computer-readable storage medium may be any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
[0085] Examples of computer-readable storage media include, but are not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, or any combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: electrical connections having one or more wires; portable computer disks; hard disks; random access memory (RAM); read-only memory (ROM); erasable programmable read-only memory (EPROM); flash memory; optical fiber; portable compact disk read-only memory (CD-ROM); optical storage devices; magnetic storage devices; or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium containing or storing program code that can be used by or in connection with an instruction execution system, apparatus, or device. Computer-readable signal media can include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination thereof. The computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including—but not limited to—wireless, wired, optical fiber, RF, etc., or any suitable combination thereof. The computer program code for performing the methods of this application can be written in one or more programming languages or a combination thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, Python, etc.—and conventional procedural programming languages—such as C or similar programming languages. The program code can be executed entirely on a user's computer, partially on a user's computer, as a standalone software package, partially on a user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0086] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A time-domain integral image sensing method that fuses antenna domain information, characterized in that, include: S1. Acquire the image sequence data after time-domain integration processing, and simultaneously acquire the antenna detection information corresponding to the image sequence. The antenna detection information includes signal strength distribution or spatial direction information. S2. The optical flow analysis method is used to calculate the motion vector field of each pixel in the image sequence between adjacent time frames. The motion vector field describes the change trend of the pixel in the time dimension. S3. Based on the antenna detection information, constrain or weight the motion vector field, and according to the consistency of the amplitude and direction of the motion vector field, divide the image into connected regions with coherent motion patterns, and use the connected regions as candidate motion target regions. S4. For each candidate moving target region, extract its appearance feature sequence in the multi-frame data of the image sequence. The appearance feature sequence includes color histogram, texture features and edge gradient features. S5. If the appearance feature sequence shows a continuous change pattern in the time dimension, then the features of the candidate moving target region are determined to be stable. S6. If the appearance feature sequence jumps or disappears periodically in the time dimension, then combine the signal changes in the antenna detection information to determine whether there is an object blocking or the target is rapidly deformed. S7. Identify candidate moving target regions with stable features as potential targets and record their temporal intervals and spatial trajectories throughout the entire image sequence.
2. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S1 includes: Acquire raw photoelectric data in continuous time frames collected by the sensor, and perform time-domain integration on the raw photoelectric data to generate a time-domain integrated image sequence; Antenna detection information is extracted based on the frame synchronization timestamp of the time-domain integral image sequence. The signal intensity distribution is projected onto the time-domain integral image sequence using the spatial direction information in the antenna detection information to obtain composite image sequence data. If the cumulative pixel intensity of a region in the composite image sequence data is higher than the preset background threshold, the acquired image sequence data after time-domain integration processing and the antenna detection information corresponding to the image sequence acquired simultaneously will be output.
3. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S2 includes: Obtain the spatiotemporal gradient information of pixels in adjacent frames of an image sequence to obtain a gradient data set; Based on the gradient data set, optical flow constraint equations are constructed to obtain an underdetermined constraint relationship model, which contains two unknown velocity components. For the underdetermined constraint relationship model, a smoothing constraint term is introduced to construct an energy functional describing the global motion state, and the global energy objective function is obtained. The global energy objective function is minimized to obtain a two-dimensional velocity component matrix. Vector synthesis is then performed on the two-dimensional velocity component matrix to generate a motion vector field describing the changing trend of pixels in the time dimension.
4. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S3 includes: Obtain antenna detection information containing detection azimuth data, and map the antenna detection information to construct a spatially distributed weight matrix; The initial motion vector field is weighted by a spatially distributed weight matrix to obtain the modified motion vector field; Calculate the directional consistency coefficient and amplitude change rate of the corrected motion vector field, and generate a vector similarity map based on the directional consistency coefficient and amplitude change rate; The vector similarity map is segmented into independent connected regions using a region growing algorithm, and these independent connected regions are used as candidate motion target regions.
5. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S4 includes: Acquire multi-frame data of an image sequence and locate the foreground pixel set; determine the boundary coordinates of the candidate moving target region based on the foreground pixel set. The target sub-image is segmented based on the boundary coordinates, its pixel distribution is statistically analyzed to generate a color histogram, and the target sub-image is converted to grayscale to obtain a grayscale image. The gray-level co-occurrence matrix is calculated based on the gray-level image to construct texture feature data, and the gradient operator is applied to obtain edge gradient features; The color histogram, texture feature data and edge gradient features are concatenated and stitched together to extract the appearance feature sequence of candidate moving target regions in multi-frame data of image sequence.
6. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S5 includes: Acquire multiple frames of images corresponding to consecutive timestamps in the video stream data, and extract appearance feature vectors for candidate moving target regions in the multiple frames of images to generate appearance feature sequences; The inter-frame difference sequence is obtained by calculating the similarity values of adjacent feature vectors in the appearance feature sequence; Numerical differentiation is performed on the inter-frame difference sequence to obtain the gradient magnitude, and a change rate sequence is generated based on the gradient magnitude. If the inter-frame difference sequence is lower than the preset continuity threshold and the maximum gradient magnitude in the rate of change sequence is less than the preset slow change threshold, then the features of the candidate moving target region are determined to be stable.
7. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S6 includes: Acquire video streams of the monitored area and extract texture vectors, then generate appearance feature sequences based on the texture vectors; Calculate the distance difference between texture vectors in the appearance feature sequence. If the distance difference exceeds the threshold or the texture vector returns to zero, mark the abnormal time segment. Radio frequency (RF) signal segments are extracted from antenna detection information based on abnormal time segments, and the signal strength attenuation evolution and Doppler frequency shift characteristics are obtained by analyzing the RF signal segments. If the signal strength attenuation decreases, it is determined that there is an object blocking the signal. If the signal strength attenuation is stable and the Doppler frequency shift characteristics show spectral broadening, then it is determined that there is rapid deformation of the target.
8. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that: S7 includes: Acquire image sequences and extract initial candidate regions, calculate the region feature vectors of the initial candidate regions, and determine the feature-stable candidate moving target regions by calculating the cosine similarity of the region feature vectors; Obtain the centroid coordinates of candidate moving target regions with stable features and generate spatiotemporal data points; Construct an associated linked list based on spatiotemporal data points to identify potential targets; Extract the time intervals of potential targets and arrange the centroid coordinates to generate spatial trajectories. Record the time intervals and spatial trajectories of potential targets throughout the entire image sequence.
9. The time-domain integral image sensing method for fusing antenna domain information according to claim 1, characterized in that, It also includes S8, which determines the corresponding minimum bounding rectangle region on each frame of the original image based on the spatial trajectory and time interval of the potential target, thus completing the target localization. Specifically, this includes: Obtain the spatial trajectory and time interval of the potential target, and extract the corresponding frame image from the original image sequence according to the time interval; The initial center point is determined by mapping the geographic coordinates of the frame image using the spatial trajectory, and the candidate region corresponding to the initial center point is calculated by combining the motion vectors between adjacent frame images. If the pixel intensity distribution within the candidate region meets a preset threshold, the minimum bounding rectangle of the candidate region is determined by a boundary refinement algorithm to complete target localization on each frame of the original image.
10. The time-domain integral image sensing method for fusing antenna domain information according to claim 9, characterized in that, It also includes S9, which performs cross-modal feature alignment and weighted fusion of the located potential target region image patch and the corresponding antenna feature information to construct a joint discriminative feature vector, and inputs it into a pre-trained convolutional neural network classifier for adaptive classification to obtain the target's category confidence and complete the target recognition. Specifically, it includes: After obtaining the image patch of the potential target area after localization and the corresponding antenna feature information, the image pixel coordinates and antenna signal intensity distribution are matched by spatial mapping relationship to obtain the initial aligned original multimodal dataset; Extract image patch texture details and antenna feature phase offset from the original multimodal dataset, and use affine transformation matrix to perform cross-modal feature alignment on feature spaces of different dimensions to obtain aligned high-dimensional feature matrix; The feature matrix is dynamically allocated according to the preset signal-to-noise ratio weights. The visual semantic information and electromagnetic spectrum features are deeply interacted through a weighted fusion algorithm to obtain a joint discriminative feature vector containing multi-dimensional attributes. The joint discriminative feature vector is input into a pre-trained convolutional neural network classifier. If the activation value output by the classifier exceeds the preset discrimination threshold, adaptive classification logic is executed based on the joint discriminative feature vector to obtain the target's category confidence and complete the target recognition.