Moving target optical detection and identification system under large airspace coverage
By combining optical detection arrays and image processing models, the contradiction between imaging accuracy, frame rate, and transmission rate in optical detection under large spatial coverage is resolved, enabling high-precision, high-frame-rate imaging and recognition of tiny moving targets, thus meeting the real-time requirements of high-speed optical detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DALIAN UNIV OF TECH
- Filing Date
- 2025-01-06
- Publication Date
- 2026-06-26
Smart Images

Figure CN119942309B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of optical imaging and detection, and relates to an optical detection and recognition system for moving targets under large airspace coverage. Background Technology
[0002] Optical inspection enables functions such as inspecting product surface information, non-contact measurement of product dimensions, determining target location, and identifying target features. It is widely used in automated inspection, intelligent manufacturing, traffic monitoring, autonomous driving, logistics automation, and medical imaging. However, there is a conflict between the large spatial coverage requirements and high-precision imaging needed in optical inspection. On one hand, scenarios like traffic monitoring and autonomous driving, which require sufficiently large spatial coverage, do not demand centimeter-level imaging accuracy. On the other hand, scenarios like intelligent manufacturing and medical imaging, which require sufficiently high imaging accuracy, often only need to cover a portion of the surrounding space. Current optical inspection equipment struggles to balance both needs for scenarios requiring both large spatial coverage and high imaging accuracy. There is also a conflict between high imaging accuracy and high frame rate in optical inspection. For some high-speed optical inspection scenarios, such as bullet penetration, explosion effects, and fluid collisions, continuous and rapid capture of instantaneous material interactions is required, resulting in lower imaging accuracy. Maintaining imaging accuracy necessitates compromises in spatial coverage. Industrial cameras, widely used optical inspection devices, currently achieve a maximum resolution of 150 million pixels, but their frame rate is only 6.2 fps, which is completely insufficient for real-time optical inspection of moving targets under large spatial coverage conditions. Even if the spatial coverage is reduced to maintain imaging accuracy with lower pixels, the high imaging frame rate will still generate a large amount of optical image data, leading to a significant increase in data transmission pressure. In optical inspection applications, spatial coverage, imaging accuracy, imaging frame rate, and data transmission rate are mutually restrictive, requiring a balance and trade-off among these four indicators based on actual optical imaging needs. Summary of the Invention
[0003] To address the aforementioned problems in the prior art, this invention provides an all-optical long-distance transmission optical detection and recognition system for moving targets under large spatial coverage, achieving high-precision, high-frame-rate, and high-transmission-rate imaging and recognition of tiny moving targets under large spatial coverage, thus overcoming the mutual constraints between spatial coverage, imaging accuracy, frame rate, and data transmission rate in optical detection.
[0004] To achieve the above-mentioned technical objectives, the present invention adopts the following technical solution:
[0005] An optical detection and recognition system for moving targets with wide airspace coverage includes: a data acquisition front-end, an optical transmission link, a detection and recognition back-end, and a real-time display terminal; the data acquisition front-end, the optical transmission link, and the detection and recognition back-end are connected sequentially via network cables and network ports that meet data transmission rate requirements; the detection and recognition back-end transmits images to the real-time display terminal via a display cable;
[0006] The data acquisition front end is used to segment the large airspace, including moving targets, acquire optical images of each sub-airspace, convert them into transmission data formats, and then send them.
[0007] The optical domain transmission link is used to transmit the optical image data of each sub-spatial domain obtained by the data acquisition front end in a high-capacity, high-speed, long-distance, and low-loss manner.
[0008] The detection and identification backend is used to receive the optical image data of the sub-spatial domain, and use the optical image data of each sub-spatial domain to detect whether there is a moving target in the large spatial domain and to identify the moving target.
[0009] The real-time display terminal is used to receive and display the optical images of each sub-spatial domain after processing by the detection and recognition backend.
[0010] In some possible implementations, the data acquisition front end includes: a spatial coverage optical detection array and a data acquisition industrial control computer;
[0011] The airspace coverage optical detection array includes several optical detection devices for field-of-view segmentation of a large airspace. Each optical detection device is responsible for acquiring optical images of a portion of the airspace within the large airspace, obtaining several sub-airspace optical images, and converting the several sub-airspace optical images into a transmission format.
[0012] The data acquisition industrial control computer is provided in multiple units, and the number of units is the same as the number of optical detection devices in the airspace coverage optical detection array. The multiple data acquisition industrial control computers are connected to the multiple optical detection devices one by one through data acquisition lines, and are used to receive the optical image data of each sub-airspace converted into the transmission format and temporarily store it in the storage medium of the industrial control computer.
[0013] In some possible implementations, the optical domain transmission link includes: an optical network unit and an optical line terminal; the optical network unit transmits sub-spatial optical image data to the optical line terminal via an optical fiber;
[0014] The optical network unit is provided in multiple units, and the number of units is the same as the number of data acquisition industrial control computers. It is used to modulate the sub-spatial optical image data received by the data acquisition industrial control computers into the optical domain.
[0015] The optical line terminal is used to integrate and demodulate multiple optical domain signals.
[0016] In some possible implementations, the optical line terminal further includes: an optical port module and an electrical port module;
[0017] The optical port module is provided in multiple quantities, and the number of these quantities is the same as the number of the optical network units. It is used to insert into the optical line terminal interface to receive optical image data from each sub-spatial domain.
[0018] The electrical port module is used to insert into the optical line terminal interface to send demodulated optical image data of each sub-spatial domain.
[0019] In some possible implementations, the detection and recognition backend includes an image processing server; the image processing server receives and stores optical image data from each sub-spatial domain for detecting whether there are moving targets in the large spatial domain and for recognizing the moving targets.
[0020] The beneficial effects of this invention are:
[0021] 1) By dividing the large airspace through an optical detection array, each optical detection device can cover part of the airspace with high imaging accuracy, breaking through the constraint between airspace coverage and imaging accuracy, and realizing large airspace optical imaging for small targets according to actual needs.
[0022] 2) Spatial segmentation greatly improves the imaging frame rate while ensuring imaging accuracy, providing a solution for high-precision optical detection of moving targets under large spatial coverage.
[0023] 3) Building a sub-spatial optical image data transmission channel through an optical domain transmission link can ensure high imaging frame rate requirements while also meeting the requirements for large capacity and high transmission rate.
[0024] 4) The detection and recognition backend can realize real-time processing and recognition of optical images in each sub-spatial domain of the front end, which improves the recognition accuracy, effectively reduces the probability of false detection and missed detection, and while improving the brightness and contrast of the image, it suppresses the amplification of noise and the loss of details, preserves detailed information, and thus achieves accurate and stable detection and recognition of the enhanced image.
[0025] 5) The real-time display terminal will present real-time optical images of each sub-space and key information of small moving targets. Attached Figure Description
[0026] Figure 1 This is a schematic diagram of the structure of the all-optical long-distance transmission moving target optical detection and identification system under large airspace coverage of the present invention;
[0027] Figure 2 This is a schematic diagram of the large spatial field of view segmentation and optical detection array arrangement according to an embodiment of the present invention;
[0028] Figure 3 This is a schematic diagram of the optical image data transmission link between various sub-spatial domains under large spatial coverage according to an embodiment of the present invention;
[0029] Figure 4 This is a schematic diagram of the algorithm for detecting and recognizing moving targets in a large airspace according to an embodiment of the present invention. Detailed Implementation
[0030] The present invention will now be described in detail with reference to the accompanying drawings and embodiments. The accompanying drawings constitute a part of this application and are used together with the embodiments of the present invention to illustrate the principles of the present invention, but are not intended to limit the scope of the present invention.
[0031] Please see Figure 1 , Figure 1 This is a schematic diagram of the structure of an all-optical long-distance transmission moving target optical detection and recognition system with large airspace coverage provided by the present invention. The system discloses a moving target optical detection and recognition system with large airspace coverage, comprising: a data acquisition front-end, an optical domain transmission link, a detection and recognition back-end, and a real-time display terminal; wherein, the data acquisition front-end, the optical domain transmission link, and the detection and recognition back-end are connected sequentially via network cables and network ports that meet data transmission rate requirements; the detection and recognition back-end transmits the image to the real-time display terminal via a display line;
[0032] The data acquisition front end is used to segment the large airspace, including moving targets, acquire optical images of each sub-airspace, convert them into transmission data formats, and then send them.
[0033] The optical domain transmission link is used for high-capacity, high-speed, long-distance, and low-loss transmission of optical image data from various sub-spatial domains acquired by the data acquisition front-end.
[0034] The detection and identification backend is used to receive optical image data of each sub-spatial domain, and use the received optical image data of each sub-spatial domain to detect whether there is a moving target in the large spatial domain and to identify the moving target.
[0035] The real-time display terminal is used to receive and display the optical images of each sub-spatial domain after processing by the detection and recognition backend.
[0036] In the above embodiment, the large airspace range is 12.0m long × 11.3m wide × 1.0m high. The small moving target enters the airspace range vertically at a maximum speed of 20m / s, and its minimum size is 5mm.
[0037] The data acquisition front-end includes a spatial coverage optical inspection array and a data acquisition industrial control computer. The optical inspection array uses a Hikvision MV-CH210-90YM / YC industrial camera with a resolution of 5120×4096 and a frame rate of 222fps. The designed field of view is 3.5m×2.8m, corresponding to the 5120×4096 resolution of the industrial camera, achieving an imaging accuracy of 0.68mm / pixel. When paired with a Hikvision MVL-AF3528M-M42 lens with a focal length of 35mm, a working distance of 5.3m is achieved, with an angle of view of 36.44° at the long side of the field of view (3.5m) and 29.50° at the short side (2.8m). When paired with a Hikvision MVL-AF5040M-M42 lens with a focal length of 50mm, a working distance of 7.4m is achieved, with an angle of view of 25.95° at the long side of the field of view (3.5m) and 20.88° at the short side (2.8m). A layered optical inspection array is achieved by using the same type of industrial camera with lenses of two different focal lengths. The upper layer consists of 10 industrial cameras arranged in pairs, facing each other, 5 meters above the large field of view. The lower layer consists of 10 industrial cameras installed in a one-to-one correspondence with the upper layer cameras, with a 1-meter distance between the two layers. For a schematic diagram of the large field of view segmentation and optical inspection array arrangement, please refer to [link / reference needed]. Figure 2 .
[0038] The industrial camera and the acquisition card transmit sub-spatial optical image data via a CoaXPress cable. Each industrial camera is equipped with a data acquisition card, and each data acquisition card requires a data acquisition industrial control computer to temporarily store the sub-spatial optical image data.
[0039] Please see Figure 3 , Figure 3 This is a schematic diagram of the optical image data transmission link for each sub-spatial area under the large airspace coverage of this embodiment. The optical domain transmission link includes several optical network units and optical line terminals. The sub-spatial optical image data is transmitted through Category 5e or Category 6 network cables between the data acquisition industrial control computer and the optical network unit. The sub-spatial optical image data is transmitted at 24fps. Based on the resolution of the industrial camera, it can be calculated that the transmission rate of a single data transmission channel needs to reach 480Mbps. Therefore, each data acquisition industrial control computer and optical network unit needs to be equipped with a gigabit network port.
[0040] The optical line terminal (OLT) also includes optical port modules and electrical port modules. The optical network unit, optical port modules, OLT, and 10 Gigabit electrical port modules can all be TP-LINK products, specifically models TL-NGP650-S2-4G, TL-SM610-OLT64, TL-NOLT800-16-24T2Q, and TL-SM510U. After integrating the 20 channels of sub-spatial optical image data, the data volume per second reaches 9.6 GB. The maximum transmission rate of 10 Gbps perfectly meets the requirement for real-time transmission of various sub-spatial optical image data to the image processing server in the detection and recognition backend. Simultaneously, the image processing server needs to have a 10 Gigabit Ethernet port, and the OLT and image processing server must be connected via a Cat 6e Ethernet cable to achieve the 10 Gbps maximum transmission rate.
[0041] Please see Figure 4 Optical image data from various sub-spatial domains, received from the data acquisition front end, are stored in the storage medium of the image processing server via an optical domain transmission link. This data is used to detect the presence of moving targets in large spatial domains with varying illumination conditions and to identify the moving targets. The sub-spatial domain optical images under conditions exceeding normal illumination are used as the reference illumination images. The specific implementation method is as follows:
[0042] S1: The sub-spatial optical image and the corresponding reference illuminance image are labeled and preprocessed to obtain a small target dataset;
[0043] S2: Construct a multi-scale enhancement model. Train the multi-scale enhancement model using sub-spatial optical images and corresponding reference illumination images from the small target dataset to obtain the MSEM model. The constructed MSEM model enhances each sub-spatial optical image, which helps to better preserve and transmit multi-scale feature information, complete the fusion of deep features, and capture more detailed features. By optimizing the network architecture of the MSEM model, redundancy is reduced and computational efficiency is improved, making the network more lightweight and easier to train, enabling recognition in complex environments.
[0044] S3: Construct a small target semantic segmentation model. Train the small target semantic segmentation model using the sub-spatial optical image enhanced by the MSEM model and the label map in the small target dataset to obtain the DR-UNet model, which can be used to segment and recognize the enhanced image. By embedding a channel spatial attention mechanism in the backbone network of the constructed DR-UNet model, the model's ability to perceive small targets is enhanced. The network model has a more in-depth structure, which improves its feature extraction and expression capabilities, enabling the model to better adapt to image segmentation tasks of different scales and complexities, improve the model's generalization ability, and optimize the network structure and performance.
[0045] S4: Perform contour detection on the segmented image and extract the target edge features of the segmented image;
[0046] S5: Based on the target edge features, the target center is located to obtain the position of the small target.
[0047] In step S2, the multi-scale enhancement model includes a decomposition module and an enhancement module;
[0048] The decomposition module comprises three sub-modules: shallow feature extraction, activation sequence, and final reconstruction layer. The shallow feature extraction sub-module contains a convolutional layer with 64 channels and a kernel size of 3, used for preliminary feature extraction from the input sub-spatial optical image and the corresponding reference illumination image. The activation sequence sub-module performs deeper feature extraction on the preliminarily extracted features, consisting of five convolutional blocks connected sequentially, each composed of a convolutional layer and a Leaky-ReLu activation function. Each convolutional layer has 64 channels and a kernel size of 3. The final reconstruction layer sub-module remaps the feature map extracted by the activation sequence sub-module back to the original image space to generate a reconstructed image or feature map. This final reconstruction layer sub-module contains a convolutional layer and a sigmoid function. The convolutional layer converts the feature map output by the activation sequence sub-module from 64 channels back to 4 channels, and the sigmoid function maps the output to the 0-1 range. The feature map is then divided into reflectance and illumination components.
[0049] The enhancement module comprises three sub-modules: feature extraction, multi-scale feature fusion, and output generation. The feature extraction sub-module includes one convolutional layer and three deep feature extraction layers. The convolutional layer has 64 channels and a kernel size of 3, receiving reflectance and illumination components as input and concatenating them along the channel dimension. Through this layer, the network can capture basic texture and edge information in the input image, completing the initial extraction of feature information. The output of the convolutional layer is used as the input to the first deep feature extraction layer, undergoing downsampling through a convolution operation with a stride of 2 to reduce the spatial resolution of the feature map and increase its receptive field. The second deep feature extraction layer receives the output of the first deep feature extraction layer as input and is again downsampled through a convolution operation with a stride of 2. Downsampling is performed using a convolution operation with a stride of 2. Compared to the first layer, this layer can capture more abstract and higher-level feature information. The third deep feature extraction layer receives the output of the second deep feature extraction layer as input and is downsampled through a third convolution operation with a stride of 2. Through this layer, the network can further extract and refine feature information, providing strong support for the final enhancement effect. The multi-scale feature fusion submodule restores the spatial resolution of the previous layer to the output of each deep feature extraction layer through an upsampling operation and concatenates it with the output of the previous layer along the channel dimension to complete the fusion of features at different scales and levels. The output generation submodule outputs the fused feature map as the enhanced illumination component.
[0050] The reflection component and the enhanced illuminance component are reconstructed by multiplying and fusing the reflection component and the enhanced illuminance component element by element to generate the enhanced target image.
[0051] In step S2, during the training of the multi-scale enhancement model, a pair of sub-spatial optical images and reference illumination images are randomly selected for each training batch; the loss is calculated through model forward propagation, and the loss is backpropagated using the optimizer to update the model parameters; at the end of every 5 epochs, the model performance is evaluated, and the best model is recorded.
[0052] In step S3, the small target semantic segmentation model includes a backbone network DCSR and a feature fusion layer;
[0053] The backbone network DCSR includes an input layer, convolutional layers, and layers composed of multiple basic building blocks (CSResidual blocks). The input layer takes an enhanced sub-spatial optical image and a label image from a small object dataset as input. The convolutional layers perform preliminary feature extraction on the input image. The layers composed of multiple basic building blocks contain four layers: layer 1 contains 3 basic building blocks, layer 2 contains 4 basic building blocks, layer 3 contains 6 basic building blocks, and layer 4 contains 3 basic building blocks and a dilated spatial pyramid pooling module. Each basic building block consists of two DC modules connected sequentially. Each DC module includes a sequentially connected convolutional layer, a batch normalization layer, and a Leaky-ReLU activation function. The second DC module contains a batch normalization layer... Channel attention and spatial attention modules are sequentially inserted between the normalization layer and the Leaky-ReLu activation function to enhance the network's ability to capture important information in the image. In the DC module, convolutional layers are used to extract image features. Convolutional kernels are applied to each input channel to generate the same number of channels. These channels are then stacked according to the channel dimension, and new feature maps are generated by combining the generated multi-channel feature maps and performing pointwise convolution. The information from different channels is weighted and combined to output the feature map. The batch normalization layer normalizes each channel of the output feature map to ensure a stable distribution of the output data. The Leaky-ReLu activation function maps the normalized feature map to a new space to generate the final feature map. The dilated spatial pyramid pooling module processes the output feature information of layer 3 through four dilated convolutional layers to obtain four feature maps at different scales. After processing by the global average pooling branch, a global context feature map is obtained. By upsampling to restore the size of the original input feature map, the four feature maps at different scales and the global context feature map are concatenated along the channel dimension to obtain the concatenated feature map.
[0054] The feature fusion layer adopts a dense connection mechanism. In each module that builds the overall network framework, the outputs of all previous modules are spliced together and input into the basic building blocks of the backbone network DCSR contained in the current layer, and then transmitted to the next layer.
[0055] The specific implementation of the dense connection mechanism used in the feature fusion layer is as follows: When building the overall network framework of the small target semantic segmentation model, the input image is first fed into the backbone network DCSR, and then processed through the preliminary feature extraction module X. 0_0 Convolution operations are performed to extract basic feature information from the image; subsequently, the extracted preliminary features are fed into a series of deep feature extraction modules X. 1_0 X 2_0 X 3_0 X4_0 The above five modules form the backbone network DCSR, which gradually mines deeper feature information in the image by downsampling and further extracting features. As the number of layers increases, the depth of feature extraction also gradually increases, thus enabling the capture of more complex and subtle image features. Module X... 1_0 After upsampling the output, it is compared with X 0_0 The outputs are concatenated along the channel dimension and fed into module X. 0_1 The module X 0_1 It contains a basic building block of a backbone network DCSR, which completes the fusion and output of feature information at different scales, enhancing the model's ability to capture detailed features. Other layers also complete the above operations in sequence, and after being concatenated with the output of each previous layer, they are input into the basic building block of the backbone network DCSR contained in each layer, and then passed to the next layer. Finally, the dense connection of the entire network is completed, so that each layer in the network is connected to all the previous layers, and richer feature information is obtained.
[0056] In step S3, when training the small object semantic segmentation model, a loss function combining Diceloss and Focalloss is used to better address the problems of class imbalance and pixel-level imbalance; the loss function is:
[0057] Loss=βDice·DiceLoss+(1-βDice)·FocalLoss (1)
[0058] Here, βDice is a hyperparameter used to adjust the weights between the Diceloss and Focalloss loss functions;
[0059] The Diceloss loss function is calculated based on the Dice coefficients. The formulas for calculating the Dice coefficients and DiceLoss are as follows:
[0060]
[0061] Where X represents the predicted image obtained after inputting the sub-spatial optical image into the small target semantic segmentation model, Y represents the true label segmentation result, |X∩Y| represents the number of elements in the intersection between X and Y, and |X| and |Y| represent the number of elements in X and Y, respectively;
[0062] The FocalLoss loss function reduces the loss weight for easily classified samples and increases the loss weight for difficult-to-classify samples, allowing the model to focus more on samples that are difficult to classify (such as small objects). The formula for the FocalLoss loss function is as follows:
[0063] FocalLoss = -a(1-p)λ log(p) (4)
[0064] Where p is the probability that the small object semantic segmentation model predicts it as the correct category; α is a sample weight used to adjust the weights of easy-to-classify and hard-to-classify samples, usually the reciprocal of the category frequency; λ represents the weight of hard-to-classify samples, used to measure the difference between hard-to-classify and easy-to-classify samples.
[0065] In step S4, edge detection mainly includes several stages such as noise reduction filtering, gradient magnitude and direction calculation, non-maximum suppression, and hysteresis thresholding. First, in the Gaussian filtering stage, the traditional Canny algorithm uses a Gaussian function to smooth the image to reduce the impact of noise on edge detection. However, while Gaussian filtering removes noise, it may also blur or lose some important edge information in the image. Considering image recognition of small targets, the blurring of edge information may significantly affect the final localization and tracking. Therefore, this embodiment uses bilateral filtering instead of Gaussian filtering for image preprocessing to eliminate noise interference and better preserve the target's edge information. The Canny operator is selected for contour detection of the segmented image, the Sobel operator is used to calculate the gradient magnitude and gradient, and a first-order differential operator is used to finely extract the image edges. Subsequently, non-maximum suppression is performed using the gradient information to ensure that only local maxima at the edges are retained. In the application of the first-order differential operator, the horizontal and vertical directions are specifically selected to effectively extract edge features. Finally, an adaptive threshold is used to select the threshold according to actual needs, and edge connection is performed on the processed image to obtain complete and accurate edge detection results. This effectively suppresses false edges and noisy edges, significantly improves the performance of edge detection, and ensures the accuracy of localization.
[0066] In step S5, when obtaining the coordinates of the target center point, the target is considered as a two-dimensional object with uniform density, and its center point position is determined by finding its centroid. In a two-dimensional image, the centroid of the target is the average position of all pixels, also known as the centroid or centroid. The centroid position of the contour is located by calculating spatial moments. To ensure the stability of the positioning results and make the center positioning coordinates more accurate, Kalman filtering is used for further smoothing and noise reduction, as well as dynamic weight adjustment to ensure the accuracy of the center positioning coordinates.
[0067] The image processed by the detection and recognition backend is transmitted to the display terminal in real time to show a large-area coverage moving target optical image.
[0068] The embodiments described above are preferred embodiments of the present invention and do not impose any other limitations on the present invention. Any person skilled in the art may make changes or imitations based on the above content. However, any changes made to the above embodiments based on the essence of the method of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
Claims
1. A moving target optical detection and identification system with large airspace coverage, characterized in that, It includes a data acquisition front-end, an optical domain transmission link, a detection and identification back-end, and a real-time display terminal; the data acquisition front-end, optical domain transmission link, and detection and identification back-end are connected sequentially via network cables and network ports that meet the data transmission rate requirements; the detection and identification back-end transmits the image to the real-time display terminal via a display cable; The data acquisition front end is used to segment the large airspace, including moving targets, acquire optical images of each sub-airspace, convert them into transmission data formats, and then send them. The optical domain transmission link is used to transmit the optical image data of each sub-spatial domain obtained by the data acquisition front end; The detection and identification backend is used to receive the optical image data of the sub-spatial domain, and use the optical image data of each sub-spatial domain to detect whether there is a moving target in the large spatial domain and to identify the moving target. The real-time display terminal is used to receive and display the optical images of each sub-spatial domain after processing by the detection and recognition backend. The detection and recognition backend achieves moving target detection and recognition based on the following methods: S1: The sub-spatial optical image and the corresponding reference illuminance image are labeled and preprocessed to obtain a small target dataset; S2: Construct a multi-scale enhancement model by training the multi-scale enhancement model using sub-spatial optical images and corresponding reference illumination images in the small target dataset to obtain the MSEM model; the multi-scale enhancement model includes a decomposition module and an enhancement module. S3: Construct a small target semantic segmentation model. Train the small target semantic segmentation model using the sub-spatial optical image enhanced by the MSEM model and the label map in the small target dataset to obtain the DR-UNet model, which can be used to segment and recognize the enhanced image. The small target semantic segmentation model includes a backbone network DCSR and a feature fusion layer. The backbone network DCSR includes an input layer, convolutional layers, and layers composed of multiple basic building blocks. The input layer takes an enhanced sub-spatial optical image and a label image from a small object dataset as input. The convolutional layers perform preliminary feature extraction on the input image. Each layer composed of multiple basic building blocks contains four layers: layer 1 contains 3 basic building blocks, layer 2 contains 4 basic building blocks, layer 3 contains 6 basic building blocks, and layer 4 contains 3 basic building blocks and a dilated spatial pyramid pooling module. Each basic building block consists of two DC modules connected sequentially. Each DC module includes a convolutional layer, a batch normalization layer, and a Leaky-ReLU activation function connected in sequence. A channel attention module and a spatial attention module are inserted sequentially between the batch normalization layer and the Leaky-ReLU activation function in the second DC module, thereby improving the network's ability to capture important information in the image. In the DC module, the convolutional layer is used to extract image features. Convolutional kernels are applied to each input channel to generate the same number of channels. These channels are then stacked according to the channel dimension. New feature maps are generated by combining the generated multi-channel feature maps and performing pointwise convolution. The information from different channels is weighted and combined to output the feature map. The batch normalization layer normalizes each channel of the output feature map, giving the output data a stable distribution; the Leaky-ReLu activation function maps the normalized feature map to a new space, generating the final feature map. The dilated spatial pyramid pooling module processes the output feature information of layer 3 through four dilated convolutional layers to obtain four feature maps of different scales; then it is processed by the global average pooling branch to obtain the global context feature map; by upsampling to restore the size of the original input feature map, the four feature maps of different scales and the global context feature map are concatenated in the channel dimension to obtain the concatenated feature map. The feature fusion layer adopts a dense connection mechanism. In each module that builds the overall network framework, the outputs of all previous modules are spliced together and input into the basic building blocks of the backbone network DCSR contained in the current layer, and then transmitted to the next layer. S4: Perform contour detection on the segmented image and extract the target edge features of the segmented image; S5: Based on the target edge features, the target center is located to obtain the position of the small target.
2. The optical detection and identification system for moving targets under large airspace coverage according to claim 1, characterized in that, The data acquisition front end includes a spatial coverage optical detection array and a data acquisition industrial control computer; The airspace coverage optical detection array includes several optical detection devices. Each optical detection device is responsible for acquiring optical images of a portion of the airspace in a large airspace, obtaining several sub-airspace optical images, and converting the sub-airspace optical images into a transmission format. The data acquisition industrial control computer is provided in multiple units, and the number of units is the same as the number of optical detection devices. The multiple data acquisition industrial control computers are connected to the multiple optical detection devices one by one through data acquisition lines, and are used to receive optical image data of each sub-spatial domain converted into transmission format and temporarily store them in the storage medium of the industrial control computer.
3. The optical detection and identification system for moving targets under large airspace coverage according to claim 2, characterized in that, The optical domain transmission link includes an optical network unit and an optical line terminal; The optical network unit transmits sub-spatial optical image data to the optical line terminal via optical fiber; The optical network unit is provided in multiple units, and the number of units is the same as the number of data acquisition industrial control computers. It is used to modulate the sub-spatial optical image data received by the data acquisition industrial control computers into the optical domain. The optical line terminal is used to integrate and demodulate the optical domain signals of multiple optical network units.
4. The optical detection and identification system for moving targets under large airspace coverage according to claim 3, characterized in that, The optical line terminal also includes an optical port module and an electrical port module; The optical port module is provided in multiple quantities, and the number of these quantities is the same as the number of the optical network units. It is used to insert into the optical line terminal interface to receive optical image data from each sub-spatial domain. The electrical port module is used to insert into the optical line terminal interface to send demodulated optical image data of each sub-spatial domain.
5. A moving target optical detection and identification system with large airspace coverage according to any one of claims 1-4, characterized in that, The detection and recognition backend includes an image processing server; the image processing server receives and stores optical image data from each sub-spatial domain, which is used to detect whether there are moving targets in the large spatial domain and to identify the moving targets.
6. The optical detection and identification system for moving targets under large airspace coverage according to claim 1, characterized in that, The decomposition module comprises three sub-modules: shallow feature extraction, activation layer sequence, and final reconstruction layer. The shallow feature extraction sub-module contains a convolutional layer with 64 channels and a kernel size of 3, used for preliminary feature extraction from the input sub-spatial optical image and the corresponding reference illumination image. The activation layer sequence sub-module performs deeper feature extraction on the preliminarily extracted features, consisting of five convolutional blocks connected sequentially, each composed of a convolutional layer and a Leaky-ReLu activation function. Each convolutional layer has 64 channels and a kernel size of 3. The final reconstruction layer sub-module remaps the feature map extracted by the activation layer sequence sub-module back to the original image space to generate a reconstructed image or feature map. This final reconstruction layer sub-module contains a convolutional layer and a sigmoid function. The convolutional layer converts the feature map output by the activation layer sequence sub-module from 64 channels back to 4 channels, and the sigmoid function maps the output to the 0-1 range. The feature map is then divided into reflectance and illumination components. The enhancement module comprises three sub-modules: feature extraction, multi-scale feature fusion, and output generation. The feature extraction sub-module includes one convolutional layer and three deep feature extraction layers. The convolutional layer has 64 channels and a kernel size of 3, receiving reflectance and illumination components as input and concatenating them along the channel dimension. Through this layer, the network can capture basic texture and edge information in the input image, completing the initial extraction of feature information. The output of the convolutional layer is used as the input to the first deep feature extraction layer, undergoing downsampling through a convolution operation with a stride of 2 to reduce the spatial resolution of the feature map and increase its receptive field. The second deep feature extraction layer receives the output of the first deep feature extraction layer as input and is again downsampled through a convolution operation with a stride of 2. Downsampling is performed using a convolution operation with a stride of 2. Compared to the first layer, this layer can capture more abstract and higher-level feature information. The third deep feature extraction layer receives the output of the second deep feature extraction layer as input and is downsampled through a third convolution operation with a stride of 2. Through this layer, the network can further extract and refine feature information, providing strong support for the final enhancement effect. The multi-scale feature fusion submodule restores the spatial resolution of the previous layer to the output of each deep feature extraction layer through an upsampling operation and concatenates it with the output of the previous layer along the channel dimension to complete the fusion of features at different scales and levels. The output generation submodule outputs the fused feature map as the enhanced illumination component.