An unmanned aerial vehicle integrated detection method for a complex operation site
By combining multimodal perception technology with image processing and millimeter-wave radar, the system enables comprehensive status monitoring and information transmission of drones in complex environments, solving the problem that existing drone systems have difficulty detecting trapped personnel in obstructed environments and providing effective rescue support.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JIANGSU PROVINCIAL GRAIN RESERVE MANAGEMENT CO LTD
- Filing Date
- 2026-01-22
- Publication Date
- 2026-06-12
AI Technical Summary
Existing drone detection systems struggle to obtain accurate geographical location, vital signs, and specific attitude information in complex and dangerous environments, and are unable to effectively detect trapped personnel in obstructed environments, resulting in a lack of data support for rescue decisions.
Employing multimodal sensing technology, combined with airborne image processing, millimeter-wave radar, and wireless communication, it enables comprehensive status monitoring and positioning of building conditions, worker behavior, hazardous situations, and trapped targets, and transmits the information to the back-end command center via wireless communication.
It enables rapid detection of building defects and worker behavior in complex environments, identifies the location and vital signs of trapped targets, provides effective rescue suggestions, and solves the detection problem in high-altitude obstructed environments.
Smart Images

Figure CN122200318A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of unmanned aerial vehicle (UAV) technology, and in particular to an integrated UAV inspection method for complex work sites. Background Technology
[0002] With the rapid development of drone technology, drones, thanks to their high mobility, flexibility, and excellent accessibility to complex and dangerous areas, have been widely used in building inspection, power line inspection, and emergency rescue. In on-site safety monitoring and management, drones equipped with high-definition cameras, infrared thermal imagers, and other sensors can monitor large areas in real time from an aerial perspective, effectively overcoming the limitations of traditional fixed ground monitoring and the low efficiency and high risk of manual inspections. By incorporating high-performance image processing modules, drones can perform preliminary analysis of the collected data, enabling the initial identification of surface defects in building structures and violations by workers, which to some extent improves the efficiency and safety of on-site supervision.
[0003] However, existing drone detection systems still face significant technical bottlenecks. First, most existing systems rely on single optical imaging sensors, whose detection performance is severely limited in complex environments such as smoke, darkness, severe weather, or structural obstructions. Especially in sudden dangerous situations like collapses, optical vision cannot penetrate obstacles, making it difficult to locate trapped personnel. Traditional life detection devices, such as audio-visual detectors, suffer from short detection ranges, susceptibility to environmental noise interference, and inefficient deployment by drones. Second, existing technologies often only provide simple assessments of target presence or macroscopic classifications of behavior, failing to acquire precise geographical locations, vital signs (such as breathing and heartbeat), and specific postures of trapped targets. This results in insufficient data support for rescue decisions. Therefore, developing an integrated drone detection method that combines multimodal perception technologies to achieve comprehensive status monitoring, hazard warning, and efficient post-disaster search and rescue in complex and high-risk operating environments has become a pressing technical challenge in this field. Summary of the Invention
[0004] The purpose of this section is to outline some aspects of embodiments of the present invention and to briefly describe some preferred embodiments. Simplifications or omissions may be made in this section, as well as in the abstract and title of this application, to avoid obscuring the purpose of these documents; however, such simplifications or omissions should not be construed as limiting the scope of the invention.
[0005] In view of the aforementioned existing problems, this invention is proposed. Therefore, this invention provides an integrated UAV inspection method for complex work sites to solve the problems mentioned in the background art.
[0006] To address the aforementioned technical problems, this invention provides the following technical solution: an integrated UAV detection method for complex work sites, comprising: The drone acquires images of the buildings at the work site using its onboard image acquisition equipment and processes the images to detect the building's condition. The drone acquires images of on-site workers and analyzes the images to detect the workers' behavioral status. The drone processes images of the surrounding environment to identify potential hazards such as collapses. When a dangerous situation is identified, the drone will search for targets in the dangerous area to determine whether there are any trapped targets and obtain their location information. After identifying the trapped target, the UAV performs status identification on the target and obtains its vital signs and attitude information; The drone will transmit the information it acquires about the building status, the status of the workers, the danger situation, the target location information, and the target status information to the command center in the background via wireless communication.
[0007] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the detection of the building's state includes: Building images are acquired using the image acquisition device, and the images are then subjected to convolution, activation, and max pooling processes in sequence for preliminary feature extraction. A convolutional block attention module is used to enhance the initially extracted features and improve the expression of key features; Perform nonlocal attention operations to capture global contextual information by calculating long-range dependencies between pixels, and combine residual connections to generate output feature maps of nonlocal blocks; The output feature map is dimensionally concatenated and upsampled to restore resolution, and the image segmentation result of building defects is obtained by classification output.
[0008] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the detection of the behavioral state of the operators includes: The input images of on-site workers are preprocessed, including denoising, convolution, and non-linear activation. A hybrid aggregation network is used as the backbone network for feature extraction, which integrates multi-level feature information. Global feature vectors are extracted from the feature map using global average pooling. An enhanced feature map is generated by combining the global feature vector, and the enhanced feature map is input into a detection head that includes class prediction and bounding box regression to classify and output the non-standard behavior of the workers.
[0009] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the identification of hazardous conditions includes: During the encoding stage, the input image is downsampled through multiple convolution and pooling operations to extract multi-scale features; In the bottleneck part of the U-Net model, a convolutional attention module and a dilated spatial convolutional pooling pyramid module are sequentially connected. The convolutional attention module is used for feature refinement, and the dilated spatial convolutional pooling pyramid module is used to capture contextual information under different receptive fields. During the decoding stage, the resolution of the feature map is gradually restored through upsampling and deconvolution operations, and then fused with the corresponding feature map from the encoding stage to output the identification result of the dangerous situation.
[0010] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the target search includes: Visual detection methods are used to search for human bodies in unobstructed areas; When visual detection is obstructed or in dangerous situations, millimeter-wave radar detection is activated: the UAV transmits frequency-modulated continuous wave signals through a millimeter-wave radar MIMO array and receives radar echo signals from the target. The echo signal is processed to determine the position information of the trapped target relative to the UAV, including distance and orientation.
[0011] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the following steps are included: processing the echo signal to determine the position information of the trapped target relative to the UAV, including: The echo signal is sampled, and a fast Fourier transform is performed on the sampled data to obtain distance information; An azimuth and elevation matrix are constructed based on the received data from the MIMO array. The azimuth and elevation matrices are solved using signal processing algorithms to estimate the target's azimuth and elevation angles, generating a heat map containing the target's distance, azimuth, and power information, thereby enabling the target's localization.
[0012] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the UAV performs state recognition on the target and acquires its vital signs and attitude information, including: The heat map generated based on the radar echo signal is input into the preset range segment modular network to extract the key point coordinates of the human target. The extracted joint coordinates are input into a multilayer perceptron classifier. The classifier analyzes the spatial distribution pattern of the joints to identify the target's posture state.
[0013] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the vital signs recognition specifically refers to respiratory status recognition, including: Extract phase information related to the target's micro-motion from the radar echo signal; The phase information is subjected to finite first-order differential filtering to suppress noise and DC components; The filtered signal is standardized, and characteristic parameters related to the standard respiratory template signal are constructed. The characteristic parameters are compared with preset threshold values to determine whether the target is breathing.
[0014] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the determination of the location information of the trapped target further includes calculating its absolute geographical location, including: The drone acquires its own real-time latitude and longitude coordinates and altitude values; Combine the known target's distance, azimuth, and pitch angle information relative to the UAV; The absolute latitude and longitude coordinates and altitude of the trapped target were calculated using a coordinate transformation algorithm.
[0015] As a preferred embodiment of the integrated UAV detection method for complex work sites described in this invention, the transmission via wireless communication includes: The UAV integrates detection and identification of various status information, including building status values, worker status values, danger status values, and the geographical coordinates, breathing status values, and attitude status values of the trapped target. The various types of status information are encapsulated into multi-parameter data frames according to a preset data frame structure; The data frame is digitally modulated, filtered, and amplified before being transmitted to the command center in the background via a radio frequency antenna.
[0016] Compared with existing technologies, the invention has the following advantages: It can quickly detect building defects and non-standard work behaviors of personnel during on-site operations, solving the problem of ineffective supervision in complex work environments such as high-altitude operations; when dangerous situations such as collapses occur, it can identify and locate on-site workers by using visual and MIMO radar detection methods and promptly report their work status, solving the problem of detection failure in complex visually obstructed environments; simultaneously, by using drones to detect and determine the target's location, calculate the target's position information, and transmit it to the command center, it can solve the problem of difficulty in determining the target's location in high-altitude or obstructed environments; furthermore, by recognizing personnel's breathing and posture, it can provide more effective rescue suggestions, solving the problem of unclear personnel conditions. Attached Figure Description
[0017] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein: Figure 1 This is a schematic diagram of the main functional components of an integrated UAV detection method provided in one embodiment of the present invention; Figure 2 This is a schematic diagram of building status detection provided in one embodiment of the present invention; Figure 3 This is a schematic diagram of operator detection provided in one embodiment of the present invention; Figure 4 This is a schematic diagram of hazardous state identification provided in one embodiment of the present invention; Figure 5 This is a schematic diagram of an array antenna configuration provided in one embodiment of the present invention; Figure 6 This is a schematic diagram of the target search location provided in one embodiment of the present invention; Figure 7 This is a diagram of a multilayer sensing network for attitude recognition provided in one embodiment of the present invention; Figure 8 This is a flowchart of a target search processing method provided in one embodiment of the present invention; Figure 9 This is a flowchart of radar detection processing provided in one embodiment of the present invention; Figure 10 This is a flowchart of target state recognition processing provided in one embodiment of the present invention; Figure 11 This is a flowchart of wireless transmission processing provided in one embodiment of the present invention; Figure 12This is a diagram of the multi-parameter data frame structure at the work site provided in one embodiment of the present invention. Detailed Implementation
[0018] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of the present invention.
[0019] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.
[0020] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.
[0021] This invention is described in detail with reference to the schematic diagrams. When detailing the embodiments of this invention, for ease of explanation, the cross-sectional views illustrating the device structure may be partially enlarged, not adhering to the usual scale. Furthermore, the schematic diagrams are merely examples and should not be construed as limiting the scope of protection of this invention. In actual fabrication, the three-dimensional spatial dimensions of length, width, and depth should be included.
[0022] Furthermore, in the description of this invention, it should be noted that the terms "upper," "lower," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. These terms are used solely for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. In addition, the terms "first," "second," or "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.
[0023] Unless otherwise explicitly specified and limited, the terms "installation," "connection," and "joining" in this invention should be interpreted broadly. For example, they can refer to fixed connections, detachable connections, or integral connections; similarly, they can refer to mechanical connections, electrical connections, or direct connections, or indirect connections through an intermediate medium, or internal connections between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0024] Example 1 Reference Figures 1 to 12 This is the first embodiment of the present invention, which provides an integrated UAV detection method for complex work sites, including: S1: The UAV collects building images at the work site through a camera, first performs convolution activation on the images, then performs max pooling; then uses a convolutional block attention module to enhance features, performs non-local attention operations and feature map stitching to obtain segmentation results and detect building status.
[0025] Specifically, step S1 includes: S11: The drone uses images of the building site captured by a camera, and first processes the images into a 3x3 grid. The convolution activation operation is performed, followed by 2×2 max pooling downsampling on the feature map of the first layer network, and then convolution operation is performed to extract the next layer.
[0026] S12: Perform convolution and downsampling, then process the data through the convolutional block attention module: in, and Both are channel attention feature maps; the MLP is a shared two-layer neural network. It is the Sigmoid activation function. For standard convolution operations, , and These are the input, intermediate process, and output feature maps, respectively.
[0027] S13: Perform nonlocal attention operation, linearly map the feature map to generate three sets of features; then merge these three sets of feature channels and perform matrix multiplication; then calculate the autocorrelation between pixels and perform softmax normalization to obtain attention weights; then perform convolution operation to expand back to the original number of channels; finally perform residual connection to obtain the output feature map of the nonlocal block.
[0028] S14: Divide the feature map of each channel into multiple sub-pixels, then stack them in one dimension to convert them into high-resolution feature map output. Perform dimensional stitching on the feature maps, and finally classify and output the image segmentation results such as "cracks" and "peeling".
[0029] S2: The UAV undergoes denoising, convolution, and ReLU activation; a hybrid aggregation network is used as the feature extraction network, and feature vectors are extracted through global average pooling to generate an enhanced feature map. The detection head for class prediction and bounding box regression is output to detect non-standard operating conditions of workers.
[0030] Specifically, step S2 includes: Furthermore, the input image undergoes denoising processing, including convolutional layers and ReLU activation functions. in, For the input feature map, This is an intermediate feature map. It is a non-linear activation function. This is represented as the first convolutional layer. This is represented as the second convolutional layer. This is represented as an activation function.
[0031] S22: Employs a hybrid aggregation network architecture to enhance feature representation capabilities. in, This is the feature map output by the Concat module in the neck network. for Intermediate feature map after 1×1 convolution To generate the output feature map of the hybrid aggregation network module.
[0032] S23: Extract feature vectors from feature map F using global average pooling. in, For channel descriptors The element of the nth channel, and These are the local feature matrix and the global feature matrix, respectively. It is the height of the feature map. This is a feature map.
[0033] S24: Generate the enhanced feature map: in, It is the standard deviation. These are the learning parameters of the hybrid aggregation network. It is a fusion operation. It is the enhanced feature map. This represents the width of the feature map.
[0034] Furthermore, the K-means algorithm is used to output a detection head that includes category prediction and bounding box regression. Three anchor boxes are set to adapt to targets of different sizes, thereby classifying states such as "not wearing a safety helmet" and "smoking".
[0035] Step S3: The UAV performs convolution and pooling operations on the image and downsamples it. A convolutional attention module is added to the U-Net model; a dilated spatial convolutional pooling pyramid module is added; in the decoding stage, upsampling and deconvolution are performed to detect dangerous states.
[0036] Specifically, step S3 includes: S31: The input image undergoes convolution, pooling operations, and downsampling.
[0037] S32: Add a convolutional attention module to the U-Net model, consisting of a channel attention module (SAM) and a spatial attention module (CAM): The SAM module processes the feature map through max pooling and average pooling operations, then concatenates the results through 7×7 convolution operations, and obtains the feature map through the Sigmoid function; the CAM module obtains the channel attention feature map through the shared MLP module and ReLU activation function, and then obtains the channel attention feature map through the Sigmoid function. This is a convolutional attention module. This is the channel attention module.
[0038] S33: Add a dilated spatial convolution pooling pyramid module, using multiple dilated convolutions with different dilation rates to introduce dilation rates into the convolution kernels to capture contextual information at multiple scales.
[0039] S34: In the decoding stage, the input image is upsampled and deconvolved to output classification results such as "collapse" and "falling rocks".
[0040] Step S4: The UAV performs target search, uses vision to search for people and identify human body parts; at the same time, it uses a millimeter-wave radar MIMO array to transmit FMCW signals, then detects the echo signals of the target, performs human target detection, realizes the target location and identifies the attitude; firstly, it performs finite first-order differential filtering on the echo signals, then performs standardization processing on the filtered signals, constructs signal-related characteristic parameters, and judges the breathing state.
[0041] Specifically, step S4 includes: S41: Visual personnel search is performed using YoloV11 to detect human bodies. The input image is first preprocessed and enhanced, then a backbone network is used for feature extraction, followed by a neck network for feature fusion. Finally, a head network outputs the target type. If a human body is detected, visual ranging is performed and an alarm is triggered. The distance between a human body and the drone can be measured through image recognition. : in, and These represent the positions of the imaging point and the origin in the coordinate system, respectively. For camera focal length, The physical dimensions of each pixel in the x and y directions of the imaging plane coordinate system, where L is the actual width.
[0042] S42: Employs a millimeter-wave radar MIMO array to transmit FMCW signals. Specifically, the FMCW signal is transmitted through the radar detection unit's radio frequency module, with the i-th element transmitting the signal. : in, For carrier frequency, This represents the frequency modulation slope.
[0043] Furthermore, after mixing the echo with the transmitted signal, the intermediate frequency signal is obtained after low-pass filtering: in , and These are the target's azimuth and elevation angles, respectively.
[0044] Furthermore, the filtered intermediate frequency signal is digitally sampled by the digital signal processing module. Within one frame, each array element emits M chirps, resulting in: (i) First, window the N sampling points within the m-th chirp and then perform a K-point FFT: (ii) Construct the azimuth matrix : (iii) Constructing the matrix: (iv) Solve The iterative formula for obtaining the signal power is: Simultaneously, a distance-azimuth heatmap was calculated: Reconstruct the pitch angle matrix: Similarly, construct the matrix: The iterative formula for obtaining the pitch angle grid power is: The target's distance r and angle are displayed using a heatmap. And parameters such as the number N.
[0045] S43: Perform human target detection to locate the target and identify its posture.
[0046] Specifically, the distance and position of the human target are first used as constraints to establish a range segment modular network. This network consists of a backbone module and a headnet module. Assuming the input radar heatmap is x, the output of the first residual block of the backbone module is: in, and The two-layer weights are for the residual block. This is the ReLU activation function.
[0047] Furthermore, after processing by the headnet module, a 3D probability confidence map of each joint is output. The soft-argmax function is used to extract the joint coordinates of the human target. : in, The entire human body imaging space domain is defined by D, H, and W, which represent the corresponding distance, height, and width values.
[0048] Furthermore, a multilayer perceptron is used to classify the locations of human joint points. It has three hidden layers, and each input neuron and the neuron connected to it in the next layer has a weight value. : in, The output of the j-th neuron in the (h+1)-th layer is... The output value is the previous value of the i-th neuron in the h-th layer. The weight values for linking x and y. This is the corresponding bias.
[0049] Furthermore, it simultaneously receives the drone's latitude and longitude values via GPS / BeiDou navigation modules. Given the altitude value h, calculate the target's latitude, longitude, and altitude: S44: Perform finite first-order differential filtering on the echo signal: in, The echo signal is then normalized after filtering. Furthermore, construct the signal correlation characteristic parameters: in, For reference respiratory signals, when When the threshold value is reached, a respiratory signal is detected.
[0050] Furthermore, when the target is breathing, the state value... When breathing is not present, the state value is... .
[0051] Step S5: The UAV calculates the target's latitude and longitude coordinates and altitude. Building status values Worker status values Hazardous state values Target distance r, target breathing state value Target attitude state value and serial number value It assembles various on-site status information into a multi-parameter data frame format, digitally modulates it to frequency bands such as UHF / 4G / 5G, filters and amplifies the signal, and then transmits it through a radio frequency antenna.
[0052] Specifically, step S5 includes: S51: The UAV calculates the target's latitude, longitude, and altitude coordinates. Building status values Worker status values Hazardous state values Target distance r, target breathing state value Target attitude state value and serial number value ; (1) When the building has cracks When peeling When the building is in normal condition ; (2) When workers are not wearing safety helmets When smoking When a person falls During normal operation ; (3) When the site collapses No accidents were reported at the scene. ; (4) When the trapped target is breathing When there is no breathing ; (5) When the trapped target curls up When standing When lying flat ; S52: It assembles various on-site status information into a multi-parameter data frame format and digitally modulates it to frequency bands such as UHF / 4G / 5G; S53: After filtering and amplifying the signal, it is transmitted through the radio frequency antenna.
[0053] Specifically, in the above method, the multi-parameter data frame structure based on on-site detection includes data such as longitude coordinates, latitude coordinates, altitude, building status, personnel work status, event status, target breathing status, target posture status, and target sequence number. Longitude coordinates (in meters), latitude coordinates (in meters), altitude (in meters), and building status are used to detect building defects; personnel work status is used to detect whether personnel are performing their duties correctly; event status is used to detect whether a safety accident has occurred; the distance r of the trapped target (in meters), the target breathing status is used to identify whether the target is breathing, the target posture status is used to identify the target's posture, and the sequence number is used to sort the detected targets.
[0054] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. The solutions in the embodiments of this application can be implemented using various computer languages, such as the object-oriented programming language Java and the interpreted scripting language JavaScript.
[0055] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0056] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0057] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0058] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0059] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. An integrated UAV inspection method for complex work sites, characterized in that, include: The drone acquires images of the buildings at the work site using its onboard image acquisition equipment and processes the images to detect the building's condition. The drone acquires images of on-site workers and analyzes the images to detect the workers' behavioral status. The drone processes images of the surrounding environment to identify potential hazards such as collapses. When a dangerous situation is identified, the drone will search for targets in the dangerous area to determine whether there are any trapped targets and obtain their location information. After identifying the trapped target, the UAV performs status identification on the target and obtains its vital signs and attitude information; The drone will transmit the information it acquires about the building status, the status of the workers, the danger situation, the target location information, and the target status information to the command center in the background via wireless communication.
2. The integrated UAV detection method for complex work sites as described in claim 1, characterized in that, The status of the building being monitored includes: Building images are acquired using the image acquisition device, and the images are then subjected to convolution, activation, and max pooling processes in sequence for preliminary feature extraction. A convolutional block attention module is used to enhance the initially extracted features and improve the expression of key features; Perform nonlocal attention operations to capture global contextual information by calculating long-range dependencies between pixels, and combine residual connections to generate output feature maps of nonlocal blocks; The output feature map is dimensionally concatenated and upsampled to restore resolution, and the image segmentation result of building defects is obtained by classification output.
3. The integrated UAV detection method for complex work sites as described in claim 1, characterized in that, The behavioral state of the inspection personnel includes: The input images of on-site workers are preprocessed, including denoising, convolution, and non-linear activation. A hybrid aggregation network is used as the backbone network for feature extraction, which integrates multi-level feature information. Global feature vectors are extracted from the feature map using global average pooling. An enhanced feature map is generated by combining the global feature vector, and the enhanced feature map is input into a detection head that includes class prediction and bounding box regression to classify and output the non-standard behavior of the workers.
4. The integrated UAV detection method for complex work sites as described in claim 1, characterized in that, The identification of hazardous situations includes: During the encoding stage, the input image is downsampled through multiple convolution and pooling operations to extract multi-scale features; In the bottleneck part of the U-Net model, a convolutional attention module and a dilated spatial convolutional pooling pyramid module are sequentially connected. The convolutional attention module is used for feature refinement, and the dilated spatial convolutional pooling pyramid module is used to capture contextual information under different receptive fields. During the decoding stage, the resolution of the feature map is gradually restored through upsampling and deconvolution operations, and then fused with the corresponding feature map from the encoding stage to output the identification result of the dangerous situation.
5. The integrated UAV detection method for complex work sites as described in claim 1, characterized in that, The target search includes: Visual detection methods are used to search for human bodies in unobstructed areas; When visual detection is obstructed or in dangerous situations, millimeter-wave radar detection is activated: the UAV transmits frequency-modulated continuous wave signals through a millimeter-wave radar MIMO array and receives radar echo signals from the target. The echo signal is processed to determine the position information of the trapped target relative to the UAV, including distance and orientation.
6. The integrated UAV detection method for complex work sites as described in claim 5, characterized in that, The echo signal is processed to determine the position information of the trapped target relative to the UAV, including: The echo signal is sampled, and a fast Fourier transform is performed on the sampled data to obtain distance information; An azimuth and elevation matrix are constructed based on the received data from the MIMO array. The azimuth and elevation matrices are solved using signal processing algorithms to estimate the target's azimuth and elevation angles, generating a heat map containing the target's distance, azimuth, and power information, thereby enabling the target's localization.
7. The integrated UAV detection method for complex work sites as described in claim 1, characterized in that, The drone performs state recognition on the target, acquiring its vital signs and attitude information, including: The heat map generated based on the radar echo signal is input into the preset range segment modular network to extract the key point coordinates of the human target. The extracted joint coordinates are input into a multilayer perceptron classifier. The classifier analyzes the spatial distribution pattern of the joints to identify the target's posture state.
8. The integrated UAV detection method for complex work sites as described in claim 1 or 5, characterized in that, The vital signs identification specifically refers to respiratory status identification, including: Extract phase information related to the target's micro-motion from the radar echo signal; The phase information is subjected to finite first-order differential filtering to suppress noise and DC components; The filtered signal is standardized, and characteristic parameters related to the standard respiratory template signal are constructed. The characteristic parameters are compared with preset threshold values to determine whether the target is breathing.
9. The integrated UAV detection method for complex work sites as described in claim 5, characterized in that, The determination of the location information of the trapped target also includes calculating its absolute geographical location, including: The drone acquires its own real-time latitude and longitude coordinates and altitude values; Combine the known target's distance, azimuth, and pitch angle information relative to the UAV; The absolute latitude and longitude coordinates and altitude of the trapped target were calculated using a coordinate transformation algorithm.
10. The integrated UAV detection method for complex work sites as described in claim 1, characterized in that, The transmission via wireless communication includes: The UAV integrates detection and identification of various status information, including building status values, worker status values, danger status values, and the geographical coordinates, breathing status values, and attitude status values of the trapped target. The various types of status information are encapsulated into multi-parameter data frames according to a preset data frame structure; The data frame is digitally modulated, filtered, and amplified before being transmitted to the command center in the background via a radio frequency antenna.