A method and device for accurate identification of distressed persons in multiple scenarios
By determining the search area and flight altitude, and collecting and processing video images for detail restoration and multi-scale feature extraction, the problem of low video data quality in traditional UAV search and rescue is solved, and efficient and accurate identification of people in distress is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- WUHAN UNIV OF TECH
- Filing Date
- 2025-04-25
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional drones lack a systematic search system for water rescue, rely on human experience, and the quality of video data is greatly affected by environmental factors, resulting in the inability to obtain comprehensive information.
By acquiring drone flight data and distress information, the search area and optimal flight altitude are determined. Video images are collected and processed for detail restoration and degradation recovery. Combined with multi-scale feature extraction, YOLOv5 and the All-in-One Image Restoration network are used for identification.
It improved search efficiency, obtained high-quality video data, and was able to accurately identify people in distress and obtain comprehensive information.
Smart Images

Figure CN120544071B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of water rescue technology, and in particular to a method and device for accurate identification of people in distress in multiple scenarios. Background Technology
[0002] With the continuous expansion of maritime transport demand and the increasing density and complexity of ship traffic, maritime accidents are more likely to occur. In complex weather conditions such as fog, rain, and low light, or under heavy traffic, people in distress at sea are easily affected by wind, waves, and currents, causing them to drift and making search and rescue operations extremely difficult. Drones, due to their high flexibility, low cost, and strong controllability, are widely used in maritime emergency rescue missions.
[0003] However, traditional drones lack a systematic search system and the ability to autonomously detect people in distress at sea. They rely on human experience, resulting in disorganized and unsystematic operations that reduce the efficiency of search and rescue personnel. Furthermore, the video data collected by traditional drone imaging equipment is greatly affected by environmental factors, leading to low-quality video data and preventing search and rescue personnel from obtaining comprehensive information from it.
[0004] Therefore, there is an urgent need to propose a method and device for accurate identification of people in distress in multiple scenarios, in order to solve the technical problem that the video data of traditional drones is greatly affected by environmental factors, resulting in low video data quality and the inability of search and rescue personnel to obtain comprehensive information from the video data. Summary of the Invention
[0005] In view of this, it is necessary to provide a method and device for accurate identification of distressed persons in multiple scenarios, in order to solve the technical problem that the video data of traditional drones is greatly affected by environmental factors, resulting in low video data quality and the inability of search and rescue personnel to obtain comprehensive information from the video data.
[0006] To address the aforementioned problems, in a first aspect, the present invention provides a method for accurate identification of persons in distress in multiple scenarios, comprising:
[0007] To obtain flight data and mission requirements of drones, as well as distress information of personnel in distress;
[0008] The search area is determined based on the distress information and mission requirements, and the optimal flight altitude is determined based on the search area and the flight data.
[0009] When the drone reaches the optimal flight altitude of the search area, the current video image from the drone's current perspective is captured, and the current video image is restored in detail and degraded to obtain a clear video image;
[0010] Multi-scale feature extraction is performed on the clear video image to obtain the identification result of the person in distress.
[0011] In one possible implementation, the distress information includes a probability density distribution of the initial location of the distressed personnel; determining the search area based on the distress information and the mission requirements includes:
[0012] Based on the parallel line scanning search method and the task requirements, the scannable area of the UAV on the water is determined;
[0013] Based on the influence of multiple factors on the distressed persons and the probability density distribution, a distressed persons drift model is established.
[0014] The search area is obtained by simulating and optimizing the drift model of the distressed person and the scannable area.
[0015] In one possible implementation, the simulation and optimization of the distressed person drift model and the scannable area to obtain the search area includes:
[0016] The wind speed at a preset sea level is estimated based on a preset wind pressure model, and the disturbance coefficient is determined.
[0017] Based on the disturbance coefficient, the wind-induced drift velocity is obtained;
[0018] The flow velocity at a preset water depth is estimated to obtain the flow-induced drift velocity;
[0019] The drift velocity is obtained based on the wind-induced drift velocity and the flow-induced drift velocity;
[0020] The drift velocity is input into the distressed person's drift model to perform random particle simulation optimization on the scannable area, thereby obtaining the search area.
[0021] In one possible implementation, determining the optimal flight altitude based on the search area and the flight data includes:
[0022] The decision variables for the flight data in the search area are determined according to a preset probability relationship model; the decision variables include coverage area and flight route interval.
[0023] Based on the coverage area and the flight path interval, construct an objective function that maximizes the discovery probability;
[0024] The target scan width is determined by optimizing the objective function, the coverage area, and the flight path interval based on preset constraints and a preset parallel selection genetic algorithm.
[0025] The optimal search flight altitude is obtained based on the target scan width.
[0026] In one possible implementation, the step of restoring details and reverting degradation of the current video image to obtain a clear video image includes:
[0027] The current weather is determined based on the drone;
[0028] Based on the current weather, historical data sets are obtained from the severe weather image dataset;
[0029] Based on the historical dataset, the current video image is restored in detail and degraded to obtain a clear video image.
[0030] In one possible implementation, the process of generating the severe weather image dataset includes:
[0031] Fog images are generated based on atmospheric scattering models to obtain a fog image dataset;
[0032] Rainy day images are generated based on the rain map model, resulting in a rainy day image dataset;
[0033] Based on the fog image dataset and the rainy day image dataset, an inclement weather image dataset is obtained.
[0034] In one possible implementation, the step of performing detail restoration and degradation recovery on the current video image based on the historical dataset to obtain a clear video image includes:
[0035] The historical dataset and the current video image are compared using a contrastive degradation encoder to obtain a potential degradation representation;
[0036] The current video image and the potential degradation representation are input into the degradation-guided restoration network for degradation restoration to obtain a clear video image.
[0037] In one possible implementation, the step of performing multi-scale feature extraction on the clear video image to obtain the identification result of the person in distress includes:
[0038] A multi-scale target detection model is constructed; the multi-scale target detection model includes a backbone feature extraction network, a BiFormer attention module, a CBAM attention module, and a decoupling head module;
[0039] The backbone feature extraction network is used to extract features from the clear video image to obtain initial features;
[0040] The initial features are adaptively fused using the BiFormer attention module to obtain fused features.
[0041] The fused features are weighted according to the CBAM attention module to obtain refined features;
[0042] The refined features are identified based on the decoupling head module to obtain the identification result.
[0043] In one possible implementation, the step of comparing the historical dataset and the current video image according to the contrastive degradation encoder to obtain a potential degradation representation includes:
[0044] The contrastive degradation encoder compares the historical dataset and the current video image to determine positive and negative samples.
[0045] The potential degenerate representation is obtained by maximizing the consistency among positive samples in the positive samples and minimizing the consistency among negative samples in the negative samples.
[0046] Secondly, the present invention also provides a device for accurate identification of persons in distress in multiple scenarios, comprising:
[0047] The information acquisition module is used to acquire the drone's flight data and mission requirements, as well as the distress information of the personnel in distress.
[0048] The area determination module is used to determine the search area based on the distress information and the mission requirements, and to determine the optimal flight altitude based on the search area and the flight data;
[0049] The image processing module is used to acquire the current video image from the current viewpoint of the drone when the drone reaches the optimal flight altitude of the search area, and to restore the details and restore the degradation of the current video image to obtain a clear video image.
[0050] The result recognition module is used to extract multi-scale features from the clear video image to obtain the recognition result of the person in distress.
[0051] The beneficial effects of this invention are: it can determine the search area and the optimal flight altitude based on the distress information of the distressed persons, the flight data of the UAV, and the mission requirements, so that when the UAV reaches the optimal flight altitude of the search area, it can collect the most suitable video image data from the current perspective, thus improving search efficiency; it can also restore details and revert degradation of the current video image, thereby obtaining a clear video image and improving the quality of video data; it can also perform multi-scale feature extraction on the clear video image to obtain the identification results of the distressed persons, thus obtaining comprehensive information about the distressed persons from the video data. Attached Figure Description
[0052] Figure 1A schematic flowchart of an embodiment of the method for accurate identification of distressed persons in multiple scenarios provided by the present invention;
[0053] Figure 2 For the present invention Figure 1 A schematic diagram of an embodiment of step S102;
[0054] Figure 3 A schematic diagram of an embodiment of the drone search path provided by the present invention;
[0055] Figure 4 A schematic diagram of an embodiment of the distressed person drift model provided by the present invention;
[0056] Figure 5 A schematic diagram of an embodiment of the final search area provided by the present invention;
[0057] Figure 6 A schematic flowchart illustrating an embodiment of the clear video image provided by the present invention;
[0058] Figure 7 A schematic diagram of an embodiment of the auxiliary rescue system for people in distress at sea provided by the present invention;
[0059] Figure 8 This is a schematic diagram of an embodiment of the multi-scenario distress personnel accurate identification device provided by the present invention. Detailed Implementation
[0060] Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form part of this application and are used together with the embodiments of the present invention to illustrate the principles of the present invention, but are not intended to limit the scope of the present invention.
[0061] In the fields of artificial intelligence and computer vision, YOLOv5 (You Only Look Once version 5) is a popular object detection algorithm, renowned for its speed and accuracy. YOLOv5 is the latest version based on the YOLO series (including YOLO, YOLOv2, YOLOv3, and YOLOv4), inheriting the advantages of previous versions and incorporating several improvements. YOLOv5 employs a single-stage object detection method, directly classifying and locating objects in the image, rather than using traditional two-stage methods (such as Faster R-CNN's two stages: candidate region generation and classification / localization). This method reduces computational cost and improves detection speed.
[0062] All-in-One Image Restoration (AiOIR) is a unified framework designed to solve various image degradation problems. By integrating advanced deep learning techniques, it can handle multiple degradation types, such as noise, blur, and weather effects, in a single network, thus providing a more convenient and universal solution.
[0063] like Figure 1 As shown, a specific embodiment of the present invention discloses a method for accurate identification of persons in distress in multiple scenarios, including:
[0064] S101. Obtain flight data and mission requirements of the drone, as well as distress information of the personnel in distress.
[0065] The method for accurate identification of distressed persons in multiple scenarios provided in this application embodiment can be applied to a system for accurate identification of distressed persons in multiple scenarios. The system for accurate identification of distressed persons in multiple scenarios can be a software system running on a terminal device. The terminal device can be a server, tablet computer, augmented reality (AR) / virtual reality (VR) device, laptop computer, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), mobile phone, etc. This application embodiment does not impose any restrictions on the specific type of terminal device.
[0066] This invention can be applied to situations where the search area is large and the location of the distressed person is uncertain, requiring comprehensive area coverage. The search area can be divided according to mission requirements, and a clear scanning path can be determined, enabling the drone to fly continuously along a parallel line. This reduces operational complexity, improves work efficiency, and allows for the acquisition of drone flight data. When a person in distress encounters danger at sea, they can send a distress signal, which the drone can receive. The distress signal may include information such as the distressed person's weight and initial location. Due to its high mobility, wide coverage, and high detection efficiency, the drone effectively solves the problem of limited field of vision and the inability to achieve full coverage of waterways in maritime search and rescue scenarios, which is a limitation of human vision and fixed monitoring equipment in ports and shipping. The mission requirements can be set by rescue personnel based on the actual situation.
[0067] S102. Determine the search area based on the distress information and mission requirements, and determine the optimal flight altitude based on the search area and flight data.
[0068] Among these methods, drones can conduct searches at sea using the parallel line scanning search method. Parallel line scanning search is the most commonly used and simplest visual search method in maritime search and rescue, primarily applicable when the search area is large and the location of the distressed personnel is uncertain. This allows the search area to be determined based on distress information and mission requirements. Furthermore, the optimal flight altitude can be calculated based on the search area and the drone's flight data.
[0069] S103. When the drone reaches the optimal flight altitude of the search area, it acquires the current video image from the drone's current perspective, and restores the details and degrades the current video image to obtain a clear video image.
[0070] After determining the search area and optimal flight altitude, the drone can be controlled to fly to the optimal flight altitude within the search area. Then, it can capture current video images from the drone's current perspective. The video images can then be restored in detail and degraded, resulting in clear video images. This provides high-quality visual information for drones during water rescues, helping rescuers to more accurately identify and locate people in distress.
[0071] S104. Perform multi-scale feature extraction on clear video images to obtain the identification results of the distressed persons.
[0072] After obtaining clear video images, multi-scale feature extraction can be performed on the clear video images based on the YOLOv5 object detection network to obtain more accurate identification results of people in distress. The YOLOv5 object detection network can overcome the shortcomings of traditional detection methods in multi-scale object detection, ensuring accurate detection under different shooting conditions and target scales, and improving the target recognition capability of drones in water rescue scenarios.
[0073] Compared with existing technologies, this embodiment provides a method to determine the search area and the optimal flight altitude based on the distress information of the distressed persons, the flight data of the UAV, and mission requirements. This allows the UAV to collect the most suitable video image data from the current perspective when it reaches the optimal flight altitude of the search area, thus improving search efficiency. Furthermore, it can restore details and revert degradation in the current video images, resulting in clear video images and improving video data quality. Moreover, it can perform multi-scale feature extraction on the clear video images to obtain the identification results of the distressed persons, thereby obtaining comprehensive information about the distressed persons from the video data.
[0074] In some embodiments of the present invention, the distress information includes the probability density distribution of the initial location of the distressed person; such as Figure 2 As shown, step S102 includes:
[0075] S201. Based on the parallel line scanning search method and mission requirements, determine the scannable area of the UAV on the water.
[0076] The parallel line scanning search method can determine the area to be searched based on task requirements and divide it into scannable zones. Following a pre-set scanning path, the UAV flies continuously along parallel lines. Parallel line search reduces operational complexity and improves work efficiency by minimizing the number of turns. The UAV search path is as follows: Figure 3 As shown, when the search area is rectangular, the search starting point is usually a vertex of that rectangle. In actual search operations, the UAV moves from the search starting point towards half the flight path distance. S The drone moves its position and searches along a path parallel to the long side of the rectangle, thus avoiding continuous turns and saving time. According to the pre-set scanning path, the drone flies continuously along the parallel line until the search endpoint is reached.
[0077] S202. Based on the influence of multiple factors on distressed persons and their probability density distribution, a distressed person drift model is established.
[0078] The force analysis of the distressed personnel involves considering their drifting motion on the sea surface due to the influence of wind, waves, and ocean currents. To simplify the force analysis, the Coriolis force is ignored. Multiple factors affecting the distressed personnel can be identified. A drift model is established by comprehensively considering the force analysis of the distressed personnel under multiple external influencing factors. These factors can include wind-induced drift and current-induced drift. The drift model of the distressed personnel is as follows: Figure 4 As shown, wind-induced drift, or wind pressure, is the directional movement of a person exposed above the waterline relative to the sea surface caused by sea winds (10 meters high). The magnitude of the wind pressure is the person's velocity relative to the sea surface, and the direction of the wind pressure is represented by the wind pressure angle. Therefore, the amount of wind-induced drift of a person can be expressed in terms of wind pressure. Figure 4 The distressed personnel are affected by surface currents at the starting point of the drift. Different wind pressure angles are generated according to the direction of wind pressure. The drift position is calculated based on different wind pressures and wind pressure angles. The drift motion equation of the distressed personnel drift model is shown in formula (1):
[0079] (1)
[0080] In the formula, For the quality of the people in distress; The speed of drift motion; Force acting as wind; Forced by ocean currents; The force of the waves; The duration of action is defined as follows: the mass of the distressed personnel and the duration of action can be obtained from the distress information, while the forces exerted by wind, ocean currents, and waves can be analyzed based on actual conditions and work experience.
[0081] S203. Simulate and optimize the drift model of the distressed personnel and the scannable area to obtain the search area.
[0082] After establishing the distressed personnel drift model, random particles can be used to simulate the distressed personnel drift model, thereby simulating and optimizing the scannable area to obtain the optimized final search area.
[0083] In some embodiments of the present invention, step S203 includes:
[0084] The wind speed at a preset sea level is estimated based on a preset wind pressure model, and the disturbance coefficient is determined.
[0085] Among them, the preset wind pressure model can be the distressed person drift model. The drift speed can be obtained by analyzing the impact of wind-induced drift and flow-induced drift on the distressed person. The specific process is as follows: In the open sea search mission, ocean currents are an important factor to consider when determining the location of the distressed person. The part of the distressed person below the waterline will be affected by surface currents, generally with a current velocity of about 0.3-1 meter. Considering that the target of this embodiment is a distressed person in a vertical posture, the current velocity at 0.5 meters is selected as the estimated input for the drift motion prediction of the distressed person. In order to quantify the uncertainty caused by the errors in the wind-induced drift and flow-induced drift of the distressed person, random perturbation needs to be set. Assuming that the wind speed perturbation follows a normal distribution, the wind speed at a height of 10m above the sea surface is expressed as shown in formula (2):
[0086] (2)
[0087] In the formula, This is an estimated wind speed. This is the predicted wind speed value; It is a wind speed disturbance, and , for The standard deviation. For the 0-Model wind pressure model, the addition of the perturbation coefficient is shown in formula (3):
[0088] (3)
[0089] In the formula, The wind pressure coefficient after adding disturbance, i.e., the disturbance coefficient; For random perturbations, and , Wind pressure coefficient The standard deviation.
[0090] The wind-induced drift velocity is obtained based on the disturbance coefficient.
[0091] Specifically, by adding the perturbation coefficient to the drift velocity, the wind-induced drift velocity after adding the perturbation is calculated. As shown in formula (4):
[0092] (4)
[0093] The flow velocity at a preset water depth is estimated to obtain the flow-induced drift velocity.
[0094] Among them, the preset water depth and flow velocity can be set The current velocity at a water depth of 0.5m will be obtained through ocean current field forecasting at the location of the drowning event. After adding random perturbations, it can be expressed as shown in formula (5):
[0095] (5)
[0096] In the formula, The velocity is the flow-induced drift velocity. For random perturbations, and , For random perturbations of flow velocity The standard deviation.
[0097] The drift velocity is obtained based on the wind-induced drift velocity and the flow-induced drift velocity.
[0098] Among them, based on the descriptions of wind-induced drift and flow-induced drift and Figure 4 According to the drift model of distressed persons, the drift speed of distressed persons at sea can be expressed as the vector superposition of wind-induced drift speed and ocean current speed, as shown in formula (6):
[0099] (6)
[0100] In the formula, The drift speed is the drift velocity, which is the drift motion velocity in formula (1). .
[0101] The drift velocity is input into the drift model of the distressed personnel, and random particle simulation optimization is performed on the scannable area to obtain the search area.
[0102] The scannable area was determined through random particle simulation experiments, where the proof of concept (POC) of the distressed person being located within this area reached its maximum of 100%. However, search missions are typically time-sensitive, demanding, and resource-constrained. An excessively large search area may not provide effective coverage, leading to a lower success rate. Conversely, if the area is too small, the probability of the distressed person being within the search area decreases, resulting in a low POC. While resources may allow for complete coverage, the random nature of the search task increases the risk of failure. Therefore, to improve the success rate and timeliness of search operations, the search area for the distressed person needs to be optimized appropriately.
[0103] To optimize and reduce the search area, the scannable area where particles ultimately reside during the simulation period is meshed using a distressed personnel drift model. This mesh is divided into 225 square sub-regions of 2000×2000m each, and the number of particles in each sub-region is counted. Based on the initial particle location distribution map of the search area, it was found that the particle distribution follows a pattern of more particles in the center and fewer around the edges. The particles are most likely to be distributed in the center of the rectangular region, and particles distributed around the edges may drift out of the search area as the search progresses, leading to increased search time but decreased efficiency. Therefore, while maintaining a certain Proof of Conformity (POC), the initial search area should be reduced to decrease search time and improve search efficiency. In this embodiment of the invention, the search area is expanded outward from the sub-region with the highest number of particles in the center until the POC is above 90%, at which point the rectangular region is determined as the final search area. Figure 5 As shown, Figure 5 middle x The axis represents nautical miles in the east-west direction. y The axis represents nautical miles in the north-south direction. The rectangular area is the final search area after area optimization. Compared with the initial search area, the probability of inclusion (POC) reaches 95%, which is greater than the requirement of 90%. At the same time, the search area is a 9×9 sub-region grid, which is 64% smaller than the initial 15×15 search area. This greatly shortens the search time, improves the search efficiency, and solves the problem of determining the search area.
[0104] In some embodiments of the present invention, step S102 includes:
[0105] The decision variables for flight data in the search area are determined based on a pre-defined probability relationship model; the decision variables include coverage area and flight route interval.
[0106] Based on the coverage area and flight route interval, construct an objective function that maximizes the discovery probability.
[0107] The UAV relies on visible light / infrared payloads to scan a two-dimensional plane (sea surface) from the air. The scanning width of the UAV is related to its flight altitude, which can be specifically described as follows: ,in , Flight altitude; This is half the field of view of the detection payload. Theoretically, increasing the scanning width by significantly increasing flight altitude is impractical because whether a person in distress on the water can be detected by the drone is a probabilistic event. The detection rate of the payload depends on the drone's flight altitude; higher altitude does not necessarily mean a higher probability of detection. It requires considering the characteristics of the drone's payload and its pre-defined probability model. This can be represented as shown in formula (7):
[0108] (7)
[0109] In the formula, For target detection rate; Preset target detection rate; For the failure alarm rate; , These are the characteristic parameters of the load. From formula (7), it can be seen that when the flight altitude... h At the optimal detection altitude Within this range, the target detection rate remains constant equal to the preset target detection rate. 100%; when h When this value is exceeded, the target detection rate is linearly related to the flight altitude; when h Greater than the maximum height At this point, the target detection rate reaches its lowest level, making it impossible to complete the search and detection task. Therefore, it is necessary to maximize the scanning width. At the same time, it is necessary to consider the constraints on the target detection rate of the UAV and the performance of the UAV itself, so as to optimize the flight altitude by considering the constraints and the performance of the UAV itself. h This increases the success rate of searches.
[0110] Wide field of view of drones W and flight interval S As the decision variable, an objective function for maximizing the discovery probability in UAV coverage trajectory planning is established, and the optimal search flight altitude of the UAV is determined. The objective function is shown in formula (8):
[0111] (8)
[0112] In the formula, This represents the objective function of maximizing the discovery probability. Given a defined search area and a fixed probability of including a person in distress at sea, the goal is to increase the success rate of the drone's search by maximizing the discovery probability. The probability of finding a target area within the search region is calculated using the following expression: , and coverage C From the scanning width W and flight interval S Joint decision, indicating as , Flight altitude; It is half the field of view of the UAV payload.
[0113] Based on preset constraints and a preset parallel selection genetic algorithm, the objective function is optimized to maximize the discovery probability, coverage area, and flight path interval, thereby determining the target scan width.
[0114] The optimal search flight altitude is determined based on the target scan width.
[0115] Among them, according to the requirements of the UAV search mission, the target detection rate, scan width and flight path interval of the UAV payload are used as constraints to restrict the UAV search trajectory planning. The preset constraints can be expressed as formula (8), formula (9) and formula (10):
[0116] (8)
[0117] (9)
[0118] (10)
[0119] In the formula, To achieve the lowest target detection rate in the search, This is the minimum warning flight altitude for drones.
[0120] Under preset constraints, a parallel selection genetic algorithm is used to evaluate the scan width. and flight interval Optimization will be performed, specifically, maximizing the discovery probability objective function. The data is then fed into a parallel selection genetic algorithm for genetic processing, thereby improving the scan width. and flight interval Iterative optimization yields the optimized target scan width and target flight path interval, and the optimal flight altitude is then obtained using the target scan width. h This allows for improved target detection rates through optimal flight altitude. This increases the success rate of searches.
[0121] In some embodiments of the present invention, the process of generating severe weather image datasets includes:
[0122] Fog images are generated based on atmospheric scattering models to obtain a fog image dataset;
[0123] Rainy day images are generated based on the rain map model, resulting in a rainy day image dataset;
[0124] Based on the fog image dataset and the rainy day image dataset, a severe weather image dataset is obtained.
[0125] Given the limited amount of comparative image data from the drone's perspective in complex scenes, which severely impacts the performance of image restoration networks, the dataset can be augmented by artificially synthesizing images from the drone's perspective under adverse weather conditions to enhance image restoration capabilities in complex scenarios. To further enrich the dataset, this invention employs two methods for synthesizing images under adverse weather conditions: synthesis based on traditional methods and synthesis based on GAN networks.
[0126] The fog image is generated using an atmospheric scattering model, which is shown in equation (11):
[0127] (11)
[0128] In the formula, These are the coordinate values in the image. It is an artificially synthesized image with a foggy appearance. It is the original, clear image. Indicates transmittance. This represents the atmospheric light value. Transmittance at any point. The propagation distance decreases exponentially, which can be described as shown in formula (12):
[0129] (12)
[0130] In the formula, This represents the scattering coefficient caused by atmospheric scattering particles and absorbed light. This indicates the distance from the scene to the camera.
[0131] The fog image dataset can be obtained through formulas (11) and (12).
[0132] The rain map model is used to generate rainy day images, as shown in formula (13):
[0133] (13)
[0134] In the formula, These are pixel values in the image. (Due to rain lines) The superposition reduces the sharpness of the scene. The visibility of rain lines can be adjusted. By setting the density, length, and angle of the rain lines, any possible level of rain in a real-world scene can be simulated. To simulate light rain, the linear stacking method in the model is changed to weighted stacking.
[0135] The low-light image is generated using the Retinex model, which is shown in equation (14):
[0136]
[0137] In the formula, These are pixel values in the image. These represent the image observed by the human eye, the reflectivity and reflected component of the object to illumination, and the incident and incident components of the light illuminating the object, respectively. According to Retinex theory, when the illumination is very weak, the incident light illuminating the object's surface... It's very dark, and the reflectance is low. It is an inherent property of an object and is not affected by light. Therefore, as can be seen from the above formula, the final image obtained when the light intensity is weak is darker.
[0138] The rainy day image dataset can be obtained by formulas (13) and (14). Then, the fog image dataset and the rainy day image dataset can be merged to obtain the severe weather image dataset.
[0139] Furthermore, this embodiment of the invention utilizes generative adversarial networks to construct a deep style transfer model with powerful semantic information representation capabilities. This model can simultaneously extract global features, attention features, and edge features from video images of multiple scenes, and then perform style transfer based on the extracted feature maps using a multi-scale generator to simulate visible light video images affected by various weather conditions. To address the potential loss of detail information during style transfer, a multi-scale style generator based on multi-scale encoding and decoding and multi-receptive field residual blocks is constructed to facilitate the preservation of structural and texture information in video images.
[0140] In some embodiments of the present invention, step S103 includes:
[0141] Determine the current weather based on drones;
[0142] Historical datasets are obtained from severe weather image datasets based on the current weather conditions;
[0143] Based on historical datasets, the current video image is restored in detail and degraded, resulting in a clear video image.
[0144] This process involves determining the current weather based on the weather conditions at the time the drone captured the video image. Then, it allows searching through a severe weather image dataset to obtain corresponding weather images, which in turn generates a historical weather dataset. This historical dataset can then be used to restore details and correct degradation in the current video image, resulting in a clearer video image.
[0145] In some embodiments of the present invention, such as Figure 6 As shown, detailed restoration and degradation recovery are performed on the current video image based on the historical dataset to obtain a clear video image, including:
[0146] The contrastive degradation encoder compares historical datasets with current video images to obtain potential degradation representations.
[0147] The general image enhancement model is based on the AirNet integrated image restoration network, a network model capable of enhancing images with various degradation types and degrees. Existing image restoration methods often can only handle images with specific degradation types and degrees, and require prior knowledge of the damage information. In real-world applications, degradation types and degrees are constantly changing, and models struggle to automatically identify the type of damage, making it difficult for existing methods to handle. However, AirNet is unaffected by prior knowledge such as degradation type and degree, using only observed degraded images for inference and restoration of various degraded images. The model consists of two modules: a contrastive-based degraded encoder (CBDE) and a degradation-guided restoration network (DGRN). It features integration and versatility across multiple scenarios, allowing AirNet to infer from degraded images in historical datasets, thereby restoring details from current video images and obtaining restored images. This significantly improves the success rate of UAV platforms performing tasks in complex scenarios.
[0148] In some embodiments of the present invention, a potential degradation representation is obtained by comparing a historical dataset and a current video image using a contrastive degradation encoder, including:
[0149] The contrastive degradation encoder compares historical datasets with current video images to determine positive and negative samples.
[0150] The potential degenerate representation is obtained by maximizing the consistency among positive samples in the positive samples and minimizing the consistency among negative samples in the negative samples.
[0151] The CBDE, consisting of multiple convolutional layers (Conv), extracts latent degradation representations from the degraded images of the input restored image. For a given degraded image, AirNet randomly crops it, treating these as positive samples since degradation within the same image is consistent, while image patches from other images in the historical dataset are considered negative samples. Using these contrasting image patches, CBDE obtains the latent degradation representation by maximizing the consistency between positive samples while minimizing the consistency between negative samples.
[0152] The current video image and potential degradation representation are input into the degradation-guided restoration network for degradation restoration, resulting in a clear video image.
[0153] The Degradation-Guided Reconstruction Network (DGRN) uses the latent degradation representation learned by CBDE to reconstruct input images with unknown degradation types and degrees into clear video images. DGRN consists of five Degradation-Guided Groups (DGGs), and each DGG is further composed of five Degradation-Guided Modules (DGMs).
[0154] The DGM module mainly consists of a Deformable Convolutional Layer (DCN) and a Spatial Feature Transform Layer (SFN). To adapt to different degradation types, the network merges the features output from the previous DGM with the latent degradation representation and inputs them into the DGM's convolutional layer to learn offsets and masks. The receptive field can be dynamically adjusted based on the modulation offset and mask. To reduce distributional discrepancies and achieve stronger multi-degradation recovery capabilities, the SFN learns a mapping function that outputs modulation parameters for a given latent degradation representation. The SFN performs an affine transformation by scaling and shifting the features output from the previous DGM using the modulation parameters and outputs a new feature. This process is repeated through multiple DGMs, ultimately outputting a clear video image.
[0155] In some embodiments of the present invention, multi-scale feature extraction is performed on clear video images to obtain identification results of distressed persons, including:
[0156] S601. Construct a multi-scale target detection model; the multi-scale target detection model includes a backbone feature extraction network, a BiFormer attention module, a CBAM attention module, and a decoupling head module.
[0157] To address the issues of missed detections and false detections of small targets, this invention proposes a YOLOv5-based multi-scale target detection model. A dataset containing over 10,000 images of real people in distress at sea, simulated people in distress at sea, and drowning dummies was constructed. By training the YOLOv5-based multi-scale target detection network using this dataset, efficient and accurate detection of targets in distress at sea was achieved, effectively improving the search and rescue efficiency of aerial visual perception systems.
[0158] S602. Extract features from clear video images using the backbone feature extraction network to obtain initial features.
[0159] The multi-scale target detection module uses the backbone feature extraction network in the YOLOv5 network to extract features from clear video images, obtaining initial features. Then, the feature extraction is enhanced in the neck network using a feature pyramid. Finally, the prediction network predicts the object corresponding to the feature points, obtaining the recognition result. The addition of the Spatial Pyramid Pooling Layer (SPPF) solves the problem of needing to fix the scale of the input image in convolutional networks, reducing the loss of image information. The neck network introduces the BiFormer attention module and the CBAM attention module, which are built based on dynamic sparse attention modules. The BiFormer attention module dynamically reduces the computational load of the network while preserving fine-grained features of the target. Furthermore, the newly added CBAM attention module in the neck network combines spatial and channel attention mechanisms to perform feature weighting on the image data of people in distress at sea, thus improving the detection accuracy of multi-scale targets in distress at sea. This effectively improves the detection efficiency and accuracy of multi-scale targets falling into the water.
[0160] S603. Adaptive feature fusion is performed on the initial features using the BiFormer attention module to obtain fused features.
[0161] In the traditional YOLO model, feature maps are typically processed through convolutional layers and then predicted through fully connected layers. However, this approach loses detailed information in the image. The Bi-level Routing Attention Vision Transformer (BiFormer) introduces a two-layer routing attention mechanism while employing sparse sampling to extract feature information of targets in the image. BiFormer contains two attention layers. The first attention layer performs adaptive feature fusion on the input feature map to extract richer semantic information. The second attention layer performs weighted processing on the fused features to highlight important target regions. This two-layer attention mechanism can significantly improve the accuracy and robustness of target detection. However, while the traditional two-layer architecture design brings performance improvements, it also causes problems such as large memory consumption and high computational cost. BiFormer reduces the number of network parameters and computational cost by collecting key-value pairs in relevant windows and using sparsity operations to directly skip the computation of the least relevant regions.
[0162] S604. The fused features are weighted according to the CBAM attention module to obtain refined features.
[0163] The Convolutional Block Attention Module (CBAM) combines spatial and channel attention mechanisms. In the network, the output of the convolutional layers is processed by the channel attention module to extract color features. The weighted result is then compared with the color features by the spatial attention module, and finally, a weighted refinement is obtained.
[0164] S605. Based on the decoupling head module and the refined features, the recognition result is obtained.
[0165] The multi-scale target detection model, i.e. the improved YOLOv5 target detection network structure, can include a head structure (i.e., a decoupled head module). The head structure can include four detection heads, which can identify the four refined features output by the CBAM attention module respectively to obtain the recognition results. The specific recognition process can be set according to the actual situation. This embodiment of the invention does not limit it. This module can improve the network's detection accuracy for waterborne distress targets with specific color features.
[0166] In some embodiments of the present invention, after performing multi-scale feature extraction on clear video images to obtain the identification results of the distressed persons, the method further includes:
[0167] Based on the identification results, the drone is controlled to reach the location of the person in distress, and rescue is carried out when the drone descends to a preset distance.
[0168] In a specific embodiment of the present invention, wind is an unavoidable influencing factor in a water rescue environment. Traditional maritime drones struggle to maintain stable flight during the release of a lifebuoy. To address this issue, the present invention improves the lifebuoy release device for drones, reducing the impact of wind-induced lifebuoy swaying on drone stability. After locating the person in distress based on identification results, the maritime drone is controlled to reach the person's location and then descends to a preset distance (e.g., approximately 1.2 meters above the water surface) to release an inflatable lifebuoy, providing the person with the energy to sustain their life and thus enabling rescue.
[0169] The maritime distress rescue system mainly consists of two parts: a search and rescue platform and a drone platform, such as... Figure 7 As shown, when water rescue personnel receive a water rescue mission (i.e., mission request) through the search and rescue platform, a search and rescue drone takes off urgently from its base and arrives at the waters where the distressed person is located. The drone's search path is planned using an intelligent drone search path planning module. During the drone's flight, it acquires real-time video stream data of the water area through its onboard imaging device and transmits it back to the search and rescue platform. The search and rescue platform improves the quality of the video stream data in real time through a visual perception enhancement module. Subsequently, a multi-scale target detection module searches for the distressed person within the drone's field of view. Upon locating the distressed person, the system notifies the search and rescue platform and drops rescue items, such as inflatable lifebuoys, to extend the survival time and probability of the distressed person, buying valuable time for rescue personnel to take further action. After receiving the information from the drone, rescue personnel dispatch a search and rescue vessel to rescue the distressed person. This system improves the search efficiency of search and rescue missions, enhances the rescue personnel's perception of the incident scene, enables more accurate searches for distressed persons, and increases the success rate of search and rescue missions.
[0170] To better implement the multi-scenario distress person accurate identification method in the embodiments of the present invention, based on the multi-scenario distress person accurate identification method, the embodiments of the present invention also provide a multi-scenario distress person accurate identification device, such as... Figure 8 As shown, the multi-scenario distress personnel accurate identification device 800 includes:
[0171] The information acquisition module 801 is used to acquire the flight data and mission requirements of the UAV, as well as the distress information of the personnel in distress.
[0172] The area determination module 802 is used to determine the search area based on the distress information and mission requirements, and to determine the optimal flight altitude based on the search area and flight data;
[0173] The image processing module 803 is used to acquire the current video image from the current view of the drone when the drone reaches the optimal flight altitude of the search area, and to restore the details and restore the degradation of the current video image to obtain a clear video image.
[0174] The result recognition module 804 is used to extract multi-scale features from clear video images to obtain the identification results of the distressed persons.
[0175] The multi-scenario distress personnel accurate identification device 800 provided in the above embodiments can realize the technical solutions described in the above multi-scenario distress personnel accurate identification method embodiments. The specific implementation principles of each module or unit can be found in the corresponding content in the above multi-scenario distress personnel accurate identification method embodiments, which will not be repeated here.
[0176] The above provides a detailed description of the method and apparatus for accurate identification of distressed persons in multiple scenarios provided by the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A method for accurate identification of distressed persons in multiple scenarios, characterized in that, include: To obtain flight data and mission requirements of drones, as well as distress information of personnel in distress; The search area is determined based on the distress information and mission requirements, and the optimal flight altitude is determined based on the search area and the flight data. When the drone reaches the optimal flight altitude of the search area, the current video image from the drone's current perspective is captured, and the current video image is restored in detail and degraded to obtain a clear video image; Multi-scale feature extraction is performed on the clear video image to obtain the identification result of the person in distress; Determining the optimal flight altitude based on the search area and the flight data includes: The decision variables for the flight data in the search area are determined according to a preset probability relationship model; the decision variables include coverage area and flight route interval. Based on the coverage area and the flight path interval, construct an objective function that maximizes the discovery probability; The target scan width is determined by optimizing the objective function, the coverage area, and the flight path interval based on preset constraints and a preset parallel selection genetic algorithm. The optimal search flight altitude is obtained based on the target scan width. Based on historical datasets, the current video image is subjected to detail restoration and degradation recovery to obtain a clear video image, including: The potential degradation representation is obtained by comparing the historical dataset and the current video image using a contrastive degradation encoder. The current video image and the potential degradation representation are input into the degradation-guided restoration network for degradation restoration to obtain a clear video image; The step of performing multi-scale feature extraction on the clear video image to obtain the identification result of the person in distress includes: A multi-scale target detection model is constructed; the multi-scale target detection model includes a backbone feature extraction network, a BiFormer attention module, a CBAM attention module, and a decoupling head module; The backbone feature extraction network is used to extract features from the clear video image to obtain initial features; The initial features are adaptively fused using the BiFormer attention module to obtain fused features. The fused features are weighted according to the CBAM attention module to obtain refined features; The refined features are identified based on the decoupling head module to obtain the identification result.
2. The method for accurate identification of distressed persons in multiple scenarios according to claim 1, characterized in that, The distress information includes the probability density distribution of the initial location of the distressed personnel; determining the search area based on the distress information and the mission requirements includes: Based on the parallel line scanning search method and the task requirements, the scannable area of the UAV on the water is determined; Based on the influence of multiple factors on the distressed persons and the probability density distribution, a distressed persons drift model is established. The search area is obtained by simulating and optimizing the drift model of the distressed person and the scannable area.
3. The method for accurate identification of distressed persons in multiple scenarios according to claim 2, characterized in that, The simulation and optimization of the distressed person's drift model and the scannable area yields a search area, including: The wind speed at a preset sea level is estimated based on a preset wind pressure model, and the disturbance coefficient is determined. Based on the disturbance coefficient, the wind-induced drift velocity is obtained; The flow velocity at a preset water depth is estimated to obtain the flow-induced drift velocity; The drift velocity is obtained based on the wind-induced drift velocity and the flow-induced drift velocity; The drift velocity is input into the distressed person's drift model to perform random particle simulation optimization on the scannable area, thereby obtaining the search area.
4. The method for accurate identification of distressed persons in multiple scenarios according to claim 1, characterized in that, The step of restoring details and resolving degradation in the current video image to obtain a clear video image includes: The current weather is determined based on the drone; Based on the current weather, historical data sets are obtained from the severe weather image dataset; Based on the historical dataset, the current video image is restored in detail and degraded to obtain a clear video image.
5. The method for accurate identification of distressed persons in multiple scenarios according to claim 4, characterized in that, The process of generating the severe weather image dataset includes: Fog images are generated based on atmospheric scattering models to obtain a fog image dataset; Rainy day images are generated based on the rain map model, resulting in a rainy day image dataset; Based on the fog image dataset and the rainy day image dataset, an inclement weather image dataset is obtained.
6. The method for accurate identification of distressed persons in multiple scenarios according to claim 1, characterized in that, The step of comparing the historical dataset and the current video image according to the contrastive degradation encoder to obtain a potential degradation representation includes: The contrastive degradation encoder compares the historical dataset and the current video image to determine positive and negative samples. The potential degenerate representation is obtained by maximizing the consistency among positive samples in the positive samples and minimizing the consistency among negative samples in the negative samples.
7. A device for accurate identification of distressed persons in multiple scenarios, characterized in that, include: The information acquisition module is used to acquire the drone's flight data and mission requirements, as well as the distress information of the personnel in distress. The area determination module is used to determine the search area based on the distress information and the mission requirements, and to determine the optimal flight altitude based on the search area and the flight data; The image processing module is used to acquire the current video image from the current viewpoint of the drone when the drone reaches the optimal flight altitude of the search area, and to restore the details and restore the degradation of the current video image to obtain a clear video image. The result recognition module is used to extract multi-scale features from the clear video image to obtain the recognition result of the person in distress; Determining the optimal flight altitude based on the search area and the flight data includes: The decision variables for the flight data in the search area are determined according to a preset probability relationship model; the decision variables include coverage area and flight route interval. Based on the coverage area and the flight path interval, construct an objective function that maximizes the discovery probability; The target scan width is determined by optimizing the objective function, the coverage area, and the flight path interval based on preset constraints and a preset parallel selection genetic algorithm. The optimal search flight altitude is obtained based on the target scan width. Based on historical datasets, the current video image is subjected to detail restoration and degradation recovery to obtain a clear video image, including: The potential degradation representation is obtained by comparing the historical dataset and the current video image using a contrastive degradation encoder. The current video image and the potential degradation representation are input into the degradation-guided restoration network for degradation restoration to obtain a clear video image; The step of performing multi-scale feature extraction on the clear video image to obtain the identification result of the person in distress includes: A multi-scale target detection model is constructed; the multi-scale target detection model includes a backbone feature extraction network, a BiFormer attention module, a CBAM attention module, and a decoupling head module; The backbone feature extraction network is used to extract features from the clear video image to obtain initial features; The initial features are adaptively fused using the BiFormer attention module to obtain fused features. The fused features are weighted according to the CBAM attention module to obtain refined features; The refined features are identified based on the decoupling head module to obtain the identification result.