Ship night navigation environment intelligent identification system and method
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN RES INST FOR WATER TRANSPORT ENG M O T
- Filing Date
- 2026-05-27
- Publication Date
- 2026-06-30
Smart Images

Figure CN122313435A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and intelligent navigation assistance technology, and in particular relates to an intelligent identification system and method for navigation environment during nighttime navigation of ships. Background Technology
[0002] During nighttime navigation, ship safety is affected by factors such as insufficient lighting, large variations in target size, strong sea surface reflection, complex weather conditions, and numerous background interferences. Traditional manual lookout methods rely heavily on the experience of the pilot and are prone to missing dangerous targets due to visual fatigue and low visibility. With the development of infrared imaging and deep learning technologies, intelligent navigation perception based on nighttime infrared images is gradually becoming an important technical means to improve nighttime navigation safety.
[0003] Infrared imaging has the advantage of not relying on external lighting and being able to stably output target thermal radiation information at night. However, its images are usually single-channel grayscale images and suffer from problems such as low contrast, high noise, blurred edges, and obvious bright spot interference. Especially in the case of sea scenes, distant targets are often small in size and have weak signals, while nearby ships and shoreline obstacles have greater scale differences, which places higher demands on image enhancement and target detection.
[0004] In terms of intelligent recognition, YOLO-like networks are widely used in target detection due to their single-stage structure, high speed, and convenient deployment. However, the original YOLO network was mainly designed for natural scenes and lacked adaptability to complex navigation scenarios such as weak ship targets, buoys, bridge piers, floating obstacles, and shoreline clutter in nighttime single-channel infrared images. In particular, when small distant targets and large nearby targets coexist near the horizon, the fixed multi-scale fusion and unified bounding box regression method cannot meet the recognition needs of targets of different scales. Summary of the Invention
[0005] In view of this, the present invention aims to propose an intelligent identification system and method for navigation environments during nighttime ship navigation, in order to solve the following problems existing in the prior art: The quality of infrared images at night is poor; it is difficult to simultaneously achieve dark area brightening, noise suppression, detail preservation, and high-brightness clutter constraint; it is difficult to achieve stable identification of targets of different scales; the localization of extremely small targets is unstable and the adaptability of the unified detection paradigm is insufficient; and it lacks semantic understanding and intelligent auxiliary decision-making capabilities for high-altitude general aviation environments.
[0006] To achieve the above objectives, the technical solution of the present invention is implemented as follows: In a first aspect, the present invention provides a method for intelligent identification of navigation environment during nighttime navigation of ships, comprising the following steps: S1. Acquire single-channel infrared images of ship nighttime navigation scenes, and normalize and segment the single-channel infrared images to obtain image samples. S2. Image enhancement processing is performed on the image samples to obtain enhanced infrared images; the image enhancement processing adopts a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night flight scenarios; S3. The enhanced infrared image is input into the recognition network, which includes a backbone network, a horizon-constrained scale decoupling feature fusion module, and a point-box collaborative detection head. The backbone network extracts multi-scale features from the enhanced infrared image. The horizon-constrained scale decoupling feature fusion module performs position weighting on the multi-scale features based on the horizon response information, and divides the weighted features into small target feature branches and large target feature branches for scale decoupling fusion, thereby obtaining small target fusion features and large target fusion features respectively. S4. The point-box collaborative detection head receives small target fusion features and large target fusion features, outputs candidate results through the small target point detection branch and the large target box detection branch respectively, and filters and fuses the candidate results through the result fusion unit to obtain the final target detection result. S5. Based on the final target detection result, the navigation environment identification result is output through the navigation environment discrimination unit.
[0007] Furthermore, in step S1, the single-channel infrared image is normalized and segmented, including: The image is linearly transformed using min-max normalization. Abnormal pixels outside the high and low percentiles are truncated first and then mapped to the range of 0 to 1. The normalized image is divided into blocks using a sliding window method, with the sample block size set to 640×640, and overlapping areas are preserved between adjacent sample blocks.
[0008] Furthermore, step S2 employs a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night flight scenarios, including: S21. Perform robust dynamic range stretching on the normalized image; S22. Perform joint estimation of brightness and noise in the image, and adaptively determine the denoising intensity, brightness correction intensity, local contrast enhancement intensity, and wavelet detail enhancement intensity. S23. Perform adaptive noise reduction and brightness correction on the image, as well as local contrast enhancement; S24. Construct a structure confidence map based on the gradient information and local variance information of the image, and construct a highlight suppression map based on the distribution of the highlight areas in the image. S25. Perform multi-scale wavelet decomposition on the image after local contrast enhancement, combine the structure confidence map and the brightness suppression map to constrain and enhance the wavelet detail subbands, and then perform inverse wavelet reconstruction to obtain the enhanced infrared image.
[0009] Furthermore, in step S3, the process of the horizon-constrained scale decoupling feature fusion module processing multi-scale features includes: The second layer features output by the backbone network are used to extract the horizontal band response region in the image, generate one-dimensional horizon response information, and expand along the width direction of the image to form a two-dimensional horizon response map; Directional context enhancement is performed on the multi-scale features output by the backbone network. Directional context enhancement includes lateral strip feature extraction, local detail feature extraction, and large receptive field context feature extraction. The location weights of features at each scale after directional context enhancement are applied using a two-dimensional horizon response map.
[0010] Furthermore, in step S3, scale decoupling fusion includes: The small target feature branch receives the second-scale feature and the third-scale feature. After upsampling the third-scale feature to the same resolution as the second-scale feature, it is fused to obtain the small target fused feature. The small target feature branch adopts a detail-first adaptive competitive fusion method; The large target feature branch receives the fourth-scale feature and the fifth-scale feature. The fourth-scale feature is downsampled to the same resolution as the fifth-scale feature and then fused to obtain the large target fused feature. Large target feature branches adopt a semantic-first adaptive competitive fusion method.
[0011] Furthermore, in step S4, the processing procedure of the point-box collaborative detection head includes: The small target detection branch receives small target fusion features, outputs center point response information and size compensation information, and converts them into small target candidate results; The large target bounding box detection branch receives the fused features of large targets and outputs candidate results for large targets. The result fusion unit represents small target candidate results and large target candidate results as a unified candidate result set, performs scale filtering based on candidate box size, resolves conflicts of candidate results with the same category and an overlap higher than a preset threshold, and finally performs non-maximum suppression to obtain the final target detection result.
[0012] Furthermore, in step S5, the navigation environment discrimination unit comprehensively judges the channel unobstructedness, obstacle density, and encounter risks in the night navigation scenario based on the target category, target quantity, target spatial distribution, and density of dangerous targets.
[0013] Secondly, based on the same concept, the present invention also provides an intelligent identification system for navigation environments during nighttime ship navigation, comprising: The image enhancement module is used to acquire single-channel infrared images of ships navigating at night, perform normalization and block processing, and use a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night navigation scenarios to enhance the images and obtain enhanced infrared images. The feature extraction and fusion module includes a backbone network and a horizon-constrained scale-decoupled feature fusion module. The backbone network is used to extract multi-scale features from the enhanced infrared image. The horizon-constrained scale-decoupled feature fusion module is used to perform position weighting on the multi-scale features based on the horizon response information and to perform scale decoupling fusion, outputting small target fusion features and large target fusion features. The detection module includes a point-bound box collaborative detection head. The point-bound box collaborative detection head is used to receive small target fusion features and large target fusion features, output candidate results through the small target point detection branch and the large target bounding box detection branch, and fuse them to obtain the final target detection result. The environment discrimination module is used to output navigation environment identification results based on the final target detection results.
[0014] Compared with existing technologies, the intelligent identification system and method for navigation environment during nighttime navigation of ships of the present invention have the following advantages: (1) Since the present invention adopts a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for single-channel infrared images at night, it enhances the dark areas while constraining noise amplification and high-brightness clutter. Therefore, it can significantly improve the distinguishability of distant weak targets and low-contrast targets, and reduce the interference of sea surface clutter, shore light reflection and local bright spots on subsequent detection.
[0015] (2) Since the present invention introduces a horizon constraint mechanism in the feature fusion stage, it can focus on enhancing the key regional features in the night flight scenario, which is mainly distributed near the sea-sky boundary, thereby improving the response capability of distant weak targets near the horizon.
[0016] (3) Because the present invention sets up a scale decoupling feature fusion module, high-resolution features are mainly used for distant weak targets, and medium and low resolution features are mainly used for close large targets, thereby overcoming the problem that the traditional fixed multi-scale fusion method is difficult to take into account the recognition of targets of different scales in the scenario where distant small and close large coexist.
[0017] (4) Since the present invention uses a point-box collaborative detection head, it uses point detection for distant point-like small targets and box detection for close-range large-scale targets. Therefore, it can avoid the problem of unstable localization of extremely small targets by unified bounding box regression, while maintaining the accurate localization capability of close-range large targets.
[0018] (5) Since the present invention not only outputs target detection results, but also performs semantic discrimination on the general aviation environment, it can provide higher-level environmental cognition information for night flight auxiliary decision making and has stronger engineering application value. Attached Figure Description
[0019] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an undue limitation of the invention. In the drawings: Figure 1 This is a schematic diagram of the overall system structure according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the image enhancement process according to an embodiment of the present invention; Figure 3 This is a schematic diagram of the horizon response extraction module and the directional context enhancement module according to an embodiment of the present invention; Figure 4 This is a schematic diagram of a scale decoupling subunit according to an embodiment of the present invention; Figure 5 This is a schematic diagram of the dot-frame collaborative detection head according to an embodiment of the present invention; Figure 6 This is a simulation diagram of the enhanced infrared image according to an embodiment of the present invention; Figure 7 This is a simulation diagram of the general aviation environment identification results in an embodiment of the present invention. Detailed Implementation
[0020] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other.
[0021] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0022] like Figures 1 to 5 As shown, this invention provides a method for intelligent identification of navigation environment during nighttime navigation of ships, comprising the following steps: Step S1: Acquire a single-channel infrared image of a ship's nighttime navigation scene, and normalize and segment the single-channel infrared image to obtain an image sample. Step S2: Perform image enhancement processing on the image samples to obtain enhanced infrared images; the image enhancement processing adopts a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night flight scenarios. Step S3: Input the enhanced infrared image into the recognition network. The recognition network includes a backbone network, a horizon-constrained scale decoupling feature fusion module, and a point-box collaborative detection head. The backbone network extracts multi-scale features from the enhanced infrared image. The horizon-constrained scale decoupling feature fusion module performs position weighting on the multi-scale features based on the horizon response information, and divides the weighted features into small target feature branches and large target feature branches for scale decoupling fusion, thereby obtaining small target fusion features and large target fusion features respectively. Step S4: The point-box collaborative detection head receives the fusion features of small targets and the fusion features of large targets, outputs candidate results through the small target point detection branch and the large target box detection branch respectively, and filters and fuses the candidate results through the result fusion unit to obtain the final target detection result; Step S5: Based on the final target detection result, the navigation environment identification result is output through the navigation environment discrimination unit.
[0023] Specifically, in step S1, the normalization and block processing of the single-channel infrared image includes: applying min-max normalization to linearly transform the image; preferably, abnormal pixels outside the high and low percentiles are first truncated and then mapped to between 0 and 1; and using a sliding window method to divide the normalized image into blocks, with the sample block size set to 640×640, and the overlapping area between adjacent sample blocks is retained.
[0024] Specifically, step S2 employs a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night flight scenarios, which includes the following sub-steps: S21. Perform robust dynamic range stretching on the normalized image; S22. Perform joint estimation of brightness and noise in the image, and adaptively determine the denoising intensity, brightness correction intensity, local contrast enhancement intensity, and wavelet detail enhancement intensity. S23. Perform adaptive noise reduction and brightness correction on the image, as well as local contrast enhancement; S24. Construct a structure confidence map based on the gradient information and local variance information of the image, and construct a highlight suppression map based on the distribution of the highlight areas in the image. S25. Perform multi-scale wavelet decomposition on the image after local contrast enhancement, combine the structure confidence map and the brightness suppression map to constrain and enhance the wavelet detail subbands, and then perform inverse wavelet reconstruction to obtain the enhanced infrared image.
[0025] Specifically, in step S3, the process of the horizon constraint scale decoupling feature fusion module processing multi-scale features includes: extracting the horizontal strip response region in the image using the second layer features output by the backbone network to generate one-dimensional horizon response information, and expanding it along the image width direction to form a two-dimensional horizon response map; performing directional context enhancement on the multi-scale features output by the backbone network, which includes horizontal strip feature extraction, local detail feature extraction, and large receptive field context feature extraction; and using the two-dimensional horizon response map to perform position weighting on the features of each scale after directional context enhancement.
[0026] Specifically, in step S3, scale decoupling fusion includes: the small target feature branch receives the second scale feature and the third scale feature, upsamples the third scale feature to the same resolution as the second scale feature, and then fuses them to obtain the small target fused feature; the small target feature branch adopts a detail-first adaptive competitive fusion method; the large target feature branch receives the fourth scale feature and the fifth scale feature, downsamples the fourth scale feature to the same resolution as the fifth scale feature, and then fuses them to obtain the large target fused feature; the large target feature branch adopts a semantic-first adaptive competitive fusion method.
[0027] Specifically, in step S4, the processing of the point-bound box collaborative detection head includes: the small target point detection branch receives the small target fusion features, outputs the center point response information and size compensation information, and converts them into small target candidate results; the large target box detection branch receives the large target fusion features and outputs the large target candidate results; the result fusion unit represents the small target candidate results and the large target candidate results as a unified candidate result set, performs scale screening according to the candidate box size, resolves conflicts for candidate results with the same category and an overlap higher than a preset threshold, and finally performs non-maximum suppression to obtain the final target detection result.
[0028] Specifically, in step S5, the navigation environment discrimination unit comprehensively judges the channel unobstructedness, obstacle density, and encounter risks in the night navigation scenario based on the target category, target quantity, target spatial distribution, and density of dangerous targets.
[0029] The specific implementation method is as follows: In a preferred embodiment of the present invention, in step S1, single-channel infrared images and their corresponding annotation information are read from the ship night navigation scene dataset, including three sets of data: port scene, near-shore scene and open waterway scene. Each set of data includes night infrared images and corresponding target annotations and navigation environment annotations.
[0030] The target labels include at least the categories of ships, buoys, bridge piers, shoreline obstacles, and floating objects; the navigation environment labels include at least the categories of unobstructed waterways, sparse obstacles, dense obstacles, encounter risks, and high-risk warnings.
[0031] In the dataset, the ratio of training set to test set is maintained at 3:1.
[0032] The input nighttime infrared images are normalized. Min-max normalization is used to linearly transform the original images, mapping the resulting values to between 0 and 1.
[0033] Since the brightness distribution of different scenes in nighttime infrared images varies greatly, and there may be a very small number of abnormal bright spots and abnormal dark spots in the images, it is preferable to first truncate the abnormal pixels outside the high and low percentiles, and then perform normalization processing so that the normalized image pixel values are stably distributed in the interval [0, 1].
[0034] After normalization, the image is divided into blocks using a sliding window method to obtain samples. Preferably, the sample block size is set to 640×640, and a certain overlap area is retained between adjacent sample blocks to avoid small targets near the horizon being cut off and lost.
[0035] In a preferred embodiment of the present invention, in step S2, because distant targets in night navigation scenarios are small in size and have low contrast, and the sea surface background also contains dark noise, bright spots, and reflective interference, the single-channel infrared image is first enhanced before being input into the recognition network. The enhancement process includes the following steps: (2a) Perform robust dynamic range stretching on the normalized image to suppress the influence of a small number of abnormal bright spots and abnormal dark spots on the overall contrast distribution; (2b) Perform joint estimation of brightness and noise for the image, and adaptively determine the denoising intensity, brightness correction intensity, local contrast enhancement intensity and wavelet detail enhancement intensity based on the mean, standard deviation, median, bright percentile and noise level of the image. (2c) Perform adaptive denoising and brightness correction on the image to improve the visibility of targets in dark areas; (2d) Local contrast enhancement is applied to the image to improve the difference between the edges of ships, buoys and obstacles and the background; (2e) Then, construct a structure confidence map based on the gradient information and local variance information of the image to characterize the location regions in the image that are more likely to belong to the real target edges and effective structures; at the same time, construct a highlight suppression map based on the distribution of the highlight regions in the image to suppress the over-amplification of shore lights, water surface reflections and local bright spots in the subsequent enhancement process. (2f) Perform multi-scale wavelet decomposition on the image after local contrast enhancement, perform threshold suppression and detail enhancement on each detail sub-band, and combine the aforementioned structure confidence map and highlight suppression map to constrain the wavelet details. Then perform inverse wavelet reconstruction to obtain the final enhanced image.
[0036] The enhancement process used in this step includes robust stretching, adaptive denoising, adaptive brightness correction, local contrast enhancement, and threshold wavelet detail enhancement. This process can enhance distant, weak targets while suppressing dark noise and bright clutter, providing a clearer input image for subsequent recognition networks.
[0037] In a preferred embodiment of the present invention, in step S3, the enhanced infrared image obtained in step 2 is input into the recognition network to construct a target detection and general aviation environment recognition model for night flight scenarios.
[0038] The recognition network first includes a backbone network for extracting multi-scale features from the input image. The backbone network preferably uses the existing YOLO backbone structure and outputs 5 layers of feature maps.
[0039] Since the second output layer in the 5-layer feature map retains both high spatial resolution and is more stable than the shallowest layer features, this invention preferably uses the second output layer features as input to the horizon response extraction module to extract horizon response information. Specifically, firstly, the horizontal strip response regions in the image are extracted based on the second output layer features to generate one-dimensional horizon response information, which is then expanded along the image width to form a two-dimensional horizon response map, used to characterize the area near the sea-sky boundary that is more likely to contain distant, weak targets. Subsequently, based on the spatial resolution of the feature maps at different scales, the two-dimensional horizon response map is adjusted to a size that matches the feature maps at each scale, so as to perform positional weighting on features at different scales. On this basis, channel unification and directional context enhancement are performed on the multi-scale features output by the backbone network. Directional context enhancement includes horizontal strip feature extraction, local detail feature extraction, and large receptive field context feature extraction. Horizontal strip features are used to enhance horizontally extended structures such as the horizon, coastline, and channel boundaries; local detail features are used to preserve the edges and textures of small targets; and large receptive field context features are used to enhance the overall semantic information in complex backgrounds. Subsequently, the horizon response map was used to perform positional weighting on the features at each scale, which enhanced the relevant features of small targets near the horizon, while suppressing invalid background responses far from the key area.
[0040] Because distant, weak targets and close-range, large targets have different dependencies on features at different scales, this invention further sets up a scale decoupling subunit to process features at different scales separately. Specifically, the small target feature branch focuses on processing high-resolution features, upsampling the third-scale features to the same resolution as the second-scale features, and then fusing the two to obtain the small target fused feature. This branch prioritizes retaining high-resolution detail information in the second-scale features while using the third-scale features to supplement mid-level semantic information, used to describe distant, weak targets near the horizon. The large target feature branch focuses on processing mid-to-low-resolution features, downsampling the fourth-scale features to the same resolution as the fifth-scale features, and then fusing the two to obtain the large target fused feature. This branch prioritizes retaining the large receptive field semantic information in the fifth-scale features while using the fourth-scale features to supplement target contour and position information, used to describe close-range ships and large-scale obstacles.
[0041] During the branch fusion process, both the small target feature branch and the large target feature branch adopt an adaptive competitive fusion approach. Specifically, the two feature paths to be fused are first unified, and then the corresponding fusion weights are automatically generated based on the differences between the two input features. The two features are then weighted and fused according to different positions. Among them, the small target feature branch adopts a detail-first adaptive competitive fusion approach, while the large target feature branch adopts a semantic-first adaptive competitive fusion approach, enabling the network to utilize features differently for targets of different scales.
[0042] Unlike the fixed method used in ordinary YOLO for multi-scale fusion, the module of this invention can adaptively adjust the contribution of features at each scale according to the difference in horizon position and target scale in night flight scenarios, thereby improving the ability to simultaneously identify small targets at a distance and large targets at a close distance.
[0043] In a preferred embodiment of the present invention, in step S4, a point-box collaborative detection head is constructed based on the output features of the horizon constraint scale decoupling feature fusion module, which is used to simultaneously adapt to distant point-like small targets and close-range large-scale targets.
[0044] The point-boundary collaborative detection head includes a small target point detection branch, a large target bounding box detection branch, and a result fusion unit.
[0045] The small target detection branch primarily receives small target fusion features and is used to detect the center point of distant, weak targets near the horizon. This branch first performs channel processing and local feature extraction on the input features, then outputs target center point response information and size compensation information, which are then converted into small target candidate results by a bounding box recovery unit. The small target candidate results include at least the target category, target location bounding box, and target confidence score.
[0046] The large target bounding box detection branch primarily receives large target fusion features and outputs large target candidate results using conventional target detection methods. The large target candidate results include at least the target category, target location bounding box, and target confidence score, and are mainly used to identify large-scale targets such as nearby ships, dock facilities, and bridge piers.
[0047] The result fusion unit is used to uniformly filter and fuse the output results of the point detection branch and the bounding box detection branch. Specifically, firstly, the candidate results for small targets and large targets are uniformly represented as a set of candidate results including target category, target location bounding box, target confidence score, and source information; secondly, the candidate results are scale-filtered according to the size of the candidate bounding box, with point detection branch results being prioritized for smaller targets and bounding box detection branch results being prioritized for larger targets, while targets in the transition scale range are retained for subsequent conflict resolution steps. Then, conflict resolution is performed on candidate results of the same category with an overlap higher than a preset threshold, where point detection branch results are prioritized for obviously small targets and bounding box detection branch results are prioritized for obviously large targets, and candidate results in the transition scale range are weighted and fused according to confidence score; finally, non-maximum suppression is performed on the fused candidate results to obtain the final detection result.
[0048] The fused final detection results are further input into the general aviation environment discrimination unit, which outputs the general aviation environment identification results based on the target category, target quantity, target spatial distribution, and hazardous target density.
[0049] By using point-boundary collaborative detection, we can avoid the problem of unstable localization of extremely small targets by ordinary bounding box regression, while retaining the ability to accurately locate large targets at close range.
[0050] In a preferred embodiment of the present invention, the parameters of the enhancement module, the horizon constraint scale decoupling feature fusion module, and the point-box collaborative detection head in steps S2, S3, and S4 are as follows: For the image enhancement module, a two-level wavelet decomposition is used; For robust dynamic range stretching, high and low percentile cutoff methods are used; For local contrast enhancement, an adaptive block enhancement method is adopted; For structurally constrained wavelet enhancement, threshold suppression and gain enhancement are performed on each detail subband; For highlight clutter suppression, suppression constraints are applied to the highlight areas in the image.
[0051] For the backbone network, output 5 layers of feature maps; For horizon response extraction, the second output layer feature map is used, and the extracted one-dimensional horizon response information is expanded into a two-dimensional horizon response map. Then, matching is performed based on the size of the feature maps at different scales. The directional context enhancement module includes a horizontal strip feature extraction unit, a local detail feature extraction unit, and a large receptive field context feature extraction unit. For the horizontal strip feature extraction unit, a 1×7 or 1×11 horizontal convolutional structure is used; For the local detail feature extraction unit, a 3×3 convolutional structure is used; For the context feature extraction unit with a large receptive field, a convolutional structure with a dilation rate of 2 or 3 is used; For the scale decoupling and fusion module, the small target feature branch uses second-scale and third-scale features, where the third-scale features are upsampled to the second scale and then fused; the large target feature branch uses fourth-scale and fifth-scale features, where the fourth-scale features are downsampled to the fifth scale and then fused. For the fusion method of the two scale branches, an adaptive competitive fusion method is adopted, in which the small target branch adopts detail-first fusion and the large target branch adopts semantic-first fusion.
[0052] For the point-box collaborative detection head, the small target point detection branch receives the small target fusion features and outputs the center point response information and size compensation information; The large target bounding box detection branch receives the fused features of large targets and outputs the target category, target location bounding box, and target confidence score; The distinction between small and large targets is based on the target width and height thresholds. Preferably, targets with a width and height of no more than 32 pixels are treated as small targets. For the result fusion unit, a combined processing approach of candidate result merging, scale screening, conflict resolution, and nonmaximum suppression is adopted.
[0053] In a preferred embodiment of the present invention, the enhanced image sample block obtained in step 2 is used as the input of the recognition network, and the target category, target location and navigation environment category in the training dataset are used as supervision information. By solving the error between the prediction result and the real annotation and performing backpropagation, the parameters of the recognition network are optimized to obtain the trained recognition model.
[0054] For distant, small targets, the training process converts them into center point supervision information and size supervision information required for the point detection branch; for close-range, large targets, the training is performed using conventional bounding box supervision.
[0055] In a preferred embodiment of the present invention, a test image sample block is input into a trained recognition model. First, an enhanced infrared image is obtained through an image enhancement module. Then, the image is processed through a backbone network, a horizon constraint scale decoupling feature fusion module, and a point-box collaborative detection head to output the target category, target location, and corresponding general aviation environment recognition results in the test image.
[0056] Among them, distant and weak targets near the horizon are identified by the point detection branch, while large targets at close range are identified by the bounding box detection branch, thus achieving stable identification of complex targets in night flight scenarios.
[0057] A method for intelligent identification of navigation environment during nighttime ship navigation, such as... Figures 1 to 5 As shown, this invention provides an intelligent recognition system for navigation environments during nighttime ship navigation, comprising: an image enhancement module for acquiring single-channel infrared images of a ship's nighttime navigation scene, performing normalization and block processing, and enhancing the image using a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for nighttime navigation scenes to obtain an enhanced infrared image; a feature extraction and fusion module, including a backbone network and a horizon-constrained scale decoupling feature fusion module; the backbone network is used to extract multi-scale features from the enhanced infrared image; the horizon-constrained scale decoupling feature fusion module is used to perform position weighting on multi-scale features based on horizon response information and perform scale decoupling fusion to output small target fusion features and large target fusion features; a detection module, including a point-box collaborative detection head; the point-box collaborative detection head is used to receive small target fusion features and large target fusion features, output candidate results through small target point detection branches and large target box detection branches, and fuse them to obtain the final target detection result; and an environment discrimination module, used to output navigation environment recognition results based on the final target detection results.
[0058] Specifically, the image enhancement module is used to: perform robust dynamic range stretching of the image; perform joint estimation of brightness and noise, and adaptively determine enhancement parameters; perform adaptive denoising, brightness correction and local contrast enhancement; construct a structure confidence map and a highlight suppression map; perform multi-scale wavelet decomposition, combine the structure confidence map and the highlight suppression map to constrain and enhance wavelet details, and perform inverse wavelet reconstruction.
[0059] Specifically, the horizon constraint scale decoupling feature fusion module is used to: extract horizon response information using the features of the second output layer of the backbone network and expand it into a two-dimensional horizon response map; perform directional context enhancement on multi-scale features, including lateral strip feature extraction, local detail feature extraction and large receptive field context feature extraction; use the two-dimensional horizon response map to perform position weighting on the enhanced features; divide the weighted features into small target feature branches and large target feature branches, and fuse features of different scales respectively.
[0060] Specifically, the small target feature branch integrates the second-scale and third-scale features, while the large target feature branch integrates the fourth-scale and fifth-scale features.
[0061] Specifically, the point-bound collaborative detection head includes: a small target point detection branch, which outputs center point response information and size compensation information; a large target bounding box detection branch, which outputs target category, target location box and target confidence; and a result fusion unit, which performs scale filtering, conflict resolution and nonmaximum suppression on the output results of the small target point detection branch and the large target bounding box detection branch.
[0062] Example 1 like Figure 1 The system's overall structure is shown in the diagram. The input single-channel nighttime infrared image first enters the image enhancement module for preprocessing to improve the visibility of targets in dark areas and suppress noise and high-brightness clutter. The enhanced image is then input into the backbone network for multi-scale feature extraction, resulting in feature maps of different resolutions. Subsequently, the system extracts horizon response information based on the features of the second output layer of the backbone network and generates a horizon response map to characterize key areas near the sea-sky boundary that are more likely to contain distant, weak targets. Simultaneously, horizontal strip feature extraction, local detail feature extraction, and large receptive field context feature extraction are performed on the multi-scale features to obtain directional context enhancement features suitable for nighttime navigation scenarios. Afterward, the system performs scale decoupling processing on the enhanced multi-scale features, sending high-resolution features to the small target feature branch and medium-to-low-resolution features to the large target feature branch, and then adaptively fusing them to obtain small target fused features and large target fused features. The small target fused feature input point detection branch and the large target fused feature input bounding box detection branch are used. Finally, the results of the point detection branch and the bounding box detection branch are input into the point-bounding box result fusion unit. After non-maximum suppression, the final target detection result is output, and the corresponding nighttime navigation environment recognition result is further obtained.
[0063] Example 2 like Figure 2 The image enhancement flowchart is shown below. The input single-channel nighttime infrared image is first subjected to robust dynamic range stretching, followed by joint estimation of brightness and noise to generate an enhancement parameter set. Subsequently, under the control of the enhancement parameter set, the image undergoes adaptive denoising, brightness correction, and local contrast enhancement. Based on this, a structure confidence map and a highlight suppression map are generated from the enhanced image, and multi-scale wavelet decomposition is performed on the enhanced image. Then, the wavelet subbands, structure confidence map, highlight suppression map, and enhancement parameter set are input into the constrained wavelet detail enhancement module to enhance the wavelet details. Finally, the final enhanced image is output through inverse wavelet reconstruction.
[0064] Example 3 like Figure 3As shown, the diagram illustrates the horizon response extraction module and the directional context enhancement module. The second-level features of the backbone network are first input into the horizon response extraction module, undergoing pooling, 1×1 convolution, and ReLU activation to obtain horizon response features. These features are then reshaped to create a two-dimensional horizon response map, representing the lateral band-like region near the sea-sky boundary that is more likely to contain distant, weak targets. Simultaneously, any level of features extracted by the backbone network are input into the directional context enhancement module, entering into three branches: lateral feature extraction, local detail feature extraction, and large receptive field context feature extraction. The lateral feature extraction branch uses a 1×7 horizontal convolution to extract lateral extension structures such as the horizon, coastline, and channel boundaries; the local detail feature extraction branch uses a 3×3 convolution to extract target edges and local textures; and the large receptive field context feature extraction branch uses a large receptive field spatial convolution to extract overall semantic information against a complex background. These three features are then concatenated and fused using a 1×1 convolution to obtain the directional context enhancement features. Finally, the two-dimensional horizon response map is multiplied position by position with the fused directional context enhancement features to enhance the target response in key areas near the horizon. The original fused features are preserved by residual addition, and the horizon-constrained directional context enhancement features are output.
[0065] Example 4 like Figure 4As shown, the scale decoupling subunit is used. For small target feature fusion, the third-level feature is first upsampled to the same resolution as the second-level feature. Then, the difference between the upsampled third-level feature and the second-level feature is calculated and concatenated. Subsequently, the concatenated features are sequentially input into a weight generation subunit composed of 1×1 convolution, 3×3 convolution, and 1×1 convolution, outputting two fusion weights. The fusion weights are then normalized using the Softmax activation function. After that, the normalized two fusion weights are multiplied by the second-level feature and the upsampled third-level feature, respectively. The two weighted results are then added together to obtain the small target fusion feature. This process can preferentially retain the high-resolution detail information in the second-level feature, while using the third-level feature to supplement the mid-level semantic information, thereby enhancing the representation ability of distant and weak targets near the horizon. For large target feature fusion, the fourth-level features are first downsampled to the same resolution as the fifth-level features. Then, the differences between the downsampled fourth-level and fifth-level features are calculated and concatenated. Subsequently, the concatenated features are input into a weight generation subunit with the same structure as the small target feature fusion, resulting in two normalized fusion weights. Then, the two fusion weights are multiplied by the downsampled fourth-level and fifth-level features respectively, and the weighted results are summed to obtain the large target fusion feature. This process prioritizes the preservation of high-level semantic information in the fifth-level features while utilizing the fourth-level features to supplement target contour and position information, thereby enhancing the representation ability of nearby ships and large-scale obstacles. Through this method, the scale decoupling subunit can perform feature alignment, adaptive competitive fusion, and feature enhancement for targets of different scales, improving the recognition stability of targets with both distant and near-terminal sizes in night navigation scenarios.
[0066] Example 5 like Figure 5As shown, this is the point-box collaborative detection head. The point-box collaborative detection head consists of a point detection branch, a box detection branch, and a point-box result fusion module. The upper part is the point detection branch, whose input is the small target fusion feature. It first undergoes a 3×3 convolution for local feature extraction, and then outputs a center heatmap and a size map through two 1×1 convolutions respectively. The center heatmap, after Sigmoid activation, represents the response intensity at the center position of a distant, weak target, while the size map, after ReLU activation, represents the size information of the corresponding target. Subsequently, the center heatmap is pooled and thresholded to extract local maxima points, and combined with the size map for size recovery, resulting in small target candidate boxes. The lower part is the box detection branch, whose input is the large target fusion feature. It first undergoes a 3×3 convolution for feature extraction, and then outputs category information, target confidence, and box regression parameters through three 1×1 convolutions respectively. The category information is activated by Softmax to obtain the classification result, the target confidence is activated by Sigmoid to obtain the target existence probability, and the box regression parameters are used to recover large target candidate boxes, thus forming large target candidate results. The right side shows the point-box result fusion module. This module first performs dual-threshold scale filtering on candidate boxes for small and large targets based on preset small and large target thresholds. Then, it resolves conflicts between candidate results of the same category that overlap. Finally, it outputs the final detection result through non-maximum suppression. With this structure, the present invention can stably identify distant, weak point targets using point detection, while accurately locating large-scale targets at close range using box detection. Furthermore, it achieves collaborative detection of targets of different scales in nighttime flight scenarios through a point-box fusion strategy.
[0067] Example 6 like Figure 6 As shown, the image was taken at 20:00 on April 6, 2026. The image was used to perform image enhancement processing on the image sample to obtain a simulation effect of the enhanced infrared image. like Figure 7 The image shown is a simulation result of the general aviation environment identification. Figure 7 In the image, the top left corner shows the ships identified by the algorithm; the top right corner shows the identified navigation marks; the bottom left corner shows the identified lighthouses and ships; and the bottom right corner shows the identified lighthouses and ships.
[0068] Alternative and Extended Implementation Methods: It should be understood that the above embodiments are merely preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Those skilled in the art can make equivalent substitutions or adaptive adjustments to the system architecture, algorithm modules, and parameter configurations without departing from the principles of the present invention, and these modifications also fall within the scope of protection of the present invention. Specifically, alternative or extended implementation methods of the present invention include, but are not limited to: (1) In terms of data acquisition, in addition to single-channel infrared images, the input images can also be far-infrared thermal images, mid-wave infrared images, near-infrared grayscale images or low-light black and white images. As long as they can characterize the target outline and thermal radiation differences or grayscale differences in nighttime scenes, they can be used to achieve the purpose of this invention.
[0069] (2) At the image enhancement level, robust dynamic range stretching can be replaced by piecewise linear stretching, Retinex brightness compensation, adaptive histogram transformation or local grayscale mapping; structural constraint wavelet enhancement can be replaced by multi-scale pyramid enhancement, frequency domain detail enhancement, guided filter detail enhancement or bilateral filter detail enhancement; high brightness clutter suppression can also be achieved by high brightness region threshold constraint, bright spot suppression filtering or local response limitation.
[0070] (3) At the horizon response extraction level, the horizon response information can be generated by the features of the second output layer of the backbone network, or by the features of the highest resolution output layer or by multiple high resolution features. The horizon response map can be constructed by one-dimensional row response expansion, or by strip region detection, horizontal response extraction or row direction weight mapping.
[0071] (4) At the scale decoupling and fusion level, the input scale of the small target feature branch and the large target feature branch is not limited to the combination of the second and third scales or the fourth and fifth scales. It can also be set to other high-resolution and low-resolution feature combinations according to the number of output layers and resolution of the backbone network. The resolution alignment method can be upsampling or downsampling, or it can be achieved by interpolation, pooling, stride convolution or deconvolution.
[0072] (5) Regarding the branch fusion method, adaptive competitive fusion can adopt either a weight generation method based on convolution or a weight generation method based on gating mechanism, attention mechanism, lightweight multilayer perceptron or dynamic convolution; the fusion of two features can be achieved by either weighted addition or weighted concatenation followed by compression fusion.
[0073] (6) At the point-box collaborative detection head level, the small target point detection branch can output center point response information and size compensation information, and can also output center point offset information; the large target box detection branch can use a decoupled detection head, a coupled detection head, an anchor-box detection head or an anchorless detection head; the result fusion unit can use a candidate screening method based on scale threshold, or a method based on confidence ranking, category priority matching, weighted fusion or soft suppression.
[0074] (7) At the level of general aviation environment output, general aviation environment identification can output discrete environment categories, continuous risk scores, or three-level risk status of "safe / warning / danger". Its judgment criteria can be based on the number and category of targets, or can be further combined with the target distribution density, the proportion of dangerous targets and the waterway navigability index.
[0075] (8) At the post-processing level, the final detection results can be further combined with target tracking, trajectory prediction, risk assessment or prior constraints of the airway to form a more complete intelligent auxiliary identification system for night navigation.
[0076] The beneficial effects of this invention are: (1) Since the present invention adopts a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for single-channel infrared images at night, it enhances the dark areas while constraining noise amplification and high-brightness clutter. Therefore, it can significantly improve the distinguishability of distant weak targets and low-contrast targets, and reduce the interference of sea surface clutter, shore light reflection and local bright spots on subsequent detection.
[0077] (2) Since the present invention introduces a horizon constraint mechanism in the feature fusion stage, it can focus on enhancing the key regional features in the night flight scenario, which is mainly distributed near the sea-sky boundary, thereby improving the response capability of distant weak targets near the horizon.
[0078] (3) Because the present invention sets up a scale decoupling feature fusion module, high-resolution features are mainly used for distant weak targets, and medium and low resolution features are mainly used for close large targets, thereby overcoming the problem that the traditional fixed multi-scale fusion method is difficult to take into account the recognition of targets of different scales in the scenario where distant small and close large coexist.
[0079] (4) Since the present invention uses a point-box collaborative detection head, it uses point detection for distant point-like small targets and box detection for close-range large-scale targets. Therefore, it can avoid the problem of unstable localization of extremely small targets by unified bounding box regression, while maintaining the accurate localization capability of close-range large targets.
[0080] (5) Since the present invention not only outputs target detection results, but also performs semantic discrimination on the general aviation environment, it can provide higher-level environmental cognition information for night flight auxiliary decision making and has stronger engineering application value.
[0081] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for intelligent identification of a navigation environment for night navigation of a ship, characterized in that: Includes the following steps: S1. Acquire single-channel infrared images of ship nighttime navigation scenes, and normalize and segment the single-channel infrared images to obtain image samples. S2. Image enhancement processing is performed on the image samples to obtain enhanced infrared images; the image enhancement processing adopts a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night flight scenarios; S3. The enhanced infrared image is input into the recognition network, which includes a backbone network, a horizon-constrained scale decoupling feature fusion module, and a point-box collaborative detection head. The backbone network extracts multi-scale features from the enhanced infrared image. The horizon-constrained scale decoupling feature fusion module performs position weighting on the multi-scale features based on the horizon response information, and divides the weighted features into small target feature branches and large target feature branches for scale decoupling fusion, thereby obtaining small target fusion features and large target fusion features respectively. S4. The point-box collaborative detection head receives small target fusion features and large target fusion features, outputs candidate results through the small target point detection branch and the large target box detection branch respectively, and filters and fuses the candidate results through the result fusion unit to obtain the final target detection result. S5. Based on the final target detection result, the navigation environment identification result is output through the navigation environment discrimination unit.
2. The intelligent identification method for navigation environment during nighttime navigation of ships according to claim 1, characterized in that: In step S1, the single-channel infrared image is normalized and segmented, including: The image is linearly transformed using min-max normalization. Abnormal pixels outside the high and low percentiles are truncated first and then mapped to the range of 0 to 1. The normalized image is divided into blocks using a sliding window method, with the sample block size set to 640×640, and overlapping areas are preserved between adjacent sample blocks.
3. The intelligent identification method for navigation environment during nighttime navigation of ships according to claim 1, characterized in that: Step S2 employs a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night flight scenarios, including: S21. Perform robust dynamic range stretching on the normalized image; S22. Perform joint estimation of brightness and noise in the image, and adaptively determine the denoising intensity, brightness correction intensity, local contrast enhancement intensity, and wavelet detail enhancement intensity. S23. Perform adaptive noise reduction and brightness correction on the image, as well as local contrast enhancement; S24. Construct a structure confidence map based on the gradient information and local variance information of the image, and construct a highlight suppression map based on the distribution of the highlight areas in the image. S25. Perform multi-scale wavelet decomposition on the image after local contrast enhancement, combine the structure confidence map and the brightness suppression map to constrain and enhance the wavelet detail subbands, and then perform inverse wavelet reconstruction to obtain the enhanced infrared image.
4. The intelligent identification method for navigation environment during nighttime navigation of ships according to claim 1, characterized in that: In step S3, the process of the horizon-constrained scale decoupling feature fusion module processing multi-scale features includes: The second layer features output by the backbone network are used to extract the horizontal band response region in the image, generate one-dimensional horizon response information, and expand along the width direction of the image to form a two-dimensional horizon response map; Directional context enhancement is performed on the multi-scale features output by the backbone network. Directional context enhancement includes lateral strip feature extraction, local detail feature extraction, and large receptive field context feature extraction. The location weights of features at each scale after directional context enhancement are applied using a two-dimensional horizon response map.
5. The intelligent identification method for navigation environment during nighttime navigation of ships according to claim 1, characterized in that: In step S3, scale decoupling fusion includes: The small target feature branch receives the second-scale feature and the third-scale feature. After upsampling the third-scale feature to the same resolution as the second-scale feature, it is fused to obtain the small target fused feature. The small target feature branch adopts a detail-first adaptive competitive fusion method; The large target feature branch receives the fourth-scale feature and the fifth-scale feature. The fourth-scale feature is downsampled to the same resolution as the fifth-scale feature and then fused to obtain the large target fused feature. Large target feature branches adopt a semantic-first adaptive competitive fusion method.
6. The intelligent identification method for navigation environment during nighttime navigation of ships according to claim 1, characterized in that: In step S4, the processing procedure of the point-boundary collaborative detection head includes: The small target detection branch receives small target fusion features, outputs center point response information and size compensation information, and converts them into small target candidate results; The large target bounding box detection branch receives the fused features of large targets and outputs candidate results for large targets. The result fusion unit represents small target candidate results and large target candidate results as a unified candidate result set, performs scale filtering based on the candidate box size, resolves conflicts for candidate results of the same category and with an overlap higher than a preset threshold, and finally performs non-maximum suppression to obtain the final target detection result.
7. The intelligent identification method for navigation environment during nighttime navigation of ships according to claim 1, characterized in that: In step S5, the navigation environment discrimination unit comprehensively judges the channel unobstructedness, obstacle density, and encounter risks in the night navigation scenario based on the target category, target quantity, target spatial distribution, and density of dangerous targets.
8. A smart identification system for navigation environment at night for ships, used to implement the smart identification method for navigation environment at night for ships as described in any one of claims 1-7, characterized in that: include: The image enhancement module is used to acquire single-channel infrared images of ships navigating at night, perform normalization and block processing, and use a brightness-noise joint adaptive and structure-constrained wavelet enhancement method for night navigation scenarios to enhance the images and obtain enhanced infrared images. The feature extraction and fusion module includes a backbone network and a horizon-constrained scale-decoupled feature fusion module; the backbone network is used to extract multi-scale features of enhanced infrared images. The horizon-constrained scale decoupling feature fusion module is used to perform position weighting on multi-scale features based on horizon response information and to perform scale decoupling fusion, outputting small target fusion features and large target fusion features; The detection module includes a point-bound box collaborative detection head. The point-bound box collaborative detection head is used to receive small target fusion features and large target fusion features, output candidate results through the small target point detection branch and the large target bounding box detection branch, and fuse them to obtain the final target detection result. The environment discrimination module is used to output navigation environment identification results based on the final target detection results.