A ship height binocular measurement method for bridge ship collision risk early warning

By building a binocular vision measurement system and using the YOLOv8 model with Transformer and deformable convolution DC for ship image processing, the problem of insufficient accuracy in ship size measurement was solved, and high-precision bridge collision warning was achieved, reducing the risk of bridge collisions.

CN120907441BActive Publication Date: 2026-06-19SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2025-08-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing ship size measurement methods malfunction or lack accuracy in rainy or snowy weather, leading to an increased risk of bridge-to-ship collisions. Furthermore, there is insufficient research on binocular vision technology in the field of bridge-to-ship collision prevention.

Method used

A binocular vision measurement system was built, and the YOLOv8 model, which integrates Transformer and deformable convolution DC, was used for ship image instance segmentation. Combined with orientation-aware Gaussian filtering, structural residual adjustment and semantic attention mechanism, edge detection was performed through multi-scale pyramid structure, and the images were converted into three-dimensional spatial coordinates in the world coordinate system for height measurement and early warning.

🎯Benefits of technology

It achieves high-precision and highly adaptable ship height measurement, reduces the risk of bridge collisions, improves bridge navigation safety, and is suitable for real-time measurement and early warning in complex environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120907441B_ABST
    Figure CN120907441B_ABST
Patent Text Reader

Abstract

This invention discloses a binocular measurement method for ship height in bridge collision risk early warning. Specifically, it involves: constructing a binocular vision measurement system oriented towards the ship navigation area; installing two camera devices on both banks of the waterway to form a binocular imaging structure with parallel optical axes; calibrating the system through displacement adjustment and distance measurement to establish a mapping relationship between the world coordinate system and the image pixel coordinate system; acquiring ship image information and extracting the key point position information of the ship target in the image; converting the image coordinates into spatial coordinates to obtain the ship's three-dimensional position and structural features; calculating the ship's height and other main dimensional parameters; and comparing the measured ship size data with the preset bridge clearance limits in the navigation area to issue early warnings for ships exceeding the height limit. This method can be widely applied in the field of bridge traffic safety, improving the bridge's proactive identification and response capabilities to ship collision risks, and providing stable and reliable technical support for bridge navigation safety.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of active bridge collision avoidance technology, and specifically relates to a binocular method for measuring ship height for bridge collision risk early warning. Background Technology

[0002] With the rapid development of water transport, the density of ships in navigable areas has increased significantly, and the tonnage of ships has also shown a clear upward trend, which has led to a continuous increase in the risk of ship-bridge collisions. Incidents of bridges on navigable waterways being struck by ships occur frequently.

[0003] Current methods for measuring ship dimensions mainly include lidar, monocular vision, and binocular vision. However, lidar suffers from instrument malfunction in rainy or snowy weather, potentially leading to the undetected presence of oversized vessels and increasing the risk of collisions. Monocular vision's measurement accuracy is significantly affected by distance, and since ship images are taken at considerable distances from bridges, measurement accuracy is further compromised. Binocular vision technology offers a new approach to addressing these issues. However, research on binocular vision methods for bridge collision avoidance remains lacking, necessitating urgent research into ship dimension measurement methods from a binocular vision perspective to enhance the active collision avoidance capabilities of bridges. Summary of the Invention

[0004] Technical Problem: The purpose of this invention is to provide a binocular measurement method for ship height for bridge collision risk early warning, which does not rely on high-density labeled data and can be widely applied in the field of bridge navigation safety, providing reliable support for the bridge's ability to actively identify and respond to ship collision risks.

[0005] Technical Solution: To achieve the above objectives, the present invention provides a binocular measurement method for ship height for bridge collision risk early warning, comprising the following steps:

[0006] Step 1: Build a binocular vision measurement system, install and calibrate industrial cameras on both banks of the ship navigation area, obtain the intrinsic and extrinsic parameters of the industrial cameras, and establish the transformation relationship between the world coordinate system and the pixel coordinate system of the industrial cameras.

[0007] Step 2: Ship images are acquired using an industrial camera. The YOLOv8 model, which integrates Transformer and Deformable Convolution DC, is used for ship image instance segmentation. The edge consistency and filtering adaptability are improved by combining orientation-aware Gaussian filtering, structural residual adjustment and semantic attention mechanism. Window size prediction based on local texture complexity and multi-scale pyramid structure are introduced to achieve dynamic adjustment of the filtering window and noise suppression. On this basis, edge detection is carried out to extract target edge information, and finally output a ship contour image with continuous structure and clear edges.

[0008] Step 3: Based on the calibration parameters of the binocular vision system and the ship's feature point information, the pixel coordinates are... u m p Convert to three-dimensional space coordinates in the world coordinate system ( X , Y , Z By calculating the ship's critical points Three-dimensional spatial coordinates in the world coordinate system The height, length, and width of the vessel are determined, and finally, based on the maximum clearance height preset for the waterway, warnings are issued to vessels exceeding the height limit.

[0009] in,

[0010] In step 1, a binocular vision measurement system is set up, an industrial camera is installed in the ship navigation area, and the camera's intrinsic and extrinsic parameters, including the focal length of the left camera, are calibrated. Right camera focal length Principal point coordinates Rotation matrix Translation vector Among them, the focal length of the left camera Right camera focal length These are parameters describing the optical characteristics of a camera lens; principal point coordinates. These are the coordinates of the intersection point of the optical axis and the image plane in the image coordinate system, and the rotation matrix. R Nine elements a , b , c , d , e , , , , This represents the three coordinate axes in the right camera coordinate system. X , Y , Z The direction vector has three components in this coordinate system, and the translation vector... T In , , This indicates the position of the right camera relative to the left camera in the world coordinate system. X , Y , Z Translation in direction, scale factor s To normalize the pixel coordinates, the calibration process is shown in the following formula:

[0011] ,

[0012] In the formula, The focal length is represented along the horizontal axis of the image. The length of a pixel unit along the axial direction. It represents the pixel unit length of the focal length along the vertical axis of the image, i.e., the v-axis.

[0013] In step 2, acquiring ship images using an industrial camera is based on the YOLOv8 architecture. By embedding a Transformer module in the detection head, global context enhancement processing is applied to the feature maps to improve the model's detection accuracy for dense and occluded targets. The multi-head attention mechanism in the Transformer module is used to extract spatial dependency features of different semantic regions in the image in parallel. A deformable convolution module is introduced to enhance the model's ability to model target deformation and complex boundaries. This module introduces spatial offsets at each output convolution sampling point to achieve [the desired effect]. The convolutional region is dynamically adjusted to adapt to feature extraction requirements under different structural forms; each sampling point independently learns position information, and all channels within a batch share the same set of offsets to ensure computational consistency and reduce redundancy; among which... It refers to the size of the convolution kernel, used to describe the size of the sampling region in the convolution operation.

[0014] In step 2, the edge detection employs an adaptive composite filtering method to remove noise. This method includes the following processing steps:

[0015] Introducing local structural orientation fields during the filtering process The covariance matrix of the Gaussian filter kernel is adjusted based on this directional field. This ensures that the principal axis of the Gaussian kernel aligns with the structural orientation field, thereby improving the directional consistency of the filter in the structural region. The covariance matrix is ​​defined as follows:

[0016] ,

[0017] In the formula, A two-dimensional Gaussian kernel at the pixel position in the image The covariance matrix at that location, For structural direction field The corresponding main direction, To rotate the Gaussian kernel to the direction Two-dimensional rotation matrix, and These represent the standard deviations in the parallel and vertical directions, respectively, controlling the extent to which the filter kernel expands in both directions; yes The transpose of the matrix;

[0018] Next, the structural residual adjustment branch is embedded, and the original image is positioned... pixel value at and predicted image at location pixel value at Edge response differences To minimize the image structure and enhance its fidelity, the objective function is... for:

[0019] ,

[0020] In the formula, This indicates traversing all coordinate positions in the image. The error of each pixel is summed up. N This represents the total number of edge points involved in the calculation; Denotes the Euclidean norm; E Indicates edge response fidelity error;

[0021] During the filtering process, a semantic distribution map of the image is extracted based on a lightweight semantic segmentation network. Constructing semantic attention masks Used for dynamically adjusting filter strength The adjustment process is defined as follows:

[0022] ,

[0023] In the formula, For in position Filtering intensity at that point Based on the basic filter strength, express A function used to map semantic scores to intensity weights;

[0024] Introduce a structure confidence factor for each pixel. This factor is calculated by combining edge strength and semantic boundary, and is used to weight and integrate the filtered output, resulting in the final filtered image output. Expressed as:

[0025] ,

[0026] In the formula, Indicates the original image at position Pixel value at that location, This indicates the position of the image after filtering. The pixel value at that location.

[0027] In step 2, ship images are acquired using an industrial camera. During image processing, a window size prediction module based on local texture complexity is introduced to achieve adaptive adjustment of the filtering window. This specifically includes the following steps:

[0028] By calculating the local gray-level variance of each pixel's neighborhood in the image or edge density The local texture complexity is evaluated, and the optimal size of the filtering window is dynamically predicted accordingly. The window size prediction function is defined as follows:

[0029] ,

[0030] In the formula, and These are the weighting coefficients. and These are the maximum values ​​of the overall grayscale variance and the edge density, respectively, used for normalization processing;

[0031] Design a multi-scale pyramid structure to construct a pyramid image sequence with multiple resolution levels from the input image. Apply the corresponding scale filtering window at each scale level. Feature extraction and noise suppression are performed.

[0032] The calibration parameters and ship feature point information based on the binocular vision system mentioned in step 3 refer to the measurement system employing a binocular stereo vision structure with collinear optical axes, and the left camera mounted in the world coordinate system. O-XYZ The origin is not applied, and the right camera is rotated through the rotation matrix. R Translation vector T Transform to the world coordinate system to establish a binocular stereo vision structure with collinear optical axes;

[0033] The optical centers of the left and right imaging planes are respectively The focal lengths are respectively ;lie in O-XYZ Key points There is a correspondence between the two image planes, as shown in the following formula:

[0034] ,

[0035] In the formula, s To normalize pixel coordinates, This represents the normalized coordinates of a point in space on the left image. This represents the normalized coordinates in the corresponding right image. This represents the depth value of a spatial point in the left camera coordinate system;

[0036] Because the two image planes are mirror symmetrical The following formula represents:

[0037] ,

[0038] therefore, The three-dimensional coordinates are simplified to:

[0039] ,

[0040] In the formula, B The distance between the two cameras is also known as the baseline.

[0041] Step 3, which involves calibrating the binocular vision system and obtaining ship feature point information, further includes measuring the object's position and depth along the Z-axis in the world coordinate system. In the calibrated binocular vision measurement model, by using focal length and baseline... B and the highest point in both images Coordinates are used to calculate the height of objects in a 3D scene; points The imaging heights on the left and right image planes are respectively and In a binocular vision measurement system with collinear optical axes, the following triangulation method is used:

[0042] ,

[0043] In the formula, These are the distances between the ship and the left and right cameras, respectively.

[0044] at the same time , B The data was obtained through calibration, and the formula for calculating the ship's height is simplified to:

[0045] .

[0046] Beneficial effects: Compared with the prior art, the present invention has the following beneficial effects:

[0047] High measurement accuracy and strong adaptability: This invention adopts a binocular stereo vision measurement system with collinear optical axes. Combined with high-precision calibration parameters and spatial geometric models, it can accurately calculate the three-dimensional coordinate position of the ship in the world coordinate system, significantly improve the accuracy of height measurement, and avoid the errors caused by scale uncertainty in monocular vision measurement.

[0048] Strong edge detail preservation capability: The YOLOv8 model, which integrates Transformer and deformable convolution, is introduced to achieve high-precision instance segmentation. Combined with various edge enhancement strategies such as orientation-aware Gaussian filtering and structural residual adjustment, the edge consistency and image fidelity of structural regions are effectively improved, ensuring clear ship boundaries and continuous contours.

[0049] Equal emphasis on noise suppression and adaptive filtering: By introducing local texture complexity analysis and multi-scale pyramid structure, the size of the filtering window is dynamically and adaptively adjusted. Furthermore, by combining semantic attention mask and structural confidence factor, differentiated filtering strategies are applied to different regions, thereby balancing the image's denoising capability and edge preservation performance.

[0050] The bridge collision avoidance early warning system is highly practical: This invention does not rely on high-density labeled data, has a high degree of automation in the measurement process, and is suitable for real-time height measurement and early warning judgment of different types of ships in complex navigation environments. It can provide a reliable perception basis and early warning basis for bridge active collision avoidance systems, reduce the risk of bridge structure collisions, and improve navigation safety.

[0051] In summary, this invention is not only innovative in image processing algorithms, but also has good practicality and promotional value in engineering deployment, and can provide effective support for bridge structural safety monitoring and ship traffic management. Attached Figure Description

[0052] Figure 1 This is a flowchart of the present invention;

[0053] Figure 2 This is a schematic diagram of a binocular vision system;

[0054] Figure 3 For instance segmentation graphs;

[0055] Figure 4 This is a multi-scale pixel pyramid diagram;

[0056] Figure 5 This is a schematic diagram of binocular imaging with collinear optical axes. Detailed Implementation

[0057] The technical solution and beneficial effects of the present invention will be described in detail below with reference to the accompanying drawings.

[0058] like Figure 1 As shown, the present invention provides a binocular method for ship height measurement for bridge collision risk early warning, comprising the following steps:

[0059] Step 1: Set up a binocular vision measurement system, install industrial cameras on both banks of the ship navigation area and calibrate them. Figure 2 (This involves) acquiring the intrinsic and extrinsic parameters of the industrial camera and establishing the transformation relationship between the world coordinate system and the pixel coordinate system of the industrial camera;

[0060] Step 2: Acquire ship images using an industrial camera, and then segment the ship images using a YOLOv8 model that fuses Transformer and Deformable Convolution (DC). Figure 3 This combines orientation-aware Gaussian filtering, structural residual adjustment, and semantic attention mechanisms to improve edge consistency and filtering adaptability, and introduces window size prediction based on local texture complexity and a multi-scale pyramid structure. Figure 4 This allows for dynamic adjustment of the filtering window and noise suppression; finally, it outputs a ship outline image with continuous structure and clear edges.

[0061] Step 3: Based on the calibration parameters of the binocular vision system and the ship's feature point information, the pixel coordinates are... Convert to three-dimensional space coordinates in the world coordinate system ( X , Y , Z By calculating the ship's critical points Three-dimensional spatial coordinates in the world coordinate system The height, length, and width of the vessel are determined; finally, warnings are issued to vessels exceeding the height limit based on the maximum clearance height preset for the waterway.

[0062] Furthermore, the method for constructing the binocular vision measurement system in step one is as follows: Figure 2 As shown, industrial-grade CMOS cameras (model Basler acA2440-35um) were installed on both banks of the ship navigation area to form a binocular stereo vision system with a fixed baseline length of B=1.460km. The lens focal length was set to... =24mm, image resolution is 2448×2048 pixels. Perform camera intrinsic and extrinsic parameter calibration to obtain the following parameters; principal point coordinates: Rotation matrix With translation vector Establish a world coordinate system ( X , Y , Z ) and pixel coordinate system The conversion relationship between them. Among them, the focal lengths of the left and right cameras are parameters describing the optical characteristics of the camera lenses. These are the coordinates of the intersection point of the optical axis and the image plane in the image coordinate system, and the rotation matrix. R The translation vector represents the rotation relationship between the right camera coordinate system and the left camera coordinate system. T Describe the translational displacement of the origin of the right camera coordinate system relative to the origin of the left camera coordinate system;

[0063] Furthermore, in step one, the camera calibration process is as follows:

[0064] ,

[0065] In the formula, s Used for pixel coordinate normalization The focal length is represented along the horizontal axis of the image. The length of a pixel unit along the axial direction. It represents the pixel unit length of the focal length along the vertical axis of the image, i.e., the v-axis.

[0066] like Figure 3As shown, in step two, based on the YOLOv8 architecture, a Transformer module is embedded in the detection head to perform global context enhancement processing on the feature map, thereby improving the model's detection accuracy for dense and occluded targets. The multi-head attention mechanism in the Transformer module is used to extract spatial dependency features of different semantic regions in the image in parallel. The multi-head attention dimension is 8, and the nesting layer is 2, to enhance the spatial dependency modeling ability of densely occluded regions. A deformable convolution module is then introduced to enhance the model's ability to model target deformation and complex boundaries. This module introduces a spatial offset at each output position's convolution sampling point, achieving [the desired effect]. Dynamic adjustment of the convolutional region allows it to adapt to feature extraction requirements under different structural forms. The backbone convolutional layer kernel is replaced, and the sampling offset dimension is... This improves the responsiveness to non-rigid boundaries. Each sampling point learns its position information independently, and all channels within a batch share the same set of offsets to ensure computational consistency and reduce redundancy.

[0067] Furthermore, in step two, the edge extraction process in step 2 employs an adaptive composite filtering method to remove noise. A local structural orientation field is introduced during the filtering process. The covariance matrix of the Gaussian filter kernel is adjusted based on this directional field. This ensures that the principal axis of the Gaussian kernel aligns with the structural orientation field, thereby improving the directional consistency of the filter in the structural region. The covariance matrix is ​​defined as follows:

[0068] ,

[0069] In the formula, A two-dimensional Gaussian kernel at the pixel position in the image The covariance matrix at that location, For structural direction field The corresponding main direction, To rotate the Gaussian kernel to the direction Two-dimensional rotation matrix, and These represent the standard deviations in the parallel and perpendicular directions, respectively, controlling the extent to which the filter kernel expands in both directions. Take 2.0, Take 0.5.

[0070] Subsequently, a structural residual adjustment branch is embedded, and the original image is positioned... pixel value at and predicted image at location pixel value at Edge response differences To minimize the image and enhance the fidelity of its structure, the objective function is:

[0071] ,

[0072] In the formula, This indicates traversing all coordinate positions in the image. The error of each pixel is summed up. N This represents the total number of edge points involved in the calculation; This represents the Euclidean norm.

[0073] Then, the semantic distribution map of the image is extracted based on the lightweight semantic segmentation network (SegFormer-B0). Constructing semantic attention masks Used for dynamically adjusting filter strength The adjustment process is defined as follows:

[0074] ,

[0075] In the formula, For in position Filtering intensity at that point The base filter strength is set to 1.2. express A function used to map semantic scores to intensity weights;

[0076] Finally, a structure confidence factor is introduced for each pixel. This factor is calculated by combining edge strength and semantic boundary, and is used to weight and integrate the filtered output, resulting in the final filtered image output. Expressed as:

[0077] ,

[0078] In the formula, Indicates the original image at position Pixel value at that location, This indicates the position of the image after filtering. The pixel value at that location.

[0079] Furthermore, step 2 of the image processing also introduces a window size prediction module based on local texture complexity. This module assesses texture complexity by calculating the local gray-level variance and edge density of the pixel neighborhood, thereby dynamically predicting the filter window size. The window size prediction function is defined as follows:

[0080] ,

[0081] In the formula, This is the optimal size for the filtering window. Let V be the local gray-level variance of each pixel's neighborhood in the image. For edge density, and These are the weighting coefficients, with values ​​of [values ​​to be filled in]. , , and These are the maximum values ​​of the overall grayscale variance and the edge density, respectively, used for normalization.

[0082] like Figure 4 As shown, the multi-scale pyramid structure constructs a pyramid image sequence with multiple resolution levels from the input image. Construct a 5-layer pyramid here, with each layer having a resolution equal to that of the original image. Apply the corresponding scale filtering window at each scale level. Feature extraction and noise suppression are performed.

[0083] Furthermore, the binocular measurement system described in step 3 is a three-dimensional structure with collinear optical axes, such as... Figure 5 The industrial camera on the left is fixed in the world coordinate system. O-XYZ Origin, without applying any rotation transformation, rotation matrix between the right industrial camera and it. R With translation moment T Represented as:

[0084] ,

[0085] In the formula, there are nine elements. a b, c, d, e , , , This represents the three coordinate axes in the right camera coordinate system. X , Y , Z The direction vector has three components in this coordinate system, which are the components of the rotation matrix. , , This indicates the position of the right camera relative to the left camera in the world coordinate system. X , Y , Z The amount of translation in a direction is collectively called the translation vector. T The optical centers of the left and right imaging planes are respectively The focal lengths are respectively .lie in O-XYZ spatial point There is a correspondence between the two image planes, as shown in the following formula:

[0086] ,

[0087] In the formula,s To normalize pixel coordinates, This represents the normalized coordinates of a point in space on the left image. This represents the normalized coordinates in the corresponding right image. This represents the depth value of a spatial point in the left camera coordinate system;

[0088] The three-dimensional coordinates can be simplified to:

[0089] ,

[0090] In the formula, B The distance between the two cameras is also known as the baseline.

[0091] Furthermore, step 3 also includes determining the object's position and depth along the Z-axis. The calculation is based on the highest point of the ship in the image. Imaging height of the left and right images and Based on the binocular geometric calibration parameters, the object height is calculated using the following trigonometric relationship:

[0092] ,

[0093] In the formula, These are the distances between the ship and the left and right cameras, respectively.

[0094] at the same time , B The data can be obtained through calibration, and the formula for calculating the ship's height can be simplified to:

[0095] .

Claims

1. A binocular method for ship height measurement for bridge collision risk early warning, characterized in that... Includes the following steps: Step 1: Build a binocular vision measurement system, install and calibrate industrial cameras on both banks of the ship navigation area, obtain the intrinsic and extrinsic parameters of the industrial cameras, and establish the transformation relationship between the world coordinate system and the pixel coordinate system of the industrial cameras. Step 2: Ship images are acquired using an industrial camera. The YOLOv8 model, which integrates Transformer and Deformable Convolution DC, is used for ship image instance segmentation. The edge consistency and filtering adaptability are improved by combining orientation-aware Gaussian filtering, structural residual adjustment and semantic attention mechanism. Window size prediction based on local texture complexity and multi-scale pyramid structure are introduced to achieve dynamic adjustment of the filtering window and noise suppression. On this basis, edge detection is carried out to extract target edge information, and finally output a ship contour image with continuous structure and clear edges. The edge detection employs an adaptive composite filtering method to remove noise, which includes the following processing steps: Introducing local structural orientation fields during the filtering process The covariance matrix of the Gaussian filter kernel is adjusted based on this directional field. This ensures that the principal axis of the Gaussian kernel aligns with the structural orientation field, thereby improving the directional consistency of the filter in the structural region. The covariance matrix is ​​defined as follows: , In the formula, Let be the covariance matrix of a two-dimensional Gaussian kernel at pixel position (x,y) in the image. For structural direction field The corresponding main direction, To rotate the Gaussian kernel to the direction Two-dimensional rotation matrix, and These represent the standard deviations in the parallel and vertical directions, respectively, controlling the extent to which the filter kernel expands in both directions; yes The transpose of the matrix; Next, the structural residual adjustment branch is embedded, and the pixel value at position (x,y) of the original image is... And predict the pixel value of the image at position (x,y). Edge response differences Minimize the image to enhance the fidelity of its structure; its optimization objective function is... for: , In the formula, This means traversing all coordinate positions (x, y) in the image, summing up the error of each pixel, and N represents the total number of edge points involved in the calculation; E represents the Euclidean norm; E represents the edge response fidelity error. During the filtering process, a semantic distribution map of the image is extracted based on a lightweight semantic segmentation network. Constructing semantic attention masks Used for dynamically adjusting filter strength The adjustment process is defined as follows: , In the formula, Let be the filter strength at position (x, y). Based on the basic filter strength, express A function used to map semantic scores to intensity weights; Introduce a structure confidence factor for each pixel. This factor is calculated by combining edge strength and semantic boundary, and is used to weight and integrate the filtered output, resulting in the final filtered image output. Expressed as: , In the formula, I(x,y) represents the pixel value of the original image at position (x,y). filtered (x,y) represents the pixel value at position (x,y) in the image after filtering; Step 3: Based on the calibration parameters of the binocular vision system and the ship's feature point information, convert the pixel coordinates (ump) into three-dimensional spatial coordinates (X,Y,Z) in the world coordinate system, and calculate the ship's key points. Three-dimensional spatial coordinates in the world coordinate system The height, length, and width of the vessel are determined, and finally, based on the maximum clearance height preset for the waterway, warnings are issued to vessels exceeding the height limit.

2. The binocular measurement method for ship height for bridge collision risk early warning as described in claim 1, characterized in that, In step 1, a binocular vision measurement system is set up, an industrial camera is installed in the ship navigation area, and the camera's intrinsic and extrinsic parameters, including the focal length of the left camera, are calibrated. Right camera focal length Principal point coordinates (c x ,c y ), rotation matrix Translation vector Among them, the focal length of the left camera Right camera focal length These are parameters describing the optical characteristics of a camera lens, specifically the principal point coordinates (c). x ,c y The coordinates of the intersection of the optical axis and the image plane in the image coordinate system are given. The nine elements a, b, c, d, e, f, g, h, i in the rotation matrix R represent the three components of the direction vectors of the three coordinate axes (X, Y, Z) in the right camera coordinate system. The translation vector T contains w... x w y w z This represents the translation of the right camera relative to the left camera in the X, Y, and Z directions of the world coordinate system. The scale factor s is the pixel coordinate normalization. The calibration process is shown in the following formula: , In the formula, It represents the pixel unit length of the focal length along the horizontal axis of the image, i.e., the u-axis. It represents the pixel unit length of the focal length along the vertical axis of the image, i.e., the v-axis.

3. The binocular measurement method for ship height for bridge collision risk early warning as described in claim 2, characterized in that, In step 2, ship images are acquired using an industrial camera. Based on the YOLOv8 architecture, a Transformer module is embedded in the detection head to perform global context enhancement processing on the feature map, thereby improving the model's detection accuracy for dense and occluded targets. The multi-head attention mechanism in the Transformer module is used to extract spatial dependency features of different semantic regions in the image in parallel. A deformable convolution module is introduced to enhance the model's ability to model target deformation and complex boundaries. This module introduces spatial offsets for each output convolution sampling point to achieve dynamic adjustment of the k×k convolution region, enabling it to adapt to feature extraction requirements under different structural forms. Each sampling point learns position information independently, and all channels within a batch share the same set of offsets to ensure computational consistency and reduce redundancy. Here, k refers to the size of the convolution kernel, used to describe the size of the sampling region in the convolution operation.

4. The binocular measurement method for ship height for bridge collision risk early warning as described in claim 3, characterized in that, In step 2, ship images are acquired using an industrial camera. During image processing, a window size prediction module based on local texture complexity is introduced to achieve adaptive adjustment of the filtering window. This specifically includes the following steps: By calculating the local gray-level variance of each pixel's neighborhood in the image or edge density The local texture complexity is evaluated, and the optimal size of the filtering window is dynamically predicted accordingly. The window size prediction function is defined as follows: , In the formula, α and β are weighting coefficients. and These are the maximum values ​​of the overall grayscale variance and the edge density, respectively, used for normalization processing; Design a multi-scale pyramid structure to construct a pyramid image sequence with multiple resolution levels from the input image. Apply the corresponding scale filtering window at each scale level. Feature extraction and noise suppression are performed.

5. The binocular measurement method for ship height for bridge collision risk early warning as described in claim 4, characterized in that, The calibration parameters and ship feature point information based on the binocular vision system mentioned in step 3 are obtained by using a binocular stereo vision structure with collinear optical axes in the measurement system. The left camera is installed at the origin of the world coordinate system O-XYZ without any rotation transformation, and the right camera is transformed to the world coordinate system through the rotation matrix R and the translation vector T, thereby establishing a binocular stereo vision structure with collinear optical axes. The optical centers of the left and right imaging planes are respectively , The focal lengths are respectively , Key points located in O-XYZ There is a correspondence between the two image planes, as shown in the following formula: , In the formula, s represents the normalized pixel coordinates. , This represents the normalized coordinates of a point in space on the left image. , This represents the normalized coordinates in the corresponding right image. This represents the depth value of a spatial point in the left camera coordinate system; Since the two image planes are mirror-symmetric, [R|T] is expressed as follows: , therefore, The three-dimensional coordinates are simplified to: , In the formula, B is the distance between the two cameras, also known as the baseline.

6. The binocular measurement method for ship height for bridge collision risk early warning as described in claim 5, characterized in that, Step 3, which involves calibrating the binocular vision system and obtaining ship feature point information, further includes measuring the object's position and depth along the Z-axis in the world coordinate system. In a calibrated binocular vision measurement model, the height of objects in a 3D scene is calculated using the focal length, baseline B, and the y-coordinate of the highest point in both images; The imaging heights on the left and right image planes are respectively and In a binocular vision measurement system with collinear optical axes, the following triangulation method is used: , In the formula, , These are the distances between the ship and the left and right cameras, respectively. at the same time The data for B is obtained through calibration, and the formula for calculating the ship's height is simplified to: 。

Citation Information

Patent Citations

  • Super-resolution image processing method and device based on artificial intelligence

    CN119295316A

  • Defect detection method for high-voltage equipment based on deep learning and multispectral image fusion

    CN120355722A