Unmanned ship-based sea cucumber cultivation dynamic monitoring system and method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By decoupling and fusing asynchronous multi-rate sensor data and using a forward prediction attitude compensation method, the problem of underwater image sequence misalignment under the influence of wind and waves by unmanned vessels was solved, enabling centimeter-level identification and counting of individual sea cucumbers, thus improving the accuracy and efficiency of sea cucumber farming monitoring.

CN122244657APending Publication Date: 2026-06-19INST OF OCEANOLOGY - CHINESE ACAD OF SCI +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: INST OF OCEANOLOGY - CHINESE ACAD OF SCI
Filing Date: 2026-03-13
Publication Date: 2026-06-19

Application Information

Patent Timeline

13 Mar 2026

Application

19 Jun 2026

Publication

CN122244657A

IPC: G06V20/05; G06V20/56; G06V10/82; G06T5/80; G06T5/60; G06T7/73; G06F18/25; G06N3/0464

AI Tagging

Application Domain

Image enhancement Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122244657A_ABST

Patent Text Reader

Abstract

This invention discloses a dynamic monitoring system and method for sea cucumber aquaculture based on an unmanned surface vessel (USV), belonging to the fields of underwater robotics and computer vision. The method includes: acquiring an asynchronous data stream generated by multimodal sensors mounted on the USV; based on the asynchronous data stream, using a dual-loop state estimator, decomposing the USV's six-degree-of-freedom pose estimation into low-frequency global position anchoring and high-frequency local attitude calculation to generate a real-time pose transformation matrix; and performing attitude compensation and coordinate system transformation on the underwater binocular images in the asynchronous data stream according to the real-time pose transformation matrix to generate a 3D point cloud in a unified world coordinate system. This invention decouples the inherent contradiction between low-frequency high-precision positioning and high-frequency attitude calculation by orthogonally decomposing the pose estimation problem. Without relying on an expensive attitude stabilization platform, it eliminates geometric distortion caused by hull swaying, achieving centimeter-level dynamic 3D reconstruction of underwater targets.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of sea cucumber aquaculture technology, and in particular to a dynamic monitoring system and method for sea cucumber aquaculture based on unmanned vessels. Background Technology

[0002] As a high-value aquaculture species, sea cucumber's large-scale farming has created an urgent need for refined management of the underwater environment. Traditional monitoring methods for sea cucumber farming, such as manual observation by divers or fixed underwater cameras, have many limitations, including limited coverage, low efficiency, easy interference with the farming environment, and inability to provide quantitative data. In recent years, unmanned surface vessels (USVs) equipped with underwater vision systems have been introduced for automated monitoring, but they face a core technological bottleneck.

[0003] In existing technologies, unmanned surface vessels (USVs) experience severe roll, pitch, and yaw motions due to wind and waves while navigating on the water surface. This unstable platform attitude leads to significant spatial misalignment and geometric distortion in image sequences captured by underwater cameras. One baseline implementation scheme attempts to directly assign the coordinates of the low-frequency, high-precision positioning system (such as BeiDou RTK) onboard the USV to the underwater camera data through timestamp alignment. However, this scheme ignores the high-frequency attitude changes between two positioning data updates, resulting in significant broom-like distortion and layering in the generated 3D point cloud. The accuracy of the 3D model remains at the meter level or higher, completely failing to meet the centimeter-level accuracy requirements for identifying, counting, and measuring the size of individual sea cucumbers (typically 10-20 cm). Therefore, a fundamental technical contradiction has long existed in this field: how to reconcile the conflict between low-frequency, high-precision global positioning information and high-frequency, easily drifting local attitude information on a dynamically unstable, low-cost platform to achieve low-cost, high-precision underwater dynamic 3D reconstruction. Summary of the Invention

[0004] Firstly, this application provides a method for dynamic monitoring of sea cucumber farming based on unmanned vessels.

[0005] To address the technical problems of low accuracy and severe model distortion in underwater 3D reconstruction caused by neglecting high-frequency attitude changes of unmanned vessels in existing technologies, this application proposes a decoupled fusion and forward prediction attitude compensation method for asynchronous multi-rate sensor data. The core technical concept of this invention lies in deconstructing pose estimation from a single, coupled optimization problem into two orthogonally decomposed tasks that operate independently but collaboratively at different time scales: low-frequency global position anchoring and high-frequency local attitude calculation. This decoupled architecture fundamentally resolves the inherent contradiction between high-precision positioning and high-frequency attitude updates in existing technologies. By constructing a dual-loop state estimator, continuous attitude prediction using high-frequency inertial measurement data ensures the real-time performance and smoothness of attitude calculation; simultaneously, sparse but accurate low-frequency global positioning data is used to periodically correct the prediction results, effectively suppressing the cumulative drift of inertial sensors. Finally, a coordinate transformation matrix based on forward prediction interpolation is used to accurately compensate for high-frequency attitude in each underwater image acquisition, thereby eliminating the geometric distortion introduced by hull rolling at its source.

[0006] This application provides a method for dynamic monitoring of sea cucumber aquaculture based on an unmanned surface vessel (USV), comprising: acquiring an asynchronous data stream generated by multimodal sensors deployed on the USV, the asynchronous data stream including low-frequency global positioning data, high-frequency inertial measurement data, and underwater binocular image data; estimating the six-degree-of-freedom pose of the USV using a dual-loop state estimator based on the asynchronous data stream to generate a real-time pose transformation matrix that is temporally aligned with the underwater binocular image data, wherein the estimation of the six-degree-of-freedom pose includes: performing high-frequency prediction of the USV's attitude based on the high-frequency inertial measurement data to generate a prior attitude state; correcting the prior attitude state based on the low-frequency global positioning data to achieve global position anchoring; and performing attitude compensation and coordinate system transformation on the underwater binocular image data based on the real-time pose transformation matrix to generate a three-dimensional point cloud in a unified world coordinate system.

[0007] Optionally, the asynchronous data stream further includes visual odometry data; the estimation of the six-degree-of-freedom pose further includes: correcting the attitude changes caused by the unmanned vessel's roll and pitch in the prior attitude state based on the visual odometry data.

[0008] Optionally, the step of performing high-frequency attitude prediction on the unmanned vessel based on the high-frequency inertial measurement data to generate a prior attitude state includes: acquiring the latest inertial measurement unit sample and the posterior state vector of the previous moment; applying quaternion kinematic equations to integrate the angular velocity in the latest inertial measurement unit sample to update the attitude quaternion, thereby generating a prior state estimate for the current moment.

[0009] Optionally, the step of correcting the prior attitude state based on the low-frequency global positioning data to achieve global position anchoring includes: converting the latitude, longitude, and altitude coordinates in the low-frequency global positioning data into a position in the local geodetic coordinate system through a universal transverse Mercator projection; constructing a measurement vector containing the position and heading angle in the local geodetic coordinate system; and updating the prior attitude state using an extended Kalman filter to generate a globally corrected posterior pose estimate.

[0010] Optionally, generating a real-time pose transformation matrix that is temporally aligned with the underwater stereo image data includes: acquiring a timestamp of a frame of underwater stereo image data; searching for the poses of the two most recent moments on either side of the timestamp in a historical pose state queue; applying linear interpolation to the translation components of the poses at the two moments and applying spherical linear interpolation to the rotation quaternion components to generate an interpolated pose that is strictly aligned with the timestamp; and constructing the real-time pose transformation matrix based on the interpolated pose.

[0011] Optionally, the step of performing attitude compensation and coordinate system transformation on the underwater binocular image data based on the real-time pose transformation matrix to generate a 3D point cloud in a unified world coordinate system includes: performing stereo correction on the underwater binocular image data to generate a corrected image pair; calculating and generating a disparity map based on the corrected image pair using a stereo matching algorithm; reprojecting the 2D pixel coordinates into 3D coordinate points in the camera coordinate system according to the disparity map and camera intrinsic parameters; and applying the real-time pose transformation matrix to transform the 3D coordinate points in the camera coordinate system to the unified world coordinate system to generate the 3D point cloud.

[0012] Optionally, after generating the three-dimensional point cloud, the method further includes: inputting at least one image from the corrected image pair into a pre-trained underwater target detection model to obtain a two-dimensional bounding box of the sea cucumber target in the image; using the pixel coordinate range of the two-dimensional bounding box, extracting the corresponding three-dimensional point set from the three-dimensional point cloud, and attaching sea cucumber semantic labels to the three-dimensional point set.

[0013] Optionally, before acquiring the asynchronous data stream, the system further includes: performing system initialization and calibration, wherein the initialization and calibration includes: solving and storing the intrinsic parameters, distortion coefficients, and extrinsic parameter transformation matrix between the camera coordinate system and the inertial measurement unit coordinate system of the underwater binocular camera; and determining and storing the static zero bias of the inertial measurement unit.

[0014] Optionally, it also includes: online monitoring of the health status of the multimodal sensors; when the visual quality of any sensor is detected to be lower than a preset threshold or a hardware failure occurs, during the estimation of the six degrees of freedom pose, dynamically adjusting the measurement noise covariance matrix corresponding to the sensor or stopping the use of the sensor's data for status updates.

[0015] Secondly, this application provides a dynamic monitoring system for sea cucumber farming based on unmanned vessels.

[0016] This application provides a dynamic monitoring system for sea cucumber aquaculture based on an unmanned surface vessel (USV), comprising: a multimodal data acquisition module for acquiring asynchronous data streams generated by multimodal sensors deployed on the USV, the asynchronous data streams including low-frequency global positioning data, high-frequency inertial measurement data, and underwater binocular image data; a decoupled attitude fusion and localization module for estimating the six-degree-of-freedom pose of the USV using a dual-loop state estimator based on the asynchronous data streams, to generate a real-time pose transformation matrix that is temporally aligned with the underwater binocular image data, wherein the decoupled attitude fusion and localization module is configured to: perform high-frequency prediction of the USV's attitude based on the high-frequency inertial measurement data to generate a prior attitude state; and correct the prior attitude state based on the low-frequency global positioning data to achieve global position anchoring; and a real-time 3D reconstruction module for performing attitude compensation and coordinate system transformation on the underwater binocular image data according to the real-time pose transformation matrix to generate a 3D point cloud in a unified world coordinate system.

[0017] Thirdly, this application provides a computer device including a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method as described in any of the first aspects.

[0018] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in any of the first aspects.

[0019] This application constructs a dual-loop state estimator by orthogonally decomposing the six-DOF pose estimation problem of an unmanned surface vessel into two independent optimization tasks: low-frequency global position anchoring and high-frequency local attitude calculation. This method effectively fuses asynchronous, multi-rate sensor data and accurately eliminates geometric distortions caused by hull rolling without relying on expensive attitude stabilization platforms. The proposed technical solution reduces the absolute positioning error of 3D reconstruction from the meter level to the centimeter level, meeting the requirements for identifying and measuring individual sea cucumbers. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a schematic diagram of the overall architecture of a sea cucumber farming dynamic monitoring system based on an unmanned vessel, provided as an embodiment of this application.

[0022] Figure 2 This is a flowchart illustrating a method for dynamic monitoring of sea cucumber aquaculture based on an unmanned vessel, as provided in one embodiment of this application.

[0023] Figure 3 This is a schematic diagram of the internal data flow and processing logic of the decoupled attitude fusion and positioning module (M200) in one embodiment of this application.

[0024] Figure 4 This is a schematic diagram of the internal data flow and processing logic of the real-time 3D reconstruction module (M300) in one embodiment of this application. Detailed Implementation

[0025] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0026] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0027] In the description of this application, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc., indicating the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, are only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this application.

[0028] This embodiment provides a three-dimensional dynamic monitoring method for sea cucumber farming based on an unmanned surface vessel (USV). In a specific implementation, this method employs a technique of decoupling and fusing asynchronous multi-rate sensor data and forward prediction attitude compensation. It orthogonally decomposes the six-degree-of-freedom pose estimation problem of the USV into two independent optimization tasks operating at different time scales: low-frequency global position anchoring and high-frequency local attitude calculation. This allows for the accurate fusion of asynchronous, multi-coordinate system data streams from a low-frequency high-precision positioning system, a high-frequency inertial measurement unit, and a high-throughput underwater stereo vision system on an USV whose attitude is constantly changing due to wind and waves, using a low-cost sensor combination. This solves the technical problems of low underwater 3D reconstruction accuracy and severe model geometric distortion caused by ignoring high-frequency attitude changes of the USV in existing technologies. It overcomes the severe spatiotemporal misalignment caused by platform dynamic instability, thereby achieving the beneficial effect of real-time 3D reconstruction and dynamic tracking of individual sea cucumbers and their benthic environment with centimeter-level accuracy.

[0029] Figure 2 A flowchart of a sea cucumber aquaculture dynamic monitoring method based on an unmanned vessel according to one embodiment of this application is shown. (Refer to...) Figure 2 This method may specifically include the following steps: S100: Acquire asynchronous data streams generated by multimodal sensors deployed on the unmanned vessel, the asynchronous data streams including low-frequency global positioning data, high-frequency inertial measurement data, and underwater binocular image data.

[0030] In this embodiment, the physical basis of the entire monitoring system is an unmanned surface vessel (USV) on which a multimodal sensor system is integrated. The multimodal sensor system includes a low-frequency global positioning unit (GPS) for providing global position information, an inertial measurement unit (IMU) for providing high-frequency motion information, and an underwater binocular camera for capturing underwater visual information. These sensors operate at their own independent frequencies and clocks, resulting in asynchronous data streams with different data formats and coordinate system definitions.

[0031] The low-frequency global positioning data originates from a BeiDou receiver that supports Real-time Kinematic (RTK) technology. This receiver can output position information with centimeter-level accuracy, but its data update frequency is low; in a typical configuration, the update frequency is 10 Hz. The output data typically follows the NMEA-0183 protocol and includes information such as longitude, latitude, altitude, and heading angle.

[0032] The high-frequency inertial measurement data originates from an Inertial Measurement Unit (IMU). This unit integrates a three-axis gyroscope and a three-axis accelerometer, enabling it to measure the unmanned vessel's angular velocity and linear acceleration at frequencies far exceeding those of a global positioning unit (e.g., 200 Hz). This data is crucial for describing the unmanned vessel's drastic attitude changes over short time intervals, but its integration results will experience cumulative drift over time.

[0033] The underwater binocular image data originates from a waterproof-encapsulated binocular camera consisting of two cameras with a fixed baseline distance. The camera simultaneously captures image frames from both the left and right perspectives at a medium frequency (e.g., 30Hz).

[0034] To handle these asynchronous data streams, a unified, high-precision time base needs to be established. In one specific embodiment, the operating system kernel of the shipboard computing unit precisely synchronizes with the Pulse Per Second (PPS) signal provided by the global positioning unit via the Precision Time Protocol (PTP). When the system receives a hardware interrupt from any sensor, the interrupt service routine immediately appends a Coordinated Universal Time (UTC) timestamp with microsecond-level precision to the data packet. All timestamped raw data is then sent to three independent circular buffers, depending on its sensor source.

[0035] For example, at a specific point in time, the system might experience the following sequence of events: at timestamp T = 1000.000 milliseconds, a BeiDou RTK data packet (containing latitude, longitude, and altitude) arrives and is stored in the global positioning data buffer; at T = 1000.005 milliseconds, an IMU data packet (containing triaxial angular velocity and triaxial acceleration) arrives and is stored in the inertial measurement data buffer; at T = 1000.010 milliseconds, another IMU data packet arrives; at T = 1000.015 milliseconds, yet another IMU data packet arrives; until T = 1033.333 milliseconds, a pair of stereo image frames (left and right views) arrive and are stored in the image data buffer. This example demonstrates the asynchronous and multi-rate characteristics of the data stream, with all data linked together by a unified timestamp.

[0036] The data acquisition process may also include an online monitoring submodule for the health status of sensors. This submodule performs real-time checks on the raw data entering the buffer, such as checking whether the data format is correct and whether the values are within a reasonable range, to ensure the validity of the data used by subsequent processing modules.

[0037] S200: Based on the asynchronous data stream, a dual-loop state estimator is used to estimate the six-degree-of-freedom pose of the unmanned vessel to generate a real-time pose transformation matrix that is time-aligned with the underwater binocular image data.

[0038] The goal of this step is to estimate the complete six-degree-of-freedom (6-DoF) pose of the unmanned surface vessel in three-dimensional space in real time, namely position (X, Y, Z) and attitude (roll, pitch, yaw). This embodiment employs a decoupled extended Kalman filter (D-EKF) as the dual-loop state estimator. The design concept of this filter is to decompose the pose estimation problem into two loops operating at different frequencies: a high-frequency prediction loop and a low-frequency correction loop.

[0039] S210: Based on the high-frequency inertial measurement data, perform high-frequency prediction of the attitude of the unmanned vessel to generate a priori attitude state.

[0040] This is a high-frequency prediction loop, whose operating frequency is consistent with the sampling frequency of the inertial measurement unit, for example, 200Hz. The goal of this loop is to continuously calculate the attitude of the unmanned vessel using the high dynamic response data provided by the IMU.

[0041] Specifically, this step reads the latest IMU sample from the inertial measurement data buffer, which contains triaxial angular velocity measurements. and triaxial acceleration measurements Simultaneously, it also acquires the posterior state vector of the filter after correction at the previous time step (k-1). The state vector is a multi-dimensional vector containing the complete kinematic state of the unmanned vessel, which includes at least its position. ,speed and quaternions used to represent attitude .

[0042] This prediction step uses the current IMU measurement as input to update the state vector. Attitude is updated by adjusting the angular velocity. This is achieved through integration. To improve integration accuracy, the fourth-order Runge-Kutta method (RK4) is preferred for numerically solving the quaternion kinematic differential equations. The basic form of the equation is: ,in This represents quaternion multiplication. It is performed over a sampling interval. Integrating within (for example, 5 milliseconds) allows us to calculate the predicted attitude quaternion for the current moment. At the same time, using linear acceleration (After deducting the effect of gravity) the velocity and position are updated by integration.

[0043] After this series of calculations, the filter outputs the prior state estimate for the current moment. , and the associated updated prior covariance matrix This prior state is the best estimate of the unmanned vessel's current pose, but it includes accumulated errors introduced by the IMU's own drift. This high-frequency loop ensures that the system can continuously output pose estimates at a frequency of 200Hz, thereby capturing all rapid swaying motions of the hull.

[0044] For example, suppose that at time k-1, the attitude quaternion is (This indicates a rotation of approximately 10 degrees around the Y-axis), the angular velocity measured by the IMU at time k is... rad / s, sampling interval The prediction step calculates the rate of change of the quaternion and integrates it using the Runge-Kutta method to obtain a new attitude quaternion. .this This is a predicted value of the current ship attitude, which will be immediately used by downstream modules. Without external correction, this prediction process will continue at a frequency of 200Hz, and the error in the state vector will accumulate continuously.

[0045] S220: Based on the low-frequency global positioning data, the prior attitude state is corrected to achieve global position anchoring.

[0046] This is a low-frequency correction loop, which operates on an event-driven basis. It is triggered when the system detects a new data packet in the global positioning data buffer. Its operating frequency is consistent with the BeiDou RTK update frequency, for example, 10Hz. The goal of this loop is to utilize the sparse but high-precision global positioning information provided by BeiDou RTK to correct the errors accumulated in the high-frequency prediction loop, preventing unbounded drift in pose estimation results.

[0047] When a new BeiDou RTK data packet arrives, this step first extracts the longitude, latitude, altitude, and heading angle from it. To facilitate Euclidean geometric calculations within a local area, it is necessary to transform the latitude, longitude, and altitude coordinates from the geodetic coordinate system to a local, fixed Cartesian coordinate system. A preferred approach is to use the Universal Transverse Mercator (UTM) projection to convert the latitude, longitude, and altitude coordinates into position coordinates in a local geodetic coordinate system (e.g., the ENU, North-South coordinate system). To ensure that all reconstructed data resides in a unified and fixed coordinate system, when the system is first deployed in aquaculture ponds, it will use the first valid BeiDou RTK positioning point received as the permanent coordinate origin (0,0,0) for that pond and persistently store its latitude, longitude, and altitude coordinates.

[0048] Subsequently, a measurement vector is constructed using the transformed position coordinates and the received heading angle. This measurement vector constitutes a direct observation of the global pose of the unmanned surface vessel.

[0049] Next, the extended Kalman filter update step is performed. This step calculates the measured values. Prior state estimation with the filter The residuals (also known as innovations) between the corresponding components (i.e., the predicted position and heading angle) are then calculated based on the prior covariance matrix. and measurement noise covariance matrix Calculate the optimal Kalman gain This gain determines the extent to which the new measurement is trusted. Finally, using the Kalman gain and residuals, the prior state is... After weighted correction, the posterior state vector after global correction is obtained. and posterior covariance matrix This posterior state It is the optimal estimate of the unmanned vessel's pose at the current moment, which integrates the dynamic information of the IMU and the global accuracy of BeiDou RTK.

[0050] For example, suppose that at time k, the prior position estimate of the filter is... At this moment, a BeiDou RTK data packet arrives, and its converted local coordinates are... Meters. Measurement noise covariance matrix The diagonal elements are set according to the nominal precision of the RTK. The filter will calculate the position residual as... Meters. Based on the prior covariance at that time. Calculate a Kalman gain Then use The formula is used to update the state vector. The updated position... It will be very close. This effectively eliminates the drift error (20-30 cm in this example) caused by IMU integration.

[0051] In an optional embodiment, to further improve the robustness of attitude estimation, especially in scenarios where the unmanned vessel is swaying in place (when the global positioning data remains unchanged) or the global positioning signal is briefly lost, the asynchronous data stream may also include visual odometry data generated by an underwater binocular camera. In this case, the six-DOF pose estimation also includes an additional correction step: based on the visual odometry data, correcting for attitude changes in the prior attitude state caused by the unmanned vessel's roll and pitch.

[0052] Specifically, a visual-inertial odometry algorithm (e.g., ORB-SLAM3) runs in parallel, extracting and matching ORB feature points between consecutive stereo image frames to calculate the relative motion transformation matrix of the camera between two frames. The rotation part of this matrix can be converted into a relative rotation quaternion. This relative rotation quaternion is used as another external measurement input to the extended Kalman filter. It constructs an independent observation model specifically for correcting the roll and pitch angles in the state vector. When visual odometry data arrives, it triggers an update step similar to global positioning data correction, but this update primarily affects the attitude components. This approach achieves orthogonal correction between different sensors in their respective advantageous dimensions; that is, BeiDou primarily corrects position and yaw, while visual odometry primarily corrects roll and pitch, avoiding conflicts between different measurement sources and enhancing the overall system performance.

[0053] For example, suppose an unmanned surface vessel (USV) is stationary on the water's surface but is rolling in place due to waves. In this situation, the BeiDou RTK coordinate readings remain essentially unchanged, providing no attitude update information. However, a sequence of images captured by an underwater binocular camera clearly reflects this rolling motion. The visual odometry module outputs a relative rotation of a non-identity matrix, for example, indicating a 5-degree change in roll angle within 0.1 seconds. This information triggers an update to the EKF (Electronic Kinematics Function), precisely correcting for roll drift that may be caused by IMU integration, ensuring that the attitude estimation remains accurate.

[0054] After the aforementioned dual-loop estimation process is completed, a queue containing historical pose states is maintained in the system memory. However, the underwater stereo image capture timestamps... Typically, the time of the update will not perfectly coincide with the discrete update time of any filter. In order to match an absolutely accurate pose in time for each frame of the image, an interpolation step is required.

[0055] Specifically, when a new frame of underwater binocular image data arrives, the system obtains its precise timestamp. Subsequently, in the historical pose state queue, efficient algorithms such as binary search are used to find the two closest moments to the given timestamp. and ( The poses of (respectively) and ).

[0056] For the translation component (i.e., the position vector) in the pose The result can be calculated using simple linear interpolation. Location at any moment .

[0057] However, for rotational components (i.e., attitude quaternions) Linear interpolation can result in the interpolated quaternion having a non-zero modulus, thus losing the physical meaning of the rotation. Therefore, spherical linear interpolation (SLERP) must be enforced. SLERP guarantees that the interpolation result follows the shortest path (geodesic) on the hypersphere representing all rotations, thus obtaining a physically correct interpolation orientation. .

[0058] Finally, the interpolated position vector and attitude quaternions These components are combined to form a 4x4 homogeneous transformation matrix. This matrix is time-perfectly aligned with the image frame and is the real-time pose transformation matrix that transforms points in the ship (or camera) coordinate system to the world coordinate system. .

[0059] For example, suppose the image timestamp is ms, found historical pose queue ms and ms represents the pose at two time points. The position is The attitude quaternion is (Unit posture). The position is The attitude quaternion is (Rotated 10 degrees around the X-axis). Interpolation scaling factor. .

[0060] Location of linear interpolation calculation: rice.

[0061] Spherical linear interpolation for attitude calculation: The result calculated by this formula will be an attitude quaternion that rotates approximately 5 degrees around the X-axis.

[0062] The final pose transformation matrix This will accurately reflect the moment the unmanned vessel captures the image, i.e. The pose at ms.

[0063] S300: Based on the real-time pose transformation matrix, perform attitude compensation and coordinate system transformation on the underwater binocular image data to generate a three-dimensional point cloud in a unified world coordinate system.

[0064] This step utilizes the precise pose information generated in S200 to perform geometric correction and 3D reconstruction on the original visual data, thereby generating a high-precision 3D environment model in a unified fixed coordinate system.

[0065] S310: Perform stereo correction on the underwater binocular image data to generate a corrected image pair.

[0066] Before performing 3D reconstruction, geometric distortions introduced by the physical characteristics of the camera lens itself must be eliminated. This step involves receiving the original stereo image pairs. Using the camera intrinsic parameter matrix K (containing focal length and principal point coordinates) and distortion coefficients D (containing radial and tangential distortion parameters) pre-calibrated during the system initialization phase (detailed in subsequent step S000), the system calls functions from an image processing library (such as OpenCV) to perform distortion correction on the original image. Subsequently, based on the extrinsic parameters (rotation and translation relationship) between the two cameras obtained during the calibration phase, stereo correction is performed on the distortion-corrected image. The purpose of stereo correction is to achieve perfect horizontal alignment of the epipolar lines of the two images through reprojection transformation. After this step, the output corrected image... In the left image, any pixel has a corresponding matching point on the right image that lies on the same horizontal line. This greatly facilitates subsequent 3D matching algorithms, simplifying it from a two-dimensional search problem to a one-dimensional search problem.

[0067] S320: Based on the corrected image pair, a disparity map is generated by calculating using a stereo matching algorithm.

[0068] The core task of this step is to find the matching point on the corresponding epipolar line in the right image for each pixel in the left image, and to calculate the difference in horizontal pixel coordinates between them, i.e., disparity. The magnitude of disparity is inversely proportional to the depth of that point in the real world.

[0069] This embodiment preferably employs a semi-global block matching (SGBM) algorithm to generate a dense disparity map. The SGBM algorithm not only considers the local information of pixels (by calculating the similarity between small image patches), but also introduces global smoothness constraints through dynamic programming in multiple directions (typically 8 or 16 directions), thereby producing robust matching results even in regions with weak texture. The algorithm's output is a single-channel image of the same size as the left image, i.e., the disparity map D, where the grayscale value of each pixel represents the disparity value at that point.

[0070] S330: Based on the disparity map and camera intrinsic parameters, reproject the two-dimensional pixel coordinates into three-dimensional coordinate points in the camera coordinate system.

[0071] This step iterates through the disparity map. Each pixel in and its corresponding non-zero disparity value Based on the triangulation principle of binocular vision, the three-dimensional coordinates of the pixel in the camera coordinate system (with the optical center of the left camera as the origin) can be recovered. The specific calculation formula is as follows: in, It is the baseline length of the binocular camera (unit: meters). and These are the camera's normalized focal lengths in the x and y directions (in pixels). These are the principal point coordinates of the camera (in pixels). These parameters are all derived from the camera intrinsic parameter matrix pre-calibrated in step S010. Through this calculation, each pixel with an effective disparity value is transformed into a three-dimensional point in the camera coordinate system.

[0072] For example, suppose the calibration parameters of a stereo camera are: baseline length Meters, focal length in the intrinsic parameter matrix Pixels Pixels, principal point coordinates Pixels Pixel. Now, for the pixel coordinates in the left image... For a feature point at a given location, a stereo matching algorithm calculates its disparity value. Pixels. First, the depth value is calculated using parallax: The result indicates that this feature point is 1.2 meters away from the camera lens plane. Subsequently, its X and Y coordinates in the camera coordinate system were calculated: rice; Meters. Therefore, the three-dimensional spatial coordinates corresponding to this pixel are: (Unit: meters). Those skilled in the art can reproduce the conversion process from two-dimensional parallax to three-dimensional coordinates without ambiguity.

[0073] S340: Apply the real-time pose transformation matrix to transform the three-dimensional coordinate points in the camera coordinate system to the unified world coordinate system to generate the three-dimensional point cloud.

[0074] Point cloud obtained from S330 This is within the camera coordinate system that moves along with the unmanned vessel. To construct a globally consistent 3D map, these local point clouds must be transformed into a fixed, unified world coordinate system.

[0075] This step utilizes the real-time pose transformation matrix generated by S200, which is strictly aligned with the current image frame timestamp. By performing a standard homogeneous coordinate transformation, that is, for each three-dimensional point... Perform matrix multiplication: ,in It needs to be augmented to homogeneous coordinate form. .

[0076] After this transformation, all the 3D points reconstructed from images taken at different times, locations, and attitudes are unified into the same world coordinate system. This completely compensates for the geometric distortion introduced by the unmanned vessel's swaying. Finally, all the transformed points... When combined, these elements form a local 3D point cloud with precise attitude compensation in a fixed world coordinate system. To facilitate subsequent processing, the two-dimensional matrix topology of the point cloud will be strictly maintained in memory, allowing it to be indexed by pixel coordinates like an image.

[0077] In an optional embodiment, to achieve identification and quantitative analysis of individual sea cucumbers, after generating the 3D point cloud, the method further includes a semantic annotation step: S350: Input at least one image from the corrected image pair into a pre-trained underwater target detection model to obtain a two-dimensional bounding box of the sea cucumber target in the image.

[0078] The semantic segmenter integrates a pre-trained underwater sea cucumber target detection model and is configured to receive the corrected left-view image generated by step S310. The data format is an unsigned 8-bit integer matrix with width x height x 3 channels. The model's detailed architecture, preprocessing mechanism, and training strategy will be elaborated in detail below. Forward inference is performed using a unified convolutional neural network. Based on this forward inference process, the model outputs a structured list containing information on all detected sea cucumber targets in the image. Each list element is a data structure that contains at least the pixel coordinates of the target's two-dimensional bounding box. And a floating-point score representing the confidence level of the detection.

[0079] The underwater sea cucumber target detection model used in this embodiment is designed to accurately and in real-time identify all sea cucumber targets from a single input image in underwater environments with uneven lighting, turbid water, and complex backgrounds (such as seaweed, silt, and shells). The model consists of an image preprocessing module, a deep convolutional neural network based on the YOLOv8s architecture, and a specific training strategy.

[0080] Due to wavelength-selective absorption (red light attenuates much more than blue and green light) and scattering by suspended particles when light propagates underwater, raw underwater images generally suffer from low contrast, color distortion, and blurred details. To compensate for these physical degradations, an image preprocessing module is configured to execute an improved Underwater Dark Channel Prior (UDCP) algorithm before the image enters the neural network. This algorithm first needs to estimate the global underwater background light. and medium transmittance Unlike traditional dark channel prior algorithms applied to land images, this embodiment incorporates adaptive improvements for underwater environments: Considering the non-uniform attenuation of light at different wavelengths, this module estimates independent background light parameters for each of the RGB three channels. (in Specifically, for each channel, the top 0.1% of pixels with the highest brightness values in its dark channel image are selected, and the average value of these pixels in the corresponding channel of the original image is calculated as the background light for that channel. This measure can effectively compensate for the overall blue-green tint caused by severe red light attenuation.

[0081] In estimating transmittance At this time, a nonlinear compensation factor based on local depth information of the image is introduced. This factor corrects the problem that traditional dark channel priors tend to underestimate transmittance on near targets (where the water medium is thin), thus avoiding over-enhancement and color distortion in the recovered image at close range.

[0082] Based on the estimated background light and transmittance This module applies the following color channel separation image restoration formula to generate an image with balanced color and enhanced contrast. , to be used as input to the neural network: in, The original image in the channel pixel values, It is a lower limit of transmittance set to prevent the denominator from being zero (exemplarily, it can be set to 0.1, dimensionless).

[0083] The model adopts the YOLOv8s network architecture, which can be decomposed into three core functional components: the backbone network, the neck network, and the head.

[0084] Input tensor definition: The image output by the preprocessing module is normalized to 640x640 pixels and converted into a tensor of shape... The four-dimensional floating-point tensor, where 3 represents the batch size, and 3 represents the RGB color channels.

[0085] Backbone: This component is responsible for multi-scale feature extraction. Its structure mainly consists of multiple stacked C2f (Cross-Stage Partial-like module with 2 convolutions) modules, effectively extracting hierarchical features from the input image tensor from coarse to fine. The input is a preprocessed image tensor; after a series of convolutions and downsampling operations, the output is feature maps with three different spatial resolutions. Their shapes are respectively ,in This represents the number of channels for the corresponding layer. These feature maps capture the semantic information of targets of different sizes in the image.

[0086] Neck Network: This component is configured as a Path Aggregation Network (PANet) structure, its function being to fuse multi-scale features output from the backbone network. It includes a top-down upsampling path for passing strong semantic features from higher levels to lower levels, and a bottom-up downsampling path for passing strong localization features from lower levels to higher levels. This bidirectional feature fusion generates a fused feature map. It combines rich semantic information with precise spatial details, thereby improving the ability to detect sea cucumbers of different sizes.

[0087] Detection Head: This component is configured as an anchor-free, decoupled head structure, making predictions directly on the fused feature map output by the neck network. It contains a classification branch and a regression branch, used to predict the target class and bounding box location, respectively. The final output is a list of tensors, each corresponding to a feature map scale, whose shape encodes the predicted bounding box coordinates (4 values, typically center point coordinates and width and height), target confidence (1 value), and class probability (1 value for the single-class sea cucumber detection in this application) at each grid cell.

[0088] Dataset Construction: At least 10,000 raw underwater images were collected using an unmanned surface vessel under different aquaculture ponds, lighting conditions, and water turbidity levels. Precise bounding boxes were then manually labeled for sea cucumbers in the images. Data augmentation techniques were then used to expand the dataset to 50,000 images. Data augmentation operations included... Random rotation within the range, Random scaling by a factor of 1, and The brightness and contrast jitter are doubled.

[0089] Loss Function: The model training is driven by a multi-task loss function, which consists of three weighted components. The classification loss uses Binary Cross-Entropy Loss (BCE Loss) to determine whether the predicted bounding box contains the target. The regression loss uses Complete Intersection over Union (CIoU) Loss, which simultaneously considers the overlap area between the predicted and ground truth bounding boxes, the distance between their center points, and the aspect ratio, resulting in more stable bounding box regression results. Additionally, Distribution Focal Loss (DFL) is introduced to allow the network to focus more quickly on the numerical distribution near the ground truth bounding box coordinates.

[0090] Training parameters: The AdamW optimizer was used for model training, and its initial learning rate was set to... The momentum parameter is 0.937, and the weighted decay coefficient is... During training, a cosine annealing learning rate scheduling strategy was employed to ensure stable convergence in the later stages of training. The entire training process was conducted on a server configured with an NVIDIA A100 GPU, with a batch size of 32 and a total of 100 epochs.

[0091] S360: Using the pixel coordinate range of the two-dimensional bounding box, extract the corresponding three-dimensional point set from the three-dimensional point cloud, and attach sea cucumber semantic tags to the three-dimensional point set.

[0092] Due to the local point cloud generated in S340 The system retains a two-dimensional raster mapping relationship completely identical to the original image in memory. Therefore, the system can directly utilize the pixel coordinate range of the two-dimensional bounding box output by the S350 as a matrix slicing index, efficiently extracting data from the structured raster. Extract all 3D points that belong to the bounding box from the matrix.

[0093] The extracted 3D point subset is then assigned a semantic tag for the sea cucumber. The system can further analyze this point cloud subset, for example, by calculating its length, width, and height using Principal Component Analysis (PCA), estimating its volume using Poisson surface reconstruction or the Alpha Shape algorithm, and calculating its centroid coordinates in the world coordinate system. Ultimately, the system outputs a structured semantic point cloud block, which contains not only the geometric information of the environment but also the semantic information and quantified attributes of key targets.

[0094] Before the entire monitoring process begins, in order to ensure the accuracy and reliability of the algorithm, the method also includes a system initialization and calibration step (S000): S010: Solve and store the intrinsic parameters, distortion coefficients, and extrinsic parameter transformation matrix between the camera coordinate system and the inertial measurement unit coordinate system of the underwater binocular camera.

[0095] This step is typically performed in a controlled environment on land before the unmanned surface vessel (USV) is launched. Using a standard checkerboard calibration board, a series of images of the calibration board are acquired from different angles and distances, and an open-source camera-IMU joint calibration toolkit (e.g., Kalibr) is run. This toolkit can simultaneously and with high accuracy solve for the intrinsic parameter matrices of the stereo cameras. Distortion coefficient And the rigid body transformation relationship between the two camera coordinate systems and the IMU coordinate system (i.e., the extrinsic parameter transformation matrix). These calibration parameters are the cornerstone of all subsequent attitude estimation and 3D reconstruction algorithms. They are stored in a YAML-formatted configuration file on the shipboard computing unit for loading when the system starts up.

[0096] S020: Measure and store the static zero bias of the inertial measurement unit.

[0097] When stationary, the output of an IMU sensor is not strictly zero, but exhibits a small bias that varies with environmental factors such as temperature. To accurately subtract this error in dynamic estimation, online zero-bias estimation is required after each system startup. A simplified method is to allow the unmanned surface vessel to remain stationary for several minutes after launch. During this period, the system continuously acquires raw angular velocity and linear acceleration data from the IMU and calculates the average of the data over this time. This average is considered the static zero bias of the sensor at that operating temperature. A more precise approach is to incorporate the zero bias as part of the extended Kalman filter's state vector, performing online dynamic estimation and updates during operation. The calculated initial zero bias value is used as the initial value for the zero bias state vector in the filter.

[0098] To improve the system's robustness and ability to handle various real-world anomalies, this method also includes a health monitoring and adaptive adjustment mechanism: S400: Monitor the health status of the multimodal sensors online; when the visual quality of any sensor is detected to be lower than a preset threshold or a hardware failure occurs, dynamically adjust the measurement noise covariance matrix corresponding to the sensor or stop using the sensor's data for status update during the estimation of the six degrees of freedom pose.

[0099] For example, the system incorporates a visual quality monitoring mechanism. In areas with extremely dense aquatic plants or abnormally turbid water, the visual odometry may fail due to feature point tracking failures. This mechanism calculates the feature point tracking success rate of the visual odometry in real time. When this success rate remains below a preset threshold for an extended period, the system determines that the visual measurement information is unreliable and automatically and dynamically increases the measurement noise covariance matrix of the visual odometry in the EKF update step. This reduces the filter's reliance on visual measurements, making it depend more on IMU and global localization data, thus preventing the pose estimation from diverging due to erroneous visual information.

[0100] Another example is sensor hardware fault avoidance. The sensor health monitor within the M100 module performs online statistical analysis of the data stream. For instance, it calculates the variance of the IMU readings within a 1-second sliding window. If the variance is below a very small threshold (indicating a constant sensor output, i.e., stuck), or above a very large threshold (indicating pure noise from the sensor output), the IMU's health status flag is set to fault. Upon detecting this flag, the S200 module immediately stops using IMU data for EKF prediction-update loops to prevent erroneous data from contaminating the status estimate and triggers the highest level hardware fault alarm.

[0101] Through the coordinated efforts of the above series of steps, the method proposed in this application can achieve centimeter-level precision three-dimensional dynamic monitoring of the underwater aquaculture environment on a dynamically unstable unmanned vessel platform.

[0102] It should be noted that the core of this embodiment, namely the dual-loop state estimator used in S200, is specifically implemented as a decoupled extended Kalman filter.

[0103] The state vector of the filter Defined as a 16-dimensional column vector, its structure is as follows: This vector fully describes the kinematic state of the unmanned vessel, where It is the three-dimensional position of the unmanned vessel in a world coordinate system (e.g., ENU coordinate system), and the unit is meters (m). It is its three-dimensional velocity, with the unit being meters per second (m / s). It is a dimensionless attitude quaternion representing the rotation from the world coordinate system to the ship's coordinate system, and satisfies the following conditions: Constraints; It is the zero bias estimate of the accelerometer on the three axes, in meters per square second (m / s²). This is the zero-bias estimate of the gyroscope across its three axes, expressed in radians per second (rad / s). Incorporating the zero-bias as part of the state vector allows the filter to estimate and compensate for sensor drift online.

[0104] Filter initialization: To address the cold start issue, the filter performs the following initialization procedure after system startup: The system waits until it receives the first valid frame of BeiDou RTK data. The state vector is then initialized using the position and heading angle from this RTK data. The corresponding position in and attitude quaternions Components. Velocity component. IMU zero bias component and All values are initialized to zero. A relatively large initial covariance diagonal matrix is also set. This reflects the high degree of uncertainty in the initial state. For example, The diagonal elements can be set to: the value corresponding to the position component is 10.0. The value corresponding to the velocity component is 1.0. The value corresponding to the attitude component is 0.1. The value corresponding to the zero bias component is 0.01. This large initial uncertainty allows the filter to converge faster in the initial stage and trust external measurements.

[0105] Prediction steps (high-frequency cycling, 200Hz): This step is performed for each new IMU sample. drive.

[0106] State propagation: The function here It is a nonlinear rigid body kinematics model based on the Newton-Euler equations. Specifically, the position update formula is: ,in It is the posture quaternion from the previous moment. The resulting rotation matrix It is the gravity vector. The velocity update formula is: The attitude is updated as described in S210 by adjusting the angular velocity after deducting zero bias. Integrating, we obtain the result. The zero-biased state is assumed to be constant in this step, i.e. and .

[0107] Covariance propagation: .in It is a state transition function For the state vector The Jacobian matrix, in The linearization is performed at the point to obtain the result. This is the process noise covariance matrix, which models the noise and zero-bias random walk of the IMU sensor. The values of its diagonal elements can be calibrated by performing an Allan variance analysis on the static data of the IMU. For example, the gyroscope random walk noise value of a consumer-grade IMU can be set to... The accelerometer random walk noise value can be set to rad / s / √Hz. m / s² / √Hz. These values are discretized and then filled in. The diagonal of a matrix.

[0108] Update steps (event-driven, 10Hz or 30Hz): This step is triggered by the arrival of external measurement data (BeiDou RTK or visual odometry).

[0109] BeiDou RTK Update: When a BeiDou RTK data packet arrives, it provides measurement vectors. .

[0110] Observation model: It is a state vector A function that maps to the measurement space. It directly extracts the position component from the state vector. The attitude quaternion is then converted into Euler angles, and the yaw angle is extracted from it.

[0111] Jacobian matrix: It is an observation model For the state vector The Jacobian matrix.

[0112] Noise measurement: This is the measurement noise covariance matrix. For BeiDou RTK, its diagonal elements are set according to its officially stated accuracy, for example, diag(R_gps) = [0.05^2, 0.05^2, 0.1^2, (1^\circ \cdot \pi / 180)^2], with units of... and .

[0113] Visual odometry update: When the visual odometry outputs a relative rotation quaternion As a measurement vector .

[0114] Observation model: The calculation method is relatively complex. It needs to calculate the arrival time of the previous VIO data based on the state prediction. Up to the current VIO data arrival time The expected relative rotation between them. Its specific expression is as follows: ,in It is an attitude quaternion extracted or interpolated from the state history of the filter.

[0115] Residual calculation: Since the rotation space is not a linear space, the residuals cannot be simply subtracted, but are calculated using quaternion multiplication. .

[0116] Kalman gain, state, and covariance updates: Calculate the new information covariance: Calculate the Kalman gain: Update the state vector: (For VIO updates, the residual calculation method is as described above.) Update the covariance matrix: Wild point removal: To enhance the robustness of the filter and prevent erroneous measurement data (outliers) from contaminating the state estimation, the system introduces a threshold detection mechanism based on Mahalanobis distance. For each new measurement... Before performing the update step, first calculate the square of its Mahalanobis distance: This value follows a chi-square distribution with degrees of freedom equal to the dimension of the measurement vector. The system presets a confidence interval, for example, 95%. If the calculated... If the value is greater than the 95th percentile of the chi-square distribution for the corresponding degrees of freedom, the measurement is determined to be an outlier and is discarded without performing the update step.

[0117] Another embodiment of this application provides a three-dimensional dynamic monitoring system for sea cucumber farming based on an unmanned vessel, such as... Figure 1 As shown, the system can be used to perform the methods provided in any of the above embodiments.

[0118] In one specific embodiment, the system consists of an onboard computing unit deployed on an unmanned vessel and a remote cloud analytics platform. The onboard computing unit is the core of the system, and its physical carrier can be a high-performance embedded computing platform, such as the NVIDIA Jetson AGX Orin. Logically, this computing unit includes the following four core functional modules: M100: Multimodal Data Acquisition Module. This module is responsible for directly interacting with various sensor hardware deployed on the unmanned surface vessel. It contains several sub-components: a high-precision time synchronizer that synchronizes with the GPS pulse-of-seconds signal via the PTP protocol, providing a unified, microsecond-level timestamp for all incoming data packets; a multi-channel data acquisition unit that manages the drivers for devices such as the BeiDou RTK receiver, IMU sensors, and binocular cameras, responsible for reading their raw data streams (e.g., GGA statements in NMEA-0183 format, binary IMU data frames containing angular velocity and acceleration, and image frames from the left and right views); and a sensor health monitor that performs online statistical analysis and validity checks on the data stream, for example, monitoring the variance of IMU readings within a sliding window; if the variance is too low, the sensor is considered stuck; if it is too high, it is considered pure noise. This module provides upstream modules with a raw, timestamped sensor data stream and a structure containing health status flags for each sensor via Remote Procedure Call (RPC) or a publish / subscribe (Pub / Sub) message queue (e.g., ROS2-based topics).

[0119] M200: Decoupled Attitude Fusion and Localization Module. This module is responsible for executing the decoupled extended Kalman filter algorithm in S200. The computational tasks of this module are highly optimized to adapt to the resource constraints of the embedded platform; EKF matrix operations are configured to be accelerated in parallel using CUDA cores on the GPU. Internally, the module logically comprises two main sub-components: an IMU integrator running in a separate high-priority thread at 200Hz, dedicated to executing the high-frequency prediction steps in S210, ensuring the real-time and continuous nature of the attitude output; and a multi-source corrector implemented as an event-driven state machine that subscribes to BeiDou RTK data and visual odometry data published by M100 via a callback function mechanism. Upon receiving new valid data, the state machine is immediately awakened, triggering S220 or the corresponding visual correction update step. The key data structure in memory for this module is a state vector and covariance structure, containing a 16-dimensional array of state vectors and a 16x16 covariance matrix. This module eventually sends a message containing real-time six-degree-of-freedom pose information (e.g., a ROS2 message of type nav_msgs / Odometry) to other modules in the system at a frequency of 200Hz.

[0120] M300: Real-time 3D Reconstruction Module. This module is responsible for executing all steps in S300, transforming sensor data into a meaningful 3D environment model. This module relies on the GPU computing resources of the embedded platform. Internally, it contains several serial processing sub-components: an image preprocessor that performs image distortion correction and stereo rectification (S310); a stereo matching engine that efficiently executes the SGBM algorithm (S320) on the GPU; a point cloud projector that performs reprojection (S330) and coordinate transformation (S340), where matrix multiplication is also performed on the GPU; and a semantic segmenter that loads a pre-trained YOLOv8 model and performs forward inference on the GPU, executing point cloud segmentation and annotation (S350 and S360). This module processes and outputs a semantic point cloud block data object. This object is preferably encoded using an efficient binary serialization format (such as Protocol Buffers), and its structure is defined as: containing a metadata header (64-bit UTC timestamp, UAV ID, 6-DoF pose) and a data volume. The data body is an N×8 single-precision floating-point matrix, where each row represents a point and the eight columns are defined as [X, Y, Z, R, G, B, semantic tag ID, detection confidence].

[0121] M400: Data Compression and Upload Module. This module receives semantic point cloud blocks generated by the M300 and compresses them using an efficient 3D data compression algorithm (e.g., Google Draco algorithm) to reduce the amount of data required for network transmission. Subsequently, this module uploads the compressed data packets to the cloud analysis platform via the ship's onboard 4G or 5G communication module.

[0122] A cloud-based analytics platform. This platform is responsible for storing, integrating, and analyzing data uploaded from one or more unmanned vessels, ultimately constructing a 4D digital twin model covering the entire aquaculture pond. The platform continuously decompresses the received semantic point cloud blocks and inserts them into a four-dimensional data structure based on a sparse voxel grid, according to their spatiotemporal labels. For spatially overlapping voxels, a weighted average or the latest data is used for updating. Finally, the platform provides farmers with rich visualization and analysis functions through a web front-end, such as querying sea cucumber parameter variation curves for any area over any time period, generating sea cucumber distribution density heatmaps, and configuring early warning rules for abnormal aquaculture density, providing decision support for smart aquaculture.

[0123] This application also provides a computer device, which may be a server, a personal computer (PC), a tablet computer, a smartphone, etc. The computer device may include a processor and a memory, with a computer program stored in the memory. The processor executes the computer program stored in the memory to implement the methods described in any of the above embodiments.

[0124] This application also provides a computer-readable storage medium storing a computer program. When executed by a processor, the computer program can implement the methods described in any of the above embodiments. The computer-readable storage medium may be a read-only memory (ROM), a random access memory (RAM), an optical disk, or a magnetic disk, etc.

[0125] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

[0126] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit described above can be implemented in hardware.

[0127] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for dynamic monitoring of sea cucumber aquaculture based on unmanned vessels, characterized in that, include: Acquire asynchronous data streams generated by multimodal sensors deployed on an unmanned vessel, the asynchronous data streams including low-frequency global positioning data, high-frequency inertial measurement data, and underwater binocular image data; Based on the asynchronous data stream, a dual-loop state estimator is used to estimate the six-degree-of-freedom pose of the unmanned vessel to generate a real-time pose transformation matrix that is temporally aligned with the underwater binocular image data. The estimation of the six-degree-of-freedom pose includes: Based on the high-frequency inertial measurement data, the attitude of the unmanned vessel is predicted at high frequency to generate a prior attitude state. Based on the low-frequency global positioning data, the prior attitude state is corrected to achieve global position anchoring; Based on the real-time pose transformation matrix, the underwater binocular image data is subjected to pose compensation and coordinate system transformation to generate a three-dimensional point cloud in a unified world coordinate system.

2. The method according to claim 1, characterized in that, The asynchronous data stream also includes visual odometry data; The estimation of the six-degree-of-freedom pose also includes: Based on the visual odometry data, the attitude changes caused by the unmanned vessel's roll and pitch in the prior attitude state are corrected.

3. The method according to claim 1, characterized in that, The step of performing high-frequency attitude prediction on the unmanned vessel based on the high-frequency inertial measurement data to generate a prior attitude state includes: Obtain the latest inertial measurement unit sample and the posterior state vector of the previous time step; The quaternion kinematic equations are applied to integrate the angular velocities in the latest inertial measurement unit sample to update the attitude quaternions, thereby generating a priori state estimate for the current moment.

4. The method according to claim 1, characterized in that, The step of correcting the prior attitude state based on the low-frequency global positioning data to achieve global position anchoring includes: The latitude, longitude, and elevation coordinates in the low-frequency global positioning data are converted into positions in the local geodetic coordinate system using a universal transverse Mercator projection. A measurement vector containing the position and heading angle in the local geodetic coordinate system is constructed, and the prior attitude state is updated using an extended Kalman filter to generate a globally corrected posterior pose estimate.

5. The method according to claim 1, characterized in that, Generating a real-time pose transformation matrix that is temporally aligned with the underwater binocular image data includes: Get the timestamp of a frame of underwater binocular image data; In the historical pose state queue, find the poses of the two most recent moments on either side of the timestamp; Linear interpolation is used for the translation components of the poses at the two time points, and spherical linear interpolation is used for the rotation quaternion components to generate interpolated poses that are strictly aligned with the timestamps, and the real-time pose transformation matrix is constructed based on the interpolated poses.

6. The method according to claim 1, characterized in that, The step of performing attitude compensation and coordinate system transformation on the underwater binocular image data based on the real-time pose transformation matrix to generate a 3D point cloud in a unified world coordinate system includes: The underwater binocular image data is stereo-corrected to generate corrected image pairs; Based on the corrected image pair, a disparity map is generated by a stereo matching algorithm; Based on the disparity map and camera intrinsic parameters, the two-dimensional pixel coordinates are reprojected into three-dimensional coordinate points in the camera coordinate system; The real-time pose transformation matrix is applied to transform the three-dimensional coordinate points in the camera coordinate system to the unified world coordinate system to generate the three-dimensional point cloud.

7. The method according to claim 6, characterized in that, After generating the 3D point cloud, the process also includes: At least one image from the corrected image pair is input into a pre-trained underwater target detection model to obtain a two-dimensional bounding box of the sea cucumber target in the image; Using the pixel coordinate range of the two-dimensional bounding box, the corresponding three-dimensional point set is extracted from the three-dimensional point cloud, and sea cucumber semantic tags are attached to the three-dimensional point set.

8. The method according to claim 1, characterized in that, Before acquiring the asynchronous data stream, the process also includes: The system is initialized and calibrated, which includes: solving and storing the intrinsic parameters and distortion coefficients of the underwater binocular camera, as well as the extrinsic parameter transformation matrix between the camera coordinate system and the inertial measurement unit coordinate system; and determining and storing the static zero bias of the inertial measurement unit.

9. The method according to claim 1, characterized in that, Also includes: Monitor the health status of the multimodal sensor online; When the visual quality of any sensor is detected to be below a preset threshold or a hardware failure occurs, during the estimation of the six degrees of freedom pose, the measurement noise covariance matrix corresponding to the sensor is dynamically adjusted or the data from the sensor is stopped for state updates.

10. A dynamic monitoring system for sea cucumber aquaculture based on unmanned vessels, characterized in that, include: The multimodal data acquisition module is used to acquire asynchronous data streams generated by multimodal sensors deployed on the unmanned vessel. The asynchronous data streams include low-frequency global positioning data, high-frequency inertial measurement data, and underwater binocular image data. A decoupled attitude fusion and localization module is used to estimate the six-degree-of-freedom pose of the unmanned vessel based on the asynchronous data stream using a dual-loop state estimator, in order to generate a real-time pose transformation matrix that is temporally aligned with the underwater binocular image data. The decoupled attitude fusion and localization module is configured to: perform high-frequency prediction of the unmanned vessel's attitude based on the high-frequency inertial measurement data to generate a prior attitude state; and correct the prior attitude state based on the low-frequency global positioning data to achieve global position anchoring. The real-time 3D reconstruction module is used to perform attitude compensation and coordinate system transformation on the underwater binocular image data according to the real-time pose transformation matrix, so as to generate a 3D point cloud in a unified world coordinate system.