Autonomous positioning method and system for underwater robot, electronic device, storage medium, program product
By combining positioning methods using standard cameras and event cameras, and performing time smoothing and maximum density compensation, the problem of low positioning accuracy of underwater robots was solved, enabling high-precision autonomous positioning in harsh environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INST OF AUTOMATION CHINESE ACAD OF SCI
- Filing Date
- 2025-08-14
- Publication Date
- 2026-06-19
AI Technical Summary
Existing underwater robot positioning methods suffer from problems such as cumbersome installation, limited coverage, and error accumulation, making it difficult to achieve high-precision positioning, especially when DVL underwater tracking fails.
A method combining standard and event cameras is adopted. By acquiring standard frames and event streams, time smoothing compensation and density maximum compensation are performed. The pose information of the underwater robot, including visual error and motion sensor measurement error, is calculated by minimizing the joint error.
It improves the positioning accuracy of underwater robots, reduces the limitations of ambient lighting conditions, enables the extraction and tracking of feature points in extremely harsh environments, and has better robustness.
Smart Images

Figure CN120997287B_ABST
Abstract
Description
Technical Field
[0001] This disclosure generally relates to the field of underwater robots, and more specifically, to an underwater robot autonomous positioning method and system, electronic equipment, storage medium, and program products. Background Technology
[0002] With the rapid development of marine resource exploration and marine scientific research, underwater robots play a crucial role in underwater exploration and operations. Accurate positioning of underwater robots is a prerequisite for their autonomous underwater operations. Currently, mature underwater robot positioning methods mainly include acoustic positioning methods and trajectory estimation methods.
[0003] Acoustic localization methods primarily rely on acoustic localization devices such as ultra-short baseline (USBL) and long baseline (LBL). However, this method requires the pre-deployment of acoustic devices external to the underwater robot. The installation process is cumbersome, the coverage area is limited, and it severely reduces the localization capability of autonomous underwater robots.
[0004] While the trajectory estimation method combining IMU (Inertial Measurement Unit) and DVL (Doppler VelocityLog) can achieve autonomous localization for underwater robots, it suffers from error accumulation, making it difficult to achieve long-term high-precision positioning. In particular, it often struggles to effectively correct errors when DVL underwater tracking fails. Summary of the Invention
[0005] This disclosure provides an autonomous positioning method for an underwater robot, used to solve at least one of the above-mentioned problems.
[0006] According to a first aspect of the present disclosure, an autonomous localization method for an underwater robot is provided, comprising: acquiring a standard frame captured by a standard camera and an event stream captured by an event camera, wherein the event stream includes multiple events, each event representing a pixel whose brightness change exceeds a set value at a given moment; aggregating multiple consecutive events in the event stream into an original event frame according to a preset rule; performing time smoothing compensation processing and density maximization compensation processing on the original event frame to obtain a motion-compensated event frame, wherein the time smoothing compensation processing is used to reduce the change gradient of events in the same event frame in the time dimension, and the density maximization compensation processing is used to increase the distribution density of events in the same event frame in the spatial dimension; and obtaining the pose information of the underwater robot by minimizing the joint error, wherein the joint error includes motion sensor measurement error and visual errors of the motion-compensated event frame and the standard frame, the visual error including a reprojection error term, the motion sensor measurement error including an inertial error term, a velocity error term, and an acoustic odometry error term, and the pose information including position, motion speed, and attitude.
[0007] Optionally, the step of aggregating multiple consecutive events in the event stream into an original event frame according to a preset rule includes: determining the aggregation number based on the event generation speed and the movement speed of the underwater robot, wherein the event generation speed represents the speed at which the event stream generates events, the aggregation number is positively correlated with the event generation speed and negatively correlated with the movement speed; and aggregating the events in the event stream that are before the acquisition time of the current standard frame into the original event frame.
[0008] Optionally, the number of aggregates is calculated using the following formula: in, This indicates the number of original event frames to be aggregated. This indicates the number of aggregates from the previous original event frame. This represents the rate of change of the slice caused by the event stream. This indicates the acquisition period of the standard frame. Indicates the current moment. This indicates the earliest trigger time of the events that will be aggregated. Indicates the velocity suppression coefficient. Indicates reference speed. This indicates the speed of the movement.
[0009] Optionally, the time smoothing compensation process includes: using gradient descent to optimize the iterative motion compensation parameters to obtain coarsely selected motion compensation parameters, wherein the objective function of the gradient descent method is the spatial gradient of the average time frame, the value of each pixel in the average time frame is the average event time of the corresponding pixel in the event frame after motion compensation, and the average event time is the average acquisition time of the events aggregated at the corresponding pixel.
[0010] Optionally, the density maximization compensation process includes: with the goal of maximizing event pixel density, performing random sampling optimization of motion compensation parameters based on the coarsely selected motion compensation parameters to obtain the final motion compensation parameters, wherein the event pixel density represents the density of events aggregated on pixels in the event frame.
[0011] Optionally, the reprojection error term is obtained by performing feature tracking processing on the corresponding frame and based on the error between the depth of the successfully tracked feature point in the corresponding frame and the image coordinate point corresponding to the feature point. The feature tracking processing includes: tracking existing feature points using optical flow, supplementing new feature points using a corner detector, and removing abnormal feature points using a random sampling consensus algorithm. The depth of the feature point and the image coordinate point corresponding to the feature point are obtained by triangulating the continuously tracked feature points. The inertial error term is the deviation between the current pose detection value and the current pose prediction value. The current pose prediction value is obtained by pre-integrating multiple historical pose detection values and based on the pre-integration result and the previous pose detection value of the current pose detection value. The detection time of each historical pose detection value is between the acquisition time of the current standard frame and the previous standard frame; the velocity error term is the deviation between the current velocity detection value and the current velocity estimate, wherein the current velocity detection value is the velocity component in the current pose detection value, the pose detection value is in the world coordinate system, and the current velocity estimate is obtained by transforming the current relative velocity detection value in the robot coordinate system to the world coordinate system; the acoustic odometry error term is the deviation between the current relative displacement change and the current relative displacement estimate change, the current relative displacement change is the relative displacement change between the current pose detection value and the pose detection value of the previous standard frame, and the current relative displacement estimate change is obtained by track extrapolation from the acoustic odometry detection data.
[0012] According to a second aspect of the present disclosure, an electronic device is provided, comprising: at least one processor; and at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform an underwater robot autonomous localization method according to an exemplary embodiment of the present disclosure.
[0013] According to a third aspect of the present disclosure, an underwater robot autonomous positioning system is provided, comprising: a data acquisition device including a standard camera, an event camera, an inertial measurement unit, a velocity measurement unit, and an acoustic odometry unit; and an electronic device according to exemplary embodiments of the present disclosure.
[0014] According to a fourth aspect of the present disclosure, a computer-readable storage medium is provided, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause at least one processor to perform an underwater robot autonomous localization method according to an exemplary embodiment of the present disclosure.
[0015] According to a fifth aspect of the present disclosure, a computer program product is provided, including computer instructions that, when executed by at least one processor, cause at least one processor to perform an underwater robot autonomous localization method according to an exemplary embodiment of the present disclosure.
[0016] The technical solutions provided by the embodiments of this disclosure offer at least the following beneficial effects: Based on the underwater robot autonomous localization method and system, electronic device, storage medium, and program product of this disclosure, the large localization error caused by DVL tracking failure can be compensated for by visual sensors (i.e., standard camera and event camera), resulting in better localization accuracy. Furthermore, by using an event camera, the limitations imposed by ambient lighting conditions when using only a standard camera can be reduced, enabling feature point extraction and tracking in extremely dark and strongly reflective underwater environments, thus exhibiting better robustness.
[0017] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0018] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure, and are not intended to unduly limit this disclosure.
[0019] Figure 1 This is a flowchart of an underwater robot autonomous localization method according to an exemplary embodiment of the present disclosure.
[0020] Figure 2 This is a schematic diagram comparing the number of aggregates with the acquisition period of a standard frame according to an exemplary embodiment of this disclosure.
[0021] Figure 3 This is a schematic diagram illustrating the calculation principle of joint error according to an exemplary embodiment of the present disclosure.
[0022] Figure 4This is a logical schematic diagram of an underwater robot autonomous localization method according to an exemplary embodiment of the present disclosure.
[0023] Figure 5 This is a block diagram of an electronic device according to exemplary embodiments of the present disclosure.
[0024] Figure 6 This is a schematic diagram of an underwater robot autonomous positioning system according to an exemplary embodiment of the present disclosure. Detailed Implementation
[0025] To enable those skilled in the art to better understand the technical solutions of this disclosure, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings.
[0026] It should be noted that the terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following examples do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.
[0027] It should be noted that the phrase "at least one of several items" in this disclosure refers to three parallel cases: "any one of the several items", "a combination of any number of the several items", and "all of the several items". For example, "including at least one of A and B" includes the following three parallel cases: (1) including A; (2) including B; (3) including A and B. Another example is "performing at least one of step one and step two", which means the following three parallel cases: (1) performing step one; (2) performing step two; (3) performing both step one and step two.
[0028] The following will describe in detail, with reference to the accompanying drawings, an underwater robot autonomous positioning method and system, electronic device, storage medium, and program product according to exemplary embodiments of the present disclosure.
[0029] Figure 1 This is a flowchart of an autonomous localization method for an underwater robot according to an exemplary embodiment of the present disclosure. The method can be executed on an electronic device with sufficient computing power.
[0030] Reference Figure 1 In step S101, standard frames captured by the standard camera and event streams captured by the event camera are obtained.
[0031] A standard camera is used to take pictures at a certain frequency; these pictures are called standard frames. An event camera is a novel biomimetic vision sensor where each pixel has an independent brightness change detector. When the brightness change of a pixel exceeds a set value (e.g., including but not limited to ±5%), that pixel immediately outputs an event. That is, each event represents a pixel's brightness change exceeding the set value at a given moment. These output events constitute an event stream; that is, an event stream includes multiple events. It should be understood that the events in the event stream are arranged in chronological order, and each event has its own timestamp indicating its acquisition time. Similarly, each standard frame also has its own timestamp. The difference is that the time difference between two adjacent standard frames is a fixed value, which is the shooting cycle of the standard camera, while the time difference between two adjacent events is not fixed and may be 0 (i.e., two adjacent events occur simultaneously, meaning two different pixels experience events at the same time).
[0032] As an example, the timestamp of the standard frame can be used as a reference, and the timestamp of the synchronized event stream can be used to represent the same moment, facilitating rapid comparison. As an example, if the event camera has the ability to output synchronized standard frames simultaneously, it is equivalent to having a standard camera embedded. In this case, the synchronized standard frame and event stream are actually obtained from the event camera, which is also an implementation method of this disclosure.
[0033] It should be understood that in order to achieve real-time autonomous positioning of the underwater robot, the process disclosed herein will run repeatedly during the movement of the underwater robot. In this process, step S101 will keep acquiring standard frames and event streams, while subsequent steps S102 to S104 are specifically executed for the latest acquired standard frame (hereinafter referred to as the current standard frame), and thus can be executed in response to acquiring the latest standard frame.
[0034] In step S102, multiple consecutive events in the event stream are aggregated into an original event frame according to a preset rule.
[0035] This step converts the event stream into event frames, which are structured representations of the event stream. These frames transform the event stream into human-understandable images (e.g., using different colors to represent the polarity of events: red for brightening, blue for darkening), and are easily compatible with existing image processing algorithms. Specifically, multiple consecutive time points prior to the acquisition time of the current standard frame can be aggregated into a single original event frame, ensuring a one-to-one correspondence between the original event frames and the standard frames.
[0036] Optionally, step S102 includes: determining the aggregation quantity based on the event generation speed and the underwater robot's movement speed, wherein the event generation speed represents the speed at which the event stream generates events, the aggregation quantity is positively correlated with the event generation speed and negatively correlated with the movement speed; and aggregating the events in the event stream that occur before the acquisition time of the current standard frame into an original event frame. By determining the appropriate event aggregation quantity for the current standard frame based on the actual event generation speed and the underwater robot's movement speed, the event window size can be adaptively and dynamically adjusted. Specifically, the event generation speed reflects texture complexity. Under the same motion conditions, the more events triggered within the same time period, i.e., the faster the events are generated, the more complex the texture. Aggregating more events in this case ensures that the aggregated original event frame contains sufficient feature information, and vice versa. The underwater robot's movement speed reflects the scene's motion. The faster the movement, the fewer events are aggregated, which reduces motion blur caused by too many events. This adaptive spatiotemporal slicing strategy can effectively acquire sufficient information in low-light, high-contrast, and underwater particulate-suspension scenarios, overcoming the limitations of traditional fixed-window methods in underwater environments.
[0037] Optionally, the number of aggregations is calculated using the following formula:
[0038]
[0039] In the above formula, This indicates the number of original event frames to be aggregated. This indicates the number of aggregates from the previous original event frame. This represents the rate of change of the slice caused by the event stream. Indicates the acquisition period of the standard frame. Indicates the current moment. This indicates the earliest trigger time of the events that will be aggregated. Indicates the velocity suppression coefficient. Indicates reference speed. This represents the velocity at time k. , , All parameters are set in advance. The motion velocity can be replaced by the pose information determined at the previous time step (i.e., time step k-1). This represents an adjustment parameter used to account for the number of aggregates resulting from different event generation rates. This indicates the change in the number of aggregates caused by the speed of camera movement, thus incorporating camera speed variations when adjusting the number of aggregates. For example... Figure 2 The diagram shows a comparison between the number of events aggregated and the acquisition period of the standard frame.
[0040] It should be understood that, since aggregation starts from the current moment... It proceeds backwards, so Will follow change, The larger the value, the more events are aggregated. The smaller, therefore in actual implementation It is not a fixed value and needs to be adjusted accordingly. The value will be redefined based on the changes. As an example, a minimum aggregation quantity σ can also be set, which will be calculated using the formula above. The larger of the two values, σ and the minimum aggregation number, is used as the final aggregation number. This means that the number of events aggregated each time must be at least σ, thus ensuring that each original event frame contains a sufficient number of events.
[0041] As an example, after determining the events to be aggregated based on the acquisition time and aggregation quantity of the current standard frame, the specific aggregation method can be expressed as follows: ,in, This represents all events to be aggregated. This represents one of the events. This represents the pixel coordinates of this event. Let be the impulse function, if If the event is triggered, its value is 1; otherwise, it is 0. This represents the intensity value of the event frame. The idea behind this aggregation method is to... (The sentence is incomplete and requires more context to be fully translated.) Intensity value of the event frame At the pixel The number of events triggered at that location.
[0042] However, when the number of events used to synthesize event frames is small, the generated event frame information is sparse, making it difficult to obtain sufficient information for feature point extraction and tracking. When a large number of events are selected directly for synthesizing event frames, simply accumulating this event information may cause motion blur. Although this disclosure can reduce motion blur to some extent by adaptively adjusting the aggregation number, there is still room for improvement in this reduction effect.
[0043] Therefore, to further remove motion blur, motion compensation can be performed on the aggregated original event frames. Although motion compensation schemes exist in related technologies, they often rely on IMU integration to estimate camera motion. However, in underwater environments, IMU acceleration signals have significant noise and the motion is often slow (low acceleration), causing IMU noise to significantly affect the accuracy of motion compensation.
[0044] To resolve this issue, return to the reference. Figure 1In step S103, time smoothing compensation and density maximum compensation are performed on the original event frame to obtain motion-compensated event frames.
[0045] Temporal smoothing compensation reduces the temporal gradient of events within the same event frame, while density maximization compensation increases the spatial distribution density of events within the same event frame. This two-stage compensation method first smooths the temporal information of events to initially fit the motion compensation parameters within the current event window, obtaining a coarse camera motion model. Then, by maximizing the spatial distribution density of events, it minimizes the spread of events on the correction plane, obtaining a more accurate camera motion model. This helps to restore events triggered at the same point in space (which may not be at the same pixel due to camera motion) to the same pixel as much as possible, improving image clarity and achieving motion compensation. Furthermore, this compensation method is IMU-independent and therefore unaffected by IMU noise, further improving motion compensation accuracy.
[0046] For each event e=(x,y,t), its pixel coordinates in the reference time plane are calculated based on the estimated camera motion model. The motion-compensated event frame generation formula can be simplified as follows:
[0047]
[0048] In the above formula, These are the coordinates of the event pixel after motion compensation.
[0049] It should be understood that the camera motion model includes the motion parameters of the event camera, such as position, pose, translation, rotation angle, etc., while the motion compensation parameters are used to directly correct the original event frame, such as pixel-level translation, rotation matrix, scaling factor, affine transformation coefficient, etc.
[0050] It should also be understood that after completing the first stage of time smoothing compensation processing, a coarse motion compensation parameter and a corresponding coarse compensation event frame can be obtained. Based on this, in the second stage of density maximization compensation processing, a final motion compensation parameter for the coarse compensation event frame can be redefined. This final motion compensation parameter is then used to perform motion compensation on the coarse compensation event frame to obtain the final motion compensation event frame. Alternatively, only a coarse motion compensation parameter can be obtained, but motion compensation is not performed initially. Then, in the second stage of density maximization compensation processing, the coarse motion parameter obtained in the first stage is further optimized to obtain the final motion compensation parameter. This final motion compensation parameter is then used to perform motion compensation on the original event frame to obtain the final motion compensation event frame. These are all implementation methods of this disclosure and fall within the protection scope of this disclosure.
[0051] Optionally, the temporal smoothing compensation process includes: optimizing the iterative motion compensation parameters using gradient descent to obtain coarse motion compensation parameters. The objective function of gradient descent is the spatial gradient of the average time frame. The value of each pixel in the average time frame is the average event time of the corresponding pixel in the motion-compensated event frame. The average event time is the average acquisition time of the events aggregated at the corresponding pixel. It should be understood that each pixel in the event frame typically includes zero or more compensated events. The former indicates that no event was triggered at that pixel, while the latter indicates that if the compensation is correct, the events compensated to the same pixel should be triggered by the same point in space and triggered continuously over this period. Therefore, the average event time of different pixels should be close to the same. Thus, the smaller the change in pixel value, the better the compensation effect; that is, the smaller the spatial gradient of the average time frame, the better, and the more accurately the motion compensation parameters are estimated. Based on this analysis, this embodiment uses gradient descent, with the spatial gradient of the average time frame as the objective function, to iteratively calculate the coarse motion compensation parameters, effectively smoothing the temporal information of events.
[0052] Specifically, during the iteration process, the original event frames are first motion-compensated using initial motion compensation parameters (which can be randomly determined or obtained using a certain calculation method; this disclosure does not impose any restrictions on this). This results in temporary motion-compensated event frames. Then, for each pixel in these temporary motion-compensated event frames, the average acquisition time of the aggregated events is calculated to obtain the average event time for each pixel. These average event time values are then projected onto the corresponding pixels to form temporary average time frames. Next, the spatial gradient of these temporary average time frames is calculated. For example, the sum of the gradients of the average time frames can be calculated using the Sobel operator, which serves as the objective function. Then, the gradient of the objective function with respect to each motion compensation parameter is calculated (here, gradient calculation refers to differentiation, not the spatial gradient mentioned earlier). The motion compensation parameters for this iteration can then be obtained using the gradient descent method. Finally, the process returns to the first step and proceeds to the next iteration. This next iteration will be based on the motion compensation parameters obtained in the previous iteration, but will still involve motion compensation of the original event frames. This iteration continues until the termination condition is met. The termination condition can use common termination conditions for the gradient descent method; this disclosure does not impose any restrictions on this.
[0053] Optionally, the density maximization compensation process includes: optimizing the motion compensation parameters by random sampling based on the coarsely selected motion compensation parameters, with the goal of maximizing event pixel density, to obtain the final motion compensation parameters. Here, event pixel density represents the density of events clustered at pixels within an event frame; the pixels refer to the entire event frame, not a single pixel. In other words, the more events cluster towards a smaller number of pixels in the event frame, the denser the event distribution, the higher the event pixel density, and the clearer the image; conversely, the more dispersed the event distribution, the lower the event pixel density, and the blurrier the image. By randomly sampling the motion compensation parameters and finding those that maximize event pixel density, motion blur can be significantly reduced.
[0054] As an example, before performing formal random sampling optimization, the event frames used as the basis for optimization (as mentioned earlier, these can be coarse-compensated event frames or original event frames) can be binarized. That is, if the number of events at a pixel is greater than 0, its value is set to 1; otherwise, it is set to 0. This simplifies subsequent compensation calculations. Subsequent compensation calculations can minimize the non-zero pixel regions of the binarized image, thereby maximizing the feature density of the event frames and obtaining clear event frames.
[0055] It should be understood that regardless of the type of event frame used as the basis for optimization, coarse motion compensation parameters are required. Specifically, the former requires the use of coarse motion compensation parameters to determine the coarse compensation event frame, while the latter requires the use of coarse motion compensation parameters to determine the range of values for random sampling. For example, the range of values can be obtained by fluctuating the coarse motion compensation parameters up or down by a certain percentage. The range of values for the former is independent of the coarse motion compensation parameters and can be determined by other reasonable methods. This disclosure does not impose any restrictions on this.
[0056] The specific random sampling optimization process can be as follows: First, determine the aforementioned random sampling value range and initialize a set of optimal parameters and their corresponding event pixel densities. The initial values of the optimal parameters can be randomly determined or empty, and their corresponding event pixel densities can be 0. Next, randomly sample a set of parameters within the determined value range and use the current sampled parameters to perform motion compensation on the event frame used as the optimization basis, obtaining a temporary motion-compensated event frame. Then, calculate the event pixel density of this temporary motion-compensated event frame. If the current event pixel density is greater than the event pixel density corresponding to the optimal parameters, replace the current sampled parameters with the optimal parameters. Continue executing random sampling and subsequent steps to achieve continuous iteration until the termination condition is met, and use the current optimal parameters as the final motion compensation parameters. It should be understood that using the final motion compensation parameters to perform motion compensation on the event frame used as the optimization basis yields the final motion-compensated event frame.
[0057] In step S104, the pose information of the underwater robot is obtained by minimizing the joint error. The joint error includes motion sensor measurement error and visual error of motion compensation event frame and standard frame. The visual error includes reprojection error term. The motion sensor measurement error includes inertial error term, velocity error term and acoustic odometry error term, which represent the error between the corresponding motion sensor measurement value and the pose information of the underwater robot to be optimized. The pose information includes position, motion speed and attitude.
[0058] As an example, the joint error can be expressed as:
[0059]
[0060] In the above formula, the six elements to be added are the reprojection error term of the motion-compensated event frame, the reprojection error term of the standard frame, the inertia error term, the velocity error term, the acoustic odometry error term, and the marginalization prior error term. Let ζe represent the visual residual of feature point i at frame k of the event frame, and let ζe represent the visual observation set of the event frame. Similarly, Let represent the standard visual residual of feature point i at frame k in the standard frame, where ζg is the corresponding set of observations. Both together constitute the visual residual. The IMU residual is expressed as... It means, and and These represent the DVL velocity residual and the trajectory estimation residual, respectively. Marginalization of the prior residual is used... express. , , , Σ and It is the covariance matrix of each residual, where This represents the Mahalanobis distance norm.
[0061] This step uses an error minimization method to obtain the position, velocity, and attitude of the underwater robot. By jointly optimizing the cost function, the advantages of each sensor can be effectively utilized. By jointly minimizing the weighted sum of the above error terms, multi-source information can be fully integrated, overcoming the limitations of individual sensors in the harsh underwater environment, and achieving high-precision, low-drift, globally consistent state estimation.
[0062] The underwater robot autonomous localization method according to exemplary embodiments of this disclosure, compared with the trajectory estimation method based on IMU and DVL, can compensate for the large localization error caused by the failure of DVL to track underwater surfaces using visual sensors (i.e., standard camera and event camera), thus achieving better localization accuracy. Furthermore, by using an event camera, the limitations imposed by ambient lighting conditions when using only a standard camera can be reduced, enabling feature point extraction and tracking in extremely dark and highly reflective underwater environments, thus exhibiting better robustness.
[0063] For each error term in the joint error, optionally, the reprojection error term is obtained by performing feature tracking processing on the corresponding frame and based on the error between the depth of the successfully tracked feature point in the corresponding frame and the image coordinate point corresponding to that feature point. The feature tracking processing includes: tracking existing feature points using optical flow, supplementing new feature points using a corner detector, and removing abnormal feature points using a random sampling consensus algorithm. The depth of the feature point and the image coordinate point corresponding to the feature point are obtained by triangulating the continuously tracked feature points. The spatial location points obtained through triangulation can be called landmark points. The corresponding frame refers to the frame whose reprojection error is to be calculated, specifically a motion compensation event frame or a standard frame. In other words, the reprojection error of the motion compensation event frame is obtained based on the image coordinate points and three-dimensional coordinates (i.e., depth) of the landmark points on the motion compensation event frame; the reprojection error of the standard frame is obtained based on the image coordinates and three-dimensional coordinates (i.e., depth) of the landmark points on the standard frame. The reprojection error of the standard frame can be expressed as:
[0064]
[0065] In the above formula, Let j be the image coordinates of the j-th landmark on the k-th standard frame. These are the coordinates of the landmark in the world coordinate system. Let be the transformation matrix from the world coordinate system to the IMU standard system at time k. Let be the transformation matrix from the IMU coordinate system to the camera coordinate system at time k. Similarly, the reprojection error of the motion-compensated event frame can be expressed as:
[0066]
[0067] In the above formula, Let J be the image coordinates of the j-th landmark on the k-th event frame.
[0068] Optionally, the inertial error term is the deviation between the current pose detection value and the current pose prediction value. The current pose prediction value is obtained by pre-integrating multiple historical pose detection values and based on the pre-integration result and the previous pose detection value of the current pose detection value. The detection time of the multiple historical pose detection values is between the acquisition time of the current standard frame and the previous standard frame.
[0069] As an example, refer to Figure 3 Since the IMU's measurement frequency is much higher than the camera's measurement frequency, there will be multiple IMU measurements between two standard frames. Initial values for the position, attitude, and velocity increments between two standard frames can be obtained through IMU pre-integration.
[0070]
[0071] In the above formula, the IMU residual Including positional components velocity components Rotational components Bias components . , , This represents the value of the robot's IMU pre-integration. Let represent the rotation matrix from the robot's coordinate system to the world coordinate system at time k. and Let represent the positions of the origin of the robot's coordinate system in the world coordinate system at times k+1 and k, respectively. This represents the robot's velocity in the world coordinate system at time k. This represents the time interval between keyframes k and k+1. Represents gravitational acceleration. This represents the rotational change of the robot in the world coordinate system at time k. and These represent the robot's bias at time k+1 and time k, respectively.
[0072] Optionally, the velocity error term is the deviation between the current velocity detection value and the current velocity estimate, where the current velocity detection value is the velocity component in the current pose detection value, which is in the world coordinate system, and the current velocity estimate is obtained by transforming the current relative velocity detection value, which is in the robot coordinate system, to the world coordinate system.
[0073] As an example, see reference Figure 3 Based on the motion relationship between DVL and IMU in the world coordinate system, the velocity of DVL in the robot coordinate system is expressed as:
[0074]
[0075] In the above formula, Let the velocity of DVL be in the robot coordinate system. Let be the rotation matrix from the world coordinate system to the robot coordinate system. Let the speed of the robot be in the world coordinate system. This refers to the position information of the DVL in the robot coordinate system. The velocity information obtained from DVL measurements, i.e., the current relative velocity detection value mentioned above, Let be the rotation matrix from the DVL coordinate system to the robot coordinate system. It can be represented as:
[0076]
[0077] In the formula, , and Here are the three angular velocities of the IMU. Then, the DVL velocity error term represents the deviation between the estimated velocity of the underwater robot's current state and the velocity values in the underwater robot's state variables (i.e., the pose information of the underwater robot to be optimized).
[0078]
[0079] Optionally, the acoustic odometry error term is the deviation between the current relative displacement change and the current relative displacement estimate change. The current relative displacement change is the relative displacement change between the current pose detection value and the pose detection value of the previous standard frame. The current relative displacement estimate change is obtained by track extrapolation from the acoustic odometry detection data.
[0080] As an example, see reference Figure 3 Based on the motion relationship between the DVL coordinate system in the trajectory extrapolation coordinate system and the robot's j-coordinate system in the world coordinate system, the relative pose change of the robot between two frames calculated by trajectory extrapolation is as follows:
[0081]
[0082] and These represent the relative pose changes from the DVL coordinate system to the DR world coordinate system at time k and time k+1, respectively. This represents the relative pose change from the robot coordinate system to the DVL coordinate system.
[0083] It is a 4×4 matrix, including a 3×3 rotation matrix. and a 3×1 translation vector , The bottom row of 1×4 vectors is filled with 0s.
[0084]
[0085]
[0086] In the above formula, This represents the rotation matrix from the DVL coordinate system to the DR world coordinate system at time k+1. Let represent the rotation matrix from the DVL coordinate system to the DR world coordinate system at time k. This represents the rotation transformation matrix from the robot coordinate system to the DVL coordinate system. This indicates the position of the origin of the robot coordinate system in the DVL coordinate system. This indicates the position of the origin of the robot's coordinate system in the DR world coordinate system at time k+1. This represents the position of the robot in the DR world coordinate system at time k.
[0087] The relative pose change of the robot, estimated by the state variables between the two frames, can then be calculated as follows:
[0088]
[0089] In the above formula, and Let k and k+1 represent the relative pose changes from the robot's coordinate system to the world coordinate system, respectively. and Let these represent the rotation matrices from the robot coordinate system to the world coordinate system at time k and time k+1, respectively. and These represent the positions of the origin of the robot's coordinate system in the world coordinate system at time k and time k+1, respectively.
[0090] Then, the relative pose change error term in the trajectory estimation is expressed as the deviation between the pose change between two adjacent frames of the trajectory estimation and the pose change between two adjacent frames of the underwater robot's state variables:
[0091]
[0092] In the above formula, the residual for trajectory estimation is... Including translation components and rotational components Variables marked with a "^" symbol represent variables in the underwater robot's state variables, while variables without a "^" symbol represent variables obtained from trajectory extrapolation based on acoustic odometry data. This represents the positional change between two adjacent frames. Specifically, it can be understood as the position of the robot's coordinate system origin at time k+1 within the robot's coordinate system at time k. Representing the rotation matrix The corresponding quaternion.
[0093] Marginalized prior residuals It is a prior residual generated through the marginalization process, used to retain the constraint information of the remaining states when removing old states (such as poses, velocities, etc. of old keyframes) in sliding window optimization. It is essentially a "compressed" representation of historical observation data, which is added to the cost function in the form of Gaussian priors to ensure computational efficiency while preserving as much information as possible and maintaining the accuracy and consistency of the estimation.
[0094] Marginalization of prior error terms The calculation method is as follows.
[0095] 1. The original optimization problem.
[0096]
[0097] Block representation of the Hessian matrix: Represents a sub-block related to the marginalized state variable, used to capture the self-related information of the marginalized state. and This represents the coupling term between marginalized state variables and retained state variables within the window, reflecting the constraints of joint observation. This represents a sub-block related to the state variables retained within the window, capturing the self-related information of the retained state. This represents the increment of the marginalized state variable. 'b' represents the increment of the state variables retained within the window. 'b' represents the residual gradient vector.
[0098] 2. Schur supplementary calculation.
[0099] eliminate The marginalized prior Hessian matrix is obtained as follows: , representing the adjustment information matrix for the retained state.
[0100] Marginalized prior residual vector This represents the residual contribution of the marginalized state to the retained state.
[0101] 3. Calculate the marginalization prior error term.
[0102] .
[0103] .
[0104] Overall, for reference Figure 4 According to an exemplary embodiment of this disclosure, the underwater robot autonomous localization method can achieve autonomous localization by sequentially executing an event flow adaptive slicing module, an optimized motion compensation event frame generation module, and a multimodal information joint optimization module. The event flow adaptive slicing module can dynamically adjust the event window size based on the speed of events generated by the event flow and the robot's motion speed, outputting an adaptive event slice set. The optimized motion compensation event frame generation module can independently of any motion sensor based on a two-stage optimization algorithm of time smoothing compensation and density maximum compensation. It performs initial compensation by minimizing the spatial gradient of the event timestamp, and then performs random sampling optimization with the event frame pixel density as the optimization objective, ultimately generating motion compensation event frames with clear texture features. The multimodal information joint optimization module can solve for the robot's position, attitude, and motion speed by minimizing the reprojection error term of the event frame and the standard frame, as well as the motion sensor measurement errors (including IMU inertial error term, DVL velocity error term, and DR error term) through a joint optimization problem.
[0105] Figure 5 A structural block diagram of an electronic device 500 according to an exemplary embodiment of the present disclosure is shown.
[0106] Reference Figure 5 The electronic device 500 includes at least one memory 501 and at least one processor 502. The at least one memory 501 stores computer-executable instructions that, when executed by the at least one processor 502, cause the at least one processor to perform the underwater robot autonomous positioning method as described in the exemplary embodiments above.
[0107] As an example, electronic device 500 may be a PC, tablet, personal digital assistant, smartphone, or other device capable of executing the aforementioned set of instructions. Here, electronic device 500 is not necessarily a single electronic device 500, but may be any collection of devices or circuits capable of executing the aforementioned instructions (or instruction sets) individually or in combination. Electronic device 500 may also be part of an integrated control system or system manager, or may be configured to interconnect with a portable electronic device 500 locally or remotely (e.g., via wireless transmission) through an interface.
[0108] In electronic device 500, processor 502 may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example and not limitation, processor 502 may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, etc.
[0109] The processor 502 can execute instructions or code stored in the memory 501, which can also store data. Instructions and data can also be sent and received over a network via a network interface device, which can employ any known transmission protocol.
[0110] The memory 501 may be integrated with the processor 502, for example, by arranging RAM or flash memory within an integrated circuit microprocessor. Alternatively, the memory 501 may include a separate device, such as an external disk drive, a storage array, or other storage device usable by any database system. The memory 501 and the processor 502 may be operatively coupled, or may communicate with each other, for example, via I / O ports, network connections, etc., enabling the processor 502 to read files stored in the memory.
[0111] In addition, electronic device 500 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of electronic device 500 can be interconnected via a bus and / or network.
[0112] Figure 6 An autonomous positioning system for an underwater robot is shown according to an exemplary embodiment of the present disclosure.
[0113] Reference Figure 6 The system includes a data acquisition device and an electronic device 500 as described in the exemplary embodiments above. The data acquisition device includes a standard camera, an event camera, an inertial measurement unit (IMU), a velocity measurement unit (e.g., a Doppler velocity meter, DVL), and an acoustic odometry unit.
[0114] The event camera used is an Inivation DAVIS346 with a spatial resolution of 346×260 and a dynamic range of up to 120dB. It can stably acquire event stream data in low-light, high-contrast environments and simultaneously output synchronized standard frames with a dynamic range of 55dB and a frame rate of approximately 16 fps, providing high-contrast texture information under good lighting conditions. The inertial measurement unit (IMU) is a Lord 3DM-GX5, an independent, high-precision six-axis inertial sensor with high-frequency three-axis angular velocity and three-axis acceleration measurement capabilities. It supports time synchronization and pre-integration processing, providing high temporal resolution inertial information for motion estimation. The Doppler velocities (DVL) used is a WaterLinked A50, supporting high-precision velocity measurement in water depths of 0-50 meters, operating at a frequency of 4-15Hz. It provides underwater absolute velocity constraints for the system, effectively suppressing drift caused by long-term IMU integration. The onboard control system (as Electronics 500) adopts an edge-center distributed architecture: the edge computing part is a Raspberry Pi 3B, mainly responsible for parsing high-level motion commands and driving the thrusters to achieve a closed loop of low-level motion control; the central computing unit is an Intel NUC12 industrial computer, serving as the main processing platform, running the ROS system, and performing tasks such as time synchronization of multi-sensor data, generation of event frames and standard frames, feature extraction and tracking, sliding window tight-coupled optimized positioning, and motion planning. All system modules are connected through a unified time synchronization mechanism and a high-speed data bus, and are integrated onto the BlueROV commercial underwater robot platform. It has a 100-meter waterproof depth rating and, in practical applications, can achieve robust and high-precision autonomous positioning in complex underwater environments such as extreme low light, high contrast, and dynamic lighting by fusing event streams, standard images, IMU, DVL, and acoustic odometry (DR) information.
[0115] According to exemplary embodiments of this disclosure, a computer-readable storage medium storing instructions may also be provided, wherein the instructions, when executed by at least one processor, cause at least one processor to perform the underwater robot autonomous localization method as described in the exemplary embodiments above. Examples of computer-readable storage media herein include: read-only memory (ROM), random access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disc storage, hard disk drive (HDD), solid-state drive (SSD), card storage (such as multimedia cards, secure digital (SD) cards, or ultra-fast digital (XD) cards), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid-state drive, and any other device configured to store a computer program and any associated data, data files, and data structures in a non-transitory manner and to provide the computer program and any associated data, data files, and data structures to a processor or computer so that the processor or computer can execute the computer program. The computer program in the aforementioned computer-readable storage medium can run in an environment deployed in computer devices such as clients, hosts, agent devices, servers, etc. Furthermore, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system, such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed manner through one or more processors or computers.
[0116] According to exemplary embodiments of the present disclosure, a computer program product may also be provided, including computer instructions that, when executed by at least one processor, perform the underwater robot autonomous positioning method as described in the exemplary embodiments above.
[0117] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the appended claims.
[0118] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
Claims
1. An autonomous positioning method for an underwater robot, characterized in that, include: Acquire standard frames captured by a standard camera and event streams captured by an event camera, wherein the event stream includes multiple events, each event representing that the brightness change of a pixel at a certain moment exceeds a set value; According to preset rules, multiple consecutive events in the event stream are aggregated into an original event frame; Perform temporal smoothing compensation and density maximization compensation on the original event frame to obtain motion-compensated event frames. The temporal smoothing compensation is used to reduce the temporal gradient of events in the same event frame, and the density maximization compensation is used to increase the spatial distribution density of events in the same event frame. The pose information of the underwater robot is obtained by minimizing the joint error, wherein the joint error includes motion sensor measurement error and visual error of motion compensation event frame and standard frame, the visual error includes reprojection error term, the motion sensor measurement error includes inertial error term, velocity error term, and acoustic odometry error term, and the pose information includes position, motion speed, and attitude. The time smoothing compensation process includes: using gradient descent to optimize iterative motion compensation parameters to obtain coarse motion compensation parameters, wherein the objective function of the gradient descent method is the spatial gradient of the average time frame, the value of each pixel in the average time frame is the average event time of the corresponding pixel in the event frame after motion compensation, and the average event time is the average acquisition time of the events aggregated at the corresponding pixel. The maximum density compensation process includes: with the goal of maximizing event pixel density, performing random sampling optimization of motion compensation parameters based on the coarsely selected motion compensation parameters to obtain the final motion compensation parameters, wherein the event pixel density represents the density of events aggregated on pixels in the event frame.
2. The method of claim 1, wherein, The step of aggregating multiple consecutive events in the event stream into an original event frame according to preset rules includes: The aggregation quantity is determined based on the event generation rate and the underwater robot's movement speed, wherein the event generation rate represents the speed at which the event stream generates events, and the aggregation quantity is positively correlated with the event generation rate and negatively correlated with the movement speed; The number of events in the event stream that occurred before the acquisition time of the current standard frame are aggregated into the original event frame.
3. The method of claim 2, wherein, The number of aggregates is calculated using the following formula: in, This indicates the number of original event frames to be aggregated. This indicates the number of aggregates from the previous original event frame. This represents the rate of change of the slice caused by the event stream. This indicates the acquisition period of the standard frame. Indicates the current moment. This indicates the earliest trigger time of the events that will be aggregated. Indicates the velocity suppression coefficient. Indicates reference speed. This indicates the speed of the movement.
4. The underwater robot autonomous positioning method according to any one of claims 1 to 3, characterized in that, The reprojection error term is obtained by performing feature tracking processing on the corresponding frame and based on the error between the depth of the successfully tracked feature point in the corresponding frame and the image coordinate point corresponding to the feature point. The feature tracking processing includes: using optical flow to track existing feature points, using a corner detector to supplement new feature points, and using a random sampling consistency algorithm to remove abnormal feature points. The depth of the feature point and the image coordinate point corresponding to the feature point are obtained by triangulating the continuously tracked feature points. The inertial error term is the deviation between the current pose detection value and the current pose prediction value. The current pose prediction value is obtained by pre-integrating multiple historical pose detection values and based on the pre-integration result and the previous pose detection value of the current pose detection value. The detection time of the multiple historical pose detection values is between the acquisition time of the current standard frame and the previous standard frame. The velocity error term is the deviation between the current velocity detection value and the current velocity estimate. The current velocity detection value is the velocity component in the current pose detection value, which is in the world coordinate system. The current velocity estimate is obtained by transforming the current relative velocity detection value, which is in the robot coordinate system, to the world coordinate system. The acoustic odometry error term is the deviation between the current relative displacement change and the current relative displacement estimate change. The current relative displacement change is the relative displacement change between the current pose detection value and the pose detection value of the previous standard frame. The current relative displacement estimate change is obtained by track extrapolation from the acoustic odometry detection data.
5. An electronic device, characterized in that, include: At least one processor; At least one memory that stores computer-executable instructions. Wherein, when the computer-executable instructions are executed by the at least one processor, the at least one processor causes the at least one processor to execute the underwater robot autonomous positioning method as described in any one of claims 1 to 4.
6. An autonomous positioning system for an underwater robot, characterized in that, include: Data acquisition equipment includes a standard camera, an event camera, an inertial measurement unit, a velocity measurement unit, and an acoustic odometry unit; and The electronic device as described in claim 5.
7. A computer-readable storage medium, characterized in that, When the instructions in the computer-readable storage medium are executed by at least one processor, the at least one processor causes the at least one processor to perform the underwater robot autonomous localization method as described in any one of claims 1 to 4.
8. A computer program product comprising computer instructions, characterized in that, When the computer instructions are executed by at least one processor, they cause the at least one processor to perform the underwater robot autonomous positioning method as described in any one of claims 1 to 4.