Lawn mower visual data processing method and computer storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a dual-predictive pose design in the lawnmower robot, combining visual and wheel speed sensor data, the positioning accuracy problem caused by wheel speed sensor noise was solved, achieving stable pose output and lawnmower path tracking in complex environments.

CN122244150APending Publication Date: 2026-06-19QINGTING INTELLIGENT TECHNOLOGY (SUZHOU) CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: QINGTING INTELLIGENT TECHNOLOGY (SUZHOU) CO LTD
Filing Date: 2026-03-16
Publication Date: 2026-06-19

Application Information

Patent Timeline

16 Mar 2026

Application

19 Jun 2026

Publication

CN122244150A

IPC: G06T7/73; G06T7/246

AI Tagging

Application Domain

Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In complex outdoor terrain, existing lawnmower robots suffer from wheel slippage, which causes the wheel speedometer data to exhibit non-Gaussian noise, making it difficult for traditional fusion methods to achieve ideal positioning accuracy.

Method used

A dual-prediction pose design is adopted. First, the first prediction pose of the visual odometry is used for optical flow tracking. If it fails, it switches to the second prediction pose of the wheel speed meter. The pose is verified by PnP pose calculation and wheel speed meter prediction pose to ensure the continuity and stability of pose output.

Benefits of technology

In typical scenarios of lawnmower robots slipping and visual degradation, stable pose output was achieved, avoiding pose jumps caused by visual errors and ensuring that the lawnmower operates stably according to the planned path.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244150A_ABST

Patent Text Reader

Abstract

This application relates to the field of garden robot technology, and discloses a visual data processing method and computer storage medium for a lawnmower. After receiving the current image, the method first obtains the corresponding wheel velocities pose by interpolation based on wheel velocities data. Then, it calculates a first predicted pose and a second predicted pose based on historical visual motion and wheel velocities motion, respectively. During optical flow tracking, the projection of the first predicted pose is preferentially used as the initial value. If the number of tracking points is insufficient, it switches to the second predicted pose for re-tracking. If the second round of tracking is still insufficient, the second predicted pose is directly output as the current pose. After obtaining sufficient tracking points, the visual pose is calculated using PnP (Proof-of-Placement) and compared with the second predicted pose for deviation. Simultaneously, it is verified using interior point conditions. Only if the verification passes is the PnP result adopted; otherwise, it reverts to the second predicted pose. This application effectively avoids pose jumps when the wheels slip, ensuring the continuity and accuracy of the positioning output.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of garden robot technology, and in particular to a method for processing visual data of a lawnmower and a computer storage medium. Background Technology

[0002] In visual SLAM (Simultaneous Localization and Mapping), fusing wheel velocimetry (VVA) data to improve localization accuracy is a common technique. Existing fusion methods typically employ factor graph optimization or extended Kalman filter (EPF) frameworks. Specifically, the pose increment between two robot states is calculated using VVA data, and this increment is added as a constraint edge to the factor graph for optimization, or introduced as an observation into the EPF framework for state estimation. All of these methods are based on the common assumption that the noise in VVA data follows a Gaussian distribution. However, for devices like lawnmowers operating in complex outdoor terrain, wheel slippage is frequent, causing the noise in the VVA output data to exhibit significant non-Gaussian characteristics. This makes it difficult for traditional fusion methods to achieve ideal results in practical applications. Summary of the Invention

[0003] Based on this, it is necessary to propose a visual data processing method for lawnmowers to address the technical problem of insufficient positioning accuracy in existing technologies.

[0004] In a first aspect, a method for processing visual data from a lawnmower is provided, the method comprising: Receive the first-view image of the current frame and determine the current wheel speed meter pose corresponding to the first-view image of the current frame; Based on the visual odometry poses of at least two historical first-view images, calculate the first predicted pose of the current first-view image. Based on the wheel speed meter pose corresponding to the first view image of the current frame and the wheel speed meter pose corresponding to the first view image of the previous frame, calculate the second predicted pose of the first view image of the current frame. Using the first predicted pose, the corner points with three-dimensional coordinates in the previous frame first-view image are projected onto the current frame first-view image to obtain the first projection point, and the first projection point is used as the initial value to perform optical flow tracking on the corner points; if the number of successfully tracked corner points meets the first threshold condition, the coordinates of the tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame first-view image. If the number of successfully tracked corner points does not meet the first threshold condition, the corner points with three-dimensional coordinates in the previous frame's first-view image are projected onto the current frame's first-view image using the second predicted pose to obtain a second projection point. Optical flow tracking is then performed on the corner points using the second projection point as the initial value. If the number of successfully tracked corner points meets the second threshold condition, the tracked corner point coordinates are used as the coordinates of the corner points with three-dimensional coordinates in the current frame's first-view image. If the number of successfully tracked corner points does not meet the second threshold condition, the second predicted pose is used as the visual odometry pose of the current frame's first-view image, and the successfully tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame's first-view image. Using the corner coordinates and corresponding three-dimensional coordinates of the current frame first-view image, the PnP pose of the current frame first-view image is calculated. It is then determined whether the deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold and whether the corresponding inlier satisfies the inlier condition. If so, the PnP pose is used as the visual odometry pose of the current frame first-view image; otherwise, the second predicted pose is used as the visual odometry pose of the current frame first-view image.

[0005] Secondly, a lawnmower vision data processing device is provided, the device comprising: The determination module is used to receive the first-view image of the current frame and determine the current wheel speed meter pose corresponding to the first-view image of the current frame; The calculation module is used to calculate the first predicted pose of the current frame first-view image based on the visual odometry pose of at least two historical first-view images. The calculation module is also used to calculate the second predicted pose of the current frame first view image based on the wheel speed meter pose corresponding to the current frame first view image and the wheel speed meter pose corresponding to the previous frame first view image. The projection module is used to project corner points with three-dimensional coordinates in the previous frame first-view image onto the current frame first-view image using the first predicted pose to obtain a first projection point, and to perform optical flow tracking on the corner points using the first projection point as the initial value; if the number of successfully tracked corner points meets the first threshold condition, the coordinates of the tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame first-view image. The tracking module is configured to, if the number of successfully tracked corner points does not meet the first threshold condition, project the corner points with three-dimensional coordinates in the previous frame's first-view image onto the current frame's first-view image using the second predicted pose to obtain a second projection point, and perform optical flow tracking on the corner points using the second projection point as the initial value; if the number of successfully tracked corner points meets the second threshold condition, use the tracked corner point coordinates as the corner point coordinates with three-dimensional coordinates in the current frame's first-view image; if the number of successfully tracked corner points does not meet the second threshold condition, use the second predicted pose as the visual odometry pose of the current frame's first-view image, and use the successfully tracked corner points as the corner point coordinates with three-dimensional coordinates in the current frame's first-view image. The output module is used to calculate the PnP pose of the current frame first-view image using the corner coordinates with three-dimensional coordinates and the corresponding three-dimensional coordinates, and to determine whether the deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold and whether the corresponding inlier satisfies the inlier condition. If so, the PnP pose is used as the visual odometry pose of the current frame first-view image; otherwise, the second predicted pose is used as the visual odometry pose of the current frame first-view image.

[0006] Thirdly, a smart lawnmower is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the lawnmower visual data processing method described above.

[0007] Fourthly, a computer storage medium is provided, which stores a computer program that, when executed by a processor, implements the lawnmower visual data processing method described above.

[0008] The beneficial effects of this application are: First, this application calculates a first predicted pose based on historical visual motion and a second predicted pose based on wheel velocity increments. These two predicted poses provide dual assurance for subsequent tracking. The first predicted pose is prioritized to guide optical flow tracking; if the number of tracked values is insufficient, the system immediately switches to the second predicted pose for re-tracking. This way, when visual prediction fails due to rapid rotation or an open scene, the controller can automatically use wheel velocity increments as an alternative initial value, thus avoiding pose loss due to tracking failure and ensuring the continuity of feature point tracking.

[0009] Secondly, this application does not directly use the visually calculated PnP pose, but instead compares its deviation with the second predicted pose and performs double verification using interior point conditions. Although wheel speedometer predictions contain slippage noise, their motion trend still has reference value in the short term. When the deviation between the PnP calculation result and the wheel speedometer prediction is too large, it indicates that the visual calculation may be affected by incorrect matching or interference from dynamic objects. This scheme discards the visual result and reverts to the second predicted pose as the output, which can effectively suppress pose jumps caused by visual errors.

[0010] In summary, this application achieves stable pose output in typical lawnmower robot slippage and visual degradation scenarios through a progressive design of two levels: dual prediction to ensure tracking continuity and wheel speed meter verification to suppress visual jumps. This not only leverages the advantage of accurate short-term motion trends of the wheel speed meter, but also avoids the contamination of the fusion results by its non-Gaussian noise. Attached Figure Description

[0011] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0012] in: Figure 1 This is an application environment diagram of the lawnmower visual data processing method in one embodiment; Figure 2 This is a flowchart illustrating a lawnmower visual data processing method in one embodiment; Figure 3 This is a structural block diagram of a lawnmower visual data processing device in one embodiment; Figure 4 This is a structural block diagram of a smart lawnmower in one embodiment. Detailed Implementation

[0013] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0014] Figure 1 This is a schematic diagram of an application scenario provided in an embodiment of this application, such as... Figure 1As shown, this application scenario can be applied to a yard environment, taking the smart lawnmower 100 as an example. The user places the lawnmower at the edge of the yard lawn and issues the "start mowing" command to the lawnmower through the control panel on the machine or the application on the terminal device.

[0015] Upon receiving the "start mowing" command, the lawnmower's controller immediately activates the visual odometry system. This system uses a binocular camera to capture image data of the lawn environment and simultaneously receives pose data from the wheel speed sensor. During operation, when the lawnmower travels to a relatively open lawn area or makes a turn in place, relying solely on visual odometry can easily lead to pose calculation errors due to degraded optical flow tracking quality, resulting in trajectory jumps.

[0016] If the technical solution provided in this application is adopted, the controller first obtains the wheel speed meter pose synchronized with the image by interpolating the wheel speed meter data, and then calculates the first predicted pose and the second predicted pose. In the optical flow tracking stage, if the number of tracks based on the first predicted pose is insufficient, it automatically switches to tracking based on the second predicted pose, and performs consistency verification with the wheel speed meter predicted pose after PnP (Perspective-n-Point) bit resolution. Finally, even if the wheels slip or the scene texture is missing, the controller can still output a smooth and continuous visual odometry pose, ensuring that the lawnmower operates stably according to the planned path and avoiding missed mowing or repeated cutting. After completing the mowing task of the entire lawn, the lawnmower automatically returns to the charging station.

[0017] Please see Figure 2 As shown, Figure 2 A flowchart illustrating the lawnmower visual data processing method provided in this embodiment of the invention includes the following steps: S1. Receive the first-view image of the current frame and determine the current wheel speed meter pose corresponding to the first-view image of the current frame.

[0018] The controller receives wheel speed meter (WSM) data and stereo camera image data in real time. The output frequency of WSM data is typically higher than that of image data; for example, 50 Hz for WSM and 10 Hz for stereo cameras. The controller maintains a WSM data buffer that stores WSM poses within a recent time period in timestamp order. When a new first-view image frame is received, its timestamp is recorded, and this image is defined as the current frame's first-view image. To obtain the precise WSM pose corresponding to the current frame's first-view image moment, the controller searches the buffer for the first WSM data with a timestamp greater than the current frame's first-view image timestamp and retrieves its preceding WSM data. Subsequently, using the timestamps and poses of these two WSM data sets, the WSM pose at the current frame's first-view image moment is calculated using an interpolation method. Specifically, the translation vector uses linear interpolation, and the rotation matrix uses logarithmic mapping interpolation on a Lie group to ensure the smoothness and physical meaning of the rotation interpolation. The interpolated pose is the current WSM pose corresponding to the current frame's first-view image, used in subsequent steps.

[0019] For ease of description, this application defines the two cameras in a binocular camera as a first image acquisition unit and a second image acquisition unit, respectively. Those skilled in the art will understand that the first image acquisition unit can be either a left or right camera, and the image it acquires is correspondingly called a first-view image. Similarly, the second image acquisition unit acquires a second-view image. The first-view image and the second-view image together constitute a stereoscopic image pair.

[0020] For example, the first-view image is the left-eye image. Assume the timestamp of the current frame's left-eye image is 10.00 seconds, and the buffer stores wheel velocity meter data, where timestamp 9.98 seconds corresponds to pose A, and timestamp 10.02 seconds corresponds to pose B. The controller first locates the first data point greater than 10.00 seconds, i.e., pose B at timestamp 10.02 seconds, and retrieves the preceding data point, i.e., pose A at timestamp 9.98 seconds. Then, it calculates interpolation coefficients based on the time difference, interpolates pose A and pose B, and obtains the precise wheel velocity meter pose at 10.00 seconds. This pose is the current wheel velocity meter pose corresponding to the current frame's left-eye image.

[0021] S2. Based on the visual odometry poses of at least two historical first-view images, calculate the first predicted pose of the current first-view image.

[0022] Specifically, the controller saves the pose results calculated by visual odometry from each previous first-view image; this result is called the visual odometry pose. These historical first-view images and their corresponding visual odometry poses form the basis for predicting the pose of the current frame.

[0023] To predict the pose of the current frame of the first-view image, the controller selects the two most recent historical first-view images: the previous frame and the two frames prior. The controller acquires the visual odometry poses corresponding to these two historical first-view images and calculates the relative motion from the previous frame to the two frames prior, which reflects the visual motion trend over a short period. Specifically, this relative motion includes rotation and translation components, describing the pose change observed in the camera coordinate system of the previous frame from the two frames prior to the previous frame.

[0024] Subsequently, the controller applies this relative motion to the visual odometry pose of the previous frame's first-view image. The application process involves multiplying the rotation matrix of the relative motion by the rotation matrix of the previous frame, and then rotating the translation vector of the relative motion to the world coordinate system and adding it to the translation vector of the previous frame. The resulting pose is the first predicted pose of the current frame's first-view image. This first predicted pose is calculated entirely based on historical visual motion information, reflecting the visual odometry's own estimation of motion.

[0025] It should be noted that for the second frame of the first-view image in the entire process, since there are no previous two frames of first-view images, the controller cannot calculate the aforementioned relative motion. In this special case, the controller directly uses the visual odometry pose of the previous frame of the first-view image as the first predicted pose of the current frame of the first-view image.

[0026] For example, suppose the controller has processed the first frame of the left-eye image and obtained its visual odometry pose P1, and processed the second frame of the left-eye image and obtained its visual odometry pose P2. When the third frame of the left-eye image arrives, the controller retrieves the historical left-eye images, i.e., the visual odometry poses P2 and P1 corresponding to the second and first frames. The controller calculates the relative motion from P1 to P2, i.e., the pose change of the second frame relative to the first frame. Then, the controller applies this relative motion to P2 to calculate the first predicted pose of the third frame of the left-eye image.

[0027] If the currently arriving image is the second frame of the first-view image, then only the historical data of the first frame of the first-view image exists. Since the previous two frames are missing, the controller directly uses the visual odometry pose P1 of the first frame of the first-view image as the first predicted pose of the second frame of the first-view image.

[0028] S3. Based on the wheel speed meter pose corresponding to the first-view image of the current frame and the wheel speed meter pose corresponding to the first-view image of the previous frame, calculate the second predicted pose of the first-view image of the current frame.

[0029] In step S1, the controller has obtained the current wheel speed meter pose corresponding to the first-view image of the current frame, and at the same time, it has saved the wheel speed meter pose corresponding to the first-view image of the previous frame and the visual odometry pose of the first-view image of the previous frame from historical data.

[0030] To calculate the second predicted pose of the current frame's first-view image, the controller first uses the wheel velocities of the current frame's first-view image and the previous frame's first-view image to calculate the wheel velocities motion increment from the previous frame to the current frame. This increment consists of two parts: a rotation matrix and a translation vector. Specifically, the calculation is as follows: the rotation matrix of the current frame's wheel velocities is multiplied on the left by the transpose of the rotation matrix of the previous frame's wheel velocities, yielding the rotation increment; the translation vector of the current frame's wheel velocities is subtracted from the translation vector of the previous frame's wheel velocities, and then the interpolation is rotated to the previous frame's wheel velocities coordinate system, yielding the translation increment. This wheel velocities motion increment reflects the robot's actual motion measured by the wheel velocities.

[0031] Subsequently, the controller applies the wheel speedometer motion increment to the visual odometry pose of the previous frame's first-view image. The application process is as follows: the rotation increment is multiplied left by the rotation matrix of the previous frame's visual odometry pose to obtain the rotation matrix of the current frame's second predicted pose; the translation increment is rotated to the world coordinate system and added to the translation vector of the previous frame's visual odometry pose to obtain the translation vector of the current frame's second predicted pose. The resulting pose is the second predicted pose of the current frame's first-view image.

[0032] It should be noted that if the current frame is the first frame of the entire process, there is no first-view image of the previous frame. Therefore, step S3 will not be executed, but will be handled directly by the initialization step.

[0033] S4. Using the first predicted pose, the corner points with three-dimensional coordinates in the previous frame first-view image are projected onto the current frame first-view image to obtain the first projection point, and the first projection point is used as the initial value to perform optical flow tracking on the corner points; if the number of successfully tracked corner points meets the first threshold condition, the coordinates of the tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame first-view image.

[0034] In this process, the controller has already calculated the first predicted pose of the first-view image in the current frame. Simultaneously, the controller stores information on all corner points from the previous first-view image. These corner points are divided into two categories, one of which has already had its corresponding 3D coordinates calculated through triangulation. Step S4 only processes this category of corner points with 3D coordinates.

[0035] The controller acquires each corner point with 3D coordinates from the previous frame's first-view image, along with the corresponding 3D coordinates. For each such corner point, the controller reprojects its 3D coordinates onto the imaging plane of the current frame's first-view image using the first predicted pose. This projection process requires consideration of the camera's intrinsic parameters; essentially, it involves determining the imaging position of a 3D point in the current frame's virtual camera coordinate system. The resulting series of pixel positions are called the first projection points. These first projection points represent the initial search positions for finding the corner point in the current frame's first-view image, i.e., the initial values for optical flow tracing.

[0036] Subsequently, the controller uses the original pixel coordinates of these corner points in the previous frame's first-view image and their corresponding first projection points as input to perform optical flow tracking on the current frame's first-view image. The purpose of optical flow tracking is to find the best matching position of the corner pixel in the previous frame's first-view image. For each corner point, the optical flow algorithm outputs a tracking result, including its pixel coordinates in the current frame's first-view image and a flag indicating whether the tracking was successful.

[0037] After tracking is complete, the controller counts the number of successfully tracked corner points. If this number meets a preset first threshold condition, for example, if the number of successfully tracked corner points exceeds one-third of the total number of corner points with 3D coordinates in the previous frame's first-view image, then the visual tracking guided by the first predicted pose is considered to have achieved good results. At this point, the controller records the coordinates of those successfully tracked corner points in the current frame's first-view image as their new coordinates in the current frame's first-view image, retaining their 3D coordinate attribute for use in subsequent steps.

[0038] If the number of successfully tracked corner points does not meet the first threshold condition, the controller ends the execution of step S4 and proceeds to step S5.

[0039] For example, suppose there are 30 corner points with 3D coordinates in the previous frame's left-eye image. Their pixel coordinates and 3D coordinates are known. The controller uses the first predicted pose obtained in step S2 to project the 3D coordinates of these 30 corner points one by one onto the current frame's left-eye image, obtaining 30 first projection points. Each projection point corresponds to the estimated position of a corner point in the current frame's left-eye image.

[0040] Next, the controller uses the pixel coordinates of these 30 corner points in the previous frame as the starting point and the corresponding 30 first projection points as the initial tracking values to perform optical flow tracking on the left eye image of the current frame. After tracking, it was found that 25 corner points were successfully tracked to their precise positions, while the tracking of the other 5 corner points failed.

[0041] The preset first threshold condition is that the number of successfully tracked corner points must be greater than 33% of the total number of corner points with 3D coordinates in the previous frame, that is, greater than 30 multiplied by 33%, which is approximately 10. Since 25 is greater than 10, the condition is met. Therefore, the controller records the coordinates of these 25 successfully tracked corner points in the left eye image of the current frame as the new coordinates of these corner points in this frame, and marks them as still having corresponding 3D coordinates. After step S4 is completed, the subsequent steps will proceed. If the number of successfully tracked corner points is only 8, which does not meet the condition of being greater than 10, the controller will start step S5.

[0042] S5. If the number of successfully tracked corner points does not meet the first threshold condition, the corner points with three-dimensional coordinates in the previous frame's first-view image are projected onto the current frame's first-view image using the second predicted pose to obtain the second projection point, and optical flow tracking is performed on the corner points using the second projection point as the initial value; if the number of successfully tracked corner points meets the second threshold condition, the coordinates of the tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame's first-view image; if the number of successfully tracked corner points does not meet the second threshold condition, the second predicted pose is used as the visual odometry pose of the current frame's first-view image, and the successfully tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame's first-view image.

[0043] Specifically, step S4 is triggered when the number of successfully tracked corner points does not meet the first threshold condition. At this time, the controller abandons the first predicted pose and instead uses the second predicted pose calculated in step S3 to retry tracking the corner points in the previous frame's first-view image.

[0044] The controller acquires all corner points with 3D coordinates from the previous frame's first-view image, along with their respective 3D coordinates. For each such corner point, the controller reprojects its 3D coordinates onto the imaging plane of the current frame's first-view image using a second predicted pose, resulting in a series of second projection points. These second projection points serve as initial values for optical flow tracking, used to find the optimal matching position for each corner point in the current frame's first-view image.

[0045] Subsequently, the controller uses the original pixel coordinates of these corner points in the previous frame's first-view image and the corresponding second projection points as input to perform optical flow tracking on the current frame's first-view image. After optical flow tracking is completed, for each corner point, the controller obtains its pixel coordinates in the current frame's first-view image and a flag indicating whether the tracking was successful.

[0046] After tracking is complete, the controller counts the number of successfully tracked corners out of all corners and performs branching processing based on whether this number meets the second threshold condition: In the first case, if the number of successfully tracked corner points meets the second threshold condition, for example, the second threshold condition is: the number of successfully tracked corner points exceeds one-third of the total number of corner points with three-dimensional coordinates in the first-view image of the previous frame, then the controller records the coordinates of those successfully tracked corner points in the first-view image of the current frame as the new coordinates of these corner points in this frame, and retains their attribute of having three-dimensional coordinates.

[0047] In the second scenario, if the number of successfully tracked corner points does not meet the second threshold condition, it indicates that visual tracking guided by the second predicted pose has still failed. In this case, the controller will no longer attempt visual tracking, but will directly use the second predicted pose as the visual odometry pose of the current frame's first-view image. Simultaneously, the controller will record the coordinates of the corner points successfully tracked in this optical flow tracking within the current frame's first-view image, using these as the three-dimensional corner coordinates in the current frame's first-view image, even if the number of these corner points is small.

[0048] It should be noted that the second threshold condition can be the same as the first threshold condition, or it can be set to different values according to actual needs. Its essence is to determine whether the optical flow tracking based on the current predicted pose has reached an acceptable quality level.

[0049] For example, suppose that in step S4, optical flow tracking based on the first predicted pose only succeeds at 8 corner points, which does not meet the first threshold condition. The controller proceeds to step S5, acquires 30 corner points with three-dimensional coordinates from the previous frame's left-eye image, and projects their two-dimensional coordinates onto the current frame's left-eye image using the second predicted pose, obtaining 30 second projection points as new initial values.

[0050] The controller uses the pixel coordinates of these 30 corner points in the previous frame as a starting point and 30 second projection points as initial values to re-perform optical flow tracking on the left eye image of the current frame. After tracking, it was found that 15 corner points were successfully tracked to their precise positions, while the other 15 corner points failed to be tracked.

[0051] The preset second threshold condition is that the number of successfully tracked corner points must be greater than 33% of the total number of corner points with 3D coordinates in the previous frame, i.e., greater than 10. Since 15 is greater than 10, the condition is met. Therefore, the controller records the coordinates of these 15 successfully tracked corner points in the left eye image of the current frame as the new coordinates of these corner points in this frame. Subsequently, the process proceeds to step S6.

[0052] Suppose that the second tracking result only successfully tracks 5 corner points, which is less than 10 and does not meet the second threshold condition. In this case, the controller directly uses the second predicted pose as the visual odometry pose of the current frame's left-eye image. At the same time, the controller records the coordinates of these 5 successfully tracked corner points in the current frame's left-eye image as the corner point coordinates with three-dimensional coordinates in the current frame's left-eye image.

[0053] S6. Using the corner coordinates with three-dimensional coordinates and the corresponding three-dimensional coordinates in the first-view image of the current frame, calculate the PnP pose of the first-view image of the current frame, and determine whether the deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold and whether the corresponding inlier satisfies the inlier condition. If so, use the PnP pose as the visual odometry pose of the first-view image of the current frame; otherwise, use the second predicted pose as the visual odometry pose of the first-view image of the current frame.

[0054] Specifically, in step S4 or S5, the controller has obtained the coordinates of corner points with three-dimensional coordinates in the current frame's first-view image. These corner points possess two attributes: one is their two-dimensional pixel coordinates in the current frame's first-view image, and the other is their three-dimensional coordinates in the world coordinate system obtained through historical triangulation. The controller uses these paired two-dimensional and three-dimensional data to solve the pose of the current frame's first-view image relative to the world coordinate system using the PnP algorithm; this pose is called the PnP pose.

[0055] During the PnP algorithm solution process, the controller calculates the reprojection error for each pair of corner data. Specifically, the 3D coordinates of the corner are projected onto the imaging plane of the first-view image of the current frame according to the currently solved pose, obtaining the projected point. Then, the pixel distance between this projected point and the actual tracked corner coordinates is calculated; this distance is the reprojection error. The controller classifies corners with reprojection errors greater than a preset error threshold as outliers and discards them; corners with reprojection errors less than or equal to the preset error threshold are classified as inliers, and pose optimization is performed based on these inliers. The final PnP pose and inlier information are recorded together.

[0056] After obtaining the PnP pose, the controller compares it with the second predicted pose calculated in step S3. The comparison mainly focuses on the deviation between the translation vectors of the two poses, specifically calculating the Euclidean distance between the translation vector of the PnP pose and the translation vector of the second predicted pose. This distance reflects the degree of difference in position between the visual solution and the wheel speed sensor prediction.

[0057] Subsequently, the controller determines the final visual odometry pose based on two conditions. The first condition is whether the deviation of the aforementioned translation vector is less than a preset deviation threshold. The second condition is whether the inliers obtained during the PnP solution meet preset inlier conditions, which typically include the absolute number of inliers and the proportion of inliers to the total number of corner points involved in the solution. If both conditions are met simultaneously—that is, the translation deviation is sufficiently small and the inlier quality is sufficiently high—the controller considers the PnP solution result reliable and uses the PnP pose as the visual odometry pose for the first-view image of the current frame. If either condition is not met, the controller considers the PnP solution result may be affected by noise or mismatches and instead uses the second predicted pose as the visual odometry pose for the first-view image of the current frame to ensure the continuity and stability of the pose output. After determining the visual odometry pose, the controller stores it for use in subsequent frames.

[0058] For example, suppose there are 15 corner points with 3D coordinates in the first-view image of the current frame, each with its own 2D and 3D coordinates. The controller calls the PnP algorithm to solve for these 15 sets of data. During the solution process, the algorithm calculates the reprojection error of each corner point, and points with an error greater than 2 pixels are considered outliers. After elimination, 12 corner points are finally identified as inliers, and the PnP pose T is optimized based on these 12 inliers. pnp .

[0059] The controller obtains the second predicted pose T of the current frame from step S3. odom And calculate T pnp With T odom The distance between the translation vectors is calculated to be 0.1 meters. The preset deviation threshold is 0.5 meters; 0.1 meters is less than 0.5 meters, thus satisfying the first condition.

[0060] Simultaneously, the controller checks the interior point conditions. The preset interior point conditions are that the number of interior points is greater than 8 and the proportion of interior points is greater than 60%. Currently, the number of interior points is 12, which is greater than 8; the proportion of interior points is 12 divided by 15, which equals 80%, which is greater than 60%. Both sub-conditions are met, therefore the interior point conditions are satisfied.

[0061] Since both the translational deviation and the interior point condition are satisfied, the controller will T pnp The visual odometry pose of the first-view image of the current frame is used and stored.

[0062] In another scenario, the calculated translational deviation is 0.6 meters, exceeding the preset deviation threshold of 0.5 meters. Even if the interior point condition is met, the controller will still consider T to be... pnp The deviation from the wheel speed sensor's prediction was too large, which may have indicated a visual tracking error; therefore, the use of T was abandoned. pnp Instead, Todom The visual odometry pose of the first-view image of the current frame.

[0063] In one possible embodiment, it also includes: The current frame first-view image is the first frame image. The visual odometry pose of the current frame first-view image is initialized, and the current frame first-view image is set as a keyframe. During initialization, the visual odometry pose of the current frame first-view image is specifically set as follows: the position of the current frame first-view image is set to... The rotation matrix of the attitude is set as the identity matrix.

[0064] Specifically, when the entire visual odometry tracking process starts, the first frame of the first-view image received by the controller has no previous pose information for reference, so a special initialization operation must be performed. This initialization step is triggered when the controller determines that the first-view image being processed is the first frame of the entire image sequence.

[0065] The core of the initialization operation is to assign an initial visual odometry pose to the current frame's first-view image. Since there is no historical motion information or external reference at this point, the controller sets the position of the current frame's first-view image to a zero vector, meaning the camera's optical center in the world coordinate system for this frame is 0, 0, 0. Simultaneously, the controller sets the rotation matrix of the current frame's first-view image to an identity matrix. The introduction of the identity matrix means that at this moment, the camera's three coordinate axes are perfectly aligned with the three coordinate axes of the world coordinate system, without any rotation. Through this setting, the camera coordinate system of the current frame's first-view image is established as the world coordinate system, and the visual odometry poses of all subsequent frames will be calculated relative to this initial coordinate system.

[0066] After completing the initial pose setting, the controller marks the current frame's first-view image as a keyframe. Keyframes are specific image frames used in visual odometry systems to build, maintain, and optimize maps; typically, representative frames are selected as keyframes.

[0067] In one possible embodiment, the method of this application further includes: Optical flow tracking is performed on corner points that do not have three-dimensional coordinates in the previous frame first-view image to obtain the coordinates of corner points that do not have three-dimensional coordinates in the current frame first-view image.

[0068] Specifically, the controller has completed the tracking of the corner points with 3D coordinates in the previous frame's first-view image and determined their positions in the current frame's first-view image. However, there are other corner points in the previous frame's first-view image. These corner points were only recently detected, or their 3D coordinates have not yet been obtained through triangulation due to previous tracking failures. Special processing is performed on these corner points to carry them over to the current frame so that their 3D coordinates can be established for them at an appropriate time later.

[0069] The controller first reads all corner points without 3D coordinates from the stored data in the previous frame's first-view image and obtains their pixel coordinates in the previous frame. For each such corner point, the controller directly uses its pixel coordinates from the previous frame as the initial search position for optical flow tracking; that is, it uses these coordinates as initial values to perform Lucas-Kanade optical flow tracking on the current frame's first-view image. Since these corner points do not have 3D coordinates for projection, they cannot be projected using predicted poses as they are with corner points that have 3D coordinates; instead, they can only rely on the grayscale consistency between images for direct matching.

[0070] The optical flow tracking algorithm finds the optimal matching position for each input corner point from the previous frame in the current frame's first-person perspective image and outputs a flag indicating whether the tracking was successful. For successfully tracked corner points, the controller records their precise pixel coordinates in the current frame's first-person perspective image; these coordinates are the corner point coordinates that do not have three-dimensional coordinates in the current frame's first-person perspective image. For corner points that fail to track, the controller discards them and does not retain them.

[0071] After tracking is complete, all corner points that successfully extend to the current frame but lack 3D coordinates, along with their corner point coordinates in the current frame, are stored in the corner point list of the current frame. These corner points, together with the corner points with 3D coordinates obtained in step S4 or S5, constitute the complete corner point set of the first-view image of the current frame. Subsequently, when the current frame is determined to be a keyframe, these corner points without 3D coordinates will have the opportunity to obtain 3D coordinates through stereo matching with the second-view image, thereby completing the conversion from new corner points to map points.

[0072] For example, suppose that in the previous frame of the left-eye image, in addition to the 30 corner points with 3D coordinates, there are 20 newly detected corner points whose 3D coordinates have not yet been obtained. In step S4, the controller has already processed the 30 corner points with 3D coordinates. The controller obtains the pixel coordinates of these 20 corner points without 3D coordinates in the previous frame, where the coordinates of the first corner point are 100, 150.

[0073] The controller, with initial values of 100 and 150, runs an optical flow algorithm on the left-eye image of the current frame to find the new position of the corner point after its movement. The algorithm calculates the best-matching coordinates of the corner point in the current frame as 105, 148 and returns a successful tracking flag. The controller records these coordinates as the coordinates of a corner point in the left-eye image of the current frame that does not have three-dimensional coordinates. For the other 19 corner points, optical flow tracking may be partially successful and partially unsuccessful. Assume that ultimately 15 corner points are successfully tracked and 5 corner points fail to track.

[0074] After tracking is complete, the left-eye image of the current frame will contain 30 corner points with 3D coordinates from step S4, and 15 corner points without 3D coordinates. These 45 corner points together constitute the feature point set of the current frame. If the current frame is subsequently selected as a keyframe, the controller will use the matching of these 15 new corner points between the left and right-eye images to calculate their 3D coordinates through triangulation, making them map points with 3D coordinates.

[0075] This embodiment ensures that new corner points in the image whose 3D coordinates have not yet been obtained can be stably tracked across consecutive image frames, preventing these potentially useful features from being discarded due to a temporary lack of depth information. This provides a continuous source of material for subsequent map point expansion, enabling visual odometry to continuously introduce new environmental features and adapt to changes in the scene.

[0076] In one possible embodiment, the method of this application further includes: Determine whether the current frame's first-view image is a keyframe; If so, update the three-dimensional coordinates of the corner point using the matching relationship between the first-view image of the current frame and the second-view image of the current frame.

[0077] Specifically, after obtaining the visual odometry pose of the first-view image of the current frame, the controller needs to determine whether the frame should be selected as a keyframe based on preset conditions. Keyframes are specific image frames used to build and optimize maps, and their selection is mainly based on the relative motion between the current frame and the previous keyframe, as well as the feature point tracking quality of the current frame.

[0078] The controller first acquires the visual odometry pose of the first-view image from the previous keyframe. Then, it calculates the positional distance between the current frame's first-view image's visual odometry pose and the previous keyframe's pose, obtained by calculating the Euclidean distance between two translation vectors. Simultaneously, it calculates the rotation angle between the two poses, obtained by using the inverse cosine function to calculate the traces of the two rotation matrices. Furthermore, the controller counts the total number of successfully tracked corner points in the current frame's first-view image; these corner points include those already possessing 3D coordinates and those not yet possessing 3D coordinates. If the positional distance is greater than a preset distance threshold, or the rotation angle is greater than a preset angle threshold, or the number of successfully tracked corner points is less than a preset number threshold, the controller determines the current frame's first-view image to be a keyframe. If none of the above three conditions are met, the current frame is a non-keyframe, and the controller terminates processing for that frame, waiting for the next frame.

[0079] If the current frame is determined to be a keyframe, the controller will perform triangulation to update the 3D coordinates of the corner points. The controller first acquires all corner points in the first-view image of the current frame, which have been successfully tracked and recorded. For each such corner point, the controller uses its pixel coordinates in the first-view image of the current frame as the initial search position and performs optical flow tracing on the second-view image of the current frame to find the corresponding point in the second-view image. After successful optical flow tracing, the controller obtains the precise pixel coordinates of the corner point in the second-view image of the current frame. Thus, for each corner point where left and right eye matching is successfully established, the controller possesses the 2D coordinates of the corner point in the left second-view image, the visual odometry pose in the first-view image of the current frame, and the pre-calibrated extrinsic parameters between the left and right eyes.

[0080] Next, the controller uses the aforementioned data to perform triangulation calculations. The triangulation algorithm is based on multi-view geometry, solving for the world coordinates of a point in space using observations from two different viewpoints and their corresponding camera poses. After calculating the 3D coordinates, the controller also needs to verify their validity. The first step of the verification is to reproject the 3D coordinates onto multiple historical image frames where the corner point was successfully tracked, calculating the reprojection error between the pixel coordinates obtained from each projection and the pixel coordinates obtained from actual tracking, ensuring that all errors are less than a preset reprojection threshold. The second step of the verification is to check the depth value of the 3D coordinates in the left-eye camera coordinate system of the current frame, i.e., the Z-coordinate of the point in the camera coordinate system, ensuring that this depth value is greater than a preset depth threshold. Only corner points that pass both verifications are considered to have valid 3D coordinates. For corner points that already have 3D coordinates, the triangulation results will be used to update and optimize their original coordinates; for corner points that did not originally have 3D coordinates, this triangulation assigns them initial 3D coordinates, making them corner points with 3D coordinates. Corner points that fail the verification are not recorded in 3D; they are still marked as having no 3D coordinates and will be retried in subsequent frames. After the triangulation update is complete, the controller stores the updated corner point information for use in subsequent image frames.

[0081] For example, suppose the previous keyframe was frame 10, with a visual odometry pose of T10. The current frame is frame 15, with a visual odometry pose of T15, and a total of 25 successfully tracked corners. The controller calculates the positional distance between T10 and T15 to be 0.4 meters, and the rotation angle to be 20 degrees. The preset keyframe distance thresholds are 0.3 meters, the angle threshold is 30 degrees, and the corner count threshold is 10. Since 0.4 meters is greater than 0.3 meters, the distance condition is met, therefore frame 15 is determined to be a keyframe.

[0082] After being identified as a keyframe, the controller processes all corner points on the left-eye image of frame 15. Assuming there are 50 corner points on the left-eye image, 30 of which already have 3D coordinates, and 20 do not, the controller uses the coordinates of these 50 corner points on the left-eye image as initial values and performs optical flow tracking on the right-eye image. Ultimately, 45 corner points successfully find corresponding points, while tracking of 5 corner points fails. For each successfully matched corner point, the controller uses the left-eye pose T15, left and right-eye extrinsic parameters, and the coordinates of the matching points on both eyes to calculate the 3D coordinates of that corner point using a triangulation algorithm.

[0083] Taking one corner point as an example, the triangulated 3D coordinates are X. The controller retrieves historical frames where this corner point was successfully tracked, including frames 12, 13, 14, and 15, and reprojects X onto these frames respectively. The reprojection errors are 1.2 pixels, 1.5 pixels, 1.1 pixels, and 1.3 pixels respectively, all less than the preset threshold of 2 pixels. Simultaneously, the depth of X in the left eye camera coordinate system of the current frame is 5 meters, greater than the preset threshold of 0.5 meters. Therefore, the 3D coordinates of this corner point are accepted. For the 30 corner points that already have 3D coordinates, their coordinates are optimized and updated; for the 15 corner points that originally had no 3D coordinates, they obtain 3D coordinates for the first time, becoming corner points with 3D coordinates. The remaining 5 corner points that failed to track are discarded and remain in a state without 3D coordinates. The updated corner point information is stored for tracking and pose calculation in subsequent frames.

[0084] This embodiment employs a keyframe determination mechanism. This method performs computationally intensive triangulation operations only on frames that meet specific conditions, avoiding redundant processing on every image frame. This significantly reduces the system's computational burden and ensures real-time performance. The appropriate selection of keyframes ensures the representativeness and uniformity of map points, providing a foundation for subsequent pose optimization and loop closure detection.

[0085] In one possible embodiment, determining the current wheel speed sensor pose corresponding to the current frame first-view image includes: Looking up timestamp t in the cache o Wheel velocity measurement pose greater than the timestamp ti of the current frame's first-view image, and lookup timestamp t o-1 Wheel speed gauge position and orientation; Based on timestamp t i t o and t o-1 Calculate the interpolation coefficients; The timestamp t is calculated based on the interpolation coefficients. o Wheel speed measurement pose and timestamp t o-1 The wheel speed meter pose is obtained by performing interpolation processing on the wheel speed meter pose of the current frame first-view image.

[0086] Specifically, the formula for calculating the interpolation coefficient λ is as follows: .

[0087] Based on the interpolation coefficient timestamp t o The wheel velocity gauge pose and timestamp are t o-1 The formula for interpolating the wheel velocity meter pose to obtain the corresponding current wheel velocity meter pose is: ; ; in, Represents timestamp The corresponding wheel speedometer pose, Represents timestamp The corresponding wheel speed gauge pose, exp() and log() are the exponential and logarithmic functions defined on the three-dimensional orthogonal Lie group SO(3), respectively.

[0088] In one possible embodiment, calculating the first predicted pose of the current frame of the first-view image based on the visual odometry pose of at least two historical first-view images includes: The visual motion increment is calculated based on the visual odometry pose corresponding to the first-view image two frames prior and the visual odometry pose corresponding to the first-view image of the previous frame. The first predicted pose of the first-view image of the current frame is calculated based on the visual motion increment.

[0089] Specifically, let the visual odometry pose correspond to the first-view image two frames prior. The visual odometry pose corresponding to the first-view image of the previous frame. Calculated visual motion increment : , Then, the first predicted pose of the first-view image of the current frame is calculated according to the following formula. : , .

[0090] In one possible embodiment, calculating the second predicted pose of the current frame's first-view image based on the wheel velocity sensor pose corresponding to the current frame's first-view image and the wheel velocity sensor pose corresponding to the previous frame's first-view image includes: The wheel velocity meter motion increment is calculated based on the wheel velocity meter pose corresponding to the first-view image of the previous frame and the wheel velocity meter pose corresponding to the first-view image of the current frame. The second predicted pose of the first-view image of the current frame is calculated based on the wheel velocity meter motion increment.

[0091] Specifically, let the wheel velocity sensor pose correspond to the first-view image of the previous frame be set. The wheel velocity sensor pose corresponding to the first-view image of the current frame. Calculate the motion increment of the wheel speed gauge : , The second predicted pose of the first-view image in the current frame is calculated based on the wheel velocity meter motion increment. ; , .

[0092] It should be noted that when i=2, that is, the first-view image of the current frame is the second frame, the visual odometry pose corresponding to the first-view image two frames ago. Visual odometry pose corresponding to the first-view image in the previous frame. replace.

[0093] In this application, the corner points of the three-dimensional coordinates are projected using the following formula: ;in, This refers to either the first predicted pose or the second predicted pose. These are camera-in-camera parameters. The corner points are represented by three-dimensional coordinates. The projection function is defined as follows: Intra-camera parameters refer to the set of parameters describing the internal optical imaging characteristics of a camera, used to establish the mapping relationship between the coordinates of a point in three-dimensional space in the camera coordinate system and its pixel coordinates on the image plane. This parameter set typically includes the camera's normalized focal length along the x and y axes, reflecting the distance from the lens's optical center to the imaging plane measured in pixels; it also includes the principal point coordinates, i.e., the pixel position of the intersection of the camera's optical axis and the imaging plane, usually located near the image center. Furthermore, intra-camera parameters may also include distortion factors describing the tilt of the image coordinate axes, and various distortion coefficients used to correct radial and tangential distortion of the lens.

[0094] In one possible embodiment, after calculating the PnP pose of the current frame's first-view image, the method further includes: Calculate the reprojection error e for each corner point with three-dimensional coordinates: ;in, These are the coordinates of the corner points in the first-view image of the current frame. and The rotation matrix and translation vector are calculated by the PnP algorithm; If the reprojection error is greater than the preset reprojection threshold Corner points are removed as outer points, and the remaining corner points are used as inner points to calculate the inner point condition; the inner point condition includes: the number of inner points is greater than a preset inner point number threshold. Furthermore, the ratio of the number of interior points to the number of corner points with three-dimensional coordinates in the current frame is greater than a preset ratio threshold. .

[0095] Specifically, after solving for the rotation matrix and translation vector of the first-view image of the current frame using the PnP (Perspective-n-Point) algorithm, the controller needs to evaluate the reliability of the solution and select those corner points with higher quality from the solution for subsequent pose optimization or as the basis for interior point condition judgment. This process is achieved by calculating the reprojection error of each corner point with three-dimensional coordinates.

[0096] For each corner point with 3D coordinates in the first-view image of the current frame, the controller knows the 2D pixel coordinates and the corresponding 3D coordinates of that corner point. The controller uses the rotation matrix and translation vector obtained by the PnP algorithm to transform the 3D coordinates of the corner point from the world coordinate system to the camera coordinate system of the current frame. Then, it projects these coordinates onto the image plane using the camera's intrinsic parameter matrix, obtaining the pixel coordinates of a projected point. Subsequently, the controller calculates the Euclidean distance between this projected point coordinate and the pixel coordinates actually tracked by the corner point; this distance is the reprojection error. The magnitude of the reprojection error reflects the degree of agreement between the 3D coordinates of the corner point and the currently solved camera pose.

[0097] The controller compares the reprojection error of each corner point with a preset reprojection threshold. If the reprojection error of a corner point is greater than the threshold, the corner point is considered an outlier, meaning its matching or 3D coordinates may have large errors and it is unsuitable for pose optimization, so it is discarded. If the reprojection error is less than or equal to the threshold, the corner point is considered an inlier and is retained.

[0098] After the above screening, the controller counts the number of inliers. To determine whether the current PnP solution is acceptable, the controller needs to check if the inlier condition is met. The inlier condition includes two sub-conditions: first, the number of inliers is greater than a preset inlier number threshold, to ensure that there are enough reliable points to support the pose solution; second, the ratio of the number of inliers to the total number of corner points with 3D coordinates participating in the PnP solution in the current frame is greater than a preset proportion threshold, to ensure that the proportion of inliers is high enough, avoiding a situation where the absolute number meets the standard but most points are outliers. Only when both of these sub-conditions are met is the current PnP solution considered reliable.

[0099] For example, suppose there are 20 corner points with three-dimensional coordinates in the first-view image of the current frame, and their two-dimensional pixel coordinates and corresponding three-dimensional coordinates are known. The controller obtains the rotation matrix R using the PnP algorithm. pnp Translation vector t pnp Subsequently, the controller calculates the reprojection error for each corner point.

[0100] Taking one corner point as an example, its actual tracked pixel coordinates are 300, 250, and its 3D coordinates are X. The controller will adjust X according to R... pnp and t pnp Transform to the camera coordinate system, then project onto the image plane using the intrinsic parameter K, obtaining projection point coordinates of 298, 248. Calculate the Euclidean distance between these two points as √2² + 2², approximately 2.83 pixels. The preset reprojection threshold T... reproj Since the value is 3 pixels and 2.83 is less than 3, the corner point is determined to be an interior point.

[0101] After performing the above calculations on all 20 corner points, the controller determined that the reprojection error of 16 corner points was less than 3 pixels, meaning there were 16 inlier points. The preset inlier point count threshold T... inlier The number is 10, and 16 is greater than 10, satisfying the first sub-condition. Simultaneously, the number of interior points (16) divided by the total number of corner points (20) equals 0.8, which meets the preset ratio threshold TH. ratio Since 0.6 is greater than 0.6, the second sub-condition is satisfied. Therefore, all interior point conditions are satisfied, confirming that the current PnP solution is reliable.

[0102] If the number of interior points is only 8, which is less than 10, the interior point quantity condition is not met; if the number of interior points is 12 but the total number of corner points is 30, the ratio is 0.4, which is less than 0.6, the ratio condition is not met. If any sub-condition is not met, the controller considers the PnP solution to be unreliable and needs to switch to another pose, such as the second predicted pose.

[0103] This embodiment utilizes reprojection error calculation and interior point screening to effectively eliminate erroneous corner points caused by mismatches, interference from dynamic objects, or inaccurate 3D coordinates, ensuring high-quality corner points used in subsequent decision-making. This provides a basis for the reliability assessment of PnP pose.

[0104] In one possible embodiment, the deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold specifically: ;in, Let be the translation vector of the PnP pose. Let be the translation vector of the second predicted pose. This is a preset distance threshold. For example: , , , The unit is meters.

[0105] Specifically, the translation vector is a three-dimensional vector used to describe the position of the camera coordinate system origin in the world coordinate system, that is, the offset from the world coordinate system origin to the camera optical center. In a visual odometry system, when the three-dimensional coordinates of a spatial point in the world coordinate system are known, it is rotated to the same orientation as the camera coordinate system using a rotation matrix, and then the translation vector is added to transform the point into the camera coordinate system. This transformed point is then projected onto the image plane using the camera's intrinsic parameters.

[0106] Preset distance threshold TH dist The determination method is mainly based on statistical analysis of the deviation between the wheel speedometer-predicted pose and the vision-resolved pose. In practical applications, this threshold is usually determined by pre-collecting experimental data during normal robot operation. Specifically, the method involves statistically analyzing the deviation distribution of the translation vector between the PnP pose and the second predicted pose under good vision tracking conditions, and then selecting an upper limit value that can cover the normal fluctuation range as the threshold. This preset distance threshold TH dist The value is set at 0.8 meters. This value is set based on a comprehensive consideration of the lawnmower robot's movement speed, wheel speed meter accuracy, and visual calculation error. It can effectively identify visual calculation anomalies while avoiding frequent regression to wheel speed meter mode due to an excessively small threshold, thus balancing positioning accuracy and robustness.

[0107] In one possible embodiment, determining whether the current frame's first-view image is a keyframe includes: Obtain the first-view image pose of the previous keyframe ; Calculate the pose of the first-view image in the current frame. Positional distance between the first-view image pose of the previous keyframe and rotation distance ; ; ; where tr() represents the trace of the matrix and arccos() represents the inverse cosine function; If one of the following conditions is met, the current frame's first-person perspective image is determined to be a keyframe: The location distance D KF Greater than the preset keyframe distance threshold KF dist ; The rotation distance is greater than a preset keyframe angle threshold; The number of successfully tracked corner points in the first-view image of the current frame is less than the preset keyframe corner point number threshold.

[0108] Specifically, the preset keyframe distance threshold KF distThe methods for determining the preset keyframe angle threshold and the preset keyframe corner number threshold are mainly based on a trade-off between robot motion characteristics, map building requirements, and computational resource allocation. Among these, the preset keyframe distance threshold KF... dist The keyframe spacing is controlled by a preset threshold of 0.3 meters, typically set based on the translational speed of the lawnmower robot in a typical work scenario and the requirement to maintain sufficient parallax between adjacent keyframes to ensure triangulation accuracy. The preset keyframe angle threshold controls the rotation interval between keyframes; a typical value of 30 degrees is set based on the robot's sensitivity to camera angle changes during turns and the need to avoid inserting redundant keyframes due to excessively small rotations. The preset keyframe corner point number threshold ensures that keyframes have sufficient feature information to support map building and pose optimization; a typical value of 10 corner points is set based on the minimum number of feature points required to guarantee the reliability of triangulation calculations. In practical applications, those skilled in the art can adjust and optimize these three thresholds experimentally based on the robot's movement speed, the richness of scene textures, and the system's computational capacity.

[0109] In one possible embodiment, updating the three-dimensional coordinates of the corner point using the matching relationship between the current frame's first-view image and the current frame's second-view image includes: Using the coordinates of the corner points in the first-view image of the current frame as the initial values of the corresponding corner points in the second-view image of the current frame, the coordinates of the same-name corner points in the second-view image of the current frame are obtained by tracking through the optical flow algorithm. For each corner point, obtain the coordinates of the corresponding corner points in each frame of the image and the pose of each frame of the image, and calculate the three-dimensional coordinates of the corner point through the triangulation algorithm. Calculate the reprojection error of the corner point in the first-view and second-view images of each frame, and the z-coordinate in the corresponding camera coordinate system of each frame. If all reprojection errors are less than the preset triangulation reprojection threshold and all z coordinates are greater than the preset depth threshold, then record the three-dimensional coordinates of the corner point. Specifically, once the first-view image of the current frame is determined to be a keyframe, the controller needs to perform stereo matching on the corner points in the image to provide basic data for subsequent triangulation calculations. For each corner point in the first-view image of the current frame, whether it is a corner point that already has three-dimensional coordinates or not, the controller needs to find its corresponding position in the second-view image of the current frame, i.e., the same-name corner point.

[0110] Since the left and right cameras acquire images simultaneously and their extrinsic parameters are fixed, the imaging position of the same spatial point in the left second-view image has a definite geometric relationship. However, this relationship is not a simple pixel translation, so an image matching algorithm is needed to find the corresponding point. The controller uses an optical flow algorithm for this matching process. Specifically, for a corner point in the current frame's first-view image, the controller uses its pixel coordinates in the first-view image as the initial search position and performs Lucas-Kanade optical flow tracking in the current frame's second-view image. Based on the assumption of image grayscale invariance, the optical flow algorithm searches for the region in the second-view image that is most similar to the grayscale distribution of the corner point and outputs the pixel coordinates of the matching point and a flag indicating whether the tracking was successful.

[0111] For successfully tracked corner points, the controller records the precise pixel coordinates of the corner point in the current frame's second-view image. These coordinates, together with the corner point's coordinates in the first-view image, form a stereo matching pair. For corner points that fail to track, the controller discards them and they are no longer involved in subsequent triangulation calculations. Through this step, the controller establishes a correspondence between the left and right eyes for as many corner points as possible in the current frame's first-view image.

[0112] For example, suppose there are 50 corner points in the current frame's first-view image, including 30 corner points that already have 3D coordinates and 20 corner points that do not yet have 3D coordinates. The controller processes these 50 corner points sequentially. Taking one corner point as an example, its pixel coordinates in the first-view image are 200, 150. The controller uses 200, 150 as initial values and runs an optical flow algorithm on the current frame's second-view image. After calculation, the algorithm finds the best matching position in the second-view image as 195, 148 and returns a tracking success flag. The controller records 195, 148 as the coordinates of the corresponding corner point in the second-view image. After all 50 corner points have been processed, suppose 45 corner points have successfully found corresponding corner points, and the other 5 corner points have failed to track and are discarded.

[0113] Specifically, after obtaining the coordinates of the matching point of the corner point on the left second-view image of the current frame, the controller needs to use the observation information of the corner point in more historical image frames to calculate its three-dimensional coordinates to improve the calculation accuracy and robustness. For each corner point that has successfully established left and right eye matching, the controller traces back all historical image frames in which the corner point has been successfully tracked in the past period.

[0114] For each such historical frame, the controller acquires the visual odometry pose of that frame, as well as the pixel coordinates of the corner point in the first-view image of that frame. If the historical frame also underwent left-right eye matching, it can also acquire the pixel coordinates of the corner point in the second-view image of that frame, and the corresponding pose of the second-view image. The pose of the second-view image can be calculated using the left-eye pose and pre-calibrated left-right eye extrinsic parameters. For example, the pose of the second-view image of the current frame is calculated using the following formula: , ; in, The visual odometry pose of the first-view image in the current frame. The extrinsic parameters between the pre-calibrated first-view and second-view images.

[0115] The controller gathers observation data of the corner point across a series of image frames, including the camera pose and 2D pixel coordinates for each frame. Then, the controller processes this data using a triangulation algorithm. The basic principle of the triangulation algorithm is to utilize epipolar geometry in multi-view geometry to solve for the 3D coordinates of a point by minimizing reprojection error. The algorithm comprehensively considers all observations to calculate the 3D coordinate point that best fits all the observation data.

[0116] Continuing with the previous example, the controller is processing a corner point for which left and right eye matching has been successfully established. This corner point was observed not only in the current frame (frame 15), but also successfully tracked in previous frames 12, 13, and 14. The controller retrieves the visual odometry pose and coordinates of the corner point in the first-view image of frame 12, the visual odometry pose and coordinates of the corner point in the first-view image of frame 13, the visual odometry pose and coordinates of the corner point in the first-view image of frame 14, and adds the visual odometry pose and coordinates of the corner point in the current first-view image of frame 15, along with the coordinates of the corner point in the left second-view image. The controller inputs these multiple sets of pose and 2D coordinate data into a triangulation algorithm, which calculates and outputs a 3D coordinate X as a preliminary estimate of the corner point in the world coordinate system.

[0117] Specifically, after obtaining the 3D coordinates of the corner points through the triangulation algorithm, the controller needs to verify the accuracy of these coordinates to ensure consistency with physical reality and observation. Verification includes two aspects: reprojection error verification and depth value verification.

[0118] For reprojection error verification, the controller reprojects the 3D coordinates of the corner point sequentially onto each frame of the observed image. For each first-view image frame, the controller uses the visual odometry pose of that frame to transform the 3D coordinates to the camera coordinate system, then projects them onto the image plane using camera intrinsic parameters to obtain the coordinates of the projected point. The Euclidean distance between this projected point and the actual tracked corner point coordinates is then calculated; this distance represents the reprojection error for that frame. If there is a corresponding second-view image observation for that frame, the controller also needs to perform the same projection and error calculation using the pose of the second-view image. The controller iterates through all observed frames to obtain a set of reprojection error values.

[0119] For depth value verification, the controller transforms the 3D coordinates of the corner point to the camera coordinate system of the first-view image of the current frame, obtaining a 3D vector. The third component of this vector is the depth value of the corner point in the current frame's camera coordinate system, which is the Z coordinate. The controller checks whether this Z coordinate is positive and sufficiently large.

[0120] Continuing the previous example, the triangulation algorithm calculates the 3D coordinates of the corner point as X. The controller first performs a reprojection error check. It processes the 12th, 13th, 14th, and 15th frames observed for that corner point. Taking the first-view image of frame 12 as an example, the controller projects X onto the image of that frame using the pose of frame 12, obtaining the projection point coordinates 98, 102. The actual tracking coordinates of this corner point in frame 12 are 100, 100, and the Euclidean distance between the two points is calculated to be approximately 2.83 pixels. Similarly, a similar calculation is performed on the second-view image of frame 12. The above process is repeated for frames 13, 14, and 15, resulting in a series of reprojection errors of 2.2 pixels, 1.8 pixels, 2.5 pixels, and 1.9 pixels, respectively. Then, a depth value check is performed. The controller transforms X to the camera coordinate system of the current frame, i.e., frame 15, obtaining camera coordinates of 5.0, 3.2, and 8.5, where the Z coordinate is 8.5 meters.

[0121] Specifically, the controller compares all reprojection errors calculated in step E3 with a preset triangulation reprojection threshold, and simultaneously compares the depth value with a preset depth threshold. The preset triangulation reprojection threshold is used to control the consistency of 3D coordinate projection under different viewpoints, and is usually set to a small pixel value. The preset depth threshold is used to exclude unreasonable depths caused by mismatches, such as unstable points caused by negative depth values or excessively small depths.

[0122] If the reprojection error of the corner point on all observation frames is less than the preset triangulation reprojection threshold, and the depth value of the corner point in the current frame's camera coordinate system is greater than the preset depth threshold, then the controller considers the 3D coordinates to be accurate and reliable, and records them. For corner points that already have 3D coordinates, the coordinates calculated this time will be used to update and optimize their original 3D coordinates; for corner points that did not originally have 3D coordinates, this calculation will assign them initial 3D coordinates, making them corner points with 3D coordinates.

[0123] If either of the above two conditions is not met—that is, if the reprojection error of a certain frame exceeds the threshold, or the depth value is less than or equal to the depth threshold—the controller considers the 3D coordinates unreliable and does not record them. The corner point remains in a state without 3D coordinates, waiting for subsequent frames to attempt triangulation again.

[0124] Continuing the previous example, the reprojection errors of this corner point are 2.83 pixels, 2.2 pixels, 1.8 pixels, 2.5 pixels, and 1.9 pixels, respectively. The preset triangulation reprojection threshold is 3 pixels, and all errors are less than 3 pixels, satisfying the reprojection error condition. Simultaneously, the depth value of this corner point in the current frame's camera coordinate system is 8.5 meters, and the preset depth threshold is 0.5 meters. 8.5 is greater than 0.5, satisfying the depth condition. Therefore, the controller determines that the 3D coordinate X is valid. If the corner point already has 3D coordinates, the controller updates the original coordinates with X; if the corner point is a newly appearing corner point in the 15th frame, the controller assigns it a 3D coordinate X, marks it as a corner point with 3D coordinates, and stores these coordinates for use in subsequent frames. Suppose that one of the reprojection errors of another corner point is 3.5 pixels, exceeding the 3-pixel threshold, then the 3D coordinates of this corner point are not recorded, and it remains in a state without 3D coordinates.

[0125] This embodiment constructs a complete and rigorous 3D coordinate update process for corner points. First, stereo matching is established using left and right line-of-sight tracking, providing reliable 2D observation data for triangulation. Then, a multi-frame triangulation algorithm fuses historical observation information, improving the accuracy and noise resistance of the 3D coordinate calculation. A rigorous verification process uses both reprojection error and depth thresholding to effectively eliminate erroneous map points caused by mismatches, dynamic objects, or calculation anomalies, ensuring the accuracy and reliability of the map.

[0126] Please see Figure 3 As shown, in one embodiment, a lawnmower vision data processing device is provided, the device comprising: The determining module 301 is used to receive the first-view image of the current frame and determine the current wheel speed meter pose corresponding to the first-view image of the current frame; The calculation module 302 is used to calculate the first predicted pose of the current frame first view image based on the visual odometry pose of at least two historical first view images. The calculation module 302 is further configured to calculate the second predicted pose of the current frame first view image based on the wheel speed meter pose corresponding to the current frame first view image and the wheel speed meter pose corresponding to the previous frame first view image. The projection module 303 is used to project corner points with three-dimensional coordinates in the previous frame first-view image onto the current frame first-view image using the first predicted pose to obtain a first projection point, and to perform optical flow tracking on the corner points using the first projection point as the initial value; if the number of successfully tracked corner points meets the first threshold condition, the coordinates of the tracked corner points are used as the coordinates of the corner points with three-dimensional coordinates in the current frame first-view image. The tracking module 304 is configured to, if the number of successfully tracked corner points does not meet the first threshold condition, project the corner points with three-dimensional coordinates in the previous frame's first-view image onto the current frame's first-view image using the second predicted pose to obtain a second projection point, and perform optical flow tracking on the corner points using the second projection point as the initial value; if the number of successfully tracked corner points meets the second threshold condition, use the tracked corner point coordinates as the corner point coordinates with three-dimensional coordinates in the current frame's first-view image; if the number of successfully tracked corner points does not meet the second threshold condition, use the second predicted pose as the visual odometry pose of the current frame's first-view image, and use the successfully tracked corner points as the corner point coordinates with three-dimensional coordinates in the current frame's first-view image. The output module 305 is used to calculate the PnP pose of the current frame first-view image using the corner coordinates with three-dimensional coordinates and the corresponding three-dimensional coordinates, and to determine whether the deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold and whether the corresponding inlier satisfies the inlier condition. If so, the PnP pose is used as the visual odometry pose of the current frame first-view image; otherwise, the second predicted pose is used as the visual odometry pose of the current frame first-view image.

[0127] For other details regarding the implementation of the above technical solution by each module in the above-mentioned lawnmower visual data processing device, please refer to the description in the lawnmower visual data processing method provided in the above-mentioned invention embodiments, which will not be repeated here.

[0128] In one embodiment, a smart lawnmower is provided, the internal structure of which can be shown in the following diagram: Figure 4As shown, the intelligent lawnmower includes a processor, memory, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile and / or volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with external clients via a network connection. When executed by the processor, the computer program implements the methods described in any of the foregoing embodiments of this application.

[0129] This application also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, are used to implement the methods described in any of the foregoing embodiments of this application.

[0130] This application also provides a chip for executing instructions, which is used to perform the methods described in any of the foregoing embodiments executed by an electronic device as described in any of the foregoing embodiments of this application.

[0131] This application also provides a computer program product, which includes a computer program that, when executed by a processor, can implement the methods described in any of the foregoing embodiments executed by an electronic device as described in any of the foregoing embodiments of this application.

[0132] It should be noted that the functions or steps that the computer-readable storage medium or intelligent lawnmower can achieve are described in the relevant descriptions in the foregoing method embodiments. To avoid repetition, they will not be described one by one here.

[0133] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0134] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.

[0135] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A method for processing visual data from a lawnmower, characterized in that, Includes the following steps: Receive the first-view image of the current frame and determine the current wheel speed meter pose corresponding to the first-view image of the current frame; Based on the visual odometry poses of at least two historical first-view images, calculate the first predicted pose of the current first-view image. Based on the wheel speed meter pose corresponding to the first view image of the current frame and the wheel speed meter pose corresponding to the first view image of the previous frame, calculate the second predicted pose of the first view image of the current frame. The corner point with three-dimensional coordinates in the previous frame first view image is projected onto the current frame first view image using the first predicted pose to obtain the first projection point, and the first projection point is used as the initial value to perform optical flow tracking on the corner point. If the number of successfully tracked corner points meets the first threshold condition, the coordinates of the tracked corner points will be used as the coordinates of the corner points with three-dimensional coordinates in the first-view image of the current frame. If the number of successfully tracked corner points does not meet the first threshold condition, the corner points with three-dimensional coordinates in the previous frame first-view image are projected onto the current frame first-view image using the second predicted pose to obtain the second projection point, and the corner points are tracked by optical flow using the second projection point as the initial value. If the number of successfully tracked corner points meets the second threshold condition, the coordinates of the tracked corner points will be used as the coordinates of the corner points with three-dimensional coordinates in the first-view image of the current frame. If the number of successfully tracked corner points does not meet the second threshold condition, the second predicted pose is used as the visual odometry pose of the current frame first view image, and the successfully tracked corner points are used as the corner point coordinates with three-dimensional coordinates in the current frame first view image. Using the corner coordinates with three-dimensional coordinates and the corresponding three-dimensional coordinates in the first view image of the current frame, calculate the PnP pose of the first view image of the current frame, and determine whether the deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold and whether the corresponding inlier satisfies the inlier condition. If so, use the PnP pose as the visual odometry pose of the first view image of the current frame. Otherwise, the second predicted pose is used as the visual odometry pose of the first-view image of the current frame.

2. The lawnmower visual data processing method according to claim 1, characterized in that, The method further includes: The current frame first-view image is the first frame image. The visual odometry pose of the current frame first-view image is initialized, and the current frame first-view image is set as a keyframe. During initialization, the visual odometry pose of the current frame first-view image is: the position of the current frame first-view image is set to... The rotation matrix of the attitude is set as the identity matrix.

3. The lawnmower visual data processing method according to claim 1, characterized in that, The method further includes: Optical flow tracking is performed on corner points that do not have three-dimensional coordinates in the previous frame first-view image to obtain the coordinates of corner points that do not have three-dimensional coordinates in the current frame first-view image.

4. The lawnmower visual data processing method according to any one of claims 1 to 3, characterized in that, Also includes: Determine whether the current frame's first-view image is a keyframe; If so, update the three-dimensional coordinates of the corner point using the matching relationship between the first-view image of the current frame and the second-view image of the current frame.

5. The lawnmower visual data processing method according to claim 1, characterized in that, Determining the current wheel speed sensor pose corresponding to the first-view image of the current frame includes: Looking up timestamp t in the cache o Wheel velocity measurement pose greater than the timestamp ti of the current frame's first-view image, and lookup timestamp t o-1 Wheel speed gauge position and orientation; Based on timestamp t i t o and t o-1 Calculate the interpolation coefficients; The timestamp t is calculated based on the interpolation coefficients. o Wheel speed measurement pose and timestamp t o-1 The wheel speed meter pose is obtained by performing interpolation processing on the wheel speed meter pose of the current frame first-view image.

6. The lawnmower visual data processing method according to claim 1, characterized in that, The calculation of the first predicted pose of the current frame of the first-view image based on the visual odometry pose of at least two historical first-view images includes: The visual motion increment is calculated based on the visual odometry pose corresponding to the first-view image two frames prior and the visual odometry pose corresponding to the first-view image of the previous frame. The first predicted pose of the first-view image of the current frame is calculated based on the visual motion increment.

7. The lawnmower visual data processing method according to claim 1, characterized in that, The step of calculating the second predicted pose of the current frame's first-view image based on the wheel speedometer pose corresponding to the current frame's first-view image and the wheel speedometer pose corresponding to the previous frame's first-view image includes: The wheel velocity meter motion increment is calculated based on the wheel velocity meter pose corresponding to the first-view image of the previous frame and the wheel velocity meter pose corresponding to the first-view image of the current frame. The second predicted pose of the first-view image of the current frame is calculated based on the wheel velocity meter motion increment.

8. The lawnmower visual data processing method according to claim 1, characterized in that, The projection is performed according to the following formula: ;in, This refers to either the first predicted pose or the second predicted pose. These are camera-in-camera parameters. The corner points are represented by three-dimensional coordinates. The projection function is defined as follows: .

9. The lawnmower visual data processing method according to claim 1, characterized in that, After calculating the PnP pose of the first-view image of the current frame, the method further includes: Calculate the reprojection error e for each corner point with three-dimensional coordinates: ;in, These are the coordinates of the corner points in the first-view image of the current frame. and The rotation matrix and translation vector are calculated by the PnP algorithm; If the reprojection error is greater than the preset reprojection threshold Corner points are removed as outer points, and the remaining corner points are used as inner points to calculate the inner point condition; the inner point condition includes: the number of inner points is greater than a preset inner point number threshold. Furthermore, the ratio of the number of interior points to the number of corner points with three-dimensional coordinates in the current frame is greater than a preset ratio threshold. .

10. The lawnmower visual data processing method according to claim 1, characterized in that, The deviation between the PnP pose and the second predicted pose is less than a preset deviation threshold, specifically: ;in, Let be the translation vector of the PnP pose. Let be the translation vector of the second predicted pose. This is a preset distance threshold.

11. The lawnmower visual data processing method according to claim 4, characterized in that, The step of determining whether the current frame's first-view image is a keyframe includes: Obtain the first-view image pose of the previous keyframe ; Calculate the pose of the first-view image in the current frame. Positional distance between the first-view image pose of the previous keyframe and rotation distance ; ； ; where tr() represents the trace of the matrix and arccos() represents the inverse cosine function; If one of the following conditions is met, the current frame's first-person perspective image is determined to be a keyframe: The location distance D KF Greater than the preset keyframe distance threshold KF dist ; The rotation distance is greater than a preset keyframe angle threshold; The number of successfully tracked corner points in the first-view image of the current frame is less than the preset keyframe corner point number threshold.

12. The lawnmower visual data processing method according to claim 4, characterized in that, The step of updating the 3D coordinates of the corner points using the matching relationship between the first-view image and the second-view image of the current frame includes: Using the coordinates of the corner points in the first-view image of the current frame as the initial values of the corresponding corner points in the second-view image of the current frame, the coordinates of the same-name corner points in the second-view image of the current frame are obtained by tracking through the optical flow algorithm. For each corner point, obtain the coordinates of the corresponding corner points in each frame of the image and the pose of each frame of the image, and calculate the three-dimensional coordinates of the corner point through the triangulation algorithm. Calculate the reprojection error of the corner point in the first-view and second-view images of each frame, and the z-coordinate in the corresponding camera coordinate system of each frame. If all reprojection errors are less than the preset triangulation reprojection threshold and all z-coordinates are greater than the preset depth threshold, then record the three-dimensional coordinates of that corner point.

13. A computer storage medium, characterized in that, The computer storage medium stores a computer program, which, when executed by a processor, implements the lawnmower visual data processing method as described in any one of claims 1 to 12.