Orchard robot binocular vision inertial positioning method, device, equipment and medium
By using binocular camera line feature detection and multi-source information fusion, the positioning accuracy problem of orchard robots in environments with missing textures and rugged roads has been solved, achieving higher precision orchard robot positioning and supporting intelligent agriculture.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTH CHINA AGRICULTURAL UNIVERSITY
- Filing Date
- 2024-01-12
- Publication Date
- 2026-06-23
AI Technical Summary
The problem of decreased positioning accuracy of orchard robots in areas with missing textures and rugged road environments is difficult to be effectively solved by existing visual-inertial positioning methods.
Line feature detection was performed using a binocular camera. The minimum significant difference method and line feature length screening algorithm were combined to extract and match line features from orchard images. Inertial positioning was then performed through triangulation and multi-source information fusion.
It improves the positioning accuracy of orchard robots, reduces the impact of texture loss areas, optimizes pose estimation, provides more accurate camera pose data, and supports intelligent agriculture.
Smart Images

Figure CN117870661B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of binocular visual inertial positioning, and more particularly to a binocular visual inertial positioning method, corresponding device, electronic device and computer-readable storage medium for an orchard robot. Background Technology
[0002] With the rapid development of the agricultural robot industry, orchard robots have gradually become one of the research hotspots in the field of robotics, with broad application prospects in orchard production, such as planting, weeding, disease treatment, and harvesting. In orchard operations, accurate positioning helps orchard robots achieve autonomous navigation and precise harvesting. Therefore, accurately obtaining real-time positioning information of robots in the orchard is a necessary condition for realizing intelligent and automated orchard operations.
[0003] Currently, the main localization methods for orchard robots include GNSS (Global Navigation Satellite System), lidar, and vision-based methods. In densely shaded orchard environments, GNSS-based methods are easily affected by tree canopies and other obstacles, making it difficult to provide long-term reliable positioning information. While lidar-based methods offer advantages such as wide coverage, high resolution, and high positioning accuracy, their high cost increases the hardware cost of orchard robots, limiting their commercial application.
[0004] Vision-based localization methods can capture rich features from images and calculate accurate localization information usable by robots. Their cost-effectiveness makes them significant for orchard robot localization and holds broad application prospects. However, current vision-based localization methods face challenges in orchard environments. Mainstream visual-inertial localization algorithms, including VINS-Fuison, generally employ feature point detection for localization. Orchards often have areas with missing textures, such as tree trunks and corners of fruit tree rows, resulting in poor quality of detected image features. Furthermore, the uneven terrain of orchard roads causes severe camera shake during operation, increasing the difficulty of feature point matching and tracking, and ultimately reducing the localization accuracy of the algorithm in orchards.
[0005] In summary, existing technologies address issues such as texture loss in orchards leading to poor image feature quality, and uneven orchard roads causing camera shake during operation, resulting in decreased positioning accuracy. The applicant has explored solutions to these problems. Summary of the Invention
[0006] The purpose of this application is to solve the above-mentioned problems by providing a binocular vision inertial positioning method for orchard robots, a corresponding device, electronic equipment, and a computer-readable storage medium.
[0007] To achieve the various objectives of this application, the following technical solution is adopted:
[0008] A binocular vision-inertial localization method for an orchard robot, proposed to meet one of the purposes of this application, includes:
[0009] Acquire orchard images from the left or right eye of a binocular camera, determine corresponding point feature matching pairs for the orchard images based on a preset point feature detection algorithm, extract line features from the orchard images based on the minimum significant difference method, and determine corresponding line features for the orchard images.
[0010] The line features are filtered by length based on a preset line feature length filtering algorithm to determine line features with a preset length threshold. Based on the line features with the preset length threshold, a preset line feature matching filtering algorithm is used to match the line features of the orchard image before and after the left or right view to determine the line feature matching pairs.
[0011] Keyframes of the orchard image are extracted based on point feature matching pairs and line feature matching pairs to determine the keyframes in the orchard image.
[0012] The line features of the keyframes are triangulated to construct line feature reprojection residuals. The cost functions of the line feature reprojection residuals, prior residuals, IMU residuals, and point feature reprojection residuals are fused to minimize the cost functions, thereby completing the binocular vision inertial localization of the orchard robot.
[0013] Optionally, the step of extracting line features from the orchard image based on the minimum significant difference method to determine the corresponding line features of the orchard image includes:
[0014] Pixels with similar gradient angles are grouped into line segment support regions. Each time a new pixel is detected, it is added to the support region. The direction of the support region is continuously updated iteratively to extract the line features of the orchard image.
[0015] Optionally, the step of filtering the line features based on a preset line feature length filtering algorithm to determine line features with a preset length threshold includes:
[0016] The line feature length filtering algorithm is as follows:
[0017] ,
[0018] Among them, L min This represents the length threshold, where μ is the scaling factor. To round down, W img and H img These represent the width and height of the input image resolution, respectively.
[0019] Optionally, the step of using a preset line feature matching and filtering algorithm to perform line feature matching on the preceding and following frames of the orchard image for the left or right view based on the line features of the preset length threshold to determine line feature matching pairs includes:
[0020] For the same line segment feature in two consecutive image frames of an orchard image, the descriptor in the current image frame is... In the previous image frame descriptor The similarity of line feature matching pairs is calculated using the Hamming distance formula;
[0021] Detect whether the similarity of the line feature matching pairs is lower than a preset similarity threshold. If it is lower, remove the line features that are lower than the preset similarity threshold.
[0022] The starting point for calculating line features is the vector connecting the current frame and the previous frame. And the endpoint of the line feature is the vector connecting the current frame and the previous frame image. Calculate the determined vector and The square of the modulus and Determine the and If the preset threshold is exceeded, the line feature matching pairs corresponding to the vectors exceeding the preset threshold are removed. Line feature matching is performed on the previous and next frames of the orchard image of the left or right view to determine the line feature matching pairs.
[0023] Optionally, the step of extracting keyframes from the orchard image based on point feature matching pairs and line feature matching pairs to determine keyframes in the orchard image includes:
[0024] The matched point features and line features are divided into ε p and ε l The characteristic ratio is calculated as follows:
[0025] ,
[0026] Where, ε l and ε p F represents the number of matched line features and point features, respectively. total The total number of features, To round down;
[0027] The average disparity of the detected line features is calculated as follows:
[0028] ,
[0029] Where du and dv represent the differences in the horizontal and vertical directions of the features between the two image frames, respectively, and LF total The number of line features involved in the calculation of disparity.
[0030] Optionally, the step of fusing the cost functions of the line feature reprojection residual with the prior residual, IMU residual, and point feature reprojection residual to minimize the cost function, in order to complete the binocular vision inertial localization of the orchard robot, includes:
[0031] Determine the reprojection residuals of the line features, and add the line feature information to the sliding window state variables as follows:
[0032] χ=[x n x n+1 , ..., x n+N , λ p , λ p+1 ,...,λ p+P O l O l+1 , ..., O l+L ],
[0033] Where x, λ, and O are the information of IMU, point features, and line features, respectively; n, p, and l are the indices of IMU, point features, and line features; and N, P, and L are the number of IMU, point features, and line features observed in the sliding window.
[0034] Based on the cost function that minimizes all information, the optimized state variables in the sliding window are as follows:
[0035] ,
[0036] Among them, e prior As prior information, e imu For measuring residuals of the IMU, e point e is the reprojection residual of point features line The feature reprojection residual is the linear feature.
[0037] Optionally, the step of triangulating the line features of the keyframe to construct the line feature reprojection residual includes:
[0038] The line features of the keyframe are triangulated to obtain the line feature space coordinates, which are then transformed into the camera coordinate system.
[0039] Project the straight line in the camera coordinate system onto the image plane, calculate the distance between the projection line 1 and the endpoint of the matching line segment, and then project the straight line L... w The calculation process for transforming from the world coordinate system to the camera coordinate system is as follows:
[0040] ,
[0041] Among them, L c For L w In Π c Plück coordinates in R cw Let t be a rotation matrix. cw It is a translation matrix;
[0042] L c Projection line 1, obtained by projecting from the camera coordinate system onto the image plane, is:
[0043] l = [l1, l2, l3] = Kn c ,
[0044] Where K is the projection matrix of the line feature;
[0045] The reprojection residual of the determined line feature is calculated as follows:
[0046] ,
[0047] Where m is the homogeneous coordinate of the midpoint of the line feature.
[0048] A binocular vision inertial positioning device for an orchard robot, provided for another purpose of this application, includes:
[0049] The line feature determination module is configured to acquire orchard images from the left or right eye of a binocular camera, determine the corresponding point feature matching pairs of the orchard images according to a preset point feature detection algorithm, extract line features from the orchard images based on the minimum significant difference method, and determine the corresponding line features of the orchard images.
[0050] The line feature matching pair determination module is configured to perform length filtering on the line features based on a preset line feature length filtering algorithm to determine line features with a preset length threshold, and to perform line feature matching on the previous and next frames of the orchard image of the left or right eye using a preset line feature matching filtering algorithm based on the line features with the preset length threshold to determine line feature matching pairs.
[0051] The keyframe determination module is configured to extract keyframes from the orchard image based on point feature matching pairs and line feature matching pairs, so as to determine the keyframes in the orchard image.
[0052] The inertial localization module is configured to triangulate the line features of the keyframes to construct line feature reprojection residuals, and fuse the cost functions of the line feature reprojection residuals with the prior residuals, IMU residuals and point feature reprojection residuals to minimize the cost functions, so as to complete the binocular vision inertial localization of the orchard robot.
[0053] An electronic device provided for another purpose of this application includes a central processing unit and a memory, the central processing unit being configured to invoke and run a computer program stored in the memory to perform the steps of the orchard robot binocular vision inertial positioning method of this application.
[0054] A computer-readable storage medium is provided for another purpose of this application, which stores, in the form of computer-readable instructions, a computer program implemented according to the orchard robot binocular vision inertial positioning method, which, when called by a computer, executes the steps included in the corresponding method.
[0055] Compared to existing technologies, this application addresses the problems in existing technologies, such as areas in orchards often exhibiting texture loss, leading to poor quality of detected image features, and the uneven terrain within orchards causing severe camera shake during operation, resulting in decreased positioning accuracy of the algorithm in the orchard. This application includes, but is not limited to, the following beneficial effects:
[0056] Firstly, the orchard robot binocular vision inertial localization method of this application uses the LSD algorithm to detect line features in the image, and optimizes the image line feature detection process by using the line segment length screening algorithm and the feature matching screening algorithm to remove inferior line features and LBD descriptors, thereby improving the reliability of line feature matching.
[0057] Secondly, the binocular vision inertial localization method for orchard robots in this application greatly reduces the redundancy of keyframes caused by texture loss areas, performs triangulation processing on line features, constructs line feature reprojection residuals and adds a sliding window to provide line feature constraints for the localization system, further optimizes the system's pose estimation, and significantly improves the localization accuracy of the orchard robot.
[0058] Furthermore, this application can quickly and accurately extract more effective line features and fuse them with point features and IMU pre-integration data to obtain globally consistent camera pose data. This can effectively avoid the problem of reduced positioning accuracy of the orchard robot's binocular camera in the orchard due to the unevenness of the roads in the orchard, greatly improve the quality of image detection, and lay a solid foundation for intelligent agriculture. Attached Figure Description
[0059] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0060] Figure 1 This is a flowchart illustrating the binocular vision inertial positioning method for orchard robots in an embodiment of this application.
[0061] Figure 2This is an exemplary network architecture used in the orchard robot binocular vision inertial localization method of this application;
[0062] Figure 3 This is a schematic diagram of the transformation and projection of line features in the embodiments of this application;
[0063] Figure 4 This is a schematic diagram illustrating the calculation of the reprojection residual of the line feature in an embodiment of this application;
[0064] Figure 5 This is a schematic diagram of the orchard robot binocular vision inertial positioning device in the embodiments of this application;
[0065] Figure 6 This is a schematic diagram of the structure of the computer device in the embodiments of this application. Detailed Implementation
[0066] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.
[0067] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this application means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.
[0068] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.
[0069] Those skilled in the art will understand that the terms "client," "terminal," and "terminal device" as used herein include both devices that receive wireless signals, devices that only possess wireless signal receiver capabilities without transmission capabilities, and devices with receiving and transmitting hardware, devices that have receiving and transmitting hardware capable of bidirectional communication over a bidirectional communication link. Such devices may include: cellular or other communication devices such as personal computers or tablets, having single-line displays, multi-line displays, or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service) that can combine voice, data processing, fax, and / or data communication capabilities; PDA (Personal Digital Assistant) that may include a radio frequency receiver, pager, internet / intranet access, web browser, notepad, calendar, and / or GPS (Global Positioning System) receiver; and conventional laptops and / or handheld computers or other devices that have and / or include radio frequency receivers. As used herein, "client," "terminal," and "terminal device" can be portable, transportable, installed in a means of transportation (air, sea, and / or land), or suitable and / or configured to operate locally and / or in a distributed manner, operating in any other location on Earth and / or in space. "Client," "terminal," and "terminal device" as used herein can also be a communication terminal, an internet access terminal, or a music / video playback terminal, such as a PDA, a MID (Mobile Internet Device), and / or a mobile phone with music / video playback capabilities, or a smart TV, set-top box, etc.
[0070] The hardware referred to by the names "server," "client," and "service node" in this application is essentially an electronic device with the equivalent capabilities of a personal computer. It is a hardware device with the necessary components revealed by the von Neumann architecture, such as a central processing unit (including an arithmetic logic unit and a control unit), memory, input devices, and output devices. The computer program is stored in its memory, and the central processing unit loads the program stored in the secondary storage into the main memory to run it, execute the instructions in the program, and interact with the input and output devices to complete specific functions.
[0071] It should be noted that the concept of "server" used in this application can also be extended to the case of server clusters. Based on the network deployment principles understood by those skilled in the art, the servers should be logically divided. Physically, these servers can be independent of each other but accessible through interfaces, or they can be integrated into a single physical computer or a computer cluster. Those skilled in the art should understand this flexibility and should not use it to constrain the implementation of the network deployment method in this application.
[0072] One or more of the technical features of this application, unless explicitly specified herein, can be deployed on a server and accessed by a client remotely calling the online service interface provided by the server, or can be directly deployed and run on a client for access.
[0073] Unless otherwise specified, the neural network models referenced or potentially referenced in this application may be deployed on a remote server and invoked remotely on the client, or deployed on a client with the capability to invoke directly. In some embodiments, when running on the client, the corresponding intelligence may be acquired through transfer learning in order to reduce the requirements on the client's hardware resources and avoid excessive consumption of the client's hardware resources.
[0074] Unless otherwise specified, all data involved in this application may be stored remotely on a server or on a local terminal device, as long as it is suitable for use by the technical solution of this application.
[0075] Those skilled in the art will understand that although the various methods in this application are described based on the same concept and thus present commonality among them, they can be performed independently unless otherwise specified. Similarly, the various embodiments disclosed in this application are all based on the same inventive concept; therefore, concepts expressed in the same way, as well as concepts that are appropriately changed for convenience but are expressed differently, should be understood equivalently.
[0076] Unless otherwise expressly stated, the various embodiments disclosed in this application can be combined in a cross-cutting manner to flexibly construct new embodiments, as long as such combination does not depart from the inventive spirit of this application and can meet the needs of the prior art or solve a certain deficiency in the prior art. Those skilled in the art should be aware of such modifications.
[0077] Please see Figure 1 In one embodiment of the orchard robot binocular vision inertial localization method of this application, the method includes:
[0078] Step S10: Obtain the orchard image from the left or right eye of the binocular camera, determine the corresponding point feature matching pair of the orchard image according to the preset point feature detection algorithm, extract the line features of the orchard image based on the minimum significant difference method, and determine the corresponding line features of the orchard image.
[0079] The terminal device in the orchard robot can acquire orchard images from the left or right eye of the binocular camera, determine the corresponding point feature matching pairs of the orchard image according to the preset point feature detection algorithm, and extract line features of the orchard image based on the minimum significant difference method to determine the corresponding line features of the orchard image.
[0080] The steps of extracting line features from the orchard image based on the minimum significant difference method and determining the corresponding line features of the orchard image include:
[0081] Pixels with similar gradient angles are grouped into line segment support regions. Each time a new pixel is detected, it is added to the support region. The direction of the support region is continuously updated iteratively to extract the line features of the orchard image.
[0082] Specifically, this application adds the Least Significant Difference (LSD) algorithm to the feature point detection to detect line segment features in the orchard image.
[0083] The LSD algorithm extracts line segment features by forming a line segment support region from pixels with similar gradient angles. Each time a new pixel is detected, it is added to the support region. The direction of the support region is iteratively updated using formula (1), as follows:
[0084]
[0085] Where i represents the region supported by line segments formed by connecting pixels with similar gradient angles, and ang represents the angle.
[0086] Step S20: Based on a preset line feature length filtering algorithm, the line features are filtered by length to determine line features with a preset length threshold. Based on the line features with the preset length threshold, a preset line feature matching filtering algorithm is used to perform line feature matching on the previous and next frames of the orchard image of the left or right view to determine line feature matching pairs.
[0087] After determining the line features corresponding to the orchard image, the line features are filtered by length based on a preset line feature length filtering algorithm to determine line features with a preset length threshold. Based on the line features with the preset length threshold, a preset line feature matching filtering algorithm is used to perform line feature matching on the previous and next frames of the orchard image of the left or right view to determine line feature matching pairs.
[0088] Since excessively short line segments cannot provide geometric constraints for localization, a line segment length filtering algorithm is proposed to filter the line features by length to determine line features with a preset length threshold, as shown below:
[0089]
[0090] Among them, L min This represents the length threshold, where μ is the scaling factor. To round down, W img and H img These represent the width and height of the input image resolution, respectively.
[0091] Furthermore, for ease of calculation, the LBD (line binary descriptor) is used to represent the characteristics of line segments. An LBD descriptor consists of n binary digits, such as LBD = {d1, d2, ... dn}. n To avoid excessive differences in LBD matching pairs between consecutive frames due to the rugged terrain and partial occlusion of leaves in the orchard, and to eliminate lower-quality matching pairs, a pre-defined line feature matching filtering algorithm is proposed to perform line feature matching on the preceding and following frames of the orchard image for the left or right view to determine the line feature matching pairs.
[0092] The steps of using a preset line feature matching and filtering algorithm to perform line feature matching on the preceding and following frames of the orchard image for the left or right view based on the preset length threshold to determine line feature matching pairs include:
[0093] Step S201: For the same line segment feature in two consecutive image frames of the orchard image, the descriptor in the current image frame is... In the previous image frame descriptor The similarity of line feature matching pairs is calculated using the Hamming distance formula;
[0094] Step S203: Detect whether the similarity of the line feature matching pairs is lower than a preset similarity threshold. If it is lower, remove the line features that are lower than the preset similarity threshold.
[0095] Step S205: Calculate the vector connecting the starting point of the line feature in the current frame and the previous frame image. And the endpoint of the line feature is the vector connecting the current frame and the previous frame image. Calculate the determined vector and The square of the modulus and Determine the and If the preset threshold is exceeded, the line feature matching pairs corresponding to the vectors exceeding the preset threshold are removed. Line feature matching is performed on the previous and next frames of the orchard image of the left or right view to determine the line feature matching pairs.
[0096] Specifically, for the same line segment feature in two consecutive image frames of an orchard image, the descriptor in the current image frame is... In the previous image frame descriptor The similarity H of the matched pairs is calculated using Hamming distance, and is expressed as follows:
[0097]
[0098] After calculating the similarity, line segment features that are below the preset similarity threshold are removed;
[0099] To ensure that the positional offset of the corresponding line segment in the image is within a certain range, the vector connecting the starting point of the line segment feature l in the current frame and the previous frame is calculated using the start and end points of the line segment feature l. And the endpoint of the line feature is the vector connecting the current frame and the previous frame image. It is expressed as follows:
[0100]
[0101] in, Let l be the vector connecting the starting points of the line feature l across two frames of images. Let l be the vector connecting the endpoints of the line feature l across two frames of images. The line feature is at the starting point of the previous frame. The starting point of the line feature in the current frame. The endpoint of the line feature in the previous frame. The endpoint of the line feature in the current frame.
[0102] Calculate vectors and square of the modulus and Determine whether the value exceeds a preset threshold ν, and discard matching pairs corresponding to vectors that are greater than the preset threshold ν.
[0103] Step S30: Extract keyframes from the orchard image based on point feature matching pairs and line feature matching pairs to determine the keyframes in the orchard image;
[0104] Improved keyframe standard: The current frame number is less than 2, F r Less than the set threshold, or the number of new features is greater than α*ε l +β*ε pα and β are the scaling coefficients of the features, and the average disparity of point and line features in the current frame and the latest keyframe is large enough.
[0105] The step of extracting keyframes from the orchard image based on point feature matching pairs and line feature matching pairs to determine the keyframes in the orchard image specifically includes:
[0106] The matched point features and line features are divided into ε p and ε l The characteristic ratio is calculated as follows:
[0107] ,
[0108] Where, ε l and ε p F represents the number of matched line features and point features, respectively. total The total number of features, To round down;
[0109] The average disparity of the detected line features is calculated as follows:
[0110] ,
[0111] Where du and dv represent the differences in the horizontal and vertical directions of the features between the two image frames, respectively, and LF total The number of line features involved in the calculation of disparity.
[0112] Keyframes in the orchard image are determined based on the average disparity of the line features.
[0113] Step S40: Triangulate the line features of the keyframe to construct the line feature reprojection residual. Fuse the cost function of the line feature reprojection residual with the prior residual, IMU residual and point feature reprojection residual to minimize the cost function, so as to complete the binocular vision inertial localization of the orchard robot.
[0114] The step of triangulating the line features of the keyframe to construct the line feature reprojection residual includes:
[0115] The line features of the keyframe are triangulated to obtain the line feature space coordinates, which are then transformed into the camera coordinate system.
[0116] Project the straight line in the camera coordinate system onto the image plane, calculate the distance between the projection line 1 and the endpoint of the matching line segment, and then project the straight line L... w The calculation process for transforming from the world coordinate system to the camera coordinate system is as follows:
[0117] ,
[0118] Among them, Lc For L w In П c Plück coordinates in R cw Let t be a rotation matrix. cw It is a translation matrix;
[0119] L c Projection line 1, obtained by projecting from the camera coordinate system onto the image plane, is:
[0120] l = [l1, l2, l3] = Kn c ,
[0121] Where K is the projection matrix of the line feature;
[0122] The reprojection residual of the determined line feature is calculated as follows:
[0123] ,
[0124] Where m is the homogeneous coordinate of the midpoint of the line feature.
[0125] The steps of fusing the cost functions of the line feature reprojection residual with the prior residual, IMU residual, and point feature reprojection residual to minimize the cost function, in order to complete the binocular vision inertial localization of the orchard robot, include:
[0126] Determine the reprojection residuals of the line features, and add the line feature information to the sliding window state variables as follows:
[0127] χ=[x n x n+1 , ..., x n+N , λ p , λ p+1 ,...,λ p+P O l O l+1 , ..., O l+L ],
[0128] Where x, λ, and O are the information of IMU, point features, and line features, respectively; n, p, and l are the indices of IMU, point features, and line features; and N, P, and L are the number of IMU, point features, and line features observed in the sliding window.
[0129] Based on the cost function that minimizes all information, the optimized state variables in the sliding window are as follows:
[0130] ,
[0131] Among them, e prior As prior information, e imu For measuring residuals of the IMU, e point e is the reprojection residual of point featuresline The feature reprojection residual is the linear feature.
[0132] After the line feature detection at the front end, in order to obtain the position of the line feature in the world coordinate system and construct the line feature reprojection residual, the line feature needs to be triangulated.
[0133] For line features, the Plück coordinate system is used to represent spatial lines during triangulation, which facilitates the transformation and projection of line features. Please refer to [link to relevant documentation]. Figure 3 Given a straight line L in space w ∈Π w Plück coordinates can be used Describe it, in which Let be the normal vector between the original coordinates and the plane containing the line. For L w The direction vector determined by the two endpoints.
[0134] Please see Figure 4 When observing line L from two different camera positions C1 and C2 w Based on three points, a plane is determined. Two planes are obtained from the optical center of the linear camera and the endpoints of the line's projection onto the camera plane, respectively: Π1 = (c1, L...). w ) and Π2=(c2,L w From this, we can obtain the dual Plück matrix:
[0135]
[0136] in[·] x Represents an antisymmetric matrix. This is derived from the dual Plück matrix L. * Able to obtain straight line L w Plück coordinates.
[0137] To obtain the reprojection residual of the line features, firstly, the triangulated line feature space coordinates are transformed to the camera coordinate system. Secondly, the line in the camera coordinate system is projected onto the image plane, and the distance between the projection line 1 and the endpoints of the matching line segment is calculated. First, the line L... w From the world coordinate system Π w Transform to camera coordinate system Π c The specific calculation process is as follows:
[0138]
[0139] Where L c For L w In Π c Plück coordinates in R cw Let t be a rotation matrix. cw It is a translation matrix.
[0140] Lc Projection line 1 is obtained by projecting from the camera coordinate system onto the image plane:
[0141] l = [l1, l2, l3] = Kn c (9)
[0142] Where K is the projection matrix of the line feature.
[0143] The final computable linear feature reprojection residual is:
[0144]
[0145] Where m is the homogeneous coordinate of the midpoint of the line feature.
[0146] After obtaining the reprojection residuals of the line features, the line feature information is added to the original sliding window state variables of the algorithm:
[0147] χ=[x n x m+1 , ..., x n+N , λ p , λ p+1 ,...,λ p+P O l O l+1 , ..., O l+L (11)
[0148] Where x, λ, and O are the information of IMU, point features, and line features, respectively; n, p, and l are the indices of IMU, point features, and line features; and N, P, and L are the number of IMU, point features, and line features observed in the sliding window.
[0149] Finally, by minimizing the cost function of each piece of information, all state variables in the sliding window are optimized:
[0150]
[0151] Where e prior As prior information, e imu For measuring residuals of the IMU, e point e is the reprojection residual of point features line The feature reprojection residual is the linear feature.
[0152] As can be seen from the above embodiments, compared with the prior art, this application addresses the problems in the prior art such as areas with missing textures in orchards, resulting in poor quality of detected image features, and the uneven roads in orchards causing severe camera shaking during operation, leading to a decrease in the positioning accuracy of the algorithm in the orchard. This application includes, but is not limited to, the following beneficial effects:
[0153] Firstly, the orchard robot binocular vision inertial localization method of this application uses the LSD algorithm to detect line features in the image, and optimizes the image line feature detection process by using the line segment length screening algorithm and the feature matching screening algorithm to remove inferior line features and LBD descriptors, thereby improving the reliability of line feature matching.
[0154] Secondly, the binocular vision inertial localization method for orchard robots in this application greatly reduces the redundancy of keyframes caused by texture loss areas, performs triangulation processing on line features, constructs line feature reprojection residuals and adds a sliding window to provide line feature constraints for the localization system, further optimizes the system's pose estimation, and significantly improves the localization accuracy of the orchard robot.
[0155] Furthermore, this application can quickly and accurately extract more effective line features and fuse them with point features and IMU pre-integrated data to obtain globally consistent camera pose data. This can effectively avoid the problem of reduced positioning accuracy of the orchard robot's binocular camera in the orchard due to the unevenness of the roads, greatly improve the quality of image detection, and lay a solid foundation for intelligent agriculture.
[0156] Please see Figure 5 This application provides a binocular vision-inertial positioning device for an orchard robot, comprising a line feature determination module 1100, a line feature matching pair determination module 1200, a keyframe determination module 1300, and an inertial positioning module 1400. The line feature determination module 1100 is configured to acquire an orchard image from the left or right view in the binocular camera, determine corresponding point feature matching pairs for the orchard image based on a preset point feature detection algorithm, extract line features from the orchard image based on the minimum significant difference method, and determine corresponding line features for the orchard image. The line feature matching pair determination module 1200 is configured to perform length filtering on the line features based on a preset line feature length filtering algorithm to determine line features with a preset length threshold, and apply a preset line feature matching filtering algorithm to the left or right view based on the line features with the preset length threshold. Line feature matching is performed on the preceding and following frames of the target orchard image to determine line feature matching pairs; the keyframe determination module 1300 is configured to extract keyframes of the orchard image based on point feature matching pairs and line feature matching pairs to determine keyframes in the orchard image; the inertial localization module 1400 is configured to triangulate the line features of the keyframes to construct line feature reprojection residuals, and fuse the cost functions of the line feature reprojection residuals with the prior residuals, IMU residuals and point feature reprojection residuals to minimize the cost functions, so as to complete the binocular vision inertial localization of the orchard robot.
[0157] Based on any embodiment of this application, please refer to Figure 6Another embodiment of this application also provides an electronic device, which can be implemented by a computer device, such as... Figure 6 The diagram shows the internal structure of a computer device. This computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected via a system bus. The computer-readable storage medium stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the processor executes the computer-readable instructions, it enables the processor to implement a binocular vision-inertial positioning method for an orchard robot. The processor provides computational and control capabilities, supporting the operation of the entire computer device. The memory stores computer-readable instructions, which, when executed by the processor, enable the processor to execute the binocular vision-inertial positioning method for an orchard robot as described in this application. The network interface of the computer device is used for communication with a terminal. Those skilled in the art will understand that… Figure 6 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0158] In this embodiment, the processor is used to execute... Figure 5 The system contains the specific functions of each module and its sub-modules, and the memory stores the program code and various data required to execute these modules or sub-modules. The network interface is used for data transmission between the user terminal and the server. In this embodiment, the memory stores the program code and data required to execute all modules / sub-modules in the orchard robot binocular vision inertial positioning device of this application, and the server can call the server's program code and data to execute the functions of all sub-modules.
[0159] This application also provides a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the orchard robot binocular vision inertial positioning method described in any embodiment of this application.
[0160] This application also provides a computer program product, including a computer program / instructions that, when executed by one or more processors, implement the steps of the orchard robot binocular vision inertial positioning method described in any embodiment of this application.
[0161] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments of this application can be implemented by a computer program instructing related hardware. This computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. The aforementioned storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0162] The above description is only a partial embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
[0163] In summary, this application can quickly and accurately extract more effective line features and fuse them with point features and IMU pre-integration data to obtain globally consistent camera pose data. This effectively avoids the problem of reduced positioning accuracy of the orchard robot's binocular camera in the orchard due to the uneven terrain, greatly improving the quality of image detection and laying a solid foundation for intelligent agriculture.
Claims
1. A binocular vision-inertial positioning method for an orchard robot, characterized in that, include: Acquire orchard images from the left or right eye of a binocular camera, determine corresponding point feature matching pairs for the orchard images based on a preset point feature detection algorithm, extract line features from the orchard images based on the minimum significant difference method, and determine corresponding line features for the orchard images. The line features are filtered by length based on a preset line feature length filtering algorithm to determine line features with a preset length threshold. Based on the line features with the preset length threshold, a preset line feature matching filtering algorithm is used to match the line features of the orchard image before and after the left or right view to determine the line feature matching pairs. Keyframes of the orchard image are extracted based on point feature matching pairs and line feature matching pairs to determine the keyframes in the orchard image. The line features of the keyframes are triangulated to construct line feature reprojection residuals, which includes: The line features of the keyframe are triangulated to obtain the line feature space coordinates, which are then transformed into the camera coordinate system. Project the straight lines in the camera coordinate system onto the image plane and calculate the projection lines. The distance to the endpoints of the matching line segment will make the line... The calculation process for transforming from the world coordinate system to the camera coordinate system is as follows: , in, for exist Plück coordinates in For rotation matrix, It is a translation matrix; Will Projection lines are obtained by projecting from the camera coordinate system onto the image plane. for: , in, The projection matrix of the line feature; The reprojection residual of the determined line feature is calculated as follows: , in, The homogeneous coordinates of the midpoint of the line feature; The cost function of the line feature reprojection residual is fused with the cost function of the prior residual, IMU residual, and point feature reprojection residual to minimize the cost function, thereby completing the binocular vision inertial localization of the orchard robot, which includes: Determine the reprojection residuals of the line features, and add the line feature information to the sliding window state variables as follows: , in, , , Information from IMU, point features, and line features, respectively. , , Indexing IMU, point features, and line features. , , This represents the number of IMUs, point features, and line features observed within the sliding window. Based on the cost function that minimizes all information, the optimized state variables in the sliding window are as follows: , in, As prior information, To measure residuals for IMU, For point feature reprojection residuals, The feature reprojection residual is the linear feature.
2. The orchard robot binocular vision inertial positioning method according to claim 1, characterized in that, The steps of extracting line features from the orchard image based on the minimum significant difference method and determining the corresponding line features of the orchard image include: Pixels with similar gradient angles are grouped into line segment support regions. Each time a new pixel is detected, it is added to the support region. The direction of the support region is continuously updated iteratively to extract the line features of the orchard image.
3. The orchard robot binocular vision inertial positioning method according to claim 1, characterized in that, The step of using a preset line feature length filtering algorithm to filter the line features to determine line features with a preset length threshold includes: The line feature length filtering algorithm is as follows: , Among them, L min This represents the length threshold, where μ is the scaling factor. To round down, and These represent the width and height of the input image resolution, respectively.
4. The orchard robot binocular vision inertial positioning method according to claim 1, characterized in that, The steps of using a preset line feature matching and filtering algorithm to perform line feature matching on the preceding and following frames of the orchard image for the left or right view based on the preset length threshold to determine line feature matching pairs include: For the same line segment feature in two consecutive image frames of an orchard image, the descriptor in the current image frame is... In the previous image frame descriptor, The similarity of line feature matching pairs is calculated using the Hamming distance formula; Detect whether the similarity of the line feature matching pairs is lower than a preset similarity threshold. If it is lower, remove the line features that are lower than the preset similarity threshold. The starting point for calculating line features is the vector connecting the current frame and the previous frame. And the endpoint of the line feature is the vector connecting the current frame and the previous frame image. Calculate and determine the vector and The square of the modulus and Determine the and If the preset threshold is exceeded, the line feature matching pairs corresponding to the vectors exceeding the preset threshold are removed. Line feature matching is performed on the previous and next frames of the orchard image of the left or right view to determine the line feature matching pairs.
5. The orchard robot binocular vision inertial positioning method according to claim 1, characterized in that, The step of extracting keyframes from the orchard image based on point feature matching pairs and line feature matching pairs to determine the keyframes in the orchard image includes: The matched point features and line features are divided into and The characteristic ratio is calculated as follows: , in, and These represent the number of matched line features and point features, respectively. The total number of features, To round down; The average disparity of the detected line features is calculated as follows: , in, , These represent the differences in features in the horizontal and vertical directions between the two image frames, respectively. The number of line features involved in the calculation of disparity.
6. A binocular vision inertial positioning device for an orchard robot, characterized in that, include: The line feature determination module is configured to acquire orchard images from the left or right eye of a binocular camera, determine the corresponding point feature matching pairs of the orchard images according to a preset point feature detection algorithm, extract line features from the orchard images based on the minimum significant difference method, and determine the corresponding line features of the orchard images. The line feature matching pair determination module is configured to perform length filtering on the line features based on a preset line feature length filtering algorithm to determine line features with a preset length threshold, and to perform line feature matching on the previous and next frames of the orchard image of the left or right eye using a preset line feature matching filtering algorithm based on the line features with the preset length threshold to determine line feature matching pairs. The keyframe determination module is configured to extract keyframes from the orchard image based on point feature matching pairs and line feature matching pairs, so as to determine the keyframes in the orchard image. An inertial positioning module is configured to triangulate the line features of the key frame to construct a line feature reprojection residual, which includes: triangulating the line features of the key frame to obtain the line feature spatial coordinates, and transforming them to the camera coordinate system. Project the straight lines in the camera coordinate system onto the image plane and calculate the projection lines. The distance to the endpoints of the matching line segment will make the line... The calculation process for transforming from the world coordinate system to the camera coordinate system is as follows: , in, for exist Plück coordinates in For rotation matrix, It is a translation matrix; Will Projection lines are obtained by projecting from the camera coordinate system onto the image plane. for: , in, The projection matrix of the line feature; The reprojection residual of the determined line feature is calculated as follows: , in, The homogeneous coordinates of the midpoint of the line feature; The cost function of the line feature reprojection residual is fused with the cost function of the prior residual, IMU residual, and point feature reprojection residual to minimize the cost function, thereby completing the binocular vision inertial localization of the orchard robot, which includes: Determine the reprojection residuals of the line features, and add the line feature information to the sliding window state variables as follows: , in, , , Information from IMU, point features, and line features, respectively. , , Indexing IMU, point features, and line features. , , This represents the number of IMUs, point features, and line features observed within the sliding window. Based on the cost function that minimizes all information, the optimized state variables in the sliding window are as follows: , in, As prior information, To measure residuals for IMU, For point feature reprojection residuals, The feature reprojection residual is the linear feature.
7. An electronic device comprising a central processing unit and a memory, characterized in that, The central processing unit is used to invoke and run a computer program stored in the memory to perform the steps of the method as described in any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that, It stores, in the form of computer-readable instructions, a computer program implemented according to any one of claims 1 to 5, which, when invoked by a computer, executes the steps included in the corresponding method.