A fruit recognition method of an intelligent picking robot in a dense planting environment
By acquiring fruit images and depth information in real time, and combining deep convolutional neural networks and hierarchical clustering algorithms, the fruit recognition and picking order are optimized, solving the problem of inaccurate recognition by intelligent picking robots in densely planted environments, and achieving efficient and accurate fruit picking.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FOSHAN INTERSTELLAR CLOUD DIGITAL TECH CO LTD
- Filing Date
- 2025-11-19
- Publication Date
- 2026-06-12
AI Technical Summary
In densely planted environments, intelligent harvesting robots struggle to accurately identify and distinguish fruits at different depths, leading to the accidental picking of nearby, less mature fruits or the omission of distant, more mature fruits. Furthermore, they are unable to effectively plan harvesting paths, impacting harvesting efficiency and potentially causing fruit damage or equipment malfunction.
By acquiring fruit images and reference depth in real time, fruit regions are divided, the saliency and spatial label of each fruit are obtained, the true saliency is adjusted, and ResNet deep convolutional neural network and Mask R-CNN model are used for feature extraction and segmentation. Hierarchical clustering algorithm is combined to optimize the picking order and update the picking target in real time until the saliency is less than the threshold to stop picking.
This improves the accuracy and efficiency of fruit picking, reduces the frequency of robotic arm movement, avoids path switching and collisions, and ensures efficient and precise picking.
Smart Images

Figure CN121569664B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent fruit harvesting technology, specifically to a fruit recognition method for an intelligent harvesting robot in a densely planted environment. Background Technology
[0002] Intelligent harvesting robots are a key development direction in the field of agricultural automation. Their core task is to autonomously identify, locate, and harvest fruits through a vision system.
[0003] Current mainstream fruit recognition technologies are mostly based on two-dimensional image features (such as color and texture) to build algorithm models. Among them, saliency algorithms are commonly used as a tool for initial fruit screening because they can highlight foreground targets through color contrast and texture differences. However, in densely planted, multi-layered structures, the contrast difference between fruits at different depths and the background is significant. For example, fruits closer to the ground occupy a larger pixel area and have richer details and textures, which can easily generate an overly strong saliency response. Fruits farther away are smaller and have less defined features, so their saliency response is greatly weakened. This can cause robots to mistakenly prioritize the identification of nearby, less mature fruits while missing distant, more mature fruits. At the same time, relying on two-dimensional images to determine the position of fruits cannot distinguish the layered relationship of fruits in three-dimensional space (such as visible fruits on the surface and fruits hidden in deep layers). This can easily lead to the robotic arm prioritizing the picking of fruits hidden in deep layers, which not only reduces picking efficiency but may also cause fruit damage or equipment failure due to collisions with branches and leaves, resulting in inaccurate and inefficient fruit picking. Summary of the Invention
[0004] To address the technical problem of inaccurate fruit recognition by intelligent harvesting robots, the present invention aims to provide a fruit recognition method for intelligent harvesting robots in densely planted environments. The specific technical solution adopted is as follows:
[0005] This invention provides a method for fruit recognition by an intelligent harvesting robot in a densely planted environment, the method comprising the following steps:
[0006] During a single fruit-picking process by an intelligent harvesting robot, real-time images of the fruit and the reference depth of each fruit in the image are acquired.
[0007] Fruit regions are divided based on the location distribution of fruits in the initial fruit image. The harvesting degree of each fruit region is obtained based on the salience, clustering, and total number of fruits in each region.
[0008] Based on the reference depth and visibility of each fruit, obtain the spatial label of each fruit; based on the salience, spatial label, and harvesting degree of each fruit in the initial fruit image, obtain the true salience of each fruit and determine the target fruit for the first harvest.
[0009] Based on the spatial label differences between the target fruit and each other fruit in the target fruit region and the preset neighboring fruit region, as well as the distance between the target fruit region and the preset neighboring fruit region, the true significance of the target fruit and other fruits in the target fruit region and the preset neighboring fruit region is adjusted, and the adjusted true significance is obtained to determine the target fruit for the second harvest. This process is repeated in real time to obtain the target fruit for harvesting until the true significance of all fruits is less than the preset significance threshold, at which point fruit harvesting stops.
[0010] Furthermore, the method for obtaining the degree of harvesting is as follows:
[0011] For any fruit region, the mean of the initial salience of all fruits in that fruit region is used as the fruit salience analysis value for that fruit region.
[0012] The ratio of the sum of the complete areas of all fruits in the fruit region to the area of the smallest circumcircle of the fruit region is used as the fruit aggregation analysis value of the fruit region.
[0013] The normalized result of the product of the fruit saliency analysis value, the fruit clustering analysis value, and the total number of fruits in the fruit region is taken as the harvesting degree of the fruit region.
[0014] Furthermore, the method for obtaining the initial salience is as follows:
[0015] For any fruit in the fruit image, the corresponding region image of the fruit is input into a ResNet deep convolutional neural network to obtain a high-level feature map.
[0016] The high-level feature map is processed by a saliency algorithm to obtain the saliency value of each pixel corresponding to the fruit.
[0017] The average saliency value of all pixels corresponding to the fruit is used as the initial saliency of the fruit.
[0018] Furthermore, the method for obtaining the complete area is as follows:
[0019] The Mask R-CNN model was used to extract the full area of each fruit in the fruit image.
[0020] Furthermore, the method for obtaining the spatial tag is as follows:
[0021] Arrange the reference depths of each fruit in the initial fruit image in ascending order to obtain a depth sequence;
[0022] The depth sequence is evenly divided into a preset number of local sequences. The local sequences are numbered sequentially from left to right, from smallest to largest. The number corresponding to each local sequence is used as the reference label for the fruit corresponding to the reference depth range in the local sequence.
[0023] The ratio of the visible area of each fruit in the fruit image to its full area is used as the visibility of each fruit.
[0024] For any fruit, when the visibility of the fruit is greater than the first preset visibility threshold, the product of the visibility of the fruit and the reference label is used as the first label adjustment value of the fruit; the difference between the reference label of the fruit and the first label adjustment value is rounded up and used as the spatial label of the fruit.
[0025] When the visibility of the fruit is less than the second preset visibility threshold, the product of the visibility of the fruit and the reference label is used as the second label adjustment value of the fruit; the result of adding the reference label of the fruit and the second label adjustment value and rounding up is used as the spatial label of the fruit.
[0026] When the visibility of the fruit is greater than or equal to the second preset visibility threshold and less than or equal to the first preset visibility threshold, the reference label of the fruit is used as the spatial label.
[0027] Furthermore, the method for obtaining the true significance is as follows:
[0028] For any given fruit, the true significance of the fruit is the sum of its initial significance, the negative correlation of its spatial label, and the harvesting rate of its fruit region, followed by normalization.
[0029] Furthermore, the method for obtaining the target fruit is as follows:
[0030] The fruit corresponding to the highest real significance is taken as the target fruit.
[0031] Furthermore, the method for obtaining the adjusted true significance is as follows:
[0032] The target fruit region is defined as the fruit region where the target fruit is located. The distance between the target fruit region and the center point of each other fruit region is obtained and used as the first distance. When the normalized first distance is less than the preset distance threshold, the corresponding other fruit regions are defined as the preset neighboring fruit regions of the target fruit region.
[0033] The result of negatively correlated and normalized first distances between the target fruit region and each of its preset neighboring fruit regions is used as the degree of influence of each preset neighboring fruit region; where the degree of influence of the target fruit region is 1.
[0034] For any fruit in the target fruit region and any fruit in its preset neighboring fruit region other than the target fruit, the spatial label difference between the target fruit and the target fruit is taken as the first difference;
[0035] The normalized result of the product of the degree of influence of the fruit region where the fruit is located and the negative correlation of the first difference is used as the significant adjustment weight of the fruit.
[0036] The product of the true significance of the fruit and the significance adjustment weight is used as the significance adjustment value of the fruit.
[0037] The sum of the true significance of the fruit and the significance adjustment value is taken as the adjusted true significance of the fruit.
[0038] Furthermore, the method for obtaining the fruit region is as follows:
[0039] Based on the distance between fruits in the initial fruit image, the fruits in the initial fruit image are clustered using a hierarchical clustering algorithm to obtain fruit clusters;
[0040] The region corresponding to each fruit cluster is taken as the fruit region.
[0041] Furthermore, the method for obtaining the reference depth is as follows:
[0042] For any fruit in the fruit image, the average depth data of all pixels corresponding to that fruit is used as the reference depth of that fruit.
[0043] The present invention has the following beneficial effects:
[0044] This invention first divides the fruit into regions based on the positional distribution of fruits in the initial fruit image, processing the fruits into blocks. This reduces the frequency of movement and path switching of the robotic arm between different regions, fundamentally lowering operating costs and avoiding priority confusion caused by independently evaluating the salience of individual fruits. Then, based on the salience, clustering, and total number of fruits in each fruit region, the harvesting degree of each fruit region is obtained, accurately reflecting the harvesting status of fruits in each region. This facilitates accurate determination of the harvesting order and improves the operational efficiency of the intelligent harvesting robot. Furthermore, based on the reference depth and visibility of each fruit, a spatial label is obtained, and the fruits are layered, further facilitating accurate determination of the harvesting order. To accurately obtain the harvesting order, the true salience of each fruit is obtained based on its salience, spatial label, and the harvesting degree of its respective fruit region in the initial fruit image, accurately reflecting the salience of each fruit in the initial fruit image. The system prioritizes fruit harvesting to accurately determine the target fruit for the first harvest, establishing an initial static target fruit. Considering that fruit harvesting alters the visual structure and spatial topology of the planting scene, to accurately acquire the target fruit in real-time, the system adjusts the true salience of the target fruit's region and other fruits in the preset neighboring regions based on the spatial label differences between the target fruit and its region, as well as the distance between these regions. This adjusted true salience allows for accurate and efficient determination of the target fruit for the second harvest, and so on, acquiring the target fruit in real-time. This approach accurately captures the impact of scene changes on true salience while minimizing the computational scope, ensuring real-time performance. Harvesting stops when the true salience of all fruits falls below a preset threshold, effectively improving the accuracy and efficiency of fruit harvesting. Attached Figure Description
[0045] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0046] Figure 1 This is a schematic flowchart illustrating a fruit recognition method for an intelligent harvesting robot in a densely planted environment, as provided in one embodiment of the present invention.
[0047] Figure 2 A flowchart illustrating a method for obtaining the degree of harvesting according to an embodiment of the present invention;
[0048] Figure 3 This is a structural diagram of a fruit recognition system for an intelligent harvesting robot in a densely planted environment, provided in one embodiment of the present invention.
[0049] Figure 4 This is a schematic diagram of a computer device provided according to an embodiment of the present invention. Detailed Implementation
[0050] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of a fruit identification method for an intelligent harvesting robot in a densely planted environment proposed according to the present invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.
[0051] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0052] The following description, in conjunction with the accompanying drawings, details the specific scheme of a fruit recognition method for an intelligent harvesting robot in a densely planted environment provided by the present invention.
[0053] Example 1:
[0054] This invention proposes a fruit recognition method for intelligent harvesting robots in densely planted environments. Please refer to [link / reference]. Figure 1 The diagram illustrates a schematic flowchart of a fruit recognition method for an intelligent harvesting robot in a densely planted environment, according to an embodiment of the present invention. The method includes the following steps:
[0055] Step S1: During a single fruit picking process by the intelligent picking robot, real-time images of the fruit and the reference depth of each fruit in the image are acquired.
[0056] Specifically, in densely planted environments, the fruit recognition of intelligent harvesting robots is the core technology for achieving automated fruit harvesting, which requires the hardware configuration of the intelligent harvesting robot system and specific image processing logic. The hardware components of the intelligent harvesting robot include a walking platform, drive wheels, a robotic arm, an end effector, a fruit basket, an industrial camera, an industrial computer, and a display. The walking platform is the basic load-bearing component of the system, supporting the robotic arm, fruit basket, and industrial computer, providing stable support for the overall equipment. The drive wheels are installed at the bottom of the walking platform, responsible for the movement and navigation of the intelligent harvesting robot, ensuring that it can move autonomously between planting rows and adapt to the spatial layout of dense planting. The robotic arm is installed on the walking platform, possessing multiple degrees of freedom to achieve flexible spatial movement, providing position adjustment capabilities for the end effector. The end effector is installed at the end of the robotic arm, directly performing the grasping and cutting operations of the fruit, and is the key execution component for completing the harvesting action. The fruit basket is installed on the walking platform to temporarily store the harvested fruit, preventing damage or loss and supporting continuous harvesting operations. The industrial camera is fixed to the main body of the intelligent harvesting robot for real-time acquisition of fruit images. The industrial computer and the display are both installed on the walking platform. The industrial computer is used to process the acquired image data, control the collaborative work of various hardware components, and display key operational information through the display.
[0057] Considering the dense distribution of fruits in densely planted environments, to ensure accurate fruit picking by the intelligent harvesting robot, this embodiment sets the robot to remain stationary during the picking process. For example, after the robotic arm has picked all the fruits at a certain collection point, the robot will move forward a preset distance to the next collection point and control the robotic arm to start a new round of fruit picking. The preset distance is set by the operator based on actual conditions, ensuring that no fruits are missed between adjacent collection points; however, it is not limited here. It is known that the picking control process for the intelligent harvesting robot is the same at each collection point. Therefore, this embodiment uses one collection point as an example for analysis, considering the entire fruit picking process at that point as one fruit picking process. It should be noted that all subsequent fruit picking processes refer to this single fruit picking process.
[0058] To ensure accurate and efficient fruit harvesting by the intelligent harvesting robot, this embodiment acquires fruit images in real time during the harvesting process and preprocesses these images, including denoising, contrast enhancement, and color space standardization. This eliminates random noise caused by light fluctuations and camera sensor interference, preventing noise from being mistakenly identified as fruit features. It also enhances the color and texture differences between the fruit and the background (e.g., the contrast between the vibrant color of ripe fruit and the green leaves), improving the accuracy of subsequent feature extraction. Simultaneously, the fruit images are uniformly converted to a standard color space (e.g., RGB standard space) to eliminate the impact of color deviations under different lighting conditions on the recognition results, ensuring the consistency of image features. It should be noted that all subsequent fruit images are preprocessed images; furthermore, this embodiment acquires a fruit image after each fruit is harvested. The preprocessing of the fruit images is well-known and will not be described in detail here.
[0059] Furthermore, the Mask R-CNN model is used to extract features and identify fruit regions in the fruit images. The Mask R-CNN model, through deep convolution and instance segmentation techniques, accurately captures fruit features in the fruit images, obtaining the local region and the complete region of each fruit within the image. The Mask R-CNN model, deep convolution, and instance segmentation techniques are all well-known technologies and will not be elaborated upon further.
[0060] Furthermore, depth data corresponding to each pixel in the fruit image is acquired using a depth sensor. Then, for any fruit in the image, the average depth data of all pixels corresponding to that fruit is used as the reference depth for that fruit. Thus, the reference depth for each fruit in the image is obtained. The smaller the reference depth, the closer the corresponding fruit is to the robotic arm, and the easier it is to harvest.
[0061] Step S2: Divide the fruit regions based on the location distribution of the fruits in the initial fruit image, and obtain the harvesting degree of each fruit region according to the salience, clustering and total number of fruits in each fruit region.
[0062] Specifically, in actual orchard environments, fruits are not isolated but often grow in clusters among branches and leaves. This distribution characteristic places specific demands on the operational efficiency of intelligent harvesting robots. If the robot operates on individual fruits one by one, the robotic arm needs to frequently move and switch paths, significantly increasing time and energy costs. It also tends to lead to a dispersed distribution of fruit salience during harvesting, making it difficult to establish clear harvesting priorities and affecting operational planning. It should be noted that higher fruit salience indicates higher fruit maturity. To adapt to the actual scenario of clustered fruit growth in orchards and the continuous harvesting requirements of the robot, this embodiment first divides fruit regions based on the positional distribution of fruits in the initial fruit image. Essentially, it groups spatially adjacent and densely distributed fruits into the same region, ensuring relative consistency in harvesting priority assessment within the same region. This allows the robot to concentrate on harvesting multiple fruits within the same region after entering it, reducing the frequency of robotic arm movements and path switching between different regions, fundamentally reducing operational costs, and avoiding priority confusion caused by independently assessing individual salience.
[0063] When the ripeness, clustering, and quantity of fruits within a given fruit area are high, it indicates that the area has priority for harvesting. Therefore, this embodiment determines the harvesting degree of each fruit area based on the prominence, clustering, and total quantity of fruits within that area. A higher harvesting degree means that the fruits in the corresponding area should be harvested more preferentially, which helps to accurately determine the harvesting order and improves the operational efficiency of the intelligent harvesting robot.
[0064] Preferably, in one feasible embodiment of this method, the fruit region is obtained as follows: based on the Euclidean distance between the corresponding center points of the fruits in the initial fruit image, a hierarchical clustering algorithm is used to cluster the fruits in the initial fruit image to obtain fruit clusters; the region corresponding to each fruit cluster is taken as the fruit region. It should be noted that the center point corresponding to each fruit is essentially the centroid of each fruit, and the center point of the fruit is represented in three-dimensional coordinates, where the z-axis in the three-dimensional coordinates corresponds to the reference depth. The methods for obtaining the centroid, Euclidean distance, and hierarchical clustering algorithm are all well-known and will not be elaborated further.
[0065] At this point, each fruit region in the initial fruit image has been obtained.
[0066] Preferably, in one feasible embodiment, the method for obtaining the degree of harvesting is described in [reference needed]. Figure 2 The document presents a flowchart of a method for obtaining the degree of harvesting provided in this embodiment. The method includes the following steps:
[0067] Step S201: For any fruit region, take the mean of the initial salience of all fruits in the fruit region as the salience analysis value of the fruit region.
[0068] The higher the significance value of the fruit, the higher the maturity of the fruit in that fruit area, and the more the fruit in that fruit area needs to be harvested.
[0069] In one possible implementation of this embodiment, the initial saliency is obtained as follows: For any fruit in the fruit image, the corresponding region image is input into a ResNet deep convolutional neural network. Through multiple layers of convolution, pooling, and residual connection operations, a high-level feature map with semantic information is extracted; wherein, the size of the high-level feature map is... Where H is the height of the high-level feature map; W is the width of the high-level feature map; and C is the number of feature channels in the high-level feature map. Each channel corresponds to key visual features of the fruit in different dimensions (such as the color features of ripe fruit, peel texture features, etc.), providing a data foundation for subsequent selection of key features. The high-level feature map is then processed using a saliency algorithm, generating channel importance weights through two steps: spatial information compression and nonlinear learning. Spatial information compression uses global average pooling to compress the spatial pixel information of each feature channel into a single value, preserving the feature differences of each channel dimension and eliminating spatial redundancy. Nonlinear learning inputs the compressed channel features into a fully connected layer containing a ReLU activation function, learning the contribution of each channel to the fruit saliency judgment through nonlinear transformation, and finally outputting a set of values ranging from... The importance weights of the channels are assigned (the higher the channel importance weight, the stronger the influence of the corresponding channel's features on the fruit's significance; for example, the red channel of a mature fruit will have a higher weight than the green leaf channel). Then, the channel importance weights are used to weight each channel of the high-level feature map to achieve the effect of strengthening key features and suppressing irrelevant features.
[0070] Further, the average value is directly calculated along the channel dimension of the channel-weighted high-level feature map, that is, for the C weighted feature channels (each channel has a size of C). The process involves summing pixel values at the same spatial location and averaging them to obtain a feature map containing only the spatial dimension. This initially generates a low-resolution saliency response map that reflects the saliency of different locations on the fruit. Then, using bilinear upsampling (smooth interpolation to avoid pixel distortion), the low-resolution saliency response map is enlarged to the same size as the original fruit region image, ensuring that the saliency accurately corresponds to the actual spatial location of the fruit. Finally, the pixel values (saliency values) in the enlarged response map are normalized, constraining all saliency values to within a certain range. The algorithm generates a high-resolution fruit heatmap within a given interval. Each salient value in the heatmap represents the visual saliency of its corresponding location. The mean of all salient values in the heatmap is used as the initial saliency of the fruit. The closer the initial saliency is to 1, the higher the overall visual saliency of the fruit, and the higher its maturity. The ResNet deep convolutional neural network, saliency algorithm, global average pooling, ReLU activation function, and bilinear upsampling technique are all well-known techniques and will not be elaborated further.
[0071] Step S202: The ratio of the sum of the complete areas of all fruits in the fruit region to the area of the smallest circumscribed circle of the fruit region is used as the fruit aggregation analysis value of the fruit region.
[0072] The higher the fruit aggregation analysis value, the denser the fruit distribution in the fruit area, the smaller the range of adjustment of the robotic arm in the fruit area, the greater the convenience of continuous harvesting, and the stronger the positive contribution to harvesting.
[0073] It should be noted that the complete area of the fruit is the area corresponding to the complete region of each fruit in step S1. In this embodiment, the area of the smallest circumscribed circle of the complete region of each fruit is taken as the complete area of each fruit.
[0074] The method for obtaining the minimum circumcircle is a well-known technique and will not be elaborated further.
[0075] Step S203: The normalized result of the product of the fruit saliency analysis value, the fruit aggregation analysis value, and the total number of fruits in the fruit region is taken as the harvesting degree of the fruit region.
[0076] It is known that larger fruit saliency and fruit clustering values indicate greater harvesting significance for the fruit area. Furthermore, a higher number of fruits in the fruit area indicates that the intelligent harvesting robot can continuously harvest more fruits within that area, resulting in higher operational efficiency and a stronger positive contribution to harvesting. Therefore, this embodiment uses the normalized product of the fruit saliency, fruit clustering, and the total number of fruits in the fruit area as the harvesting degree of that fruit area. This embodiment uses the norm normalization function to normalize the product of the fruit saliency, fruit clustering, and the total number of fruits in the fruit area.
[0077] At this point, the harvesting status of each fruit region in the initial fruit image is obtained.
[0078] Step S3: Obtain the spatial label of each fruit based on the reference depth and visibility of each fruit; obtain the true salience of each fruit based on the salience, spatial label, and harvesting degree of the fruit area in the initial fruit image, and determine the target fruit for the first harvest.
[0079] Specifically, in densely planted environments, the spatial distribution of fruits exhibits a significant layered structure, displaying a tiered arrangement from the outside in and from shallow to deep (for example, surface fruits are close to the outer perimeter of the plant, while deeper fruits are obscured by surface fruits or branches and leaves). This distribution characteristic determines that, due to physical space limitations, intelligent harvesting robots will inevitably prioritize contacting surface fruits that are unobstructed or less obstructed during the harvesting process. Only after the surface fruits are harvested and removed can they gradually contact the previously obscured deeper fruits. To make the fruit harvesting priority judgment more in line with this actual operational pattern, this embodiment first obtains the spatial label of each fruit based on its reference depth and visibility, essentially stratifying the fruits. The smaller the spatial label, the closer the corresponding fruit is to the surface.
[0080] In order to accurately determine the fruit picking order and improve the picking efficiency of the intelligent picking robot, this embodiment obtains the true salience of each fruit based on the salience, spatial label and picking degree of the fruit area in the initial fruit image. The greater the true salience, the more priority the corresponding fruit should be picked, thereby determining the target fruit to be picked for the first time in the current fruit picking process.
[0081] Preferably, in one feasible method of this embodiment, the spatial label acquisition method is as follows: The reference depths of each fruit in the initial fruit image are arranged in ascending order to obtain a depth sequence; the depth sequence is evenly divided into a preset number of local sequences, and the local sequences are sequentially numbered from left to right in ascending order. The number corresponding to each local sequence is used as the reference label for the fruit corresponding to the reference depth range in the corresponding local sequence. In this embodiment, the preset number is set to 5, and the numbers are 1, 2, 3, 4, and 5. The implementer can set the preset number and the number of labels according to the actual situation, which is not limited here. The smaller the reference label, the closer it is to the surface, and the easier it is for the robotic arm to harvest.
[0082] Considering that reference labels cannot accurately reflect the visibility of fruits in practice, if a fruit has a large reference label but almost no other fruits obstructing its view, then the fruit is easy to pick, and the corresponding label for that fruit should be reduced in size. Conversely, if a fruit has a small reference label but is severely obstructed by other fruits, then the fruit is difficult to pick, and the corresponding label for that fruit should be increased in size. Therefore, in this embodiment, the ratio of the visible area of each fruit in the fruit image to its full area is used as the visibility of each fruit. The visible area of each fruit in the fruit image is the number of pixels contained in the corresponding region of each fruit in the fruit image.
[0083] For any fruit, when the visibility of the fruit is greater than a first preset visibility threshold, the product of the fruit's visibility and a reference label is used as the fruit's first label adjustment value; the difference between the fruit's reference label and the first label adjustment value, rounded up, is used as the fruit's spatial label. When the visibility of the fruit is less than a second preset visibility threshold, the product of the fruit's visibility and a reference label is used as the fruit's second label adjustment value; the sum of the fruit's reference label and the second label adjustment value, rounded up, is used as the fruit's spatial label. It should be noted that if the spatial label is greater than the maximum reference label, the spatial label is modified to the maximum reference label. When the visibility of the fruit is greater than or equal to the second preset visibility threshold and less than or equal to the first preset visibility threshold, the fruit's reference label is used as the spatial label. In this embodiment, the first preset visibility threshold is set to 0.8, and the second preset visibility threshold is set to 0.5. Implementers can set the values of the first and second preset visibility thresholds according to actual conditions; this is not limited here.
[0084] At this point, the spatial label of each fruit in the initial fruit image is obtained, which prepares for the subsequent accurate characterization of the salient features of each fruit in the initial fruit image, i.e., the picking priority.
[0085] For any given fruit, a higher initial significance, a smaller spatial label, and a higher harvesting rate in its fruit region indicate a higher harvesting priority for that fruit. Therefore, this embodiment uses the sum of the fruit's initial significance, the negative correlation of its spatial label, and the harvesting rate of its fruit region, followed by normalization, as the true significance of the fruit. This embodiment uses the reciprocal of the spatial label as the negative correlation result; and normalizes the sum of the initial significance, the negative correlation of the spatial label, and the harvesting rate of its fruit region using the norm normalization function.
[0086] At this point, the true salience of each fruit in the initial fruit image is obtained.
[0087] Furthermore, the fruit corresponding to the highest true salience is selected as the target fruit for the first harvest in the current fruit picking process. It should be noted that if there are multiple fruits corresponding to the highest true salience in the initial fruit image, then any one of them is selected as the target fruit for the first harvest.
[0088] Step S4: Based on the spatial label differences between the target fruit and each other fruit in the target fruit region and the preset neighboring fruit region, as well as the distance between the target fruit region and the preset neighboring fruit region, adjust the true significance of the target fruit and other fruits in the target fruit region and the preset neighboring fruit region, obtain the adjusted true significance, and determine the target fruit for the second harvest; and so on, obtain the harvested target fruits in real time until the true significance of all fruits is less than the preset significance threshold, then stop harvesting fruits.
[0089] Specifically, in automated fruit harvesting operations, a single harvesting operation by a robotic arm directly alters the visual structure and spatial topology of the planting scene. When a target fruit is removed, fruits (or parts of the area) that were previously obscured become visible, causing a dynamic change in the accessibility of the remaining fruits. While a global recalculation strategy (recalculating the true salience of all remaining fruits after each harvest) can ensure the identification of the optimal mature fruit, it requires recalculating the entire process of feature extraction, hierarchical clustering, and true salience evaluation across all fruits, resulting in extremely high computational complexity and making it difficult to meet the real-time response requirements of harvesting operations. To balance the accuracy of true salience with the real-time nature of harvesting, this embodiment designs an incremental update mechanism based on fruit depth information. This means that the true salience is updated only for the local fruits affected by the harvesting action, rather than a global recalculation.
[0090] The incremental update fully reuses the computational results of the initial fruit image, relying on the completed hierarchical clustering results to retain the fruit region, spatial label, initial saliency, reference depth, and spatial adjacency relationships (such as inter-cluster distance and hierarchical association), without re-performing basic data calculations. After the harvesting action occurs, it only focuses on the local area where the occlusion relationship changes due to harvesting, analyzes the change pattern of the true saliency of the fruit in this area, and achieves dynamic prediction through local feature adjustments, significantly reducing the computational load. It is known that when the target fruit is harvested, its occlusion effect on surrounding fruits disappears; therefore, the areas most affected by the target fruit are the fruit region where the target fruit is located and its preset neighboring fruit regions.
[0091] Therefore, this embodiment adjusts the true salience of other fruits in the target fruit region and the preset neighboring fruit region based on the spatial label differences between the target fruit and each other fruit in the target fruit region and the preset neighboring fruit region, as well as the distance between the target fruit region and the preset neighboring fruit region. The adjusted true salience is then obtained, and the target fruit for the second harvest is determined. This allows for accurate capture of the impact of scene changes on the true salience while minimizing the calculation range, ensuring real-time performance, and fully adapting to the continuous operation requirements of automated harvesting.
[0092] Preferably, in one feasible method of this embodiment, the method for obtaining the adjusted true significance is as follows: The fruit region where the target fruit is located is taken as the target fruit region. The Euclidean distance between the target fruit region and the corresponding center points (i.e., cluster centers) of each other fruit region in the three-dimensional coordinate system is obtained, and each distance is taken as the first distance. When the normalized first distance is less than a preset distance threshold, the corresponding other fruit regions are taken as preset neighboring fruit regions of the target fruit region. In this embodiment, the preset distance threshold is set to 0.2. The implementer can set the size of the preset distance threshold according to the actual situation, which is not limited here. The result of negatively correlated and normalized first distances between the target fruit region and each preset neighboring fruit region is taken as the degree of influence of each preset neighboring fruit region. The degree of influence of the target fruit region is 1. In this embodiment, the negative of the first distance is taken as the power of an exponential function with the natural constant as the base. The output of this exponential function is the result of negatively correlated and normalized first distances. The greater the degree of influence, the greater the influence of the target fruit on the fruits in the corresponding preset neighboring fruit regions. After the target fruit is harvested, the visibility of the fruits in the corresponding preset neighboring fruit regions is significantly improved.
[0093] Furthermore, for any fruit in the target fruit region and its preset neighboring fruit regions (excluding the target fruit), the absolute value of the difference between the spatial labels of the target fruit and the target fruit is taken as the first difference. The smaller the first difference, the more likely the fruit and the target fruit were originally in the same line of sight. After the target fruit is picked and removed, the occlusion of the target fruit is directly eliminated, and the salience of the fruit should be significantly improved. Therefore, in this embodiment, the result of normalizing the product of the degree of influence of the fruit region where the fruit is located and the negative correlation result of the first difference is taken as the salience adjustment weight of the fruit. In this embodiment, the negative number of the first difference is taken as the power of an exponential function with the natural constant as the base, and the output of the exponential function is the negative correlation result of the first difference. In this embodiment, the product of the degree of influence of the fruit region where the fruit is located and the negative correlation result of the first difference is normalized by the norm normalization function. Then, the product of the true significance of the fruit and the significance adjustment weight is used as the significance adjustment value of the fruit; the sum of the true significance of the fruit and the significance adjustment value is used as the adjusted true significance of the fruit; thus, the adjusted true significance of the target fruit region in the initial fruit image and each other fruit in the preset neighboring fruit region, excluding the target fruit, is obtained.
[0094] Considering that some target fruits may be completely obscured in practice, a second fruit image is acquired after the target fruits are harvested. Using existing visual tracking technology, it is determined whether any new fruits appear in the second image. If new fruits are present, they are all considered reference fruits. For any reference fruit, the Euclidean distance between its center point and the corresponding center point of each fruit region is calculated. The fruit region with the smallest Euclidean distance is identified as the reference fruit region, preparing for the subsequent acquisition of its true saliency. Then, the true saliency of each reference fruit is obtained using the same method as for each fruit in the initial fruit image. Thus, the true saliency of each reference fruit in the second fruit image is obtained.
[0095] The target fruit region in the initial fruit image is reordered with the adjusted true significance of each fruit in its preset neighboring fruit region (excluding the target fruit), the true significance of the reference fruit, and the unadjusted true significance in the initial fruit image. The target fruit with the highest true significance at this point is then selected as the target fruit for the second harvest.
[0096] It should be noted that the target fruit for the third harvest is acquired based on the second fruit image. The second fruit image encompasses all the information in the first fruit image. This process continues, acquiring target fruits in real-time until the actual salience of each fruit is less than a preset salience threshold, at which point the harvesting process stops. In this embodiment, the preset salience threshold is set to 0.4. Implementers can adjust the preset salience threshold according to actual circumstances; no limitation is imposed here.
[0097] After the intelligent harvesting robot completes the visual recognition and spatial positioning of the target fruit, the system will activate the execution module. Through four coordinated operations—trajectory planning, robotic arm movement, fruit grasping, and transfer and collection—it will complete a single harvesting task in a densely planted environment. The entire process aims for collision-free operation, high efficiency, and low damage. The specific execution process is as follows:
[0098] (a) Motion planning algorithm generates optimal collision-free trajectory.
[0099] Trajectory planning is a prerequisite for precise robotic arm operations. Its core is to calculate a safe and efficient movement path by combining the current state of the robotic arm with the position of the target fruit. The input data for the motion planning algorithm are the current pose of the robotic arm (including the angles of each joint, the position and attitude of the end effector) and the three-dimensional spatial coordinates of the target fruit. The planned target must simultaneously meet two requirements: collision-free and optimal. Collision-free means that the trajectory must avoid branches and leaves, other fruits, and the robot's own structure (such as the walking platform and fruit basket) in the planting environment. Collision detection algorithms (such as bounding box-based collision detection) are used to ensure path safety. Optimal means that the trajectory should minimize the robotic arm's movement distance and time, while reducing the joint movement amplitude, reducing energy consumption and operational errors. Finally, a continuous and smooth motion trajectory is generated, which includes the target angle of each joint of the robotic arm at different time points and the attitude change curve of the end effector, providing precise control commands for subsequent robotic arm movements.
[0100] (ii) Drive the robotic arm to move along the planned trajectory to the target position.
[0101] After receiving instructions from the trajectory planning module, the robotic arm control system drives the motors of each joint to execute actions. The system adopts a closed-loop control strategy, collecting position sensor data (such as encoder feedback) from each joint in real time, comparing it with the target position in the trajectory instructions, and dynamically adjusting the motor speed and torque to ensure that the deviation between the actual movement trajectory of the robotic arm and the planned trajectory is controlled within the allowable range. During the movement, the attitude of the end effector is adjusted synchronously (such as adjusting the gripping angle according to the fruit growth direction to ensure that the gripping surface is in contact with the fruit surface) to avoid subsequent gripping failure or fruit damage due to improper attitude. When the end effector reaches the preset gripping point (usually the optimal gripping position where the fruit and the stem are connected, which is pre-calibrated by the vision module), the system confirms the position signal through the position sensor, the robotic arm stops moving, and enters the gripping preparation state.
[0102] (iii) The end effector grasps the fruit based on its physical characteristics.
[0103] The gripping action of the end effector needs to be adapted to the physical characteristics of the fruit (such as hardness, size, and stem strength) to avoid damaging the fruit or causing stem breakage. Common operating methods are divided into shearing gripping and suction gripping. Shearing gripping is suitable for fruits with thick stems that need to be cut before picking (such as apples and pears). The end effector has a built-in small shearing mechanism (such as a micro-blade or clamping shears). When it reaches the gripping point, it first gently clamps the fruit with flexible claws (the inside of the claws is wrapped with soft material to prevent crushing the peel), and then the shearing mechanism is activated to cut the stem, ensuring that the fruit is completely removed from the plant. Suction gripping is suitable for fruits with soft peels that are not easy to clamp (such as tomatoes and strawberries) or fruits with thin stems that are easy to fall off. The end effector generates suction through a negative pressure generator, which smoothly adsorbs the surface of the fruit onto the gripping end (the suction force can be dynamically adjusted according to the weight of the fruit to avoid excessive suction that crushes the fruit or insufficient suction that causes it to fall). The gripping can be completed without contacting the stem.
[0104] (iv) The robotic arm transfers the fruit to the collection device to complete the harvesting.
[0105] After successfully grasping the fruit, the robotic arm switches to transfer mode and transports the fruit to the collection device (i.e., the fruit basket) along a preset path. The transfer path also needs to undergo collision detection to avoid obstructions along the way, and the path design needs to minimize the distance while preventing the fruit from falling due to shaking (e.g., using a low and stable moving speed). When the end effector moves to the designated release position above the fruit basket, it executes the release action according to the grasping method. The grippers of the shearing gripper release, and the negative pressure of the suction gripper is released, allowing the fruit to fall naturally into the fruit basket (some fruit baskets have built-in cushioning pads to reduce impact damage when the fruit falls). After the fruit is released, the robotic arm returns to the initial working position, and the system completes the status recording of a single picking task (such as the number of fruits picked and location information). Then, it can enter the next fruit recognition and picking cycle, adapting to the continuous operation needs in dense planting environments.
[0106] In summary, this embodiment acquires fruit images and reference depth in real time during fruit harvesting; divides fruit regions in the initial fruit image, and determines the harvesting degree based on the salience, clustering, and total number of fruits in each region; acquires spatial labels based on the reference depth and visibility of the fruits; determines the target fruit for the first harvest by obtaining the true salience based on the salience, spatial labels, and harvesting degree of the fruit region in the initial fruit image; determines the target fruit for the second harvest by obtaining the adjusted true salience of the affected fruits based on the impact of the target fruit harvest on other local fruits; and so on, acquiring the target fruit in real time until the stopping condition is met. This invention effectively improves the efficiency and accuracy of fruit harvesting by pre-identifying the target fruit.
[0107] Example 2:
[0108] This invention also proposes a fruit recognition system for an intelligent harvesting robot in a densely planted environment; please refer to [link / reference]. Figure 3 The diagram shows a structural diagram of a fruit recognition system for an intelligent harvesting robot in a densely planted environment, according to an embodiment of the present invention. The system includes: a data acquisition module 10, a harvesting degree acquisition module 20, a static target fruit acquisition module 30, and a dynamic target fruit acquisition module 40.
[0109] The data acquisition module 10 is used to acquire fruit images and the reference depth of each fruit in the fruit image in real time during a single fruit picking process of the intelligent picking robot.
[0110] The harvesting degree acquisition module 20 is used to divide fruit regions based on the positional distribution of fruits in the initial fruit image, and to acquire the harvesting degree of each fruit region according to the salience, clustering and total number of fruits in each fruit region.
[0111] The static target fruit acquisition module 30 is used to acquire the spatial label of each fruit based on the reference depth and visibility of each fruit; and to acquire the true salience of each fruit based on the salience, spatial label and harvesting degree of the fruit area in the initial fruit image, and to determine the target fruit for the first harvest.
[0112] The dynamic target fruit acquisition module 40 is used to adjust the true significance of other fruits in the target fruit area and the preset neighboring fruit area based on the spatial label difference between the target fruit and each other fruit in the target fruit area and the preset neighboring fruit area, as well as the distance between the target fruit area and the preset neighboring fruit area. The adjusted true significance is then used to determine the target fruit for the second harvest. This process is repeated in real time to acquire the harvested target fruits until the true significance of all fruits is less than the preset significance threshold, at which point fruit harvesting stops.
[0113] It should be noted that the system provided in the above embodiments is only an example of the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the computer device can be divided into different functional modules to complete all or part of the functions described above. In addition, the fruit recognition system of an intelligent harvesting robot in a dense planting environment and the fruit recognition method of an intelligent harvesting robot in a dense planting environment provided in the above embodiments belong to the same concept. The specific implementation process is detailed in the method embodiments and will not be repeated here.
[0114] Example 3:
[0115] This invention also proposes a fruit recognition device for an intelligent harvesting robot in a densely planted environment. The device includes a memory and a processor. The memory stores executable program code, and the processor calls and executes the executable program code to perform the fruit recognition method for an intelligent harvesting robot in a densely planted environment provided in the embodiments of this application. Specifically, the device may be a chip, component, or module. The chip may include a connected processor and memory; the memory stores instructions, and when the processor calls and executes the instructions, the chip can perform the fruit recognition method for an intelligent harvesting robot in a densely planted environment provided in the above embodiments.
[0116] Furthermore, this application also protects a computer device; please refer to [link to relevant documentation]. Figure 4 The computer device includes a memory 401, a processor 402, and a computer program 403 stored in the memory 401 and running on the processor 402. When the processor 402 executes the computer program 403, the computer device can execute any of the aforementioned methods for fruit recognition by an intelligent harvesting robot in a densely planted environment.
[0117] Example 4:
[0118] This embodiment also provides a computer-readable storage medium storing computer program code. When the computer program code is run on a computer, the computer executes the above-mentioned method steps to realize the fruit recognition method of an intelligent harvesting robot in a dense planting environment provided in the above embodiment.
[0119] Example 5:
[0120] This embodiment also provides a computer program product. When the computer program product is run on a computer, it causes the computer to perform the above-mentioned related steps to realize the fruit recognition method of an intelligent harvesting robot in a dense planting environment provided in the above embodiment.
[0121] In this embodiment, the device, computer-readable storage medium, computer program product, or chip are all used to execute the corresponding methods provided above. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects in the corresponding methods provided above, and will not be repeated here.
[0122] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0123] The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.
Claims
1. A method for fruit recognition using an intelligent harvesting robot in a densely planted environment, characterized in that, The method includes the following steps: During a single fruit-picking process by an intelligent harvesting robot, real-time images of the fruit and the reference depth of each fruit in the image are acquired. Fruit regions are divided based on the location distribution of fruits in the initial fruit image. The harvesting degree of each fruit region is obtained based on the salience, clustering, and total number of fruits in each region. Based on the reference depth and visibility of each fruit, obtain the spatial label of each fruit; based on the salience, spatial label, and harvesting degree of each fruit in the initial fruit image, obtain the true salience of each fruit and determine the target fruit for the first harvest. Based on the spatial label differences between the target fruit and each other fruit in the target fruit region and the preset neighboring fruit region, as well as the distance between the target fruit region and the preset neighboring fruit region, the true significance of the target fruit and other fruits in the target fruit region and the preset neighboring fruit region is adjusted, and the adjusted true significance is obtained to determine the target fruit for the second harvest. This process is repeated in real time to obtain the target fruit for harvesting until the true significance of all fruits is less than the preset significance threshold, at which point fruit harvesting stops. The method for obtaining the adjusted true significance is as follows: The target fruit region is defined as the fruit region where the target fruit is located. The distance between the target fruit region and the center point of each other fruit region is obtained and used as the first distance. When the normalized first distance is less than the preset distance threshold, the corresponding other fruit regions are defined as the preset neighboring fruit regions of the target fruit region. The result of negatively correlated and normalized first distances between the target fruit region and each of its preset neighboring fruit regions is used as the degree of influence of each preset neighboring fruit region; where the degree of influence of the target fruit region is 1. For any fruit in the target fruit region and any fruit in its preset neighboring fruit region other than the target fruit, the spatial label difference between the target fruit and the target fruit is taken as the first difference; The normalized result of the product of the degree of influence of the fruit region where the fruit is located and the negative correlation of the first difference is used as the significant adjustment weight of the fruit. The product of the true significance of the fruit and the significance adjustment weight is used as the significance adjustment value of the fruit. The sum of the true significance of the fruit and the significance adjustment value is taken as the adjusted true significance of the fruit.
2. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 1, characterized in that, The method for obtaining the degree of harvesting is as follows: For any fruit region, the mean of the initial salience of all fruits in that fruit region is used as the fruit salience analysis value for that fruit region. The ratio of the sum of the complete areas of all fruits in the fruit region to the area of the smallest circumcircle of the fruit region is used as the fruit aggregation analysis value of the fruit region. The normalized result of the product of the fruit saliency analysis value, the fruit clustering analysis value, and the total number of fruits in the fruit region is taken as the harvesting degree of the fruit region.
3. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 2, characterized in that, The method for obtaining the initial significance level is as follows: For any fruit in the fruit image, the corresponding region image of the fruit is input into a ResNet deep convolutional neural network to obtain a high-level feature map. The high-level feature map is processed by a saliency algorithm to obtain the saliency value of each pixel corresponding to the fruit. The average saliency value of all pixels corresponding to the fruit is used as the initial saliency of the fruit.
4. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 2, characterized in that, The method for obtaining the complete area is as follows: The Mask R-CNN model was used to extract the full area of each fruit in the fruit image.
5. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 4, characterized in that, The method for obtaining the spatial tag is as follows: Arrange the reference depths of each fruit in the initial fruit image in ascending order to obtain a depth sequence; The depth sequence is evenly divided into a preset number of local sequences. The local sequences are numbered sequentially from left to right, from smallest to largest. The number corresponding to each local sequence is used as the reference label for the fruit corresponding to the reference depth range in the local sequence. The ratio of the visible area of each fruit in the fruit image to its full area is used as the visibility of each fruit. For any fruit, when the visibility of the fruit is greater than the first preset visibility threshold, the product of the visibility of the fruit and the reference label is used as the first label adjustment value of the fruit; the difference between the reference label of the fruit and the first label adjustment value is rounded up and used as the spatial label of the fruit. When the visibility of the fruit is less than the second preset visibility threshold, the product of the visibility of the fruit and the reference label is used as the second label adjustment value of the fruit; the result of adding the reference label of the fruit and the second label adjustment value and rounding up is used as the spatial label of the fruit. When the visibility of the fruit is greater than or equal to the second preset visibility threshold and less than or equal to the first preset visibility threshold, the reference label of the fruit is used as the spatial label.
6. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 3, characterized in that, The method for obtaining the true significance level is as follows: For any given fruit, the true significance of the fruit is the sum of its initial significance, the negative correlation of its spatial label, and the harvesting rate of its fruit region, followed by normalization.
7. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 1, characterized in that, The method for obtaining the target fruit is as follows: The fruit corresponding to the highest real significance is taken as the target fruit.
8. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 1, characterized in that, The method for obtaining the fruit region is as follows: Based on the distance between fruits in the initial fruit image, the fruits in the initial fruit image are clustered using a hierarchical clustering algorithm to obtain fruit clusters; The region corresponding to each fruit cluster is taken as the fruit region.
9. The fruit recognition method for an intelligent harvesting robot in a densely planted environment as described in claim 1, characterized in that, The method for obtaining the reference depth is as follows: For any fruit in the fruit image, the average depth data of all pixels corresponding to that fruit is used as the reference depth of that fruit.