Autonomous intelligent fruit and vegetable picking robot carrying deep learning model
By incorporating a deep learning model into an autonomous intelligent fruit and vegetable harvesting robot, combined with zero-space projection control and edge computing, the problems of motion planning and obstacle avoidance for agricultural harvesting robots in complex environments have been solved, improving operational stability and safety, and enabling efficient model updates and resource utilization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG UNIV
- Filing Date
- 2026-02-27
- Publication Date
- 2026-06-23
AI Technical Summary
Existing agricultural harvesting robots face challenges such as limited motion planning, weak model generalization ability, and high risk of online updates in complex and unstructured environments. They struggle to effectively coordinate the end effector's tasks with the robotic arm's obstacle avoidance requirements, resulting in insufficient operational stability and safety.
The robot employs an autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model. Combining zero-space projection control strategy and edge computing technology, it acquires environmental data through a visual perception module, uses a reinforcement learning path planning module for task allocation and conflict monitoring and resolution, enabling the robotic arm to flexibly avoid obstacles, and updates and optimizes the model online through edge-cloud collaborative interaction.
It improves the robot's motion stability and operational safety in complex unstructured environments, ensures the model's generalization ability and efficient utilization of computing resources, and achieves seamless policy updates and operational continuity.
Smart Images

Figure CN122250293A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of agricultural robot technology, specifically to an autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model. Background Technology
[0002] As modern agriculture develops towards intensification and intelligence, agricultural robots are widely used in fruit and vegetable harvesting to alleviate labor shortages and improve production efficiency. Existing harvesting robots typically consist of a wheeled or tracked chassis, a multi-degree-of-freedom robotic arm, and a dedicated end effector. Their standard operating procedure generally involves acquiring the three-dimensional coordinates of the fruit using vision sensors, calculating the target angles of each joint using inverse kinematics algorithms, and then controlling the robotic arm to move the end effector along a planned path to the target location to complete the grasping, separating, and collecting actions. This automated operation method, based on a pre-set model or offline planning, has already achieved certain application results in highly structured scenarios such as greenhouses.
[0003] However, in typical unstructured environments such as natural orchards, the intricate distribution of branches and leaves and the extremely dense array of obstacles pose a significant challenge to the motion planning of robotic arms. Existing harvesting robots often struggle to effectively coordinate the end effector's tasks with the obstacle avoidance requirements of the robotic arm's links when performing harvesting actions. Specifically, when the robotic arm's intermediate links detect a collision risk with obstacles such as branches, traditional path planning methods typically require altering the overall configuration. This often forces the end effector to deviate from its originally locked fruit target point, resulting in operational interruption or harvesting failure. Furthermore, when performing complex maneuvers in confined spaces, the robotic arm is highly susceptible to approaching kinematically singular configurations. Conventional inverse kinematics algorithms are prone to numerical instability or divergence in such states, leading to sudden changes in joint velocity or even system deadlock. This prevents the robotic arm from flexibly adjusting its shape using redundant degrees of freedom while maintaining the end effector's operational posture, severely restricting the robot's operational stability and safety in complex environments. Summary of the Invention
[0004] The technical problem to be solved by this invention is to provide an autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model, addressing the problems of limited motion planning, weak model generalization ability, and high risk of online updates faced by existing agricultural harvesting robots in complex and unstructured environments.
[0005] The first aspect of this invention provides an autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model. The robot includes a mobile frame, a multi-degree-of-freedom robotic arm, a harvesting end effector, a visual perception module, an edge computing control module, and a wireless communication module. The mobile frame serves as the robot's mobile base; the multi-degree-of-freedom robotic arm is mounted on the mobile frame; the harvesting end effector is connected to the end of the multi-degree-of-freedom robotic arm; the visual perception module is equipped with an RGB-D depth camera for acquiring environmental data; the wireless communication module is used for data transmission with a cloud server; and the edge computing control module is electrically connected to the mobile frame, the multi-degree-of-freedom robotic arm, the harvesting end effector, the visual perception module, and the wireless communication module. The edge computing control module integrates a visual perception data processing unit, an intelligent decision-making and planning unit, and an edge-cloud collaborative interaction unit; the intelligent decision-making and planning unit includes a reinforcement learning path planning module, a task allocation module, and a conflict monitoring and resolution module.
[0006] Furthermore, in terms of motion control and obstacle avoidance, this invention employs a zero-space projection control strategy to resolve the task conflict between end-effector operation and obstacle avoidance. The conflict monitoring and resolution module constructs the capsule-like bounding box of the multi-degree-of-freedom manipulator and the minimum directional bounding box of the obstacle, and calculates the minimum Euclidean distance between them. When the minimum Euclidean distance is less than a preset safety interference threshold, the system calculates the obstacle avoidance velocity vector based on artificial potential field logic. The module calculates the Jacobian matrix of the multi-degree-of-freedom manipulator and its corresponding zero-space projection matrix, multiplies the obstacle avoidance velocity vector by the zero-space projection matrix, and generates a self-moving joint velocity component that acts only in the zero space of the manipulator. Subsequently, the system superimposes this self-moving joint velocity component onto the joint velocity command derived from the main task, thereby achieving the adjustment of the link attitude to avoid obstacles while keeping the end-effector from deviating from the main task's operating trajectory. Meanwhile, the module calculates the operability index of the Jacobian matrix in real time. When the index indicates that the robotic arm is approaching a singular configuration, the damped least squares solver is activated. A regularized damping factor is introduced on the diagonal of the product of the Jacobian matrix and its transpose to ensure the numerical stability of the inverse solution. It also has an automatic rollback function based on the historical trajectory buffer in deadlock conditions.
[0007] A second aspect of this invention provides a method for edge-cloud collaborative data processing and model evolution of the aforementioned robot. The edge-cloud collaborative interaction unit performs hard case screening based on uncertainty measurement and value bias at the edge.
[0008] For uncertainty screening, a lightweight policy network is deployed using model quantization techniques. During the extrapolation process, the information entropy of the action probability distribution is calculated. This information entropy quantifies the uncertainty of the policy network regarding the current decision, and its value is positively correlated with the dispersion of the action probability distribution. When the information entropy exceeds a warning threshold, the current state is determined to be at the boundary of the data distribution and is marked as an uncertain difficult example sample.
[0009] In terms of value deviation screening, a value deviation index is calculated. This index is equal to the absolute value of the difference between the target value (constructed based on the immediate reward value and the predicted value of the next state) and the predicted value of the current state. When this index is too large, it indicates that the model's evaluation of the state has a significant bias, and it is marked as a value deviation difficult example sample. Combined with trigger-based screening based on task results (such as inverse kinematics failure, collision alarm), the system only stores the above-mentioned high-value difficult examples and fault scene data packets in a high-priority queue and uploads them to the cloud.
[0010] The cloud server uses uploaded data to perform reverse calibration of the physical parameters of the simulation environment (such as illumination and friction coefficient), reducing the difference between the virtual and real domains. Subsequently, knowledge distillation technology is used to transform the unnormalized log probability of actions output by the teacher policy network into a smoothed, softened action probability distribution by introducing a temperature coefficient. With the goal of minimizing the relative entropy between the teacher policy network and the student policy network, the student policy network on the edge side is trained, achieving high-performance model compression and transfer.
[0011] A third aspect of this invention provides a method for the safe updating and deployment of the aforementioned robot model. The edge-cloud collaborative interaction unit employs a dual-buffered memory architecture and atomic switching technology to achieve uninterrupted hot updates.
[0012] Upon receiving an update notification containing a digital signature and hash checksum, the edge device downloads the encrypted model deployment package via breakpoint resumption and verifies its integrity. The system maintains dual-buffered runtime areas in memory, serving as the main runtime area and a backup loading area. The new model completes decompression, computation graph construction, and memory allocation in the backup loading area, entering a hot standby state without affecting the normal operation of the old model in the main runtime area.
[0013] The switching process is strictly constrained by safety conditions and atomicity. Only when the robotic arm is detected to be in a zero-speed holding state and has no pending action commands, the system, protected by a mutex mechanism, performs a function entry pointer swap operation within a single system control cycle, instantly redirecting the inference call address to the backup loading area. If performance anomalies occur after the new model is deployed, the system uses a rollback mechanism to redirect the pointer back to the previous version model stored in non-volatile memory, ensuring the continuity and safety of the operation process.
[0014] This invention provides an autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model. It has the following beneficial effects: 1. This invention constructs a null-space projection matrix of the Jacobian matrix of the robotic arm, maps the obstacle avoidance velocity vector into self-moving joint velocity components that act only in the null space, and superimposes them into the main task command. At the same time, it combines the damped least squares method to introduce a regularized damping factor in the singular region, which enables flexible adjustment of the linkage attitude to avoid obstacles while ensuring that the harvesting end effector operates strictly along the predetermined trajectory. It also effectively suppresses the sudden change of joint velocity under singular configurations, significantly improving the motion stability and operational safety of the robot in complex unstructured agricultural environments.
[0015] 2. This invention accurately screens uncertain and difficult example samples at the model's cognitive boundary by calculating the information entropy of the action probability distribution and the value deviation index based on temporal difference at the edge. Combined with cloud-based knowledge distillation technology based on temperature scaling, the optimal strategy of the teacher network is transferred to the edge side. This achieves continuous iteration of the lightweight model on the edge side using high-value data while significantly reducing the bandwidth consumption of wireless network transmission. It solves the contradiction between the limited computing resources of edge devices and the high computing power requirements of deep reinforcement learning, and ensures the steady improvement of the model's generalization ability.
[0016] 3. This invention constructs a dual-buffer architecture of a main running area and a backup loading area in the memory of the edge computing control module. When the robotic arm is in a zero-speed holding state and there are no instructions, it uses a mutex lock mechanism to complete the atomic switching of the function entry pointer within a single control cycle. Combined with automatic rollback logic based on performance monitoring, it realizes online seamless hot updates of the strategy model, avoids the operation interruption caused by traditional shutdown updates, and eliminates the risk of control system crashes due to potential defects in the new model. This ensures the continuity of robot operation and system reliability under long-term unattended operation conditions. Attached Figure Description
[0017] Figure 1 This is a schematic diagram of the overall three-dimensional structure of the present invention; Figure 2 This is a schematic diagram of the overall rear view structure of the present invention; Figure 3 This is a schematic diagram of the system flow of the present invention.
[0018] Among them, 10 is the mobile frame; 20 is the robotic arm with degrees of freedom; 30 is the harvesting end effector; 40 is the vision perception module; 50 is the edge computing control module; and 60 is the wireless communication module. Detailed Implementation
[0019] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] Example: This invention provides an autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model, including a mobile frame 10, a multi-degree-of-freedom robotic arm 20, a harvesting end effector 30, a visual perception module 40, an edge computing control module 50, and a wireless communication module 60.
[0021] The mobile chassis 10 serves as the robot's mobile carrier, employing differential drive or Ackerman steering to adapt to unstructured agricultural terrain. A drive wheel assembly is mounted at the bottom of the mobile chassis 10, electrically connected to a motor driver. The motor driver receives motion commands from the edge computing control module 50 via a CAN bus or industrial Ethernet, controlling the robot to perform forward, backward, and turning movements to achieve positioning within crop rows.
[0022] The multi-degree-of-freedom robotic arm 20 is mounted on the upper surface of the mobile frame 10 and is used to adjust the spatial pose of the end effector 30. In this embodiment, the multi-degree-of-freedom robotic arm 20 includes a first robotic arm and a second robotic arm. Both the first and second robotic arms are six-degree-of-freedom or seven-degree-of-freedom serial articulated robotic arms, with each joint integrating a servo motor and a reducer. The bases of the first and second robotic arms are fixed at different height positions on the mobile frame 10, or they share the same mounting base to form a dual-arm cooperative configuration. A fixed rigid transformation relationship is maintained between the coordinate system of the robotic arm base and the coordinate system of the geometric center of the mobile frame 10.
[0023] The harvesting end effector 30 is mounted on the end flange of the multi-degree-of-freedom robotic arm 20. The harvesting end effector 30 integrates a harvesting gripper and a front-end shearing assembly. The harvesting gripper includes a drive motor, a transmission screw, and a contouring finger support. A flexible buffer layer is attached to the inner side of the contouring finger support for gripping the target fruit. A thin-film pressure sensor is installed on the inner side of the harvesting gripper to provide feedback on the gripping force data. The front-end shearing assembly is located on the side of the harvesting gripper and employs an electrically driven rotating blade or a reciprocating scissor structure, configured to cut the fruit stem along a predetermined trajectory after the harvesting gripper has secured the fruit.
[0024] The visual perception module 40 is configured to collect three-dimensional environmental data and image data of the working environment. The visual perception module 40 includes a global camera and a local camera. The global camera is fixedly mounted on the front support of the mobile frame 10 and is used to acquire global environmental point cloud data; the local camera is mounted on the end of the multi-degree-of-freedom robotic arm 20 and is used to acquire close-range fruit stem images.
[0025] To achieve precise harvesting, hand-eye calibration was performed between the visual perception module 40 and the multi-degree-of-freedom robotic arm 20. Specifically, the edge computing control module 50 stores a pre-calibrated coordinate transformation matrix, configured to map the target point pixel coordinates and depth information collected by the visual perception module 40 to the base coordinate system of the multi-degree-of-freedom robotic arm 20 through a rigid transformation algorithm, thereby enabling the multi-degree-of-freedom robotic arm 20 to perform precise end effector movements based on the transformed spatial coordinates.
[0026] This invention provides a control system architecture for an autonomous intelligent fruit and vegetable harvesting robot. The control system architecture runs on an edge computing control module 50 and a cloud server, and mainly includes a visual perception data processing unit, an intelligent decision planning unit, a motion control execution unit, and an edge-cloud collaborative interaction unit.
[0027] The visual perception data processing unit is used to preprocess and extract features from the raw data input by the sensor. It receives RGB image data and depth point cloud data from the visual perception module 40. Internally, the unit runs a convolutional neural network model and point cloud processing algorithm to identify fruit targets, branch and leaf obstacles, and supporting structures within the field of view. The output data includes the three-dimensional center coordinates of the target fruit, fruit maturity classification labels, fruit growth posture vectors, and three-dimensional bounding box information of obstacles. The unit maps the output data to the coordinate system of the robotic arm base, constructing a current environmental state vector, which serves as the input for subsequent decision-making modules.
[0028] The intelligent decision-making and planning unit is the logical core of the system, used to generate the picking sequence and the robotic arm's motion trajectory. The intelligent decision-making and planning unit includes a reinforcement learning path planning module and a multi-arm task collaboration module.
[0029] The reinforcement learning path planning module receives the environmental state vector and uses a pre-trained deep neural network policy model to calculate the optimal action policy in the current state. The reinforcement learning path planning module models the picking operation as a sequential decision-making process, comprehensively considering path length, robotic arm energy consumption, and collision risk, and outputs a picking sequence for a specific fruit, as well as the corresponding end effector target pose and approach path point.
[0030] The multi-arm task coordination module is used for task allocation and conflict detection in dual-arm or multi-arm operation modes. Based on the kinematic workspace range of each robotic arm, the module divides the set of fruits to be harvested into different task subsets. Simultaneously, the module models the geometric envelope of the planned trajectory of each robotic arm in the time dimension and calculates the minimum Euclidean distance between the envelopes of different robotic arms. When the calculated minimum Euclidean distance is less than a preset safety threshold, the module applies time delay or speed scaling commands to the lower-priority robotic arms to achieve conflict-free scheduling of multi-arm parallel operations.
[0031] The motion control execution unit is used to convert upper-level decision commands into lower-level motor drive signals. It receives Cartesian space trajectory point data from the intelligent decision planning unit and calculates the target angles and angular velocities of each joint axis using an inverse kinematics solver. The motion control execution unit also includes a closed-loop feedback controller, which compensates for the actual motion error of the robotic arm in real time based on feedback data from the joint encoder and torque sensor, and controls the clamping force and shearing timing of the end effector.
[0032] The edge-cloud collaborative interaction unit manages data flow and model iteration between the edge and the cloud. It monitors the confidence index of the model inference in real time. When the confidence index falls below a preset threshold, the unit marks the corresponding image frame and sensor data as low-confidence sample data. The unit uploads the low-confidence sample data to the cloud database for annotation and training via a wireless communication network, and periodically receives updated network model weight parameters from the cloud, performing online upgrades to the algorithm models in the local visual perception data processing unit and intelligent decision-making and planning unit.
[0033] The aforementioned units interact with each other through shared memory or message middleware, forming a closed-loop control logic from environmental perception, decision planning, action execution to model evolution.
[0034] This invention provides an operating method for an autonomous intelligent fruit and vegetable harvesting robot. The operating method is coordinated by an edge computing control module 50 to move the frame 10, the vision perception module 40, the multi-degree-of-freedom robotic arm 20, and the harvesting end effector 30 in sequence.
[0035] The workflow begins with the navigation and positioning phase. The edge computing control module 50 sends a movement command to the mobile frame 10. The mobile frame 10 travels to the target work area according to the movement command and adjusts its position and heading angle so that the plants to be harvested enter the preset inverse kinematic reachable working domain of the multi-degree-of-freedom robotic arm 20. After the mobile frame 10 stops moving, it sends a position arrival signal back to the edge computing control module 50.
[0036] During the environmental perception and global scanning phase, the edge computing control module 50 controls the global camera located at the front end of the mobile frame 10 to start. The global camera scans the plants within the current field of view, acquiring large-scale three-dimensional point cloud data and RGB image data, including fruits, branches, leaves, and support poles. The visual perception data processing unit receives the above data, runs a target detection algorithm to identify the three-dimensional coordinates and maturity status of the target fruits within the field of view, identifies the spatial distribution of obstacles, and constructs the environmental state vector at the current moment.
[0037] During the intelligent decision-making and path planning phase, the intelligent decision-making and planning unit receives the environmental state vector. The reinforcement learning path planning module inputs the environmental state vector into a pre-trained deep neural network model to calculate the picking target sequence that meets the criteria of lowest energy consumption or shortest path. For each target fruit in the picking target sequence, the intelligent decision-making and planning unit combines obstacle distribution information to generate a corresponding collision-free motion trajectory and end-approach angle.
[0038] During the multi-arm collaborative verification phase, when the system is configured in dual-arm or multi-arm operation mode, the multi-arm task collaboration module performs time-dimensional interference detection on the motion trajectories generated by each robotic arm. The multi-arm task collaboration module calculates the minimum distance between the spatial envelopes of different robotic arms at the same time. If the detected minimum distance is less than a preset safety threshold, the multi-arm task collaboration module adjusts the execution time or speed of the lower-priority robotic arm to generate the final collaborative control command.
[0039] During the motion execution and harvesting phase, the motion control execution unit parses the coordinated control commands into motor drive signals for each joint of the multi-degree-of-freedom robotic arm 20. The multi-degree-of-freedom robotic arm 20 drives the harvesting end effector 30 to approach the target fruit along the planned path. During this process, a local camera located at the end of the robotic arm continuously acquires close-range images, the edge computing control module 50 calculates the pixel deviation between the current end pose and the target fruit, and corrects the motion trajectory of the multi-degree-of-freedom robotic arm 20 in real time based on a visual servo control algorithm. When the harvesting end effector 30 reaches the predetermined grasping position, the harvesting claw closes to fix the fruit, and the front-end shearing component performs a cutting action.
[0040] During the closed-loop detection and state update phase, the harvesting end effector 30 places the separated fruit into the collection device. The edge computing control module 50 re-detects the target area via the visual perception module 40. If it is confirmed that the target fruit has been removed from the plant, the system updates the environmental state vector, marks the fruit as harvested, and continues to execute the harvesting task of the next target in the sequence; if the target fruit is still detected, the system marks this target as abnormal or triggers a retry mechanism.
[0041] During the edge-cloud collaboration and model evolution phase, the edge-cloud collaboration interaction unit continuously monitors the model inference output of each of the above stages. When the confidence index of visual recognition falls below a preset threshold or consecutive harvesting failures occur, the edge-cloud collaboration interaction unit captures the current sensor data and system logs, and uploads them to the cloud server via the wireless communication module 60. The cloud server uses the uploaded data to train and update the model, and then sends the optimized model parameters to the edge computing control module 50 via the wireless communication module 60 to complete the iteration of the algorithm model.
[0042] The reinforcement learning path planning module inside the intelligent decision-making and planning unit constructs the continuous operation process of the autonomous picking robot into a Markov decision process, which consists of a state space, an action space, and a reward function.
[0043] The reinforcement learning path planning module first constructs a state space. At any given moment, the system collects the robot's motion state information and the perception information of the working environment, combining them to generate a current state vector. Specifically, the current state vector includes the following data components: End-effector motion state data: including the three-dimensional position coordinates of the end effector of the multi-degree-of-freedom robotic arm 20 in the base coordinate system and the current instantaneous velocity vector; Target fruit feature data: represents the set of target fruits that have not yet been picked within the current field of view. Each fruit feature includes its own three-dimensional center coordinates relative to the base coordinate system and a maturity classification label. Obstacle feature data: obtained by the visual perception data processing unit based on depth point cloud data segmentation, including spatial geometric bounding box parameters of branches, supporting rods and unripe fruits.
[0044] To address the issue of the dynamic change in the number of fruits in the target fruit feature data during the harvesting process, and in order to adapt to the fixed input dimension of the deep neural network model, the reinforcement learning path planning module arranges the target fruit set according to the preset maximum number, fills in the insufficient parts with zero padding, and then performs numerical normalization and vector concatenation on the above-mentioned data to form a high-dimensional state tensor input to the deep neural network.
[0045] The reinforcement learning path planning module then constructs the action space. For the current state vector, the action instructions output by the reinforcement learning path planning module employ hybrid action encoding, specifically including a discrete decision variable and a continuous control variable: The discrete decision variable is used to select the index of the next target fruit from the remaining fruit set, thus establishing the current picking target; The continuous control variables are used to define the motion trajectory parameters of the multi-degree-of-freedom robotic arm 20 from its current position to the target fruit, specifically manifested as the coordinates of the path control point in Cartesian space or the angular velocity vector in joint space.
[0046] Correspondingly, the policy network model within the reinforcement learning path planning module is designed with two parallel output branches, which are used to output the probability distribution for the discrete decision variable and the regression value for the continuous control variable, respectively.
[0047] During the state transition process, after the system executes the aforementioned action instructions, the visual perception data processing unit detects the picking result. If the target fruit is successfully separated, the system removes the target fruit from the remaining fruit set, collects the updated end effector position and velocity, and reconstructs a new state vector for the next round of decision-making.
[0048] This invention provides a deep neural network architecture for harvesting path planning. The deep neural network architecture is deployed within the intelligent decision-making and planning unit of the edge computing control module 50 and is loaded and run by the reinforcement learning path planning module. The deep neural network architecture employs a multi-branch feature extraction and multi-output network structure to process high-dimensional state tensors and map them to obtain mixed action instructions.
[0049] The deep neural network architecture includes a feature extraction subnetwork. This subnetwork comprises three parallel encoding branches: an ontology state encoding branch, an environment perception encoding branch, and an obstacle encoding branch.
[0050] The ontology state encoding branch is used to process end-effector motion state data. The ontology state encoding branch employs a multilayer perceptron structure, with modified linear unit activation functions configured between each layer to map motion parameters in physical space into high-dimensional ontology feature vectors.
[0051] The environment-aware coding branch processes the target fruit feature data. Since the target fruit feature data is a fixed-dimensional tensor padded with zeros, the environment-aware coding branch employs a multi-head self-attention mechanism layer. This multi-head self-attention mechanism layer extracts permutation-invariant global contextual information by calculating the weights of the interrelationships within the fruit feature sequence. Simultaneously, the environment-aware coding branch introduces a masking operation, forcing the attention weights corresponding to zero-padding regions to negative infinity to eliminate interference from invalid data in feature extraction, ultimately outputting an environmental context feature vector.
[0052] The obstacle coding branch is used to process obstacle feature data. It employs a one-dimensional convolutional neural network, using a sliding convolution kernel to extract features from the spatial bounding box parameters of obstacles, and outputs an obstacle distribution feature vector.
[0053] The deep neural network architecture includes a feature fusion layer. This layer is connected to the outputs of the ontology state encoding branch, the environment perception encoding branch, and the obstacle encoding branch. The feature fusion layer performs a vector concatenation operation, merging the ontology feature vector, the environment context feature vector, and the obstacle distribution feature vector into a single global state representation vector. Subsequently, the feature fusion layer reduces the dimensionality of this global state representation vector through a fully connected layer to extract the fused feature vector.
[0054] The deep neural network architecture includes an action policy output subnetwork. The action policy output subnetwork is connected to the output of the feature fusion layer and contains two independent decision branches.
[0055] The discrete strategy branch is used to generate fruit selection instructions. The end of the discrete strategy branch is connected to a Softmax normalized exponential function layer, the output dimension of which is consistent with the preset maximum number of fruits. The discrete strategy branch outputs the probability distribution of each fruit to be picked being selected as the next target, and the system determines the discrete decision variables based on this probability distribution.
[0056] The continuous control branch is used to generate the motion trajectory parameters of the robotic arm. The ends of the continuous control branch are sequentially connected to a hyperbolic tangent activation function layer and an amplitude scaling layer. The hyperbolic tangent activation function layer limits the output values to the [-1, 1] interval; the amplitude scaling layer maps the values in the [-1, 1] interval to the maximum allowable angular velocity or maximum acceleration range of each joint of the robotic arm, outputting deterministic continuous control variables.
[0057] The deep neural network architecture also includes a value evaluation sub-network. This value evaluation sub-network is connected in parallel to the action policy output sub-network and then to the feature fusion layer. The value evaluation sub-network outputs a scalar value through a multilayer perceptron structure. This scalar value represents the value assessment of the current state and is used to calculate the advantage function during model training, guiding the parameter optimization of the action policy output sub-network.
[0058] This invention provides a method for constructing a reward function to guide the training of a deep reinforcement learning model. The method is executed by a reinforcement learning path planning module within an edge computing control module 50. The reinforcement learning path planning module calculates the comprehensive reward value for the current time step based on the current environmental state vector and action instructions. The comprehensive reward value is constructed using a multi-objective linear weighted combination method, and its components specifically include an objective guidance reward, a posture constraint reward, a safety obstacle avoidance penalty, and a work efficiency penalty.
[0059] The reinforcement learning path planning module first calculates the target-guided reward. This reward includes a distance-guided component and a sparse success component. The module calculates the spatial Euclidean distance between the end effector position of the multi-degree-of-freedom robotic arm 20 and the center coordinates of the currently selected target fruit in real time. The module uses the difference between the Euclidean distance at the previous time step and the current time step as the distance-guided component, providing a positive potential energy guide reward to encourage the end effector to continuously approach the target. When the spatial Euclidean distance is less than a preset grasping threshold and the visual perception data processing unit confirms successful fruit separation, the module superimposes a fixed positive sparse reward, marking the completion of a single harvesting task.
[0060] The reinforcement learning path planning module then calculates the posture constraint reward. It extracts the fruit growth posture vector and the current axis vector of the end effector from the environmental state vector. The module then calculates the cosine similarity between the fruit growth posture vector and the current axis vector. A higher cosine similarity value indicates that the end effector's approach direction is more parallel to the fruit growth axis, resulting in a higher posture reward value from the reinforcement learning path planning module. This constrains the multi-degree-of-freedom robotic arm 20 to maintain the optimal operating posture in accordance with fruit growth.
[0061] The reinforcement learning path planning module calculates a safety obstacle avoidance penalty. Based on the obstacle distribution feature vector, the module calculates the minimum distance between the links and end effector of the multi-degree-of-freedom robotic arm 20 and the nearest obstacle bounding box. When the minimum distance is less than a preset safety warning threshold, the module generates a negative penalty value that increases exponentially with decreasing distance. If a physical collision is detected in simulation or actual operation, the module directly applies a large truncation penalty and forcibly terminates the current training round, forcing the policy network to learn to avoid obstacle areas.
[0062] The reinforcement learning path planning module calculates a task efficiency penalty term. This penalty term includes a time step penalty and a smoothness penalty. The module applies a fixed, small negative value as the time step penalty at each task time step to guide the model to converge to the shortest task path. Furthermore, the module calculates the rate of change of the angular velocity of each of the 20 joints of the multi-degree-of-freedom robotic arm and uses the norm of this rate of change as the smoothness penalty to suppress abrupt acceleration and deceleration, ensuring smooth motion.
[0063] Finally, the reinforcement learning path planning module performs a weighted sum of the above reward and penalty items based on preset importance weight coefficients to obtain the comprehensive reward value at the current time step. This comprehensive reward value is input into the value evaluation subnetwork for advantage function calculation and is also used to update the gradient parameters of the policy network.
[0064] This invention provides a method for training and online inference of a strategy model based on a combination of virtual and real-world approaches. The method is collaboratively implemented by a cloud server and an edge computing control module 50, with the core algorithm logic executed by the reinforcement learning path planning module within the intelligent decision-making and planning unit. The training and inference strategy is divided into an offline simulation training phase, a model deployment and inference phase, and an online iterative update phase.
[0065] During the offline simulation training phase, a cloud server constructs a high-fidelity virtual simulation environment based on a physics engine. The reinforcement learning path planning module initializes the weight parameters of the policy network and value network within this high-fidelity virtual simulation environment. To eliminate model differences between the simulation environment and the real physical environment, the reinforcement learning path planning module employs domain randomization techniques. Specifically, at the beginning of each training round, the reinforcement learning path planning module randomly samples physical parameters from a preset uniform distribution and applies them to the high-fidelity virtual simulation environment. These physical parameters include ambient light intensity, the coefficient of friction of object surfaces, the link mass of the multi-degree-of-freedom robotic arm 20, and sensor noise intensity. By introducing parameter perturbations, the reinforcement learning path planning module enables the policy network to extract general features that are robust to environmental changes.
[0066] During data acquisition, the reinforcement learning path planning module controls the virtual robotic arm instance to interact with the environment. At each time step, the reinforcement learning path planning module outputs an action probability distribution through the policy network based on the current environmental state vector. To ensure sufficient exploration of the state space, the reinforcement learning path planning module performs random sampling based on the action probability distribution to generate exploratory action commands for execution. The reinforcement learning path planning module records the state vector, action commands, comprehensive reward value, and the state vector at the next time step during this process, encapsulates them into experience tuples, and stores them in the trajectory data buffer.
[0067] During parameter optimization, when the amount of data in the trajectory data buffer reaches a preset threshold, the reinforcement learning path planning module uses the value evaluation sub-network to calculate the generalized advantage estimate of the current state. Subsequently, the reinforcement learning path planning module constructs a comprehensive loss function and optimizes it using the stochastic gradient descent algorithm. The comprehensive loss function is composed of a weighted combination of the following three parts: Policy gradient loss term: used to increase the output probability of high-dominance actions in the policy network based on the generalized dominance estimate; Value prediction loss term: used to minimize the error between the predicted value and the actual return of the value assessment sub-network; Policy entropy regularization term: Used to maximize the information entropy of the policy distribution to encourage the model to maintain its exploratory capabilities and prevent the policy from converging prematurely to a local optimum.
[0068] The reinforcement learning path planning module calculates the gradient based on the comprehensive loss function and updates the weight parameters of the action policy output subnetwork and the value evaluation subnetwork simultaneously.
[0069] During the model deployment and simulation phase, the converged network model is deployed to the real edge computing control module 50. Unlike the training phase, the reinforcement learning path planning module adopts a deterministic strategy during actual operation. For discrete policy branches, the reinforcement learning path planning module directly selects the fruit index with the highest probability value; for continuous control branches, the reinforcement learning path planning module directly outputs the mean of the probability distribution as the deterministic trajectory parameter, without adding random noise. This deterministic strategy ensures the motion stability and repeatability of the multi-degree-of-freedom robotic arm 20 during actual operation.
[0070] During the online iterative update phase, the edge-cloud collaborative interaction unit is responsible for closed-loop feedback. When the edge computing control module 50 encounters samples with visual recognition confidence levels below the threshold or experiences harvesting failures in actual operation, the edge-cloud collaborative interaction unit records the raw sensor data and corresponding state vectors at that moment, marks them as difficult samples, and uploads them to the cloud server. The cloud server uses these difficult samples to fine-tune the parameters of the simulation environment and incrementally train the strategy model. The updated model parameters are periodically pushed to the edge computing control module 50 via the wireless communication network, enabling continuous evolution of the control strategy.
[0071] This invention provides a task allocation strategy that optimizes the picking order of multiple targets. The task allocation strategy is specifically executed by the intelligent decision-making and planning unit within the edge computing control module 50. The intelligent decision-making and planning unit includes a task allocation module, which spatially partitions and sorts the multiple targets within the global field of view before the reinforcement learning path planning module executes individual action planning.
[0072] The task allocation module first performs an accessibility analysis of the workspace. The visual perception data processing unit outputs the 3D coordinate data of all fruits to be picked within the global field of view. The task allocation module loads the kinematic model of the multi-DOF robotic arm 20 to obtain the maximum work envelope space of the end effector at the current base position. The task allocation module maps the 3D coordinate data of the fruits to be picked into the maximum work envelope space. The task allocation module uses an inverse kinematics algorithm to eliminate invalid fruit targets located outside the maximum work envelope space or at kinematic singularities, thus obtaining a set of valid work targets.
[0073] The task allocation module then performs region partitioning based on spatial density. Given the uneven distribution of fruits in the unstructured environment, the task allocation module employs a density-based spatial clustering strategy to group the effective task target set. The module calculates the spatial Euclidean distance between each fruit target and groups fruits whose spatial Euclidean distance is less than a preset clustering threshold into the same task sub-region. Each task sub-region is defined as an independent task cluster. The task allocation module calculates the geometric centroid of all fruit coordinates within the task cluster and marks the geometric centroid as the navigation anchor point of that task sub-region.
[0074] The task allocation module constructs an inter-cluster traversal sequence. Using the current end effector position of the multi-DOF robotic arm 20 as the starting node and the navigation anchor points of each work sub-region as the target nodes, the module constructs a fully connected undirected weighted graph. To balance movement distance and robotic arm joint adjustment range, the module comprehensively calculates the edge weights between any two nodes in the undirected weighted graph. The magnitude of the edge weights is positively correlated with the spatial Euclidean distance between the two nodes and with the average joint angle change of the multi-DOF robotic arm 20 under the corresponding postures of the two nodes. The module introduces a preset posture cost balance coefficient to adjust the proportion of spatial distance and joint angle change in the weight calculation. The module uses a heuristic global path search algorithm (e.g., solving the Traveling Salesman Problem) to optimize the undirected weighted graph, determine the priority processing order of each work sub-region, and generate a global task queue.
[0075] The task allocation module performs intra-cluster task sorting. For the current task sub-region that is first in the sorting, the task allocation module generates an intra-cluster picking sequence based on the vertical distribution characteristics of the fruits and the occlusion relationship of the fruit stalks. The task allocation module compares the depth coordinate values of each fruit in the cluster relative to the base coordinate system, prioritizing the fruit with the smallest depth coordinate value and no occlusion at the beginning of the sequence, and placing fruits located deep or with occlusion relationships at the end of the sequence. The task allocation module sends the sorted list of fruits in the current cluster to the reinforcement learning path planning module.
[0076] The task allocation module implements a dynamic replanning mechanism. During the harvesting process, if the visual perception data processing unit detects a significant change in the environment—such as displacement of the target fruit due to leaf rebound or the appearance of new obstacles in the field of view—the task allocation module immediately freezes the current queue. Based on the updated environmental state data, the task allocation module re-performs reachability analysis and sequence optimization only for task clusters that have not yet been executed, ensuring the task allocation strategy's adaptability to the dynamic environment.
[0077] This invention provides a real-time control mechanism to ensure the safety of robotic arm operations. This real-time control mechanism is executed by the intelligent decision-making and planning unit within the edge computing control module 50, specifically through its internal conflict monitoring and resolution module. The conflict monitoring and resolution module is connected to the reinforcement learning path planning module and the visual perception data processing unit, and is used to perform motion verification and safety corrections before the action commands are issued to the underlying driver.
[0078] The conflict monitoring and resolution module constructs a real-time collision detection geometric model for the multi-DOF robotic arm 20. To reduce computational load and meet real-time requirements, the module employs a bounding volume approximation strategy to model the physical entities. The module abstracts the base, links, and end effector of the multi-DOF robotic arm 20 into several capsule bounding volumes. Simultaneously, based on obstacle point cloud data output by the visual perception data processing unit, the module constructs the minimum oriented bounding box capable of enclosing obstacles. Using geometric intersection testing algorithms (such as the separating axis theorem), the module calculates in real-time the minimum Euclidean distance and the nearest point coordinate pairs between each capsule bounding volume and all oriented bounding boxes.
[0079] The conflict monitoring and resolution module executes a redundant obstacle avoidance strategy based on null-space projection. When the minimum Euclidean distance is less than a preset safety interference threshold, the conflict monitoring and resolution module calculates an obstacle avoidance velocity vector that causes the affected link to move away from the obstacle surface normal direction based on artificial potential field logic. Given the kinematic redundancy of the multi-degree-of-freedom robotic arm 20, the conflict monitoring and resolution module synthesizes the final joint velocity command using gradient projection logic. The conflict monitoring and resolution module calculates the current Jacobian matrix of the multi-degree-of-freedom robotic arm 20 and constructs the corresponding null-space projection matrix. The conflict monitoring and resolution module multiplies the obstacle avoidance velocity vector with the null-space projection matrix to generate a self-motion joint velocity component that only acts on the null space of the robotic arm. The conflict monitoring and resolution module superimposes the self-motion joint velocity component onto the joint velocity command derived from the main task. This superposition operation enables the active obstacle avoidance by adjusting the link attitude using redundant degrees of freedom while strictly maintaining the end effector's adherence to the main task's operating trajectory.
[0080] The conflict monitoring and resolution module implements a singular configuration avoidance and handling mechanism. During the movement of the multi-degree-of-freedom robotic arm 20, the conflict monitoring and resolution module performs singular value decomposition on the Jacobian matrix in real time and calculates the current operability index. When the operability index is lower than a preset singularity warning value, it indicates that the axes of the robotic arm's joints are collinear and approaching a kinematic singularity point. The conflict monitoring and resolution module activates a damped least squares solver. When solving the inverse kinematic solution of the Jacobian matrix, the damped least squares solver introduces a regularized damping factor on the diagonal of the product of the Jacobian matrix and its transpose. The regularized damping factor is used to limit the velocity command norm in the joint space, preventing sudden changes in joint angular velocity and control divergence caused by ill-conditioned Jacobian matrix, thereby ensuring motion stability while allowing for small end-effector position errors.
[0081] The conflict monitoring and resolution module performs deadlock detection and rollback operations. If the conflict monitoring and resolution module detects that the number of reciprocating oscillations of the multi-degree-of-freedom robotic arm 20 near the same position exceeds a preset threshold, or that a feasible solution satisfying obstacle avoidance constraints cannot be found within the joint limit range under the current configuration, the conflict monitoring and resolution module determines that the system is in a deadlock state. The conflict monitoring and resolution module sends an interrupt signal to the reinforcement learning path planning module and retrieves the safe configuration data from the previous moment from the historical trajectory buffer. The conflict monitoring and resolution module controls the multi-degree-of-freedom robotic arm 20 to perform rollback motion along the historical path until it exits the deadlock region, and triggers the task allocation module to replan the operation path.
[0082] This invention provides a method for real-time inference and hard example selection at the edge, which is executed by an edge-cloud collaborative interaction unit, a visual perception data processing unit, and an intelligent decision-making and planning unit within an edge computing control module 50. This method is used to execute deep reinforcement learning strategies under conditions of limited edge computing resources and to select high-value samples for model iteration.
[0083] The edge computing control module 50 first performs lightweight deployment of the policy model. The edge-cloud collaborative interaction unit receives the trained policy network model from the cloud server. Using model quantization technology, the edge-cloud collaborative interaction unit converts the floating-point weight parameters in the policy network model into half-precision floating-point or integer formats. The edge-cloud collaborative interaction unit then calls the inference acceleration engine to perform operator fusion and memory reuse optimization on the computation graph of the policy network model. This lightweight deployment enables the intelligent decision-making and planning unit to complete forward inference computation from environmental state vector input to action command output within a preset time period.
[0084] During online simulation, the edge-cloud collaborative interaction unit executes a difficult example monitoring task based on uncertainty measurement in parallel. The intelligent decision-making and planning unit outputs an action probability distribution based on the current environmental state vector. The edge-cloud collaborative interaction unit calculates the information entropy of the action probability distribution. The information entropy is used to quantify the uncertainty of the policy network regarding the current decision, and its value is positively correlated with the dispersion of the action probability distribution. The edge-cloud collaborative interaction unit compares the information entropy with a preset uncertainty warning threshold. When the information entropy is greater than the preset uncertainty warning threshold, the edge-cloud collaborative interaction unit determines that the current environmental state vector is on the distribution boundary of the model training data and marks the current environmental state vector as an uncertain difficult example sample.
[0085] The edge-cloud collaborative interaction unit combines temporal difference error to screen samples for the value dimension. After the multi-degree-of-freedom robotic arm 20 performs an action, the edge-cloud collaborative interaction unit obtains the immediate reward value from the environmental feedback. The edge-cloud collaborative interaction unit calculates the value deviation index. The value deviation index is equal to the absolute value of the difference between the target value constructed based on the immediate reward value and the predicted value of the next state and the predicted value of the current state. When the value deviation index exceeds a preset threshold, the edge-cloud collaborative interaction unit determines that the value assessment subnetwork's assessment of the state has a significant deviation, that is, the state sequence contains key features that the model has not mastered, and marks the corresponding state sequence as a value deviation hard example sample.
[0086] The edge-cloud collaborative interaction unit performs trigger-based filtering based on task results. It monitors the operational status feedback of the multi-DOF robotic arm 20 in real time. When it detects a failure in path planning, a null value returned by the inverse kinematics solver, an alarm signal issued by the collision detection module, or visual closed-loop feedback confirming fruit separation failure, the edge-cloud collaborative interaction unit triggers negative sample capture logic. It then backtracks and extracts the raw sensor data streams, internal state vector sequences, and control command logs before and after the anomaly occurred, constructing a fault scene data package.
[0087] The edge-cloud collaborative interaction unit performs structured encapsulation and asynchronous uploading of data. The unit stores the data identified as uncertain and difficult examples, value deviation and fault scene data packets in a local high-priority storage queue, and does not persistently store unlabeled ordinary samples. When the edge-cloud collaborative interaction unit detects that the network communication quality meets preset transmission conditions, it sends the data in the high-priority storage queue to the cloud server via the wireless communication module. The cloud server uses the uploaded data to calibrate the simulation environment parameters and adds the difficult examples to the experience replay buffer for targeted model retraining.
[0088] This invention provides a cloud-based model training and knowledge distillation method, which is executed by an edge-cloud collaborative interaction unit within a cloud server and edge computing control module 50. This method utilizes high-value samples uploaded from the edge to update the simulation environment and employs knowledge transfer technology to compress the performance of high-computing-power models into the edge-side models, achieving closed-loop iteration of the system.
[0089] The cloud server first performs parameter calibration and domain adaptation for the simulation environment. The cloud server receives uncertainty difficulty samples, value deviation difficulty samples, and fault scene data packets uploaded by the edge-cloud collaborative interaction unit. The simulation environment calibration module configured within the cloud server parses the raw sensor data and state vectors in the aforementioned data packets, extracting statistical features of the real physical environment. The simulation environment calibration module calculates the distribution difference between the real working environment data and the data generated in the current virtual simulation environment. Based on this distribution difference, the simulation environment calibration module uses system identification or adversarial generative network logic to reverse-correct the physical parameters of the virtual simulation environment. These physical parameters include light intensity distribution, texture features, object surface friction coefficient, and joint damping coefficient. Through this parameter calibration process, the cloud server reduces the inter-domain difference between the virtual training environment and the real working environment, ensuring the effectiveness of subsequent training.
[0090] The cloud server performs incremental training on a high-complexity policy model. The cloud server maintains a teacher policy network with deeper layers and a larger parameter scale compared to the edge model. In a calibrated virtual simulation environment, the cloud server performs reinforcement learning training on the state scenarios corresponding to uploaded hard example samples. The cloud server updates the parameters of the teacher policy network using an augmented experience dataset containing the hard example samples. Due to ample computing resources and stronger feature extraction capabilities, the teacher policy network can learn optimal control policies for high-complexity hard example scenarios.
[0091] The cloud server performs model compression and transfer based on knowledge distillation. To transfer the performance of the teacher policy network to a student policy network (i.e., a lightweight edge-side policy network) suitable for operation by the edge computing control module 50, the cloud server constructs a policy distillation training framework. In this framework, the cloud server simultaneously inputs the same sequence of state vectors into both the teacher and student policy networks.
[0092] The cloud server utilizes a temperature-scaling-based soft probability distribution matching technique for policy transfer. The cloud server obtains the unnormalized log probability of actions output by the teacher's policy network. Introducing a preset temperature coefficient, the cloud server divides the unnormalized log probability of actions by the temperature coefficient and performs a normalization exponent operation to generate a softened action probability distribution. The temperature coefficient is greater than 1 to smooth the probability distribution curve and highlight the relative relationships between non-maximum probability actions. Simultaneously, the cloud server applies the same temperature scaling processing to the output of the student's policy network. The cloud server calculates the relative entropy between the softened action probability distribution of the teacher's policy network and the softened action probability distribution of the student's policy network. With minimizing this relative entropy as the optimization objective, the cloud server performs gradient updates on the weight parameters of the student's policy network, forcing the decision logic of the student's policy network to approximate that of the teacher's policy network.
[0093] The cloud server manages and distributes model versions. Once the performance metrics of the student policy network reach the preset convergence criteria, the cloud server encapsulates and versions the model parameters. The cloud server then pushes the updated model parameter package to the edge computing control module 50 via a wireless communication network. Upon receiving the new model, the edge-cloud collaborative interaction unit performs online parameter replacement during non-task idle periods, loading the new policy model, thereby completing a closed-loop control policy iteration process from fault data collection, cloud simulation reproduction, model capability enhancement, to edge deployment and application.
[0094] This invention provides a model update and deployment method, which is executed by an end-cloud collaborative interaction unit within a cloud server-edge computing control module 50. The method achieves seamless online updates of the strategy model through a double-buffered memory mechanism and atomic switching logic.
[0095] The cloud server first performs model packaging and version release. After completing knowledge distillation and verifying that the student policy network performance indicators meet preset requirements, the cloud server packages the updated neural network weight parameter file, network structure configuration file, and version description metadata to generate an encrypted model deployment package. The cloud server uses an asymmetric encryption algorithm to digitally sign the encrypted model deployment package. The cloud server publishes the encrypted model deployment package to the content delivery network interface and sends a version update notification message to the edge computing control module 50. The version update notification message includes the version number of the new model, the file size in bytes, and the SHA-256 hash checksum.
[0096] The edge-cloud collaborative interaction unit performs incremental download and integrity verification. Upon receiving the version update notification message, the edge-cloud collaborative interaction unit compares the version number of the currently running model in the edge computing control module 50 with the version number of the new model. If an updated version is confirmed, the edge-cloud collaborative interaction unit initiates the download process. The edge-cloud collaborative interaction unit uses a file transfer protocol that supports resuming interrupted downloads to pull the encrypted model deployment package from the cloud server. After the download is complete, the edge-cloud collaborative interaction unit calculates the hash value of the local file. Only when the hash value of the local file completely matches the SHA-256 hash checksum does the edge-cloud collaborative interaction unit decrypt and decompress the encrypted model deployment package; otherwise, the edge-cloud collaborative interaction unit deletes the erroneous file and triggers a re-download process.
[0097] The edge-cloud collaborative interaction unit performs background preloading and computation graph initialization. It maintains a dual-buffered running area in the system memory of the edge computing control module 50, consisting of a main running area and a backup loading area. At this time, the old strategy model resides in the main running area and continuously controls the multi-degree-of-freedom robotic arm 20. The edge-cloud collaborative interaction unit loads the decompressed new strategy model into the backup loading area. It then calls the inference engine interface to perform computation graph construction and memory allocation operations on the new strategy model, putting it into a hot standby state.
[0098] The edge-cloud collaborative interaction unit performs atomic pointer switching under safe conditions. The edge-cloud collaborative interaction unit reads the real-time motion status data of the multi-degree-of-freedom robotic arm 20. When it detects that the multi-degree-of-freedom robotic arm 20 is in a zero-speed hold-at-a-time state and has no pending action commands, the edge-cloud collaborative interaction unit triggers model switching logic. Under the protection of a mutex lock mechanism, the intelligent decision-making and planning unit performs a function entry pointer swapping operation. The intelligent decision-making and planning unit instantly redirects the entry address of the inference call from the main running area to the backup loading area and marks the original main running area as pending release. The atomic pointer switching operation is completed within a single system control cycle, ensuring the continuity of control signals.
[0099] The edge-cloud collaborative interaction unit implements a runtime monitoring and automatic rollback mechanism. Within a preset monitoring window after the new strategy model is launched, the edge-cloud collaborative interaction unit statistically analyzes the inference time and task success rate of the intelligent decision-making and planning unit. If the edge-cloud collaborative interaction unit detects that the inference time exceeds a preset latency threshold or that consecutive task planning failures occur, the edge-cloud collaborative interaction unit determines that the new model is malfunctioning. The edge-cloud collaborative interaction unit immediately performs a rollback operation, resetting the function entry pointer to the previous version of the model stored in non-volatile memory, and generates an exception log file which is uploaded to the cloud server.
[0100] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. An autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model, characterized in that, include: The mobile frame (10) serves as the mobile base for the robot; A multi-degree-of-freedom robotic arm (20) is used to harvest and prune fruits and vegetables. The multi-degree-of-freedom robotic arm (20) is mounted on the mobile frame (10). A harvesting end effector (30) is used to approach the target fruit along a planned path. The harvesting end effector (30) is fixedly connected to the end of a multi-degree-of-freedom robotic arm (20). The visual perception module (40) is equipped with an RGB-D depth camera for acquiring environmental data. The visual perception module (40) is set on the surface of the front end of the mobile frame (10). A wireless communication module (60) is used to transmit data with a cloud server, and the wireless communication module (60) is electrically connected to the upper surface of the mobile frame (10); The edge computing control module (50) is electrically connected to the mobile frame (10), the multi-degree-of-freedom robotic arm (20), the harvesting end effector (30), the visual perception module (40), and the wireless communication module (60), respectively. The edge computing control module (50) integrates a visual perception data processing unit, an intelligent decision planning unit, and an edge-cloud collaborative interaction unit; the intelligent decision planning unit is equipped with a reinforcement learning path planning module, a task allocation module, and a conflict monitoring and resolution module.
2. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 1, characterized in that, The conflict monitoring and resolution module is configured as follows: Construct the capsule enclosure of the multi-degree-of-freedom robotic arm (20) and the minimum orientation bounding box of the obstacles, and calculate the minimum Euclidean distance based on the geometric intersection test algorithm; When the minimum Euclidean distance is less than the preset safety interference threshold, the obstacle avoidance speed vector is calculated based on the artificial potential field logic. Calculate the Jacobian matrix of the multi-degree-of-freedom robotic arm (20) and its corresponding null space projection matrix; Multiply the obstacle avoidance velocity vector by the null space projection matrix to generate self-moving joint velocity components that act only on the null space of the robotic arm; The self-moving joint velocity components are superimposed on the joint velocity commands derived from the main task to adjust the linkage attitude while keeping the end effector from deviating from the main task's working trajectory.
3. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 2, characterized in that, The conflict monitoring and resolution module is also configured to perform the following operations: The operability index of the Jacobian matrix is calculated in real time. When the operability index is lower than the preset singularity warning value, the damped least squares solver is activated, and a regularization damping factor is introduced on the diagonal of the product of the Jacobian matrix and its transpose. The number of oscillations of the multi-degree-of-freedom robotic arm (20) is monitored. When the system is determined to be in a deadlock state, the safety configuration data of the previous moment is retrieved from the historical trajectory buffer, and the multi-degree-of-freedom robotic arm (20) is controlled to perform a retraction motion.
4. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 1, characterized in that, The edge-cloud collaborative interaction unit is configured to perform a difficult case monitoring task based on uncertainty measurement: Model quantization technology is used to convert the policy network model distributed from the cloud into half-precision floating-point or integer format; During the online simulation, the information entropy of the action probability distribution output by the intelligent decision-making and planning unit is calculated; The information entropy is used to quantify the uncertainty of the policy network regarding the current decision, and its value is positively correlated with the dispersion of the action probability distribution. When the information entropy is greater than the preset uncertainty warning threshold, the current environmental state vector is marked as an uncertain difficult case sample.
5. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 4, characterized in that, The edge-cloud collaborative interaction unit is also configured to perform the following operations: Calculate the value deviation index, which is equal to the absolute value of the difference between the target value constructed based on the immediate reward value and the predicted value of the next state and the predicted value of the current state; when the value deviation index exceeds a preset threshold, it is marked as a difficult case of value deviation. When the path planning solution fails, the inverse kinematics solver returns a null value, the collision detection module issues an alarm signal, or the visual closed-loop feedback confirms that the fruit separation has failed, the negative sample capture logic is triggered to construct a fault scene data packet. Only the uncertain case samples, the value deviation case samples, and the fault site data packets are stored in a high-priority storage queue and uploaded through the wireless communication module (60).
6. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 5, characterized in that, The edge-cloud collaborative interaction unit is used to upload the fault site data packet so that the cloud server can perform parameter calibration of the simulation environment. The parameter calibration is based on the distribution difference between real working environment data and data generated in the current virtual simulation environment. It uses system identification or adversarial generative network logic to reverse correct the light intensity distribution, texture features, object surface friction coefficient and joint damping coefficient of the virtual simulation environment.
7. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 1, characterized in that, The edge computing control module (50) runs a student policy network, which is generated by the cloud server based on the teacher policy network through knowledge distillation. The knowledge distillation process includes: introducing a temperature coefficient, dividing the unnormalized log probability of actions output by the teacher policy network by the temperature coefficient, and then performing a normalization exponent operation to generate a softened action probability distribution; and updating the weight parameters of the student policy network with the optimization objective of minimizing the relative entropy between the teacher policy network and the student policy network.
8. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 1, characterized in that, The edge-cloud collaborative interaction unit is configured to perform model updates as follows: Receive a version update notification message, the message containing a SHA-256 hash checksum; Download the encrypted model deployment package using a file transfer protocol that supports breakpoint resumption. Calculate the local hash value of the downloaded file, and decrypt and decompress the encryption model deployment package only when the local hash value completely matches the SHA-256 hash checksum.
9. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 8, characterized in that, The edge-cloud collaborative interaction unit is also configured as follows: The edge computing control module (50) maintains a double-buffered running area in its system memory, including a main running area and a spare loading area; The new strategy model is loaded into the backup loading area, and the inference engine interface is called to perform computation graph construction and memory allocation operations on the new strategy model, so that it enters a hot standby state without affecting the operation of the old strategy model residing in the main running area.
10. The autonomous intelligent fruit and vegetable harvesting robot equipped with a deep learning model according to claim 9, characterized in that, The edge-cloud collaborative interaction unit is also configured to perform atomic pointer switching in a secure state: When it is detected that the multi-degree-of-freedom robotic arm (20) is in a zero-speed holding state and there are no action commands to be executed, the function entry pointer exchange operation is performed under the protection of the mutex lock mechanism; The function entry pointer swapping operation is completed within a single system control cycle, redirecting the entry address of the inference call from the main running area to the backup loading area; If the inference time exceeds the preset latency threshold after the switch, a rollback operation is performed, and the function entry pointer is pointed to the previous version model stored in non-volatile memory.