Agricultural product sorting defect identification method based on reinforcement learning

By constructing a reinforcement learning mechanism that combines cell cavity complex partitioning and Lie group manifold orthogonal correction, the problem of decision instability in agricultural product sorting strategy networks under complex environments is solved, achieving efficient defect identification and improved sorting accuracy.

CN122252402APending Publication Date: 2026-06-23BEIJING WUFU JIYE AGRICULTURAL TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING WUFU JIYE AGRICULTURAL TECHNOLOGY CO LTD
Filing Date
2026-03-31
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing agricultural product sorting strategy networks struggle to accurately define decision boundaries when faced with interference factors such as changes in lighting, shading, or blurred defect characteristics. This leads to significant fluctuations in sorting decisions and a lack of quantitative risk assessment for uncertain areas. Consequently, sorting strategies cannot be dynamically adjusted, which can result in the erroneous rejection of high-quality agricultural products or the missed detection of defective products, thus limiting the robustness and economic benefits of the sorting system.

Method used

By constructing a cell cavity complex partitioning and edge operator operation based on homology group risk blind zone mapping, topological invariants are extracted, random point sets are generated and Delaunay triangulation is performed to screen stable confidence regions. Combined with the online parameter update mechanism of Lie group manifold orthogonal correction, the robustness and accuracy of defect identification are improved.

Benefits of technology

It solves the problems of dimensionality curse and initialization instability in high-dimensional parameter space, ensures that the sorting strategy network has a geometrically optimal parameter structure, improves the robustness of defect identification and sorting accuracy, and provides a sorting solution with high generalization ability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122252402A_ABST
    Figure CN122252402A_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on reinforcement learning's agricultural product sorting defect identification method, it is related to computer vision technical field, including the following steps: S1, reinforcement learning state space is constructed;S2, generates reinforcement learning total reward value;S3, output sorting strategy network initialization parameter;S4, output initial sorting strategy network;S5, utilize improved Crossformer model to construct kernel matrix mapping generation difference sequence tensor, calculate truncated logarithmic signature reconstruction parameter, generate online sorting strategy network;S6, output sorting action instruction;S7, output final online sorting strategy network.The application overcomes the defects of manifold structure collapse and parameter drift caused by traditional gradient update, realizes the technical effect of improving defect identification robustness and sorting precision while maintaining network manifold orthogonality, and provides a high generalization ability solution for agricultural product intelligent sorting.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision technology, and in particular to a method for identifying defects in agricultural product sorting based on reinforcement learning. Background Technology

[0002] As the demands for real-time performance and accuracy in agricultural product sorting operations continue to increase, deep learning-based sorting strategy networks, while excelling in feature extraction, still face significant theoretical challenges in handling complex defect decisions. Existing online sorting strategy networks primarily rely on the geometric distance of feature vectors or simple threshold segmentation to distinguish defects. This purely numerical difference-based approach ignores the complex topological structures and uncertainties inherent in the deep feature space of the image. When faced with interference factors such as changes in illumination, occlusion, or blurred defect features, traditional methods struggle to accurately define decision boundaries, leading to significant fluctuations in sorting decisions. Furthermore, existing sorting decision mechanisms often lack quantitative risk assessment for uncertain regions, failing to dynamically adjust sorting strategies based on risk costs when features are blurred. This can easily result in the misrepresentation of high-quality agricultural products or the missed detection of defective products, limiting the robustness and economic efficiency of the sorting system.

[0003] Therefore, how to provide a reinforcement learning-based method for identifying defects in agricultural product sorting is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0004] This invention proposes a reinforcement learning-based method for identifying defects in agricultural product sorting. Through a reinforcement learning reward correction mechanism based on homology group risk blind zone mapping, the risk cost convex set is subjected to cavity complex decomposition and edge operator operations to extract topological invariants containing Betti numbers. Based on this, the potential risk blind zone in the decision space is mapped to correct the penalty vector. The corrected total reward value induces a probability density field to generate a random point set. Delaunay triangulation is constructed to establish neighborhood relationships, stable confidence regions are screened, and tensor train decomposition is performed to update the core tensor sequence on a low-dimensional manifold. This mechanism effectively solves the dimensionality curse and initialization instability problems in high-dimensional parameter spaces by establishing an optimization path from "topological risk assessment" to "low-dimensional manifold parameter initialization," ensuring that the sorting strategy network has a geometrically topologically optimal parameter structure in the initial stage. Furthermore, through an online parameter update mechanism based on Lie group manifold orthogonal correction, extreme decomposition is performed on the manually verified error tensor to obtain rotational scale components. A normalized rotation matrix is ​​constructed using Lie algebra exponential mapping, and combined with Lie group multiplication, a synthetic mapping is used to correct the network parameters. This mechanism overcomes the defects of manifold collapse and parameter drift caused by traditional gradient updates by constructing a closed-loop feedback from manual error verification to manifold structure correction. It achieves the technical effect of improving the robustness of defect identification and sorting accuracy while maintaining the orthogonality of the network manifold, providing a highly generalizable solution for intelligent sorting of agricultural products.

[0005] According to an embodiment of the present invention, a method for identifying defects in agricultural product sorting based on reinforcement learning specifically includes: S1. Map the original image of agricultural products into a sorting state hypercomplex matrix, use hypercomplex convolution to extract defect features and calculate the phase offset angle with sorting environment parameters to construct a reinforcement learning state space. S2. Construct a cavity complex based on the risk cost convex set partitioned by the state space of reinforcement learning, extract homology group features using the edge operator, calculate the Betti number to map the risk blind zone, and correct the penalty vector to generate the total reward value of reinforcement learning. S3. Generate random point sets based on the probability density field induced by the total reward value of reinforcement learning, construct Delaunay triangulation to establish neighborhoods, screen stable confidence regions and perform weighted smoothing, and output the sorting strategy network initialization parameters. S4. Reorganize the initialization parameters of the sorting strategy network into a high-order parameter tensor and perform tensor train decomposition. Use the cross approximation algorithm to update the core tensor sequence in the low-dimensional manifold and output the initial sorting strategy network. S5. Based on the initial sorting strategy network, construct the fine-tuning signal time path, use the improved Crossformer model to calculate Herringer distance to construct kernel matrix mapping to generate differential sequence tensors, use matrix multiplication operator to decompose and capture long-range correlations, generate cross-dimensional dependent features through orthogonal projection of algebraic invariants, and reconstruct parameters by normalized mapping and truncated log signature to generate online sorting strategy network. S6. Input real-time agricultural product images into the online sorting strategy network, calculate the upper and lower approximation sets of decision classes based on the defect feature equivalence relation, use rough entropy to measure uncertainty and output sorting action instructions according to the principle of minimum risk; S7. Execute sorting action instructions and obtain manual verification results. Obtain rotational scale components based on the tensor extreme decomposition of verification error. Correct the direction using exponential mapping and adjust the scale using confidence. Synthesize the mapping using Lie group multiplication to output the final online sorting strategy network.

[0006] Optionally, S1 specifically includes: S11. Map the RGB three-channel pixel values ​​of the original image of agricultural products to the imaginary part of a hypercomplex number, and construct a hypercomplex matrix of sorting status by combining the preset zero imaginary part and real part. S12. Construct a multi-channel convolution kernel weight matrix, perform sliding window multiplication and addition operations on the sorting state hypercomplex matrix and the multi-channel convolution kernel weight matrix, and extract the hypercomplex feature map. S13. Calculate the hypercomplex modulus and phase angle of each element in the hypercomplex feature map, and use the hypercomplex modulus as the defect feature amplitude; S14. Obtain the sorting environment parameters and map them to complex environmental features. Calculate the difference between the feature phase angle corresponding to the defect feature amplitude and the environmental phase angle of the complex environmental features, and use it as the phase offset angle. S15. Concatenate the defect feature amplitude and phase offset angle into vectors and perform a dimension flattening operation to generate a state feature vector. Define the state feature vector as the reinforcement learning state space perceived by the reinforcement learning agent.

[0007] Optionally, S2 specifically includes: S21. Based on the multidimensional topological structure of the reinforcement learning state space, the convex set of missed detection risk and overkill cost is discretized and a cell cavity complex structure is constructed. S22. Define addition operations on the cell cavity complex structure to generate chain groups, use edge operators to calculate the boundary chains of the chain groups, and construct chain complex sequences connecting chain groups of different dimensions. S23. Calculate the kernel space and image space of the edge operator based on the chain complex sequence, and use homology group theory to extract the topological invariants of the kernel space quotient to the image space to generate a topological feature vector that reflects the characteristics of risk and cost boundaries. S24. Calculate the Betti number of the topological feature vector to quantify the connected components and internal voids of the risk convex set, map the internal voids as potential risk blind spots in the decision space, use the Betti number as a weighting coefficient to correct the comprehensive penalty vector, and output the total reward value of reinforcement learning.

[0008] Optionally, S3 specifically includes: S31. Construct a probability density field for the parameter space based on the total reward value of reinforcement learning, generate a non-uniform random point set using the Poisson point process, and sample to obtain the discrete distribution of the parameter candidate particles. S32. Construct a Delaunay triangulation on a random point set on a parametric manifold, and use the empty circle property and geodesic distance to determine the local neighborhood relations and dual Voronoi regions of the parametric particles; S33. Calculate the centroid coordinates and circumsphere radius of each Delaunay simplex, and combine the fitness potential induced by the total reward value to select the neighborhood of the simplex with stable topology as the confidence region for parameter search. S34. Using the weighted aggregation of neighboring particle parameter information based on the centroid coordinates, the isolated noise points in the parameter space are eliminated through the local topology smoothing operator, and the initialization parameters of the sorting strategy network with the optimal geometric topology are output.

[0009] Optionally, S4 specifically includes: S41. Reorganize the sorting strategy network initialization parameters into a higher-order parameter tensor, perform tensor train decomposition on the higher-order parameter tensor, and generate a train decomposition format composed of the core tensor sequence. S42. Construct a low-dimensional parametric manifold based on the train decomposition scheme, calculate the Riemann gradient on the low-dimensional parametric manifold, and use the cross approximation algorithm to iteratively update the core tensor sequence. S43. During the iterative process of the cross-approximation algorithm, the residual norm of the core tensor sequence is calculated. When the residual norm is less than the preset convergence threshold, the iteration is terminated and the optimized core tensor sequence is output. S44. Reconstruct the optimized core tensor sequence using tensor train to restore it to a higher-order parameter tensor. Map the higher-order parameter tensor back to the network parameter space to generate the target policy network parameters. S45. Based on the target policy network parameters, configure the neural network weights and biases to construct and output the initial sorting policy network. This includes setting the initial sorting policy network as a multilayer perceptron structure, containing one input layer, two hidden layers, and one output layer; setting the number of nodes in the input layer to the vector dimension value after concatenating the defect feature amplitude and phase offset angle, used to receive the reinforcement learning state space vector; setting the number of neurons in both hidden layers to 64, and fitting the nonlinear mapping relationship from state to action through the ReLU function; setting the number of nodes in the output layer to the total number of sorting action instruction categories, and outputting the probability distribution of each sorting action through the Softmax function; defining the weight matrix and bias vector as trainable parameters of the initial sorting policy network.

[0010] Optionally, the improved Crossformer model includes a probability distribution embedding layer, a tensor network operator interaction layer, an affine cluster algebraic hybrid layer, and an output projection layer: The probability distribution embedding layer is used to construct a fine-tuning signal time path input tensor based on the initial sorting strategy network. It is assumed that each sub-dimensional vector of the fine-tuning signal time path input tensor follows an implicit probability distribution. The Heringer distance between each sub-dimensional vector is calculated, the Heringer kernel matrix is ​​constructed, and the probability distribution difference dimension sequence tensor is generated by mapping through kernel principal component analysis. The tensor network operator interaction layer is used to obtain the probability distribution difference dimension sequence tensor, construct the parameterized interaction kernel of the matrix multiplication operator format, decompose the high-dimensional feature interaction matrix into a low-dimensional core tensor sequence through tensor chain decomposition, use the core tensor sequence to capture the long-range correlation between dimensions and aggregate feature information to generate a global interaction representation tensor. The affine cluster algebraic hybrid layer is used to obtain the global interactive representation tensor, treat the local temporal feature vector as a set of points in the affine space, calculate the dimension of the ideal generator and cluster that define the feature manifold, and use algebraic geometric invariants to orthogonally project and fuse the features to generate cross-dimensional dependent features. The output projection layer is used to obtain the cross-dimensional dependency features of the affine cluster algebraic hybrid layer output, perform layer normalization on the cross-dimensional dependency features, and map them to the target dimension through a fully connected layer to output the cross-dimensional dependency feature vector. The truncated logarithmic signature is calculated and the parameter update is reconstructed to generate an online sorting strategy network.

[0011] Optionally, S6 specifically includes: S61. Input real-time agricultural product images into the online sorting strategy network, extract deep features of the images and map them to the decision information system, construct the defect feature equivalence relation based on the decision information system and divide the decision space into granular parts; S62. Calculate the upper approximation set and lower approximation set of the decision class based on the equivalence relation of defect features, and use the boundary domain of the upper approximation set and lower approximation set to quantify the uncertainty region of the sorting decision. S63. Construct a rough entropy model based on the uncertainty region of sorting decisions, calculate the uncertainty measure of the system, and construct a minimum risk Bayesian decision rule in combination with the loss function of sorting decisions. S64. Calculate the expected risk value of each sorting action according to the minimum risk Bayes decision rule, compare the expected risk values ​​and select the action with the minimum risk as the optimal decision, and convert the optimal decision into a control instruction to output the sorting action instruction.

[0012] Optionally, S7 specifically includes: S71. Execute the sorting action instruction and obtain the manual review result. Based on the manual review result, construct the error tensor of the strategy parameter matrix. Use the Lie algebra structure of the orthogonal group to perform extreme decomposition on the error tensor and output the orthogonal rotation component and positive definite scale component. S72. Using orthogonal rotation components, construct a normalized rotation matrix through Lie algebra exponential mapping, perform geometric attitude correction on the direction of the policy eigenvector, and output the direction correction matrix. S73. Using positive definite scale components and direction correction matrices, and combining them with the complex confidence level, construct an adaptive scaling factor, adjust the eigenvalues ​​for regularization, and output the intensity optimization matrix. S74. The reference direction correction matrix and the intensity optimization matrix are synthesized by Lie group multiplication. The synthesis result is mapped back to the original parameter space topology, and the final sorting strategy network that maintains the orthogonality of the manifold is output.

[0013] The beneficial effects of this invention are: (1) This invention achieves efficient initialization and dimensionality reduction of the parameter space for reinforcement learning policy networks by constructing a parameter initialization and tensor train decomposition manifold optimization mechanism based on Delaunay triangulation. A random point set is generated by inducing a probability density field using the total reward value. Neighborhood relationships are established and stable confidence regions are selected through Delaunay triangulation, ensuring the geometric stability of the initial parameters on the topological manifold. The higher-order parameter tensors are reorganized into a core tensor sequence using tensor train decomposition. Riemann gradient updates are performed on the low-dimensional manifold, and the core tensors are iteratively optimized using a cross-approximation algorithm. This mechanism effectively solves the curse of dimensionality problem in high-dimensional parameter spaces by combining geometric topological constraints with tensor algebraic decomposition, providing the policy network with an initial parameter solution possessing a superior manifold structure.

[0014] (2) This invention achieves deep capture of long-range correlation of fine-tuned signals and accurate extraction of cross-dimensional dependency features by constructing an improved Crossformer model and an interaction mechanism of algebraic geometric invariants. Heringer distance is used to construct a kernel matrix mapping to generate probability distribution difference sequence tensors, and matrix multiplication operators are used to decompose and capture long-range correlations between dimensions. An affine cluster algebraic hybrid layer is used to calculate the dimensions of ideal generators and clusters, and orthogonal projection and fusion operations are performed to generate cross-dimensional dependency features with algebraic invariance. This mechanism maps time-series fine-tuned signals to algebraic geometric space, and through the synergistic effect of tensor network operators and algebraic invariants, significantly improves the dynamic representation ability of online sorting strategy networks for complex defect features. Attached Figure Description

[0015] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is an overall flowchart of a reinforcement learning-based method for identifying defects in agricultural product sorting proposed in this invention. Figure 2 This is a flowchart illustrating the working principle of the improved Crossformer model, a reinforcement learning-based method for identifying defects in agricultural product sorting proposed in this invention. Detailed Implementation

[0016] The invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0017] refer to Figure 1 and Figure 2 A reinforcement learning-based method for identifying defects in agricultural product sorting, specifically including: S1. Map the original image of agricultural products into a sorting state hypercomplex matrix, use hypercomplex convolution to extract defect features and calculate the phase offset angle with sorting environment parameters to construct a reinforcement learning state space. S2. Construct a cavity complex based on the risk cost convex set partitioned by the state space of reinforcement learning, extract homology group features using the edge operator, calculate the Betti number to map the risk blind zone, and correct the penalty vector to generate the total reward value of reinforcement learning. S3. Generate random point sets based on the probability density field induced by the total reward value of reinforcement learning, construct Delaunay triangulation to establish neighborhoods, screen stable confidence regions and perform weighted smoothing, and output the sorting strategy network initialization parameters. S4. Reorganize the initialization parameters of the sorting strategy network into a high-order parameter tensor and perform tensor train decomposition. Use the cross approximation algorithm to update the core tensor sequence in the low-dimensional manifold and output the initial sorting strategy network. S5. Based on the initial sorting strategy network, construct the fine-tuning signal time path, use the improved Crossformer model to calculate Herringer distance to construct kernel matrix mapping to generate differential sequence tensors, use matrix multiplication operator to decompose and capture long-range correlations, generate cross-dimensional dependent features through orthogonal projection of algebraic invariants, and reconstruct parameters by normalized mapping and truncated log signature to generate online sorting strategy network. S6. Input real-time agricultural product images into the online sorting strategy network, calculate the upper and lower approximation sets of decision classes based on the defect feature equivalence relation, use rough entropy to measure uncertainty and output sorting action instructions according to the principle of minimum risk; S7. Execute sorting action instructions and obtain manual verification results. Obtain rotational scale components based on the tensor extreme decomposition of verification error. Correct the direction using exponential mapping and adjust the scale using confidence. Synthesize the mapping using Lie group multiplication to output the final online sorting strategy network.

[0018] In this embodiment, S1 specifically includes: S11. Scan the pixel matrix of the original image of agricultural products, and read the grayscale value of each pixel in the red, green and blue channels in sequence. Set the grayscale value of the red channel as the first imaginary component of the supercomplex number, set the grayscale value of the green channel as the second imaginary component of the supercomplex number, set the grayscale value of the blue channel as the third imaginary component of the supercomplex number, set the real component value to 0, and combine the real component with the three imaginary components to construct the sorting state supercomplex matrix.

[0019] S12. Construct a quaternion convolution kernel containing real and three imaginary parts. Slide the convolution window to traverse the sorting state hypercomplex matrix and extract hypercomplex pixels at positions within the window. Perform Hamiltonian product operations, specifically: cross-multiply the quaternion convolution kernel with the real and three imaginary parts of the pixel; subtract the product of each imaginary part of the convolution kernel and the corresponding imaginary part of the pixel from the product of the convolution kernel's real part and the pixel's real part to obtain the output real part; add the product of the convolution kernel's real part and each imaginary part of the pixel to the product of the corresponding imaginary part of the convolution kernel and the pixel's real part, and add the cross-product between the convolution kernel and different imaginary parts of the pixel to obtain the output imaginary parts; accumulate the operation results at all positions within the window to generate a hypercomplex feature map.

[0020] S13. Traverse each feature element in the hypercomplex feature map, read the real part value and the three imaginary part values ​​of the feature element, calculate the sum of the squares of the real part value and the squares of the three imaginary part values, perform the square root operation on the sum of squares to obtain the hypercomplex modulus, and use the hypercomplex modulus value as the defect feature amplitude; calculate the arctangent value of the ratio of the real part value to the first imaginary part value to obtain the feature phase angle.

[0021] S14. Collect temperature and humidity values ​​of the sorting environment, set the temperature weighting coefficient to 0.6 and the humidity weighting coefficient to 0.4, and calculate the environmental feature scalar by weighted summation of the temperature and humidity values; obtain the maximum and minimum values ​​of the environmental feature scalar, calculate the difference between the environmental feature scalar and the minimum value, and then divide by the difference between the maximum and minimum values ​​to obtain the normalized value; multiply the normalized value by the value of 2 and the value of pi to obtain the environmental phase angle; calculate the absolute value of the difference between the feature phase angle and the environmental phase angle, and use the absolute value of the difference as the phase offset angle.

[0022] S15. Perform vector concatenation operation on the magnitude values ​​of the defect features and the phase offset angle values ​​in dimensional order. Stretch the concatenated two-dimensional vector into a one-dimensional array to generate a state feature vector. Set the state feature vector as the reinforcement learning state space perceived by the reinforcement learning agent.

[0023] In this embodiment, S2 specifically includes: S21. Traverse all data points in the reinforcement learning state space, set the missed detection risk weight coefficient to 0.7, set the overkill cost weight coefficient to 0.3, calculate the weighted risk value of each data point, divide data points with the same weighted risk value range into a sub-region, and construct a missed detection risk convex set and an overkill cost convex set composed of multiple sub-regions; set the discretization grid size to 0.5, cut along the boundary of the risk convex set and the cost convex set, divide the continuous risk surface into a finite number of discrete cavity units, each cavity unit records the number of data points it contains and the average risk value, and combine them to construct a cavity complex structure.

[0024] S22. Assign a unique index number to each discrete cavity unit in the cavity complex structure. Set the addition operation rule to merge the number of data points and risk values ​​of adjacent cavity units. Traverse all adjacent cavity units and perform addition operations to generate a chain group containing multiple cavity units. Construct an edge operator matrix, calculate the boundary index number of each cavity unit in the chain group, set the index value of the boundary position to 1, and set the index value of the internal position to 0. Calculate the boundary chain of the chain group through matrix multiplication. Based on the connection relationship of the boundary chain, arrange and generate a chain complex sequence that connects chain groups of different dimensions.

[0025] S23. Read the current dimension edge operator matrix and the next dimension edge operator matrix from the chain complex sequence, construct the augmented matrix of the current dimension edge operator matrix, perform Gaussian elimination row transformation to transform the matrix into row minimum echelon form, extract the solution vector set corresponding to the all-zero rows as the kernel space of the current dimension; calculate the product of the next dimension edge operator matrix and its transpose matrix, perform diagonalization transformation to extract non-zero eigenvalues, calculate the eigenvector set corresponding to the non-zero eigenvalues ​​as the image space of the next dimension; count the dimension values ​​of the kernel space and the image space, subtract the dimension value of the image space from the dimension value of the kernel space to obtain the homology group dimension; extract the index positions of elements with values ​​greater than 0.1 in the basis vectors of the kernel space, locate the corresponding cell unit coordinates in the cell complex structure, map the cell unit coordinates to the boundary feature coordinates of the risk convex set and the cost convex set, and combine them to generate a topological feature vector reflecting the boundary features of risk and cost.

[0026] S24. Traverse the topological feature vectors, count the number of topological invariants with dimension 0 as the zeroth Betti number, and count the number of topological invariants with dimension 1 as the first Betti number. Set the zeroth Betti number to the number of independent connected components in the risk convex set, and set the first Betti number to the number of holes inside the risk convex set. Multiply the number of holes by a weight adjustment coefficient of 1.5 to calculate the risk blind zone penalty value. Add the risk blind zone penalty value to the basic penalty vector and update the comprehensive penalty vector. Set the weight coefficient of the basic reward value to 0.6 and the weight coefficient of the comprehensive penalty vector to 0.4. Multiply the basic reward value by the weight coefficient 0.6 and the comprehensive penalty vector value by the weight coefficient 0.4, calculate the sum of the two weighted results, and output the total reinforcement learning reward value.

[0027] The risk topology feature extraction and reward generation process proposed in this step is similar to the traditional reinforcement learning reward shaping method in that it is based on the risk assessment theory of state space. That is, the penalty signal is constructed by analyzing the relative positional relationship between state features and risk regions, and the multi-dimensional risk indicators are integrated into a scalar reward value by weighted summation.

[0028] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on Euclidean distance or simple probability models to construct reward functions and ignore the high-dimensional topological structure of the decision space. Instead of directly designing artificial penalty terms, this invention adds an algebraic topological analysis step, using cavity complexes to discretize the risk-cost convex set, rather than simply partitioning the space. In the feature extraction step, edge operators and homology group theory are used to extract the topological invariants of the kernel space quotient de-image space, rather than extracting shallow geometric or statistical features. Finally, in the reward generation step, the Betty number is calculated to quantify connected components and hole features, mapping holes to potential risk blind spots and using the Betty number to correct the penalty vector, rather than based on linear weighting with fixed weights.

[0029] The beneficial effect of the improvement is that the present invention can explicitly express the implicit connectivity and void structure in the decision space through the construction of cell cavity complexes and the extraction of homology group features. This breaks the limitation of traditional methods that only focus on local risks and ignore global topological blind spots, and realizes a dimensional leap from local empirical penalties to global topological constraints. This design uses topological invariants to accurately map risk blind spots, effectively solving the problem of missed detection or over-detection caused by the lack of knowledge of feature space, and significantly improving the safety and generalization ability of reinforcement learning strategies in complex sorting environments.

[0030] In this embodiment, S3 specifically includes: S31. Read the total reward value of reinforcement learning, read the weight matrix values ​​and bias vector values ​​of the pre-trained multilayer perceptron, and construct the coordinate matrix of the parameter space. The specific steps are as follows: arrange the values ​​of each row of the input layer weight matrix in order, arrange the values ​​of the hidden layer bias vector after the values ​​of the weight matrix, then concatenate the values ​​of each row of the output layer weight matrix in order, and finally concatenate the values ​​of the output layer bias vector to generate a one-dimensional parameter vector containing all parameters; arrange the parameter vectors after each update in chronological order, and take each parameter vector as a row of the matrix to generate a two-dimensional matrix with the number of rows equal to the number of updates and the number of columns equal to the total number of parameters, which is the coordinate matrix of the parameter space. The bandwidth parameter of the Gaussian kernel function is set to 1.0. The probability density field of the parameter space is constructed by the following steps: Calculate the difference between the coordinates of any sampling point in the parameter space and the coordinates of every known data point in the space in each dimension. Square the differences in each dimension and sum them to obtain the squared Euclidean distance between the two points. Square the value of the bandwidth parameter and multiply it by 2 to obtain the denominator value. Divide the squared Euclidean distance value by the denominator value to obtain the quotient value. Find the value of the natural constant e and calculate the negative quotient power of the natural constant e to obtain the single-point kernel function contribution value. For this sampling point, accumulate its single-point kernel function contribution values ​​with all known data points in the space. Set the sum of the accumulated values ​​as the probability density value at this sampling point. Iterate through all sampling points to complete the construction of the probability density field. Set the total number of sampled particles to 500, and perform non-uniform random sampling. The specific steps are as follows: calculate the ratio of the probability density value of each coordinate point in the probability density field to the sum of the probability densities of all points, and set the ratio as the sampled probability value of that coordinate point; generate a uniformly distributed random number between 0 and 1, calculate the cumulative sequence of the sampled probability values ​​of all coordinate points, find the coordinate point position in the cumulative sequence where the value is first greater than or equal to the random number, and set that position as the selected particle; repeat the random number generation and position search steps until the number of selected particles reaches 500, and generate a discrete distribution point set of parameter candidate particles.

[0031] S32. Read the discrete distribution set of candidate parameter particles, and select any three non-collinear parameter particles on the parameter manifold to construct an initial triangular mesh; traverse the remaining particles in the set and search for a triangular mesh containing the inserted particle. The specific steps are: traverse all existing triangular meshes, read the coordinates of the three vertices of a triangle, and calculate the coordinates of the center and radius of the circumcircle of the triangle; calculate the Euclidean distance between the coordinates of the inserted particle and the center coordinates, and compare the Euclidean distance with the radius of the circumcircle. If the Euclidean distance is less than the radius, it is determined that the inserted particle is located inside the circumcircle of the triangle, and the triangle is marked as a containing triangle; if no triangle containing the inserted particle is found... For the shape, calculate the mean coordinates of the inserted particle, find the existing triangle closest to this mean coordinates, and set it as the containing triangle; connect the inserted particle to the three vertices of the containing triangle to construct three new edges, delete the edges of the original containing triangle, and check whether the circumcircle of the newly constructed triangle contains other particles. If it does, swap the diagonals and reconstruct the triangle until the circumcircle of all triangles does not contain other particles, thus completing the Delaunay triangulation; calculate the norm of the difference between the parameter vectors of each parameter particle and the connected particles as the geodesic distance, select the particle pair with the smallest geodesic distance value as the local neighborhood relationship, and construct the dual Voronoi region based on the adjacency relationship of the triangle edges.

[0032] S33. Read the triangular mesh data after Delaunay triangulation, extract the coordinate vectors of the three vertices of each triangle, calculate the average value of the three vertex coordinate vectors in each dimension, and combine the average values ​​into a vector to obtain the centroid coordinates; calculate the Euclidean distance from the coordinates of the circumcircle center of the triangle to the coordinates of any vertex, and set the Euclidean distance value as the circumsphere radius; for each triangle vertex, find the historical data point with the closest Euclidean distance in the coordinate matrix of the original parameter space, read the total reinforcement learning reward value corresponding to the historical data point, and set it as the surrogate reward value of the vertex; read the surrogate reward values ​​of the three vertices, calculate the variance of the three reward values ​​as the fitness potential variance; set the circumsphere radius threshold to 2.0 and the variance threshold to 0.5, compare the circumsphere radius value of each triangle with the threshold value, compare the fitness potential variance value with the variance threshold value, retain the triangle region with the circumsphere radius less than 2.0 and the variance value less than 0.5, and set the retained region as the confidence region for parameter search.

[0033] S34. Traverse all parameter particles within the confidence region, read the vertex parameter vector and centroid coordinate values ​​of each triangle in the neighborhood, multiply each dimension value of the vertex parameter vector by the weight value of the corresponding dimension of the centroid coordinate, and sum the weighted results of the three vertices to obtain the aggregate parameter vector; calculate the Euclidean distance between the parameter vector of each particle in the local neighborhood and the aggregate parameter vector, set the distance threshold to 0.1, compare the Euclidean distance value with the distance threshold, if the Euclidean distance value is greater than 0.1, determine that the particle is an isolated noise point, extract the parameter vectors of all non-noise particles in the neighborhood, calculate the arithmetic mean of the parameter vectors of non-noise particles in each dimension, replace the original parameter vector of the isolated noise point with the arithmetic mean vector; output the smoothed parameter vector as the initialization parameter of the sorting strategy network.

[0034] In this embodiment, S4 specifically includes: S41. Reorganize the initialization parameters of the sorting strategy network into a higher-order parameter tensor. Perform tensor train decomposition on the higher-order parameter tensor to generate a train decomposition format composed of the core tensor sequence. The specific steps are as follows: Read the smoothed parameter vector and count the total length of the parameter vector; determine that the tensor order is third-order; calculate the cube root of the total length and round it up to obtain the single-dimensional length value; calculate the cube of the single-dimensional length value to obtain the total number of filled elements; compare the total number of filled elements with the total length value. If the total number of filled elements is greater than the total length value, generate a filling vector with a value of zero. Concatenate the filling vector to the end of the parameter vector so that the length of the concatenated vector is equal to the total number of filled elements. Based on the single-dimensional length value, reshape the concatenated vector into a third-order tensor with a three-dimensional cube structure. Set the rank parameter of the Tensor Train decomposition to 10. Read the length value of the first dimension of the third-order tensor and initialize the first core tensor to a dimension of one row, a column of the specified length, and a three-dimensional structure with a depth of ten layers. Read the length value of the second dimension of the third-order tensor and initialize the second core tensor to a dimension of ten rows, a column of the specified length, and a three-dimensional structure with a depth of ten layers. Read the length value of the third dimension of the third-order tensor and initialize the third core tensor to a dimension of ten rows, a column of the specified length, and a three-dimensional structure with a depth of one layer. Perform the Tensor Train decomposition operation to divide the third-order tensor into several parts along the first dimension. For each column vector, perform singular value decomposition (SVD) on it, retaining the left singular vectors corresponding to the top ten largest singular values ​​to construct the first core tensor. Using the product of the singular value matrix and the right singular matrix as input, perform SVD along the second dimension, retaining the components corresponding to the top ten largest singular values ​​to construct the second core tensor. The remaining decomposition results are directly assigned to the third core tensor, completing the solution for the internal values ​​of the three core tensors. Extract the three decomposed core tensors and arrange them in order from the first to the third dimension to generate a core tensor sequence, which is the train decomposition format.

[0035] S42. Construct a low-dimensional parametric manifold based on the train decomposition scheme, calculate the Riemann gradient on the low-dimensional parametric manifold, and use the cross-approximation algorithm to iteratively update the core tensor sequence. The specific steps are as follows: Read the core tensor sequence and count the total number of elements in all core tensors; according to the position order of the core tensors in the sequence, read the first core tensor and arrange all the element values ​​inside the core tensor into a row vector in row priority order, then read the subsequent core tensors in sequence and concatenate the element values ​​to the end of the row vector in row priority order to generate a low-dimensional coordinate vector containing all element values; set the space where the low-dimensional coordinate vector is located as the low-dimensional parametric manifold, and each value in the low-dimensional coordinate vector corresponds to a coordinate point on the manifold, completing the manifold construction and mapping; construct the left shrinking matrix sequence and the right shrinking matrix sequence, and perform shrinking operations on the remaining core tensors in the core tensor sequence except for the current core tensor to generate the left environment matrix and the right environment matrix respectively; set the objective function as the mean square error of the network output. The partial derivatives of the objective function with respect to each element in the current core tensor are calculated using the backpropagation algorithm. All partial derivatives are then arranged according to their element positions to generate an Euclidean gradient tensor. The Euclidean gradient tensor is then shrunk with the left and right environment matrices, and the result is scaled to obtain the Riemann gradient tensor. The cross-approximation algorithm is iterated from left to right, reading the first core tensor in the core tensor sequence, and marking the values ​​of the second and third core tensors as fixed constants. Finally, the inner product of the current core tensor and the Riemann gradient tensor is calculated to generate the current search result. The process involves: defining a set of candidate step size values ​​arranged in descending order; multiplying each candidate step size value by the Riemann gradient tensor; adding the product to the current core tensor to generate multiple candidate update tensors; calculating the objective function value for each candidate update tensor; extracting the candidate step size that minimizes the objective function value as the optimal update step size; subtracting the product of the optimal update step size and the Riemann gradient tensor from the current core tensor to obtain the updated core tensor value; repeating the above steps for the second and third core tensors until all core tensors have been updated once.

[0036] S43. During the iterative process of the cross-approximation algorithm, the residual norm of the core tensor sequence is calculated. The iteration is terminated when the residual norm is less than the preset convergence threshold, and the optimized core tensor sequence is output. The specific steps are as follows: After each iteration update, the core tensor sequences before and after the update are read, the sum of squares of the differences between corresponding elements in the core tensor sequences before and after the update is calculated, and the square root of the sum of squares is taken to obtain the residual norm value; the preset convergence threshold is set to 1e-5, and the relationship between the residual norm value and the convergence threshold is compared; if the residual norm value is greater than or equal to the convergence threshold, the algorithm is determined not to have converged, and the next iteration is continued; if the residual norm value is less than the convergence threshold, the algorithm is determined to have converged, and the iteration process is terminated; the core tensor sequence at the time of iteration termination is extracted and set as the optimized core tensor sequence.

[0037] S44. Reconstruct the optimized core tensor sequence using tensor train to restore it to higher-order parameter tensors. Map the higher-order parameter tensors back to the network parameter space to generate the target policy network parameters. The specific steps are as follows: Read the optimized core tensor sequence and initialize a temporary tensor variable with one row per dimension, a single-dimensional numerical column, and a depth of ten layers. Its value is equal to the value of the first core tensor. Following the order of the core tensor sequence, read the second core tensor, calculate the shrinkage operation between the temporary tensor variable and the second core tensor, align the third-dimensional depth of the temporary tensor variable with the first-dimensional depth of the second core tensor, and perform a multiplication and accumulation operation to generate a new temporary tensor variable. Read the third core tensor, calculate the shrinkage operation between the new temporary tensor variable and the third core tensor. The simplification operation aligns the third-dimensional depth of the temporary tensor variable with the first-dimensional depth of the third core tensor, performs a multiplication and accumulation operation, and generates a final tensor with one row, a column of product lengths of all dimensions, and one layer depth. The first and third dimensions with a value of 1 are removed from the final tensor, and the second dimension is retained. The tensor is then converted into a one-dimensional parameter vector containing the values ​​of all parameters. The number of layers and the number of neurons in each layer of the neural network are read. According to the arrangement order of the network layers, data segments of the corresponding length are extracted from the starting position of the one-dimensional parameter vector. The extracted data segments are reshaped into a matrix shape as the weight matrix. Subsequent data segments are extracted as bias vectors. This step is repeated until the parameters of all network layers have been extracted, generating the target policy network parameters.

[0038] S45. Based on the target policy network parameters, configure the neural network weights and biases, construct the initial sorting policy network, and output it. The specific steps are as follows: Read the one-dimensional parameter vector in the target policy network parameters. According to the arrangement order of the network layers, extract a data segment with a length equal to the product of the number of input layer nodes and the number of nodes in the first hidden layer from the starting position of the one-dimensional parameter vector. Reshape it into a 64x64 matrix as the weight matrix of the first hidden layer. Continue to extract a data segment with a length of 64 as the bias vector of the first hidden layer. Then extract a data segment with a length equal to the product of the number of nodes in the first hidden layer and the number of nodes in the second hidden layer. Reshape it into a 64x64 matrix as the weight matrix of the second hidden layer. Continue to extract a data segment with a length of 64 as the bias vector of the second hidden layer. Finally, extract a data segment with a length equal to the product of the number of nodes in the second hidden layer and the number of nodes in the output layer. Reshape it into a matrix with a length of 64 rows and 64 columns as the weight matrix of the output layer. Extract the remaining data segment as the bias vector of the output layer. The initial sorting strategy network is set as a multilayer perceptron structure, containing one input layer, two hidden layers, and one output layer. The number of nodes in the input layer is set to the dimension of the vector obtained by concatenating the magnitude of the defect feature and the phase offset angle, which is used to receive the reinforcement learning state space vector. The number of neurons in both hidden layers is set to 64. A linear transformation matrix is ​​constructed between the first and second hidden layers. A nonlinear transformation is performed on the linear transformation result. If the value is greater than 0, it remains unchanged; if the value is less than or equal to 0, it is set to 0, fitting the nonlinear mapping relationship from state to action. The number of nodes in the output layer is set to the total number of sorting action command categories. An exponential normalization operation is performed on the linear transformation result of the output layer, and the exponent value of each output value is calculated. All exponent values ​​are summed to obtain the denominator. The exponent value of each output value is divided by the denominator to obtain the probability value between 0 and 1, and the probability distribution of each sorting action is output. The weight matrix and bias vector are defined as the trainable parameters of the initial sorting strategy network, and the network construction is completed and output.

[0039] In this embodiment, the improved Crossformer model includes a probability distribution embedding layer, a tensor network operator interaction layer, an affine cluster algebraic hybrid layer, and an output projection layer: The probability distribution embedding layer is used to scan the input port of the initial sorting strategy network. The sampling frequency is set to 100 Hz, and the time series data of the fine-tuning signal is continuously read. The read time series data is divided into segments with a time step of 0.01 seconds, and the signal amplitude within each time step is extracted. The signal amplitudes are arranged into row vectors to construct the input tensor of the fine-tuning signal time path. The Heringer distance between any two sub-dimensional vectors is calculated. The specific steps are as follows: read the two target sub-dimensional vectors to be calculated, take the square root of each element in the vector to obtain the corresponding square root vector; calculate the absolute value of the difference between corresponding elements of the two square root vectors, sum all the absolute values ​​of the difference to obtain the sum; divide the sum by 2 to obtain an intermediate result, square the intermediate result, and the final value is the Heringer distance. The Heringer kernel matrix is ​​constructed as follows: First, initialize a zero matrix with N rows and N columns, where N is the total number of sub-dimensional vectors. Second, fill the calculated Heringer distance between any two sub-dimensional vectors into the intersection of the row and column indices of the zero matrix to generate the Heringer kernel matrix. Third, set the principal component retention ratio for kernel principal component analysis to 95%, and perform eigenvalue decomposition on the Heringer kernel matrix. The specific steps are: First, calculate the centered matrix of the Heringer kernel matrix; then, perform eigenvalue decomposition on the centered matrix to obtain the eigenvalue sequence and the corresponding eigenvector matrix. Fourth, arrange the eigenvalue sequence in descending order, and calculate the ratio of the sum of the first K eigenvalues ​​to the sum of all eigenvalues ​​to obtain the cumulative contribution rate. Fifth, extract the number K eigenvalues ​​whose cumulative contribution rate first exceeds the preset ratio of 95%. Sixth, extract the first K columns of eigenvectors from the eigenvector matrix to construct the projection matrix. The fine-tuning signal time path input tensor is mapped to a low-dimensional space composed of principal component vectors. The specific steps are as follows: calculate the matrix product of the fine-tuning signal time path input tensor and the transpose of the projection matrix, and use the calculation result as the probability distribution difference dimension sequence tensor.

[0040] The tensor network operator interaction layer is used to read the probability distribution difference dimension sequence tensor and construct a parameterized interaction kernel in the form of a matrix product operator. The specific steps are as follows: initialize an identity matrix with dimensions of R rows and R columns as the core tensor, set the rank parameter R to 15, and construct three core tensor sequences; calculate the interaction coefficients between features of different dimensions in the probability distribution difference dimension sequence tensor. The specific steps are as follows: read the feature vectors at the i-th and j-th positions in the tensor, calculate the dot product of the two feature vectors, divide the dot product by the dimension value of the feature vector and add a bias term of 0.1 to obtain the interaction coefficients; construct a high-dimensional feature interaction matrix based on the interaction coefficients. The high-dimensional feature interaction matrix is ​​decomposed into a sequence of low-dimensional core tensors using tensor chain decomposition. The specific steps are as follows: the high-dimensional feature interaction matrix is ​​reshaped into a third-order tensor format; singular value decomposition (SVD) is used to decompose the third-order tensor layer by layer, setting a truncation threshold of 0.01, retaining the components whose singular values ​​are greater than the truncation threshold, generating three low-dimensional core tensor sequences; the core tensor sequences are then used to capture long-range correlations between dimensions. Specifically, the shrinkage operation between adjacent core tensors is calculated, extracting the feature components at the i-th and (i+10)-th time steps, and calculating the covariance between the two components as a long-range correlation index; the feature vectors obtained from the shrinkage operation are summed and averaged to aggregate global feature information, generating a global interaction representation tensor.

[0041] The affine cluster algebraic hybrid layer is used to obtain the global interaction representation tensor, extract the feature vector corresponding to each time step in the global interaction representation tensor, and treat it as a set of discrete points in the affine space; calculate the ideal generator polynomial coefficients connecting the discrete point sets, the specific steps are as follows: set the highest degree of the polynomial to 3, construct the Vandermonde matrix, solve the coefficient vector of the Vandermonde matrix using the least squares method, count the number of non-zero elements in the coefficient vector as the number of ideal generators; count the number of non-zero terms in the polynomial coefficients as the dimension of the cluster. The computational order of algebraic geometric invariants is set to 2. The coordinate matrix of the discrete point set in affine space is calculated by: calculating the sum of squares and products of the coordinates of all points in the discrete point set to construct the second-order moment matrix; constructing an orthogonal projection matrix using the coordinate moments, specifically by: performing eigenvalue decomposition on the second-order moment matrix, extracting the two eigenvectors with the largest eigenvalues, and constructing the orthogonal projection matrix; projecting the eigenvectors onto a subspace spanned by the dimension of the cluster, calculating the cosine similarity between the eigenvectors before and after projection, and weighting and fusing features with similarity values ​​greater than 0.8 to generate cross-dimensional dependent features.

[0042] The output projection layer is used to read the cross-dimensional dependency features output by the affine cluster algebraic hybrid layer, calculate the mean and variance of the cross-dimensional dependency features in the feature dimension, and perform layer normalization on the features using the mean and variance values; construct a fully connected layer with an output dimension of 64 to map the normalized features to the target dimension and output the cross-dimensional dependency feature vector; calculate the truncated logarithmic signature of the cross-dimensional dependency feature vector, specifically: set the truncation depth to 5, map the cross-dimensional dependency feature vector to a path function, calculate the iterative integral of the path function in the interval [0,1], extract the absolute value of the iterative integral result and take the logarithm to the base of the natural constant e to obtain the truncated logarithmic signature value; extract the higher-order terms of the signature as key features, perform point-to-point addition operation between the key features and the parameter vector of the initial sorting strategy network, reconstruct the network parameter update, and generate the online sorting strategy network.

[0043] The improved Crossformer model proposed in this step is similar to the traditional Crossformer model in that it is based on the theory of cross-dimensional feature interaction. That is, by decomposing the input tensor into cross-dimensional embedding vectors, the attention mechanism is used to capture the dependencies between dimensions, and both use residual connections and normalization structures for deep feature extraction.

[0044] The difference lies in that this invention breaks away from the limitations of traditional Crossformer models, which rely solely on dot-product attention to calculate feature relevance or neglect probability distribution structure information. Instead of directly processing the original feature vectors, this invention adds a probability distribution embedding step, utilizing Heringer distance to measure the distribution differences of subdimensional vectors and construct a kernel matrix, rather than directly using Euclidean distance or dot-product similarity. In the feature interaction step, a parameterized interaction kernel is constructed using matrix multiplication operators, capturing long-range relevance through low-dimensional core tensor sequences, rather than relying on traditional inner product attention matrix operations. Finally, in the feature fusion step, an affine cluster algebraic hybrid layer is introduced, using algebraic geometric invariants to orthogonally project and fuse features, and reconstructing parameters through truncated logarithmic signatures, rather than simple dimension concatenation or element-wise addition.

[0045] The beneficial effects of the improvements are that this invention, through Herringer kernel matrix construction and matrix product operator decomposition, can accurately capture the differences in probability distribution in the feature space and the implicit long-range dependencies between dimensions, breaking through the limitations of traditional methods such as high computational complexity and difficulty in capturing deep correlations. It achieves an upgrade from shallow feature interaction to deep tensor network structure. The orthogonal projection fusion and logarithmic signature reconstruction using algebraic geometric invariants effectively preserve the intrinsic geometric structure information of the feature manifold, significantly enhancing the model's ability to represent complex cross-dimensional dependencies and improving the accuracy and convergence speed of online fine-tuning of the sorting strategy network.

[0046] In this embodiment, S6 specifically includes: S61. Input the real-time agricultural product image into the online sorting strategy network, read the red, green, and blue color values ​​of each pixel in the image, set the convolution kernel size to 3 rows and 3 columns, and set the weight values ​​in the convolution kernel to Gaussian distribution random initialization values; multiply the convolution kernel with the color values ​​of the surrounding neighborhood of the image pixel point by point and accumulate them to obtain the pixel feature sum value; add the bias value of 0.1 to the feature sum value to obtain the convolution feature value; perform a non-linear transformation on the convolution feature value, the specific steps are: determine whether the convolution feature value is greater than 0, if it is greater than 0, keep the value unchanged, if it is less than 0, then... If the value is equal to or less than 0, the value is set to 0, and the deep feature response value of the image is obtained. The feature response values ​​are arranged according to the pixel position to generate a feature vector sequence, which is then mapped to the decision information system. The feature similarity threshold is set to 0.05, and the Euclidean distance between any two vectors in the feature vector sequence is calculated. If the distance value is less than the threshold of 0.05, the two vectors are determined to belong to the same category, and a defect feature equivalence relation is constructed. Based on the equivalence relation, the feature vector sequence is divided into several disjoint vector sets, and each vector set is defined as an equivalence class particle, thus completing the granular partitioning of the decision space.

[0047] S62. Read the granularity partitioning results of the decision space, traverse each preset decision category label, filter out all equivalence class particles that are completely contained in the sample set corresponding to the decision category label, sum the sample counts within the filtered particles to generate the lower approximate set value of the decision category; filter out all equivalence class particles that intersect with the sample set corresponding to the decision category label, sum the sample counts within the filtered particles to generate the upper approximate set value of the decision category; calculate the difference between the upper approximate set value and the lower approximate set value, use the difference as the boundary domain value, and divide the boundary domain value by the total number of samples to calculate the proportion of the uncertainty region in the sorting decision.

[0048] S63. Based on the uncertainty region of sorting decisions, construct a rough entropy model and calculate the system uncertainty measure. The specific steps are as follows: count the total number of equivalence class particles contained in the boundary domain, denoted as M; count the total number of all samples in the universe of discourse, denoted as N; calculate the cardinality value of each equivalence class particle, i.e., the number of samples contained in the particle; divide the cardinality value of each equivalence class particle by the total number of the universe of discourse N to obtain the probability distribution value of the particle; calculate the logarithm of the probability distribution value of each particle with the natural constant e as the base; multiply the particle probability distribution value by the corresponding logarithm to obtain the information entropy component of the particle; sum the information entropy components of all M particles in the boundary domain, and then take the negative number to obtain the rough entropy value as the system uncertainty measure; read the loss function parameters of sorting decisions, set the loss value of correct sorting action to 0, the loss value of incorrect sorting action to 1, and the loss value of missed detection action to 0.5, combine the loss function parameters and the rough entropy value to calculate the conditional risk value of each decision action, and construct the minimum risk Bayesian decision rule.

[0049] S64. Calculate the expected risk value of each sorting action according to the minimum risk Bayesian decision rule. The specific steps are as follows: Read the conditional risk value corresponding to each sorting action, multiply the conditional risk value by the corresponding posterior probability value to obtain the risk product value, sum the risk product values ​​of all decision categories to obtain the expected risk value of the sorting action; compare the expected risk values ​​of all sorting actions and extract the action with the smallest expected risk value as the optimal decision; convert the category label corresponding to the optimal decision into binary control instructions, set the instruction transmission baud rate to 9600bps, and output the sorting action instructions through the communication interface.

[0050] In this embodiment, S7 specifically includes: S71. Execute the sorting action instruction and obtain the manual review result. Based on the manual review result, construct the error tensor of the strategy parameter matrix. The specific steps are as follows: Read the correct sorting label vector given by the manual review, calculate the difference between the correct label vector and the predicted label vector output by the current strategy network, arrange the difference into a matrix format, and construct the error tensor of the strategy parameter matrix; use the Lie algebra structure of the orthogonal group to perform extreme decomposition on the error tensor. The specific steps are as follows: reshape the error tensor into a square matrix, calculate the product of the square matrix and its transpose to obtain a symmetric positive definite matrix, perform eigenvalue decomposition on the symmetric positive definite matrix to obtain the eigenvector matrix and the eigenvalue diagonal matrix; construct orthogonal rotation components using the eigenvector matrix, construct a diagonal matrix using the diagonal elements in the eigenvalue diagonal matrix and perform square root operation to obtain the positive definite scale component.

[0051] S72. Using orthogonal rotation components, construct a normalized rotation matrix through Lie algebra exponential mapping. The specific steps are as follows: read the orthogonal rotation component matrix, calculate the antisymmetric part of the matrix to obtain the Lie algebra basis vectors; according to the Lie algebra exponential mapping formula, calculate the matrix exponent operation of the antisymmetric matrix, specifically by calculating the Taylor series expansion of the antisymmetric matrix and truncating the first 5 terms for summation, and outputting the normalized rotation matrix; perform geometric attitude correction on the policy eigenvector direction. The specific steps are as follows: read the eigenvector matrix of the current policy network, calculate the matrix product of the eigenvector matrix and the normalized rotation matrix, use the product result as the eigenvector after direction correction, and output the direction correction matrix.

[0052] S73. Using the positive definite scaling component and the orientation correction matrix, and combining the verification confidence, an adaptive scaling factor is constructed. The specific steps are as follows: Read the values ​​in the diagonal matrix of the positive definite scaling component and calculate the geometric mean of the diagonal elements; read the confidence value of the manual verification result, set the confidence threshold to 0.9, and if the confidence value is greater than the threshold, set the weighting coefficient to 1; otherwise, set the weighting coefficient to 0.5; multiply the geometric mean by the weighting coefficient to obtain the adaptive scaling factor; perform regularization adjustment on the eigenvalues. The specific steps are as follows: read the singular value sequence of the orientation correction matrix, multiply the singular value by the adaptive scaling factor, reconstruct the diagonal matrix using the adjusted singular values, and combine the left and right singular vectors of the orientation correction matrix to output the intensity optimization matrix.

[0053] S74. The direction correction matrix and the intensity optimization matrix are synthesized using Lie group multiplication. The specific steps are: calculate the matrix product of the direction correction matrix and the intensity optimization matrix to obtain the synthesized parameter matrix; map the synthesized result back to the original parameter space topology. Specifically, read the parameter space dimension values ​​of the initial sorting strategy network, reshape the synthesized parameter matrix to make its shape consistent with the initial parameter matrix; orthogonalize the reshaped matrix. Specifically, use the Gram-Schmidt orthogonalization algorithm to perform pairwise orthogonalization on the matrix column vectors, ensuring that the matrix column vectors are pairwise orthogonal and have a magnitude of 1, and output the final sorting strategy network that maintains the manifold orthogonality property.

[0054] The online sorting strategy network parameter update process proposed in this step, like the traditional fine-tuning method, is based on error backpropagation and parameter optimization theory. It uses error calculation to guide parameter correction in order to minimize losses.

[0055] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on Euclidean gradient descent while ignoring the manifold structure. It adds tensor extreme decomposition and Lie group mapping steps to decouple the error into rotation and scale components, uses Lie algebra exponential mapping to achieve geometric orientation correction, and completes the update through Lie group multiplication synthesis.

[0056] The beneficial effect of the improvement is that by decoupling the processing and preserving the structure mapping, the transformation from Euclidean unconstrained update to manifold space accurate update is realized, which ensures the geometric stability of the policy network, effectively alleviates catastrophic forgetting, and significantly improves the adaptability and robustness of the online sorting policy.

[0057] Example 1: To verify the feasibility of this invention in intelligent sorting and grading of agricultural products, the method of this invention was applied to the automated fruit and vegetable sorting production line of an agricultural technology company (hereinafter referred to as "Company A"). In traditional agricultural product sorting systems, simple color threshold segmentation or basic machine learning algorithms are usually used. These methods often struggle to accurately extract deep defect features when faced with fruit stem occlusion, irregular fruit surface defects, and drastic changes in light intensity. Furthermore, they lack a quantification mechanism for the uncertainty of sorting decisions, easily leading to missed detections or incorrect rejection of inferior fruits, resulting in economic losses. To solve the above problems, Company A decided to adopt the reinforcement learning-based agricultural product sorting defect identification method proposed in this invention.

[0058] During implementation, Company A first uses industrial cameras deployed above the sorting conveyor belt to acquire real-time image streams of agricultural products. It then extracts pixel feature sums through a 3x3 convolution operation, generates deep image feature response values ​​through nonlinear transformation, and maps these values ​​to the decision information system. Simultaneously, it calculates the Euclidean distance between any two vectors in the feature vector sequence, sets a feature similarity threshold of 0.05, constructs defect feature equivalence relations, and divides the feature vector sequence into several disjoint equivalence classes, thus completing the granular partitioning of the decision space.

[0059] Company A quantified the uncertainty region of sorting decisions by calculating the upper and lower approximation sets of decision classes and constructed a rough entropy model. Specifically, the rough entropy value was calculated as a measure of system uncertainty by statistically analyzing the probability distribution and logarithmic values ​​of equivalence class particles within the boundary domain. Combined with the set loss values ​​of 0 for correct sorting, 1 for incorrect sorting, and 0.5 for missed detection, a minimum-risk Bayesian decision rule was constructed, outputting sorting action instructions based on the principle of minimizing expected risk.

[0060] During the core strategy iteration phase, Company A constructs an error tensor for the strategy parameter matrix using manual review results. It then performs extreme decomposition of the error tensor using the Lie algebra structure of orthogonal groups, outputting orthogonal rotation components and positive definite scaling components. A canonical rotation matrix is ​​constructed using Lie algebra exponential mapping to perform geometric orientation correction on the strategy eigenvectors. An adaptive scaling factor is then constructed based on the manual review confidence level, outputting an intensity optimization matrix. Finally, the orientation correction matrix and the intensity optimization matrix are synthesized using Lie group multiplication, and processed using the Gram-Schmidt orthogonalization algorithm to output the final sorting strategy network that preserves the orthogonal property of the manifold.

[0061] Company A's technical team discovered during implementation that, compared to traditional color sorting and basic deep learning methods, the method of this invention significantly improves the accuracy and robustness of agricultural product sorting. Traditional methods cannot effectively handle defective samples with ambiguous boundaries and lack quantitative control over the risk of false detections. In contrast, the method of this invention effectively solves the classification problem of uncertain samples through rough set granularity partitioning and Bayesian risk decision-making, and achieves stable optimization of policy parameters through Lie group manifold correction.

[0062] To further verify the actual performance of the method of the present invention, Company A conducted a detailed comparative test between the method of the present invention and the traditional method. The specific performance data is shown in Table 1: Table 1. Performance Comparison of Online Sorting Methods for Agricultural Products by Company A

[0063] As shown in Table 1, the performance of the online agricultural product sorting system was comprehensively improved after applying the method of this invention. The defect identification accuracy increased from 88.6% using traditional methods to 98.2%, and the rate of incorrect rejection of substandard fruit decreased from 5.5% to 0.8%, significantly reducing waste caused by the erroneous discarding of superior fruit. The sorting accuracy for uncertain samples increased from 75.0% to 95.5%, verifying the effectiveness of rough entropy and Bayesian decision-making in handling fuzzy samples. The decision-making time for a single fruit and vegetable sorting was reduced from 45 milliseconds to 22 milliseconds, meeting the real-time requirements of high-speed production lines. Furthermore, the number of strategy iteration convergence steps was significantly reduced, indicating that the parameter correction method based on Lie group manifolds has better optimization efficiency. The economic loss caused by missed detections decreased from 250,000 yuan / year to 40,000 yuan / year, and the workload of manual review decreased by 75.0%, significantly reducing operating costs. Customer satisfaction also increased from 89.0% to 98.5%.

[0064] Through the method of this invention, Company A has successfully achieved high-precision intelligent sorting and grading of agricultural products, effectively solving the problems of difficulty in extracting complex features and lack of quantification of decision uncertainty, ensuring the consistency of sorting quality, greatly improving the automation and intelligence level of the production line, significantly reducing the burden of manual intervention, enhancing the stability and robustness of the system, and providing strong technical support for the industrialization of smart agriculture.

[0065] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A method for identifying defects in agricultural product sorting based on reinforcement learning, characterized in that, Includes the following steps: S1. Map the original image of agricultural products into a sorting state hypercomplex matrix, use hypercomplex convolution to extract defect features and calculate the phase offset angle with sorting environment parameters to construct a reinforcement learning state space. S2. Construct a cavity complex based on the risk cost convex set partitioned by the state space of reinforcement learning, extract homology group features using the edge operator, calculate the Betti number to map the risk blind zone, and correct the penalty vector to generate the total reward value of reinforcement learning. S3. Generate random point sets based on the probability density field induced by the total reward value of reinforcement learning, construct Delaunay triangulation to establish neighborhoods, screen stable confidence regions and perform weighted smoothing, and output the sorting strategy network initialization parameters. S4. Reorganize the initialization parameters of the sorting strategy network into a high-order parameter tensor and perform tensor train decomposition. Use the cross approximation algorithm to update the core tensor sequence in the low-dimensional manifold and output the initial sorting strategy network. S5. Based on the initial sorting strategy network, construct the fine-tuning signal time path, use the improved Crossformer model to calculate Herringer distance to construct kernel matrix mapping to generate differential sequence tensors, use matrix multiplication operator to decompose and capture long-range correlations, generate cross-dimensional dependent features through orthogonal projection of algebraic invariants, and reconstruct parameters by normalized mapping and truncated log signature to generate online sorting strategy network. S6. Input real-time agricultural product images into the online sorting strategy network, calculate the upper and lower approximation sets of decision classes based on the defect feature equivalence relation, use rough entropy to measure uncertainty and output sorting action instructions according to the principle of minimum risk; S7. Execute sorting action instructions and obtain manual verification results. Obtain rotational scale components based on the tensor extreme decomposition of verification error. Correct the direction using exponential mapping and adjust the scale using confidence. Synthesize the mapping using Lie group multiplication to output the final online sorting strategy network.

2. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, S1 specifically includes: S11. Map the RGB three-channel pixel values ​​of the original image of agricultural products to the imaginary part of a hypercomplex number, and construct a hypercomplex matrix of sorting status by combining the preset zero imaginary part and real part. S12. Construct a multi-channel convolution kernel weight matrix, perform sliding window multiplication and addition operations on the sorting state hypercomplex matrix and the multi-channel convolution kernel weight matrix, and extract the hypercomplex feature map. S13. Calculate the hypercomplex modulus and phase angle of each element in the hypercomplex feature map, and use the hypercomplex modulus as the defect feature amplitude; S14. Obtain the sorting environment parameters and map them to complex environmental features. Calculate the difference between the feature phase angle corresponding to the defect feature amplitude and the environmental phase angle of the complex environmental features, and use it as the phase offset angle. S15. Concatenate the defect feature amplitude and phase offset angle into vectors and perform a dimension flattening operation to generate a state feature vector. Define the state feature vector as the reinforcement learning state space perceived by the reinforcement learning agent.

3. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, S2 specifically includes: S21. Based on the multidimensional topological structure of the reinforcement learning state space, the convex set of missed detection risk and overkill cost is discretized and a cell cavity complex structure is constructed. S22. Define addition operations on the cell cavity complex structure to generate chain groups, use edge operators to calculate the boundary chains of the chain groups, and construct chain complex sequences connecting chain groups of different dimensions. S23. Calculate the kernel space and image space of the edge operator based on the chain complex sequence, and use homology group theory to extract the topological invariants of the kernel space quotient to the image space to generate a topological feature vector that reflects the characteristics of risk and cost boundaries. S24. Calculate the Betti number of the topological feature vector to quantify the connected components and internal voids of the risk convex set, map the internal voids as potential risk blind spots in the decision space, use the Betti number as a weighting coefficient to correct the comprehensive penalty vector, and output the total reward value of reinforcement learning.

4. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, S3 specifically includes: S31. Construct a probability density field for the parameter space based on the total reward value of reinforcement learning, generate a non-uniform random point set using the Poisson point process, and sample to obtain the discrete distribution of the parameter candidate particles. S32. Construct a Delaunay triangulation on a random point set on a parametric manifold, and use the empty circle property and geodesic distance to determine the local neighborhood relations and dual Voronoi regions of the parametric particles; S33. Calculate the centroid coordinates and circumsphere radius of each Delaunay simplex, and combine the fitness potential induced by the total reward value to select the neighborhood of the simplex with stable topology as the confidence region for parameter search. S34. Using the weighted aggregation of neighboring particle parameter information based on the centroid coordinates, the isolated noise points in the parameter space are eliminated through the local topology smoothing operator, and the initialization parameters of the sorting strategy network with the optimal geometric topology are output.

5. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, S4 specifically includes: S41. Reorganize the sorting strategy network initialization parameters into a higher-order parameter tensor, perform tensor train decomposition on the higher-order parameter tensor, and generate a train decomposition format composed of the core tensor sequence. S42. Construct a low-dimensional parametric manifold based on the train decomposition scheme, calculate the Riemann gradient on the low-dimensional parametric manifold, and use the cross approximation algorithm to iteratively update the core tensor sequence. S43. During the iterative process of the cross-approximation algorithm, the residual norm of the core tensor sequence is calculated. When the residual norm is less than the preset convergence threshold, the iteration is terminated and the optimized core tensor sequence is output. S44. Reconstruct the optimized core tensor sequence using tensor train to restore it to a higher-order parameter tensor. Map the higher-order parameter tensor back to the network parameter space to generate the target policy network parameters. S45. Based on the target policy network parameters, configure the neural network weights and biases to construct and output the initial sorting policy network. This includes setting the initial sorting policy network as a multilayer perceptron structure, containing one input layer, two hidden layers, and one output layer; setting the number of nodes in the input layer to the vector dimension value after concatenating the defect feature amplitude and phase offset angle, used to receive the reinforcement learning state space vector; setting the number of neurons in both hidden layers to 64, and fitting the nonlinear mapping relationship from state to action through the ReLU function; setting the number of nodes in the output layer to the total number of sorting action instruction categories, and outputting the probability distribution of each sorting action through the Softmax function; defining the weight matrix and bias vector as trainable parameters of the initial sorting policy network.

6. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, The improved Crossformer model includes a probability distribution embedding layer, a tensor network operator interaction layer, an affine cluster algebraic hybrid layer, and an output projection layer. The probability distribution embedding layer is used to construct a fine-tuning signal time path input tensor based on the initial sorting strategy network. It is assumed that each sub-dimensional vector of the fine-tuning signal time path input tensor follows an implicit probability distribution. The Heringer distance between each sub-dimensional vector is calculated, the Heringer kernel matrix is ​​constructed, and the probability distribution difference dimension sequence tensor is generated by mapping through kernel principal component analysis. The tensor network operator interaction layer is used to obtain the probability distribution difference dimension sequence tensor, construct the parameterized interaction kernel of the matrix multiplication operator format, decompose the high-dimensional feature interaction matrix into a low-dimensional core tensor sequence through tensor chain decomposition, use the core tensor sequence to capture the long-range correlation between dimensions and aggregate feature information to generate a global interaction representation tensor. The affine cluster algebraic hybrid layer is used to obtain the global interactive representation tensor, treat the local temporal feature vector as a set of points in the affine space, calculate the dimension of the ideal generator and cluster that define the feature manifold, and use algebraic geometric invariants to orthogonally project and fuse the features to generate cross-dimensional dependent features. The output projection layer is used to obtain the cross-dimensional dependency features of the affine cluster algebraic hybrid layer output, perform layer normalization on the cross-dimensional dependency features, and map them to the target dimension through a fully connected layer to output the cross-dimensional dependency feature vector. The truncated logarithmic signature is calculated and the parameter update is reconstructed to generate an online sorting strategy network.

7. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, S6 specifically includes: S61. Input real-time agricultural product images into the online sorting strategy network, extract deep features of the images and map them to the decision information system, construct the defect feature equivalence relation based on the decision information system and divide the decision space into granular parts; S62. Calculate the upper approximation set and lower approximation set of the decision class based on the equivalence relation of defect features, and use the boundary domain of the upper approximation set and lower approximation set to quantify the uncertainty region of the sorting decision. S63. Construct a rough entropy model based on the uncertainty region of sorting decisions, calculate the system uncertainty measure, and construct a minimum risk Bayesian decision rule in combination with the loss function of sorting decisions. S64. Calculate the expected risk value of each sorting action according to the minimum risk Bayes decision rule, compare the expected risk values ​​and select the action with the minimum risk as the optimal decision, and convert the optimal decision into a control instruction to output the sorting action instruction.

8. The method for identifying defects in agricultural product sorting based on reinforcement learning according to claim 1, characterized in that, Specifically, S7 includes: S71. Execute the sorting action instruction and obtain the manual review result. Based on the manual review result, construct the error tensor of the strategy parameter matrix. Use the Lie algebra structure of the orthogonal group to perform extreme decomposition on the error tensor and output the orthogonal rotation component and positive definite scale component. S72. Using orthogonal rotation components, construct a normalized rotation matrix through Lie algebra exponential mapping, perform geometric orientation correction on the policy eigenvector direction, and output the orientation correction matrix. S73. Using positive definite scale components and direction correction matrices, and combining them with the complex confidence level, construct an adaptive scaling factor, adjust the eigenvalues ​​for regularization, and output the intensity optimization matrix. S74. The reference direction correction matrix and the intensity optimization matrix are synthesized by Lie group multiplication. The synthesis result is mapped back to the original parameter space topology, and the final sorting strategy network that maintains the orthogonality of the manifold is output.