Hybrid rice seed identity and health state combined identification method

By employing multi-level association analysis and adaptive feature modulation mechanisms, the problem of insufficient model generalization ability in the joint identification of hybrid rice identity and health status is solved, achieving high accuracy and robust identification results, which are applicable to the identification of hybrid rice identity and health status.

CN122244855APending Publication Date: 2026-06-19WUHAN WUDA TIANYUAN BIOLOGY TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WUHAN WUDA TIANYUAN BIOLOGY TECH CO LTD
Filing Date
2026-04-29
Publication Date
2026-06-19

Smart Images

  • Figure CN122244855A_ABST
    Figure CN122244855A_ABST
Patent Text Reader

Abstract

This invention discloses a method for joint identification of hybrid rice seed identity and health status, specifically relating to the field of artificial intelligence image classification technology. It addresses the problems of feature confusion and insufficient generalization ability in existing multi-task learning models for hybrid rice seed identification due to the association of training data labels. By acquiring rice seed images and extracting deep feature maps, channel attention weights are obtained from the identity and health status classification branches to calculate the semantic relevance matrix. Based on this, highly relevant target feature channels are selected, and the spatial overlap between their feature activation maps under different tasks is calculated. Finally, adaptive suppression of the target feature channels is applied based on semantic relevance and spatial overlap. The decoupled feature maps are then input into the classification branch to obtain the final identification result. This method effectively removes spurious statistical associations in the data, improves the model's identification accuracy and generalization ability for associated unconventional samples, and thus meets the practical needs of high-reliability seed purity identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence image classification technology, and more specifically, to a method for joint identification of hybrid rice seed identity and health status. Background Technology

[0002] Automated detection and classification of crop seeds using computer vision technology is an important application of artificial intelligence in agriculture. Image classification methods based on deep convolutional neural networks have been attempted for identifying individual rice seeds, for example, by training models to distinguish different varieties or determine their appearance and health status. Existing solutions offer a direct approach: constructing a multi-task learning neural network model to simultaneously complete the two related tasks of identification and health assessment. This model typically uses a shared low-level feature extraction network and separate classification branches for the two tasks at higher levels. The aim is to achieve simultaneous automated identification of hybrid rice purity and quality by jointly outputting variety and health status labels through a single forward propagation.

[0003] However, in the actual production and quality inspection of hybrid rice, there may be inherent and non-uniform correlations between specific varieties and specific health states. For example, certain parental lines may be more prone to exhibiting certain moldy characteristics due to their physiological characteristics. When training a multi-task model based on such datasets with inherent label correlations, the learning process of the shared feature layer will be interfered with. The model is prone to capturing spurious statistical correlations in the data that are irrelevant to the classification essence, leading to confusion in the learned feature representations between tasks. This results in the model's high joint recognition accuracy on the training data being an overfit to surface correlations, with insufficient generalization ability. When faced with a small proportion of samples or different correlation patterns in the training data, such as healthy susceptible parents or moldy hybrids, the model's recognition performance will significantly decrease, making it difficult to meet the actual requirements of high-reliability seed purity identification. Summary of the Invention

[0004] In order to overcome the above-mentioned defects of the prior art, the present invention provides a method for joint identification of hybrid rice seed identity and health status to solve the problems mentioned in the background art.

[0005] To achieve the above objectives, the present invention provides the following technical solution:

[0006] A method for joint identification of hybrid rice variety identity and health status includes:

[0007] S1. Obtain image data of single-grain hybrid rice seeds;

[0008] S2. Based on the image data, determine the key morphological regions and perform feature extraction and spatial weighting to obtain a depth feature map;

[0009] S3. Input the deep feature maps into the identity classification branch and the health status classification branch respectively, and obtain the channel attention weight information on which the decision depends from the two branches respectively.

[0010] S4. Based on the channel attention weight information of the identity classification branch and the health status classification branch, the semantic correlation matrix between channels in the deep feature map is calculated.

[0011] S5. Select the target feature channel from the deep feature map based on the semantic relevance matrix, and calculate the spatial overlap between the feature activation maps of the target feature channel corresponding to the identity classification and health status classification respectively.

[0012] S6. Based on the spatial overlap and semantic relevance matrix, suppress the target feature channels in the deep feature map. Input the suppressed deep feature map into the identity classification branch and the health status classification branch respectively, and output the final identity recognition result and health status recognition result.

[0013] Furthermore, image data of single-grain hybrid rice seeds are acquired, including:

[0014] The rice seed samples to be tested were placed under backlighting in a single-grain dispersed state.

[0015] Adjust the aperture, focal length, and exposure parameters of the image acquisition device to make the outline of the rice grains clear and the chalky white features inside visible;

[0016] The image acquisition device is triggered to acquire digital images containing single rice seeds, thus obtaining image data of single hybrid rice seeds.

[0017] Furthermore, based on the image data, key morphological regions are identified and features are extracted and spatially weighted to obtain a depth feature map, including:

[0018] Image data was processed to identify the embryo region and chalky areas of rice grains as key morphological regions;

[0019] A spatial weight map corresponding to the image data space is generated based on the location information of key morphological regions.

[0020] The image data is processed using a feature extraction network to obtain an initial feature map, and the spatial weight map is multiplied element-wise with the initial feature map to obtain a depth feature map.

[0021] Furthermore, the image data was processed to identify the embryonic region and chalky areas of the rice grains as key morphological regions, including:

[0022] The ventral and dorsal positions of the grains are determined based on the grain outline;

[0023] Within the abdominal region, areas with high incidence of chalky white skin, characterized by a milky white color and loose texture, were segmented based on color and texture features;

[0024] The embryo region is identified at the end of the grain based on a preset geometric template.

[0025] Furthermore, the deep feature maps are input into the identity classification branch and the health status classification branch respectively, and the channel attention weight information on which the decision depends is obtained from the two branches respectively, including:

[0026] Inputting the deep feature map into the identity classification branch yields preliminary results for identity classification, and inputting the deep feature map into the health status classification branch yields preliminary results for health status classification.

[0027] Within the identity classification branch, based on the preliminary results of identity classification and the deep feature map, channel attention weight information on which the identity classification decision depends is generated through the channel attention mechanism.

[0028] Within the health status classification branch, based on the preliminary results of health status classification and the deep feature map, channel attention weight information on which the health status classification decision depends is generated through a channel attention mechanism.

[0029] Furthermore, based on the channel attention weight information of the identity classification branch and the health status classification branch, the semantic relevance matrix between channels in the deep feature map is calculated, including:

[0030] Obtain the channel attention weight vector that the identity classification decision depends on from the identity classification branch, and obtain the channel attention weight vector that the health status classification decision depends on from the health status classification branch.

[0031] The channel attention weight vectors on which identity classification decisions depend and the channel attention weight vectors on which health status classification decisions depend are normalized to obtain normalized identity attention weight vectors and normalized health status attention weight vectors.

[0032] Calculate the correlation measure between each pair of corresponding channel weights in the normalized identity attention weight vector and the normalized health status attention weight vector, and organize the correlation measures of all channel pairs into a semantic correlation matrix.

[0033] Furthermore, target feature channels are selected from the deep feature maps based on the semantic relevance matrix, and the spatial overlap between the feature activation maps corresponding to the target feature channels for identity classification and health status classification is calculated, including:

[0034] Channels with semantic relevance values ​​higher than a preset relevance threshold are selected from the semantic relevance matrix as target feature channels;

[0035] Extract the feature activation map corresponding to the identity classification of the target feature channel from the deep feature map, and extract the feature activation map corresponding to the health status classification of the target feature channel from the deep feature map;

[0036] The spatial overlap is obtained by calculating the ratio of the overlap area to the union area of ​​the feature activation map corresponding to the identity category and the feature activation map corresponding to the health status category in the spatial dimension.

[0037] Furthermore, calculating the ratio of the overlap area to the union area of ​​the feature activation map corresponding to the identity classification and the feature activation map corresponding to the health status classification in the spatial dimension includes:

[0038] The feature activation map is converted into a binary activation region map through thresholding.

[0039] The number of overlapping pixels between the binarized activation region map corresponding to the identity classification and the binarized activation region map corresponding to the health status classification is calculated as the overlap area.

[0040] Calculate the number of all active pixels in the two binarized activation region maps as the area of ​​their union;

[0041] The ratio is obtained by dividing the overlapping area by the union area.

[0042] Furthermore, the target feature channels in the deep feature map are suppressed based on the spatial overlap and semantic relevance matrix. The suppressed deep feature map is then input into the identity classification branch and the health status classification branch, respectively, and the final identity recognition result and health status recognition result are output, including:

[0043] For each target feature channel, the channel suppression coefficient is calculated based on its corresponding spatial overlap and semantic relevance value in the semantic relevance matrix.

[0044] The channel suppression coefficient is multiplied channel by channel by channel with the activation value of the corresponding target feature channel in the depth feature map to suppress the target feature channel and obtain the suppressed depth feature map.

[0045] The suppressed deep feature map is input into the identity classification branch to obtain the final identity recognition result, and the suppressed deep feature map is input into the health status classification branch to obtain the final health status recognition result.

[0046] Furthermore, the channel suppression coefficient is calculated based on its corresponding spatial overlap and semantic relevance values ​​in the semantic relevance matrix, including:

[0047] The spatial overlap is compared with a preset spatial overlap threshold.

[0048] When the spatial overlap is lower than the spatial overlap threshold, a channel inhibition coefficient less than 1 is calculated based on the semantic relevance value.

[0049] When the spatial overlap is not lower than the spatial overlap threshold, the channel suppression coefficient is set to 1.

[0050] Compared with the prior art, the present invention has the following beneficial effects:

[0051] 1. By introducing multi-level association analysis and adaptive feature modulation mechanisms, the accuracy and reliability of hybrid rice purity identification are effectively improved. By utilizing channel attention weights at the task decision level, a semantic relevance matrix is ​​constructed to quantify the attention and association of different tasks to feature channels, thereby revealing possible spurious statistical dependencies. Consistency tests of spatial activation distribution are performed on highly correlated channels. By calculating spatial overlap, feature confusion and beneficial sharing are accurately identified. Suspicious feature channels are targeted for suppression based on both semantic and spatial criteria. This achieves adaptive decoupling of multi-task feature representations, enabling the model to remove interference caused by data bias. Even when faced with difficult samples with unconventional association patterns, such as healthy susceptible parents or moldy hybrids, the model can still make stable judgments based on essential distinguishing features, significantly enhancing the model's generalization ability and robustness in actual quality inspection scenarios.

[0052] 2. While improving recognition accuracy, the system also ensures overall practicality and efficiency. From feature extraction guided by key morphological regions to association analysis based on attention weights, and then to feature suppression based on clear rules, the entire process forms a complete and interpretable automated decision-making loop. It does not require complex model structures or a large number of additional parameters, and achieves targeted correction of the defects of existing multi-task learning frameworks. This not only reduces the risk of overfitting and improves the recognition rate of rare sample combinations, but also allows all operations to be integrated into a single forward propagation process, meeting the requirements of batch automated processing and high real-time performance in the seed quality inspection process, and providing a practical and feasible technical means for high-reliability seed purity identification. Attached Figure Description

[0053] Figure 1 This is a flowchart of a method for jointly identifying the identity and health status of hybrid rice varieties according to the present invention. Detailed Implementation

[0054] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0055] Example: Figure 1 This invention provides a method for jointly identifying the identity and health status of hybrid rice varieties, comprising:

[0056] S1. Obtain image data of single-grain hybrid rice seeds;

[0057] S2. Based on the image data, determine the key morphological regions and perform feature extraction and spatial weighting to obtain a depth feature map;

[0058] S3. Input the deep feature maps into the identity classification branch and the health status classification branch respectively, and obtain the channel attention weight information on which the decision depends from the two branches respectively.

[0059] S4. Based on the channel attention weight information of the identity classification branch and the health status classification branch, the semantic correlation matrix between channels in the deep feature map is calculated.

[0060] S5. Select the target feature channel from the deep feature map based on the semantic relevance matrix, and calculate the spatial overlap between the feature activation maps of the target feature channel corresponding to the identity classification and health status classification respectively.

[0061] S6. Based on the spatial overlap and semantic relevance matrix, suppress the target feature channels in the deep feature map. Input the suppressed deep feature map into the identity classification branch and the health status classification branch respectively, and output the final identity recognition result and health status recognition result.

[0062] S1. Obtain image data of a single hybrid rice seed, as follows:

[0063] When collecting hybrid rice seed image data, prepare an image acquisition device that includes a uniform backlight source, a transparent platform, and a digital image acquisition device. The digital image acquisition device uses an industrial camera equipped with a fixed-focus lens and mounted on a vertically adjustable bracket. The operator spreads the hybrid rice seed samples to be tested one by one on the transparent platform by hand or with a vibrating feeder, ensuring that each seed is independent and does not contact each other, forming a single-seed dispersed state. Turn on the surface light source under the transparent platform so that the light penetrates the platform and the rice seeds from bottom to top, establishing backlight illumination conditions.

[0064] The operator adjusts the optical and electronic parameters of the industrial camera to optimize imaging. When adjusting the aperture, the lens aperture ring is rotated to set the aperture value within the range of, for example, F8 to F16, selecting an aperture value such as F11. This range aims to obtain sufficient depth of field to ensure that the overall outline of the rice grain is clearly imaged. When adjusting the focal length, the lens focus ring is rotated and the real-time preview image is observed until the outer edge of the rice grain shows the highest contrast and sharpness, at which point the focal length is considered to be adjusted correctly. When adjusting the exposure parameters, the shutter speed and ISO gain are set. The shutter speed is set within the range of, for example, 1 / 125 second to 1 / 500 second to avoid motion blur, and the ISO gain, i.e., the ISO value, is set at a low level, for example, ISO 100 to ISO 400, to reduce noise. The specific combination of values ​​is fine-tuned based on the real-time histogram or the brightness of the preview image. The goal of fine-tuning is to make the gray distribution of the main area of ​​the rice grain in the mid-tones and to make a discernible gray difference between the chalky area inside the rice grain and the surrounding endosperm area, so that the chalky features inside are visible in the image.

[0065] Operators trigger the digital image acquisition device to perform acquisition via computer commands or physical buttons; the industrial camera completes the exposure according to the set exposure parameters, converting the optical signal into a digital signal to generate a digital image containing a single rice grain; the digital image is saved in a format such as a 24-bit RGB JPEG file or a lossless compressed TIFF file, which is the image data of the acquired single hybrid rice grain, stored in a designated path on the computer for subsequent processing; backlighting makes the rice grain appear as a dark foreground and a bright background, enhancing the gradient of the contour edges; the precisely controlled combination of aperture, focal length and exposure parameters ensures that details from the external morphology to the internal chalky structure are fully and repeatedly recorded, providing high-quality input for subsequent analysis; the entire acquisition process eliminates interference from multiple grain adhesion, uneven lighting and imaging blur, ensuring that each image data corresponds to an independent rice sample with complete characterization information.

[0066] S2. Based on the image data, determine the key morphological regions and perform feature extraction and spatial weighting to obtain a depth feature map, as follows:

[0067] Image data of single-grain hybrid rice seeds is read from the storage path. This image data consists of digital images containing rice grains. Preprocessing is performed on the single-grain hybrid rice image data, including converting the image from the RGB color space to the Lab color space. This conversion process calculates the red, green, and blue components of each pixel into their corresponding lightness component L, color component a, and color component b, based on the standard color space conversion formula. Rice grain contours are then extracted from the preprocessed image using an adaptive thresholding method based on grayscale values. An example of an adaptive thresholding method is the Otsu method. The Otsu method iterates through all possible grayscale thresholds in the image and calculates the inter-class variance between the foreground and background, selecting the grayscale threshold that maximizes the inter-class variance as the final segmentation threshold. The image is then binarized based on this final segmentation threshold. An edge detection algorithm, such as the Canny operator, is then used to extract the outermost set of continuous pixels from the binary image. The extracted set of continuous pixels is defined as the rice grain contour.

[0068] The ventral and dorsal positions of rice grains are determined based on the extracted grain outlines. The convex hull of the grain outline is calculated; the convex hull is the smallest convex polygon encompassing the entire grain outline. By comparing the positional relationship between the grain outline points and the convex hull points, concave arc segments on the grain outline are identified. The areas containing these concave arc segments are determined to be the ventral position of the grain, while the areas corresponding to relatively protruding arc segments with gentler curvature are determined to be the dorsal position. High-incidence chalkiness areas are then segmented within the identified ventral position regions. The segmentation process combines color and texture features for analysis. Color feature analysis involves extracting the b-component value in the Lab color space for each pixel within the abdominal region, where the b-component represents the yellow-blue dimension. Texture feature analysis characterizes the degree of texture looseness by calculating the gray-level standard deviation within each pixel's neighborhood window, with the neighborhood window size being, for example, 5 pixels × 5 pixels. The color channel threshold and texture variance threshold are set through statistical analysis of training sample images labeled with chalky white regions. The statistical analysis process calculates the mean and variance of the b-component values ​​of all labeled chalky white pixels, as well as the mean and variance of the neighborhood gray-level standard deviations, and determines the threshold range based on this. The color channel threshold range is, for example, between 20 and 40, and the texture variance threshold range is, for example, between 15 and 30. For any pixel within the abdominal region, if its b-component value is higher than the color channel threshold and its neighborhood gray-level standard deviation is higher than the texture variance threshold, then the pixel is identified as belonging to a high-incidence area of ​​chalky white. Connecting all pixels identified as belonging to high-incidence areas of chalky white forms a connected high-incidence area of ​​chalky white.

[0069] The embryo region is identified at the end of the grain using a predefined geometric shape template matching method. A predefined geometric shape template for the embryo region is composed of basic geometric shapes, such as a combination of a circle and an adjacent rectangle. The geometric shape template is slid across the end region of the grain outline in the image, and the similarity between the geometric shape template and the current image sub-region is calculated. Similarity calculation uses either normalized cross-correlation or edge gradient direction matching. The shape matching similarity threshold is determined experimentally by testing the matching accuracy of different similarity thresholds on a set of images with known embryo locations. The lowest threshold achieving the required accuracy is selected as the shape matching similarity threshold; for example, 0.7 is used. When the similarity calculated at a certain location exceeds the shape matching similarity threshold, it is determined that an embryo region matching the geometric shape template exists at that location, and the position of the bounding rectangle of the embryo region is recorded as the embryo region. This completes the identification of the embryo region and chalky high-incidence areas of rice grains, which are collectively defined as key morphological regions.

[0070] A spatial weight map is generated based on the location information of the identified key morphological regions, perfectly corresponding to the spatial resolution of the original image data. The spatial weight map is a two-dimensional matrix of the same size as the input image, where each value represents a weight coefficient for that spatial location. The weight coefficients are set based on the importance of key morphological regions for variety and health status identification. These regions are considered crucial identification sites by agricultural experts, thus receiving higher weights to enhance their feature responses. The weight coefficient assignment rules are as follows: pixels identified as embryonic regions or areas with high chalkiness are assigned higher weight coefficients, with values ​​ranging from 1.2 to 1.8; pixels outside key morphological regions but within the grain outline are assigned a base weight coefficient of 1.0; and background pixels outside the grain outline are assigned extremely low weight coefficients, such as 0.1. To achieve a smooth transition of weights from the key morphological region to the surrounding region, the initially assigned weight matrix is ​​subjected to Gaussian filtering. For example, the kernel size of the Gaussian filter is set to 9 pixels × 9 pixels, and the standard deviation is set to 1.5. The final matrix obtained after Gaussian filtering is the spatial weight map.

[0071] An initial feature map is obtained by processing the original single-grain hybrid rice image data using a feature extraction network. The feature extraction network employs a deep convolutional neural network (DCNN) structure, such as a variant of ResNet or VGG. The original classification layer of the DCNN structure is removed, retaining the convolutional and pooling layers as feature extractors. The single-grain hybrid rice image data is input into the feature extraction network. The image data is first normalized to the pixel value range required during network pre-training, for example, 0 to 1 or -1 to 1. The data undergoes multiple convolutional, non-linear activation, and pooling operations in the feature extraction network, ultimately outputting a three-dimensional tensor at an intermediate layer. This three-dimensional tensor is the initial feature map, with dimensions of C channels × H height × W width. Here, C channels represent the responses of different feature filters, and H height and W width correspond to downsampled versions of the input image space.

[0072] The spatial weight map is multiplied element-wise with the initial feature map to obtain the depth feature map. The size of the spatial weight map is the same as the image data of the original single-grain hybrid rice, while the spatial size of the initial feature map is reduced due to network downsampling, for example, the height and width become one-sixteenth of the original. Therefore, the spatial weight map needs to be downsampled to the exact same height H and width W as the initial feature map using bilinear interpolation. The downsampled spatial weight map expands its dimension, changing it from a two-dimensional matrix of height H × width W to a three-dimensional tensor of height H × width W. Through a broadcasting mechanism, the spatial weight map of the three-dimensional tensor is multiplied element-wise with the number of channels C × height H × width W of the initial feature map at each spatial location and at each channel. After multiplication, the activation values ​​corresponding to the key morphological regions in the initial feature map are enhanced, while the activation values ​​of the background regions are weakened. The final product is the depth feature map, which retains the number of channels C and the spatial dimensions H and W of the initial feature map, but its content has been spatially modulated according to the importance of the key morphological regions. The entire feature extraction and weighting process does not require retraining the core parameters of the feature extraction network; domain knowledge-guided feature focusing can be achieved simply by applying the spatial weight map.

[0073] S3. Input the deep feature maps into the identity classification branch and the health status classification branch respectively, and obtain the channel attention weight information on which the decision depends from the two branches, as follows:

[0074] The deep feature map generated by feature extraction and spatial weighting is read. The deep feature map is a three-dimensional tensor with C channels, H height, and W width. Both the identity classification branch and the health status classification branch are pre-built and trained neural network substructures. The network structure of the identity classification branch consists of a first fully connected layer, a second fully connected layer, and a Softmax classification layer connected in sequence. The network structure of the health status classification branch consists of a third fully connected layer, a fourth fully connected layer, and another Softmax classification layer connected in sequence. The number of output neurons in the first fully connected layer is set according to the number of identity categories and task complexity; for example, the number of output neurons in the first fully connected layer can be set to 512; the number of output neurons in the second fully connected layer can be set to 256; the number of output neurons in the third fully connected layer is set according to the number of health status categories; for example, the number of output neurons in the third fully connected layer can be set to 256; and the number of output neurons in the fourth fully connected layer can be set to 128. The specific process of inputting the deep feature map into the identity classification branch to obtain the preliminary results of identity classification is as follows: A global average pooling operation is performed on the deep feature map in terms of spatial dimensions height H and width W. This global average pooling operation involves summing the activation values ​​at all height H and width W positions for each channel in the deep feature map, dividing by the total number of spatial positions, height H multiplied by width W, to obtain a channel description vector with C elements. This channel description vector is then input into the first fully connected layer of the identity classification branch, undergoing linear transformation and ReLU nonlinear activation function processing. The output vector of the first fully connected layer is then input into the second fully connected layer of the identity classification branch for linear transformation. Finally, the output vector of the second fully connected layer is input into the Softmax classification layer of the identity classification branch. The Softmax classification layer converts the input vector into a probability distribution vector through exponential normalization. Each element of the probability distribution vector corresponds to the predicted probability of an identity category. This probability distribution vector is the preliminary result of identity classification. Simultaneously, the same deep feature map is input into the health status classification branch to obtain preliminary results of health status classification. Its processing flow is consistent with that of the identity classification branch, but the network parameters are independent. Specifically, the same global average pooling operation is performed on the same deep feature map to obtain the same channel description vector; the channel description vector is input into the third fully connected layer of the health status classification branch, and processed by the linear transformation and ReLU nonlinear activation function of the third fully connected layer; the output vector of the third fully connected layer is input into the fourth fully connected layer of the health status classification branch for linear transformation; the output vector of the fourth fully connected layer is input into the Softmax classification layer of the health status classification branch, and another probability distribution vector is obtained through exponential normalization. Each element of the probability distribution vector corresponds to the predicted probability of a health status category. This probability distribution vector is the preliminary result of health status classification.

[0075] Within the identity classification branch, channel attention weight information is generated, which is crucial for identity classification decisions. This process is accomplished through a channel attention mechanism integrated into the identity classification branch. The design and training process of the channel attention mechanism is as follows: The channel attention mechanism is a sub-network that is trained end-to-end along with other layers of the identity classification branch during training. The training objective is to minimize the cross-entropy loss function of identity classification. After training, the parameters of the sub-network are fixed and used to generate weights during inference. The specific steps for generating weight information are as follows: First, the channel description vector used to calculate the preliminary result of identity classification is concatenated with the probability distribution vector corresponding to the preliminary result of identity classification. This concatenation operation links the two vectors end-to-end to form a feature vector with an expanded dimension. The expanded feature vector has the dimension of the number of channels C plus the number of identity categories. The expanded feature vector is then input into the first fully connected layer of the channel attention mechanism sub-network. The first fully connected layer reduces the input dimension to an intermediate dimension, for example, the number of channels C divided by 16. The output of the first fully connected layer is processed by the ReLU activation function. The processed feature vector is then input into... The second fully connected layer of the channel attention mechanism subnetwork restores the intermediate dimension to the original number of channels C. The output of the second fully connected layer is processed by the Sigmoid activation function, which maps each input value to the interval between 0 and 1. The final output of the Sigmoid activation function is a vector with C elements, where each element has a value between 0 and 1. The value of the i-th element in the vector represents the importance of the i-th channel in the deep feature map when making the initial decision on the current identity classification. This vector is defined as the channel attention weight information on which the identity classification decision depends.

[0076] Within the health status classification branch, channel attention weight information is generated, which is crucial for health status classification decisions. This process is accomplished through a channel attention mechanism integrated into the health status classification branch. Its network structure is symmetrical to the channel attention mechanism in the identity classification branch, but the parameters are not shared and are trained independently. The specific steps for generating the weight information are as follows: the channel description vector used to calculate the preliminary result of health status classification is concatenated with the probability distribution vector corresponding to the preliminary result of health status classification. This concatenation operation links the two vectors end-to-end to form a dimension-expanded feature vector. The dimension of the expanded feature vector is the number of channels C plus the number of health status categories. The expanded feature vector is then input into the first fully connected layer of the dedicated channel attention mechanism subnetwork in the health status classification branch. This first fully connected layer reduces the input dimension to an intermediate dimension, which is, for example, the number of channels C divided by 16. The output of the first fully connected layer is then processed by Re... The LU activation function is used for processing; the processed feature vector is then input into the second fully connected layer of the channel attention mechanism subnetwork. The second fully connected layer restores the intermediate dimension to the original number of channels C; the output of the second fully connected layer is processed by the Sigmoid activation function; the final output of the Sigmoid activation function is a vector with C elements, where each element has a value between 0 and 1. The value of the j-th element in the vector represents the importance of the j-th channel in the deep feature map when making a preliminary decision on the current health status classification. This vector is defined as the channel attention weight information on which the health status classification decision depends.

[0077] Through the above process, the identity classification branch and the health status classification branch, while generating preliminary results for identity classification and health status classification respectively, also generate corresponding channel attention weight information for identity classification and health status classification decisions based on their respective classification decision logic. Both the channel attention weight information for identity classification and health status classification decisions are one-dimensional vectors of length C, which will be used in subsequent steps to calculate the semantic relevance matrix. The entire implementation process ensures that the generation of attention weights directly depends on the classification decision reasoning path of each branch, thereby capturing the differentiated attention patterns of different tasks to feature channels. The deep feature map, as the common input of the two branches, undergoes independent processing flows within each branch, ultimately producing weight information outputs closely bound to task decisions, providing clear and interpretable quantitative evidence for subsequent analysis of feature correlations between tasks.

[0078] S4. Based on the channel attention weight information of the identity classification branch and the health status classification branch, the semantic correlation matrix between channels in the deep feature map is calculated as follows:

[0079] The channel attention weight vectors for identity classification decisions are obtained from the output of the identity classification branch in the previous step. This vector is a one-dimensional vector generated by the channel attention mechanism within the identity classification branch, and its length is consistent with the total number of channels in the deep feature map. Similarly, the channel attention weight vectors for health status classification decisions are obtained from the output of the health status classification branch in the previous step. This vector is also a one-dimensional vector generated by the channel attention mechanism within the health status classification branch, and its length is also consistent with the total number of channels in the deep feature map. The channel attention weight vectors for identity classification decisions and health status classification decisions are two independent vectors, respectively encoding the importance assessments of each channel in the deep feature map by the identity classification task and the health status classification task.

[0080] The channel attention weight vector used for identity classification decisions is normalized to obtain a normalized identity attention weight vector. The normalization process uses a min-max normalization method. The calculation process involves first traversing all elements in the channel attention weight vector to find the minimum and maximum element values; then subtracting the minimum element value from each element value to obtain the first difference for each element; next, calculating the second difference between the maximum and minimum element values; and finally dividing the first difference by the second difference. After this calculation, all element values ​​in the normalized identity attention weight vector are linearly mapped to a closed interval between 0 and 1. The same normalization process is applied to the channel attention weight vector used for health status classification decisions to obtain a normalized health status attention weight vector. The process is similar: finding the minimum and maximum element values, calculating the third difference between each element and the minimum element value, calculating the fourth difference between the maximum and minimum element values, and dividing each third difference by the fourth difference. All elements in the normalized health status attention weight vector are also linearly mapped to a closed interval between 0 and 1. The purpose of normalization is to eliminate scale bias that may arise from the different numerical ranges of the two original attention weight vectors, ensuring that subsequent correlation measurement calculations are based on the same numerical scale.

[0081] The correlation metric between each pair of corresponding channel weights in the normalized identity attention weight vector and the normalized health state attention weight vector is calculated. A corresponding channel refers to the channel position with the same index number in the two normalized vectors; each index number represents the sequence number of the same feature channel in the deep feature map. For each pair of channels with the same index number, the correlation metric between the channel weights is calculated. The correlation metric is calculated as follows: Obtain the element value of the channel in the normalized identity attention weight vector, denoted as the first normalized weight value; obtain the element value of the channel in the normalized health state attention weight vector, denoted as the second normalized weight value; calculate the square of the first normalized weight value, calculate the square of the second normalized weight value, and calculate twice the product of the first and second normalized weight values; finally, the correlation metric is equal to the sum of twice the product divided by the square of the first normalized weight value, the square of the second normalized weight value, and a very small positive constant. The added very small positive constant is to prevent the denominator from being zero. This calculation formula has the following characteristics: the closer the first normalized weight value and the second normalized weight value are, and the less 0 both are, the closer the calculated correlation metric is to the value of 1; when the first normalized weight value and the second normalized weight value differ greatly, or one of them is 0, the calculated correlation metric approaches 0. This calculation is used to evaluate the consistency of attention given to the same feature channel by two classification tasks; the higher the consistency, the closer the correlation metric is to one.

[0082] The calculated relevance metrics for all channel pairs are organized into a semantic relevance matrix. This matrix is ​​a two-dimensional square matrix with the same number of rows and columns as the total number of channels in the deep feature map. The matrix is ​​constructed by filling the elements on its main diagonal with the calculated relevance metrics for each channel index, while setting all elements off the main diagonal to 0. This is because this step aims to analyze the semantic association of the same feature channel from the perspectives of two tasks, rather than the association between different channels. The final semantic relevance matrix is ​​a square matrix with main diagonal elements ranging from 0 to 1 and all other elements set to 0. This matrix structurally represents the degree of consistency between the identity classification importance assessment and the health status classification importance assessment of each feature channel in the deep feature map. Higher diagonal element values ​​indicate more consistent assessments of that channel by both tasks; lower diagonal element values ​​indicate greater assessment discrepancies. This semantic relevance matrix will serve as one of the key inputs for selecting target feature channels and performing spatial overlap analysis in subsequent steps.

[0083] S5. Select the target feature channel from the deep feature map based on the semantic relevance matrix, and calculate the spatial overlap between the feature activation maps of the target feature channel corresponding to identity classification and health status classification, as follows:

[0084] Obtain the semantic relevance matrix generated by the semantic relevance matrix calculation step. The semantic relevance matrix is ​​a two-dimensional square matrix, the number of rows and columns of which are equal to the total number of channels in the deep feature map. Each element value on the main diagonal of the semantic relevance matrix represents the semantic relevance value of the corresponding index channel.

[0085] Channels with semantic relevance values ​​higher than a preset relevance threshold are selected as target feature channels from the semantic relevance matrix. The preset relevance threshold is set through a separate threshold determination process using a validation dataset independent of the training set. This validation dataset contains rice seed images with known identities and health status labels. In this threshold determination process, the complete joint recognition method is run on the validation dataset up to the step of generating the semantic relevance matrix. Then, a series of candidate preset relevance thresholds are tried, generated with fixed step sizes ranging from 0 to 1, for example, starting at 0.5, increasing to 0.9 with a step size of 0.05. For each candidate preset relevance threshold, target feature channels are selected from the semantic relevance matrix based on that candidate value, and all subsequent steps are completed until the final recognition result is obtained. The overall recognition accuracy of the final recognition result corresponding to each candidate preset relevance threshold on the validation dataset is calculated. From all candidate preset relevance thresholds, the candidate value that achieves the highest overall recognition accuracy is selected and formally determined as the preset relevance threshold for use in all subsequent analyses. The specific operation for selecting target feature channels is to traverse each semantic relevance value on the main diagonal of the semantic relevance matrix and compare each traversed semantic relevance value with a preset relevance threshold. If the semantic relevance value corresponding to a certain channel index is greater than the preset relevance threshold, then the channel index is added to the target feature channel list.

[0086] Extracting the feature activation map corresponding to the identity classification of target feature channels from a depth feature map. The extraction operation is performed based on the channel indices in the target feature channel list. A new 3D tensor is created, with the number of channels equal to the number of indices in the target feature channel list. The height and width of the new 3D tensor are the same as those of the depth feature map. The entire 2D data slice of each channel in the depth feature map belonging to the target feature channel list is copied to the corresponding channel position in the new 3D tensor according to its order in the list. This filled new 3D tensor is the feature activation map of the target feature channel corresponding to the identity classification. Extracting the feature activation map corresponding to the health status classification of target feature channels from the same depth feature map. The operation of extracting the feature activation map corresponding to the health status classification of target feature channels uses the same target feature channel list and the same depth feature map. Following the exact same steps as extracting the feature activation map for identity classification, another new 3D tensor is created and filled with data to obtain the feature activation map of the target feature channel corresponding to the health status classification.

[0087] The spatial overlap is calculated by multiplying the overlap area of ​​the feature activation map corresponding to the identity category to the union area of ​​the feature activation map corresponding to the health status category. The first step in this calculation is to convert the feature activation map corresponding to the identity category into a binary activation region map corresponding to that identity category through thresholding. Thresholding requires setting an activation threshold, which is based on the global pixel value statistics of the feature activation map corresponding to the identity category. Specifically, the activation threshold is calculated by taking the average value of all pixel values ​​in the feature activation map corresponding to the identity category, and then calculating the standard deviation of all pixel values. The activation threshold is set to the average value plus N times the standard deviation, where N is determined by analyzing the feature distribution; for example, N can be 1 or 2. The conversion process involves traversing each spatial location in the feature activation map corresponding to the identity category and reading the pixel value at that location. If the pixel value is greater than or equal to the activation threshold, a value of 1 is assigned to the corresponding location in the binary activation region map corresponding to the identity category; if the pixel value is less than the activation threshold, a value of 0 is assigned to the corresponding location in the binary activation region map corresponding to the identity category. Using the exact same activation threshold and transformation process, the feature activation map corresponding to the health status category is transformed into a binary activation region map corresponding to the health status category.

[0088] The overlap area is calculated by determining the number of overlapping pixels between the binarized activation region map corresponding to the identity classification and the binarized activation region map corresponding to the health status classification. The specific process for calculating the overlap area is as follows: A counter is initialized with an initial value of 0; then, every identical spatial location of the binarized activation region maps corresponding to the identity and health status classifications is traversed in parallel; at each traversed location, the value of the binarized activation region map corresponding to the identity classification at that location is checked to see if it equals 1, and simultaneously, the value of the binarized activation region map corresponding to the health status classification at that location is also checked to see if it equals 1; if both conditions are met, the counter value is incremented by 1; after the traversal is complete, the final value of the counter is the overlap area.

[0089] The union area is calculated by taking the total number of active pixels in the two binarized activation region images. The specific process for calculating the union area is as follows: First, count the number of pixels with a value of 1 in the binarized activation region image corresponding to the identity category, denoted as quantity A; then count the number of pixels with a value of 1 in the binarized activation region image corresponding to the health status category, denoted as quantity B; next, add quantity A and quantity B together to obtain the sum of quantity A and quantity B; finally, subtract the overlapping area from the sum of quantity A and quantity B to obtain the union area.

[0090] The ratio is obtained by dividing the overlapping area by the union area. Before performing the division operation, a check is performed: if the value of the union area is 0, the spatial overlap degree is directly defined as 0; if the value of the union area is greater than 0, the spatial overlap degree is calculated as the overlapping area divided by the union area. The value of the spatial overlap degree is a real number between 0 and 1.

[0091] S6. Suppress the target feature channels in the deep feature map based on the spatial overlap and semantic relevance matrix. Input the suppressed deep feature map into the identity classification branch and the health status classification branch respectively, and output the final identity recognition result and health status recognition result. The implementation is as follows:

[0092] Obtain the target feature channel list and the spatial overlap degree corresponding to each target feature channel generated by the feature activation map spatial overlap degree calculation step. Obtain the semantic relevance matrix generated by the semantic relevance matrix calculation step. Obtain the depth feature map generated by the feature extraction and spatial weighting steps. The depth feature map is three-dimensional tensor data. The target feature channel list is a sequence recording the channel indices selected from the semantic relevance matrix. The spatial overlap degree is a value obtained by calculating the spatial overlap degree of the feature activation map for each target feature channel index in the target feature channel list. The semantic relevance matrix is ​​a two-dimensional square matrix, and the element values ​​on its main diagonal are the semantic relevance values ​​of the corresponding channels. The semantic relevance values ​​of the target feature channels are obtained by extracting them from the main diagonal of the semantic relevance matrix according to the target feature channel index.

[0093] For each target feature channel index in the target feature channel list, the channel suppression coefficient of that target feature channel is calculated based on the spatial overlap corresponding to that target feature channel index and the semantic relevance value extracted from the semantic relevance matrix. Calculating the channel suppression coefficient requires a preset spatial overlap threshold. The spatial overlap threshold is set through a threshold optimization process based on a validation dataset. The validation dataset is a collection of images containing multiple single-grain hybrid rice seeds with accurate identity and health status labels, and this validation dataset is not involved in the main training process of the model. In the threshold optimization process, the joint recognition method is first run on the validation dataset until the spatial overlap of all samples is calculated, thus obtaining the spatial overlap value distribution of all target feature channels on the validation set. Then, a set of candidate spatial overlap threshold values ​​is set. This set of candidate values ​​is generated by uniform sampling within the interval 0 to 1 with a fixed step size, for example, starting from 0.1, increasing to 0.8 with a step size of 0.05. For each candidate spatial overlap threshold, channel suppression is applied to the deep feature maps of all samples in the validation set according to the channel suppression coefficient calculation rules described later, and the suppressed feature maps are used to complete the final classification. Next, the overall accuracy of the model for joint identification of identity and health status on the validation dataset is calculated for each candidate spatial overlap threshold. Finally, the candidate value that achieves the highest overall accuracy is selected from all candidate spatial overlap thresholds and formally determined as the preset spatial overlap threshold. The specific rule for calculating the channel suppression coefficient is as follows: the spatial overlap corresponding to the current target feature channel index is compared with the preset spatial overlap threshold. If the spatial overlap value is lower than the preset spatial overlap threshold, the suppression coefficient calculation branch is entered. In the suppression coefficient calculation branch, the channel suppression coefficient is calculated based on the semantic relevance value corresponding to the target feature channel index, and the result is a value less than 1. One calculation method is: the channel suppression coefficient equals the value 1 minus the semantic relevance value of the target feature channel. This calculation method results in channels with higher semantic relevance values ​​having smaller channel suppression coefficients, thus significantly reducing their activation values ​​during suppression operations. If the spatial overlap value is not lower than a preset spatial overlap threshold, the process proceeds to the retention branch. In the retention branch, regardless of the semantic relevance value of the target feature channel, its channel suppression coefficient is directly set to 1. A channel suppression coefficient of 1 means that in subsequent multiplication operations, the activation value of the channel will remain unchanged, i.e., it will not be suppressed.

[0094] The channel suppression coefficient corresponding to each target feature channel index is calculated and multiplied channel-by-channel with the activation value of the channel pointed to by that target feature channel index in the depth feature map. This process suppresses the target feature channel and yields a suppressed depth feature map. Specifically, the process involves sequentially traversing each target feature channel index in the target feature channel list. For each index, the channel suppression coefficient calculated for that index is retrieved from the channel suppression coefficient calculation results. Next, in the depth feature map, the corresponding entire channel data is located based on the target feature channel index. This channel data is a two-dimensional matrix with dimensions equal to its height multiplied by its width. Then, the channel suppression coefficient is used as a scalar multiplier and multiplied by each element in the two-dimensional matrix. This multiplication operation overwrites the original activation value at each position in the matrix with the channel suppression coefficient. This scalar multiplication operation is performed on all channels pointed to by indices in the target feature channel list. For other channels in the depth feature map whose indices are not in the target feature channel list, no multiplication operation is performed, and their activation values ​​remain unchanged. After suppressing all target feature channels, the original depth feature map is updated to a new three-dimensional tensor, which is the suppressed depth feature map. The suppressed depth feature map is identical to the original depth feature map in terms of spatial height, spatial width, and total number of channels.

[0095] The suppressed deep feature map is input into the identity classification branch to obtain the final identity recognition result. The identity classification branch is a pre-trained and fixed neural network substructure. The input and calculation process is as follows: First, a global average pooling operation is performed on the suppressed deep feature map. The global average pooling operation sums the activation values ​​of all spatial locations of each channel in the suppressed deep feature map and divides them by the total number of spatial locations to obtain a one-dimensional channel description vector. Then, this channel description vector is input into a series of fully connected layers in the identity classification branch for linear transformation and non-linear activation. Finally, the output vector of the fully connected layers is input into the Softmax classification layer of the identity classification branch. The Softmax classification layer calculates a probability distribution vector through exponential normalization, and this probability distribution vector is the final identity recognition result. The category identifier with the highest probability value is the rice seed identity category determined by the system. The same suppressed deep feature map is input into the health status classification branch to obtain the final health status recognition result. The health status classification branch is also a pre-trained and fixed neural network substructure. The input and calculation process is completely consistent with the identity classification branch but independent of the parameters: the same global average pooling operation is performed on the same suppressed deep feature map to obtain the same channel description vector; this channel description vector is input into a series of fully connected layers in the health status classification branch for processing; the processed vector is input into the Softmax classification layer of the health status classification branch to obtain another probability distribution vector, which is the final health status identification result. The category with the highest probability value is the rice seed health status category determined by the system. The final identity identification result and the final health status identification result together serve as the complete analysis conclusion for the current single-grain hybrid rice seed sample.

[0096] All calculations involved in the embodiments are dimensionless numerical calculations, and the preset parameters and thresholds in the calculations are set by those skilled in the art according to the actual situation.

[0097] It should be noted that this invention can be deployed on the device itself to realize embedded applications, or it can run on a PC or other terminal with a user interface, thereby meeting various hardware environments and usage requirements.

[0098] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions according to the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wireless or wired transmission; wired transmission methods include optical fiber, twisted pair, coaxial cable, etc.; wireless transmission includes infrared, microwave, etc. Computer-readable storage media can be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more sets of available media. Available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media. Semiconductor media can be solid-state drives.

[0099] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0100] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.

[0101] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0102] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.

[0103] If a function is implemented as a software module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0104] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0105] In conclusion, the above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for jointly identifying the identity and health status of hybrid rice varieties, characterized in that, include: S1. Obtain image data of single-grain hybrid rice seeds; S2. Based on the image data, determine the key morphological regions and perform feature extraction and spatial weighting to obtain a depth feature map; S3. Input the deep feature maps into the identity classification branch and the health status classification branch respectively, and obtain the channel attention weight information on which the decision depends from the two branches respectively. S4. Based on the channel attention weight information of the identity classification branch and the health status classification branch, the semantic correlation matrix between channels in the deep feature map is calculated. S5. Select the target feature channel from the deep feature map based on the semantic relevance matrix, and calculate the spatial overlap between the feature activation maps of the target feature channel corresponding to the identity classification and health status classification respectively. S6. Based on the spatial overlap and semantic relevance matrix, suppress the target feature channels in the deep feature map. Input the suppressed deep feature map into the identity classification branch and the health status classification branch respectively, and output the final identity recognition result and health status recognition result.

2. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 1, characterized in that, Acquire image data of single-grain hybrid rice seeds, including: The rice seed samples to be tested were placed under backlighting in a single-grain dispersed state. Adjust the aperture, focal length, and exposure parameters of the image acquisition device to make the outline of the rice grains clear and the chalky white features inside visible; The image acquisition device is triggered to acquire digital images containing single rice seeds, thus obtaining image data of single hybrid rice seeds.

3. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 1, characterized in that, Based on the image data, key morphological regions are identified, and features are extracted and spatially weighted to obtain a depth feature map, including: Image data was processed to identify the embryo region and chalky areas of rice grains as key morphological regions; A spatial weight map corresponding to the image data space is generated based on the location information of key morphological regions. The image data is processed using a feature extraction network to obtain an initial feature map, and the spatial weight map is multiplied element-wise with the initial feature map to obtain a depth feature map.

4. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 3, characterized in that, Image data was processed to identify the embryonic region and chalky areas of rice grains as key morphological regions, including: The ventral and dorsal positions of the grains are determined based on the grain outline; Within the abdominal region, areas with high incidence of chalky white skin, characterized by a milky white color and loose texture, were segmented based on color and texture features; The embryo region is identified at the end of the grain based on a preset geometric template.

5. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 1, characterized in that, The deep feature maps are input into the identity classification branch and the health status classification branch respectively, and the channel attention weight information on which the decision depends is obtained from the two branches respectively, including: Inputting the deep feature map into the identity classification branch yields preliminary results for identity classification, and inputting the deep feature map into the health status classification branch yields preliminary results for health status classification. Within the identity classification branch, based on the preliminary results of identity classification and the deep feature map, channel attention weight information on which the identity classification decision depends is generated through the channel attention mechanism. Within the health status classification branch, based on the preliminary results of health status classification and the deep feature map, channel attention weight information on which the health status classification decision depends is generated through a channel attention mechanism.

6. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 1, characterized in that, Based on the channel attention weight information of the identity classification branch and the health status classification branch, the semantic relevance matrix between channels in the deep feature map is calculated, including: Obtain the channel attention weight vector that the identity classification decision depends on from the identity classification branch, and obtain the channel attention weight vector that the health status classification decision depends on from the health status classification branch. The channel attention weight vectors on which identity classification decisions depend and the channel attention weight vectors on which health status classification decisions depend are normalized to obtain normalized identity attention weight vectors and normalized health status attention weight vectors. Calculate the correlation measure between each pair of corresponding channel weights in the normalized identity attention weight vector and the normalized health status attention weight vector, and organize the correlation measures of all channel pairs into a semantic correlation matrix.

7. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 1, characterized in that, Target feature channels are selected from the deep feature maps based on the semantic relevance matrix, and the spatial overlap between the feature activation maps corresponding to the target feature channels for identity classification and health status classification is calculated, including: Channels with semantic relevance values ​​higher than a preset relevance threshold are selected from the semantic relevance matrix as target feature channels; Extract the feature activation map corresponding to the identity classification of the target feature channel from the deep feature map, and extract the feature activation map corresponding to the health status classification of the target feature channel from the deep feature map; The spatial overlap is obtained by calculating the ratio of the overlap area to the union area of ​​the feature activation map corresponding to the identity category and the feature activation map corresponding to the health status category in the spatial dimension.

8. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 7, characterized in that, Calculating the ratio of the overlap area to the union area of ​​the feature activation maps corresponding to identity classification and health status classification in the spatial dimension includes: The feature activation map is converted into a binary activation region map through thresholding. The number of overlapping pixels between the binarized activation region map corresponding to the identity classification and the binarized activation region map corresponding to the health status classification is calculated as the overlap area. Calculate the number of all active pixels in the two binarized activation region maps as the area of ​​their union; The ratio is obtained by dividing the overlapping area by the union area.

9. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 1, characterized in that, Based on spatial overlap and semantic relevance matrices, target feature channels in the deep feature map are suppressed. The suppressed deep feature map is then input into the identity classification branch and the health status classification branch, respectively. The final identity recognition result and health status recognition result are output, including: For each target feature channel, the channel suppression coefficient is calculated based on its corresponding spatial overlap and semantic relevance value in the semantic relevance matrix. The channel suppression coefficient is multiplied channel by channel by channel with the activation value of the corresponding target feature channel in the depth feature map to suppress the target feature channel and obtain the suppressed depth feature map. The suppressed deep feature map is input into the identity classification branch to obtain the final identity recognition result, and the suppressed deep feature map is input into the health status classification branch to obtain the final health status recognition result.

10. The method for jointly identifying the identity and health status of hybrid rice varieties according to claim 9, characterized in that, The channel inhibition coefficient is calculated based on its corresponding spatial overlap and semantic relevance value in the semantic relevance matrix, including: The spatial overlap is compared with a preset spatial overlap threshold. When the spatial overlap is lower than the spatial overlap threshold, a channel inhibition coefficient less than 1 is calculated based on the semantic relevance value. When the spatial overlap is not lower than the spatial overlap threshold, the channel suppression coefficient is set to 1.