A rapid identification method for single soybean varieties based on portable fiber optic spectrometers

By acquiring multi-point spectra of single soybeans using a portable fiber optic spectrometer and combining multi-instance learning with additive angular interval loss, the problems of single-point spectral bias and multi-point average dilution were solved, enabling rapid and accurate identification of single soybean varieties.

CN122306749APending Publication Date: 2026-06-30HEILONGJIANG BAYI AGRICULTURAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HEILONGJIANG BAYI AGRICULTURAL UNIVERSITY
Filing Date
2026-05-11
Publication Date
2026-06-30

Smart Images

  • Figure CN122306749A_ABST
    Figure CN122306749A_ABST
Patent Text Reader

Abstract

This invention relates to a rapid identification method for single soybean varieties based on a portable fiber optic spectrometer, belonging to the field of near-infrared spectroscopy. This method models the spectra of multiple measurement points on the same soybean seed as a single package sample, introduces a gated attention mechanism to achieve adaptive weighted fusion of information from different measurement points, and incorporates additive angular interval loss during the classification stage. This enhances intra-class compactness and inter-class separation within the normalized space, improving the clarity of decision boundaries and thus achieving stable identification of single soybean varieties. In the classification of more than 10 soybean varieties, the method significantly outperforms traditional chemometrics and conventional deep learning models.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of near-infrared spectroscopy technology, specifically to a method for rapid identification of single soybean varieties based on a portable fiber optic spectrometer. Background Technology

[0002] Soybeans are important oilseed and protein crops, widely used in edible oil processing, soy product production, animal feed, and seed distribution. Establishing rapid, accurate, and non-destructive methods for soybean variety identification is crucial for ensuring seed quality and safety, maintaining distribution order, supporting breeding screening, and quality supervision. Existing soybean variety identification methods mainly include morphological identification, field planting identification, and molecular marker identification. Morphological identification is easily affected by environmental conditions and human experience, and its ability to distinguish closely related varieties is limited; field planting identification is time-consuming and costly, making it difficult to meet the needs of rapid detection; while molecular marker methods have high accuracy, they usually require laboratory conditions, reagent consumption, and complex sample processing, limiting their application in rapid on-site screening and large-scale distribution.

[0003] Near-infrared spectroscopy (NIRS) offers advantages such as rapid detection, non-destructive sample handling, and suitability for online applications, and has been widely used in agricultural product component analysis and category identification. For seed samples, NIRS can acquire information related to chemical composition, tissue structure, and moisture status without damaging the seed structure, providing fundamental data for variety identification. Existing NIRS-based seed variety identification methods typically employ single-point measurements or perform equal-weighted averaging of spectra from multiple sampling points. However, the surface of a single soybean seed exhibits variations in hilum structure, curvature, seed coat texture, and uneven distribution of local components, resulting in differences in spectral characterization capabilities at different measurement points. Single-point measurements are susceptible to local scattering, probe contact conditions, and abnormal spectral shapes, making it difficult to represent the true attributes of the whole sample. Simply averaging spectra from multiple points assumes equal contributions from each point, easily diluting key discriminative information. Furthermore, the presence of noise or anomalies at local points further reduces overall identification stability. On the other hand, many existing classification models use the standard Softmax loss function during training, which mainly increases the probability of the correct class, but fails to explicitly constrain the distribution of features in the embedding space. When faced with identification tasks where the inter-class differences between soybean varieties are weak and the fine-grained features are similar, problems such as insufficient compactness of intra-class distribution and unclear inter-class boundaries easily arise, resulting in limited ability of the model to distinguish similar varieties.

[0004] Therefore, there is an urgent need for a variety identification method applicable to multi-point near-infrared spectral data of single soybeans, which takes into account both multi-point information difference modeling and fine-grained discrimination boundary constraints. To address the above problems, this invention proposes a rapid single-seed soybean variety identification method based on a portable fiber optic spectrometer. This method constructs bag-level samples from the spectra of multiple measurement points of a single soybean, utilizes a gated attention mechanism to achieve adaptive weighted fusion of information from different points, and combines additive angular interval loss to enhance the discrimination boundary between categories, thereby improving the stability and accuracy of single-seed soybean variety identification. Summary of the Invention

[0005] The purpose of this invention is to address the problems in existing technologies, such as significant bias in single-point near-infrared spectral characterization, easy dilution of key discriminative information by multi-point equal-weighted averaging, and unclear boundaries in fine-grained variety classification. This invention provides a rapid single-seed soybean variety identification method based on a portable fiber optic spectrometer. This method models the spectra from multiple measurement points of the same soybean seed as a single bag-level sample, introduces a gated attention mechanism to achieve adaptive weighted fusion of information from different points, and introduces additive angular interval loss during the classification stage to enhance intra-class compactness and inter-class separation, thereby achieving stable identification of single-seed soybean varieties.

[0006] The technical solution adopted in this invention includes the following steps: (1) Select soybean seed samples to be identified, assign a unique seed number to each soybean seed, and collect near-infrared diffuse reflectance spectral data at multiple measurement points on the same soybean seed; during the collection process, record the variety label, seed number and point number corresponding to each spectrum simultaneously to establish a multi-point near-infrared spectral dataset at the single-seed sample level. (2) For each spectrum in the multi-point near-infrared spectral dataset, compare the individual and pairwise preprocessing methods in Savitzky-Golay smoothing, first derivative, second derivative, standard normal variable transformation, multivariate scattering correction and standardization. Use the MMGA-Net model to screen each preprocessing scheme, and use the five-fold cross-validation results of accuracy, precision, recall and F1 score as the comprehensive evaluation basis to determine the optimal preprocessing method for subsequent modeling, so as to correct noise interference, baseline drift, scattering difference and scale difference in the original spectrum; (3) Based on multi-instance learning, the preprocessed multi-point spectral data is modeled at the package level. Multiple spectra of the same soybean seed collected at different measurement points are regarded as a package-level sample, and the single spectra corresponding to each measurement point in the package are regarded as instance samples. The multiple instance samples are assigned to the same package-level label, so that the spectra of each point of the same soybean seed are processed in the same package-level sample form during the training and inference stages. (4) Input the spectral instances corresponding to each measurement point within the package-level sample into the multilayer perceptron encoder with shared parameters. Utilize a unified nonlinear mapping structure to perform instance-level feature extraction, feature compression, and embedding representation learning on the spectra of each point, thereby obtaining the instance embedding representation h that corresponds one-to-one with each measurement point. k ; (5) Embed the instance corresponding to each measurement point into the representation h k The input is a gated attention aggregation module. Within this module, a content branch and a gated branch are constructed for each instance's embedded features. The content branch generates a semantic response related to category discrimination, while the gated branch generates a gated signal that selectively modulates the semantic response. The outputs of the content branch and the gated branch are fused element-wise, and then linearly mapped to obtain the attention score corresponding to each measurement point. The attention score is then normalized to obtain the attention weight corresponding to each measurement point. Based on the attention weight corresponding to each measurement point, the embedded features of each instance within the packet are weighted and summed to obtain a packet-level representation corresponding to a single soybean seed. (6) Input the package-level representation into the classification layer for variety discrimination training. During the training phase, an additive angular margin loss is introduced. Normalization is performed on the package-level representation and the prototype vectors of each category. An angular margin and a scaling factor are introduced into the classification output corresponding to the real category to construct a classification model for single soybean variety identification. During the inference phase, the variety identification model no longer introduces an additive angular margin. Instead, it calculates the score of each category based on the cosine similarity between the normalized package-level representation and the normalized prototype vectors of the categories. The score is then scaled using a scaling factor. The category corresponding to the maximum score is used as the variety category output of the single soybean seed to be identified, thus completing the construction of a fast single soybean variety identification model. (7) Select soybean seeds to be identified, collect near-infrared diffuse reflectance spectral data using a portable fiber optic spectrometer, preprocess the spectral data and input it into the identification model to complete the rapid identification of single soybean varieties.

[0007] As a further improvement of the present invention, the specific process of single soybean seed spectral sampling and package-level modeling in steps (1) and (3) is as follows: Four representative measurement points were set at different surface structural regions of a single soybean grain: the hilum, the dorsal side of the hilum, the side side, and the top region. These four measurement points can characterize the spectral differences between different surface parts of a single soybean grain. Near-infrared spectra of the same soybean grain were collected at the four measurement points, and the spectra of these four points were used as four instance samples in the same package-level sample. A multi-instance learning approach was used to model the single soybean grain as a whole.

[0008] As a further improvement of the present invention, the gating attention aggregation module in step (5) has the following specific process: The gated attention aggregation module adopts a dual-branch structure based on content branches and gated branches. The content branch performs a linear transformation on the instance embedding features and applies a tanh activation function to extract semantic responses related to category discrimination. The gated branch performs a linear transformation on the instance embedding features and applies a sigmoid activation function to generate a gated signal that selectively modulates the semantic responses. The dual-branch outputs are multiplied element-wise and then linearly mapped to obtain the... k Attention score at measurement point r ik And using temperature coefficient τ The softmax function normalizes the attention scores at each measurement point to obtain the attention weights corresponding to each measurement point. a ik Its expression is: (1).

[0009] As a further improvement of the present invention, the additive angular interval loss in step (6) is specifically implemented as follows: The additive angular margin loss is achieved by L2 normalizing the bag-level representation and the prototype vectors of each class, making the classification based on the normalized feature directions. During the training phase, the additive angular margin is introduced only for the classification output corresponding to the true class, and a scaling factor is used to scale the classification output to enhance the clustering of similar samples in the feature space and expand the angular separation between different classes. The classification output during the training phase satisfies the following: (2) Among them, is the first i The sample corresponds to the first c Class-based output, s As a scale factor, m For additive angular intervals, θ i The angle between the normalized package-level representation and the category prototype vector. y i For the first i The true category of each sample; during the inference phase, the additive angular interval is no longer explicitly introduced, but instead a score is given based on the cosine similarity between the normalized bag-level representation and the normalized category prototype vector, and the category with the highest score is output.

[0010] Compared with the prior art, the beneficial effects of the present invention are as follows: This invention collects near-infrared spectra from multiple representative measurement points on a single soybean grain and combines multi-instance learning, gated attention mechanism, and additive angular interval loss for unified modeling. Multi-instance learning uses a single-grain sample as the bag-level modeling unit, preventing data leakage caused by the cross-occurrence of spectra from different points on the same soybean grain between the training and test sets. The gated attention mechanism adaptively weights and fuses information from different measurement points, avoiding bias in single-point measurement representation and dilution of key information caused by equal-weighted averaging of multiple points. Additive angular interval loss enhances intra-class compactness and inter-class separation in fine-grained soybean variety identification, thereby achieving rapid, non-destructive, stable, and highly accurate identification of single soybean varieties. Attached Figure Description

[0011] Figure 1 This is a flowchart of the soybean variety identification and modeling process of the present invention; Figure 2 The original four-point spectra are shown in the example. Figure 3 The above is a preprocessed four-point spectral image of an example. Figure 4 This is a schematic diagram illustrating the principle of additive angular interval loss in an embodiment. Figure 5 The intra-class / inter-class distance distributions of the standard Softmax loss function and the additive angular interval loss are used as examples. Figure 6 A visual comparison of t-SNE features between the standard Softmax loss function and the additive angular interval loss, as shown in the example; Figure 7 The attention weights are visualized for the example. Detailed Implementation

[0012] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.

[0013] This experiment selected single soybean samples from different soybean ecological regions in my country as the research object. The samples covered the Northeast spring soybean region, the Huang-Huai-Hai summer soybean region, and the southern multi-cropping soybean region. Specific collection locations included Harbin and Heihe in Heilongjiang Province, Cangzhou in Hebei Province, Zhumadian in Henan Province, Linyi, Weifang, and Jinan in Shandong Province, Suqian in Jiangsu Province, and Hengyang in Hunan Province. Fifty intact soybean kernels were selected from each variety as samples. The samples were sealed and stored at room temperature under dry conditions. Before collection, the kernels were manually screened to remove damaged, moldy, insect-infested, and obviously shriveled samples, and surface dust was removed. To minimize the influence of environmental differences on the measurement results, all samples were placed under the same experimental environment for 2 hours to allow the temperature and humidity to reach relative equilibrium before subsequent near-infrared spectroscopy acquisition. The specific implementation steps of this invention are as follows: S1: Diffuse reflectance spectra of individual soybean seeds were acquired using a near-infrared fiber optic spectrometer. The acquisition wavelength range was 1600nm–2400nm, with each spectrum containing 200 data points. The single acquisition exposure time was 1.27ms, and the average value of multiple scans at each measurement point was taken as the final spectrum for that point. Before the formal acquisition, the spectral acquisition device was preheated, and whiteboard calibration was performed before and during acquisition to reduce the impact of ambient light changes, dark current drift, and light source fluctuations on the original spectral signal. Considering the presence of hilum structure, curvature variations, seed coat texture differences, and uneven local component distribution on the surface of individual soybean seeds, four measurement points were set on each soybean seed: the hilum, the back of the hilum, the side, and the top. During the acquisition process, the probe was kept in close contact with the soybean seed surface, and the contact pressure and measurement angle between each measurement point were kept as consistent as possible to reduce random errors caused by human operation. Four near-infrared diffuse reflectance spectra were collected for each soybean at the four measurement points mentioned above, and the variety label, seed number and location number corresponding to each spectrum were recorded simultaneously to form a multi-point near-infrared spectral dataset at the single-seed sample level. Figure 2 This is the original spectrum at four points.

[0014] S2: The acquired raw spectra are input into the preprocessing module. The multi-point near-infrared spectral data undergoes either Savitzky-Golay smoothing (SG), first derivative (FD), second derivative (SD), standard normal variable transformation (SNV), multivariate scattering correction (MSC), or standardization, either individually or in combination, to correct noise interference, baseline drift, scattering differences, and scale differences in the raw spectra. To determine the most suitable preprocessing scheme for this embodiment, the MMGA-Net model is used to evaluate both single and pairwise preprocessing methods, and the accuracy, precision, recall, and F1 score of the training set cross-validation are used as comprehensive evaluation metrics. Table 1 presents the performance comparison results of different preprocessing methods. As shown in Table 1, different preprocessing methods have a significant impact on model performance. After comprehensively comparing various indicators, the combined preprocessing of first derivative and standardization (FD+Standardization) achieves the best overall performance in the MMGA-Net model. Its accuracy, precision, recall and F1 score reach 0.9660, 0.9714, 0.9660 and 0.9654, respectively, which are better than other single preprocessing and combined preprocessing methods. Therefore, in this embodiment, the combination of first derivative and standardization is preferred for preprocessing the original spectrum. Figure 3The images show the preprocessed spectra at four locations. After preprocessing, the preprocessed spectral data are combined on a per-soybean-grain basis, so that the four spectra collected from the four measurement points of the same soybean grain constitute a single package-level sample. This package-level sample serves as the basic input unit for subsequent modeling and discrimination. Modeling of the package-level sample is performed using a multi-instance learning approach. Each single spectrum corresponding to a measurement point within the package is considered an instance sample, and multiple instance samples corresponding to the same soybean grain are grouped under the same package-level label. This ensures that training, validation, and testing are all conducted using a single soybean sample as the smallest unit, avoiding the cross-occurrence of spectra from different locations of the same sample across different datasets. Table 1 Preprocessing Results

[0015] S3: Based on the natural multi-instance structure formed by multi-point near-infrared spectra of single soybean grains, an MMGA-Net variety identification model is constructed. The MMGA-Net variety identification model is an end-to-end identification model integrating multilayer perceptron, multi-instance learning, gated attention mechanism, and additive angular interval loss. It is used for packet-level modeling, feature aggregation, and variety discrimination of multi-point near-infrared spectral data of single soybean grains. The spectral instances corresponding to each measurement point within the packet are input into the shared parameter multilayer perceptron encoder in the MMGA-Net variety identification model. The multilayer perceptron encoder adopts a nonlinear mapping structure composed of a stacked linear layer, a batch normalization layer, and a modified linear unit activation layer, performing instance-level feature extraction, feature compression, and embedding representation learning on the spectra of each point. First, the input spectral data is mapped to a 256-dimensional latent space, and then further compressed into a 128-dimensional instance embedding representation h. k This is to reduce the original spectral dimensionality while retaining spectral shape information that is beneficial for class discrimination.

[0016] S4: Embed the instance corresponding to each measurement point into the representation h k Input the gated attention aggregation module into the MMGA-Net variety identification model. The gated attention aggregation module includes a content branch and a gated branch. The content branch generates a semantic response related to category discrimination through linear transformation and the application of a tanh activation function; the gated branch generates a gated signal that selectively modulates the semantic response through linear transformation and the application of a sigmoid activation function. The outputs of the content branch and the gated branch are multiplied element-wise, and then linearly mapped to obtain the attention score (Score(r)) for each measurement point. ikThe attention scores at each measurement point are normalized into attention weights using the softmax function. A temperature coefficient is introduced during the normalization process to adjust the smoothness of the attention weight distribution at each measurement point. Finally, the embedded features of each instance within the bag are weighted and summed according to the attention weights corresponding to each measurement point to obtain the bag-level representation z corresponding to a single soybean seed. i .

[0017] S5: Represent the packet level z i The classification layer of the MMGA-Net variety identification model is input for variety discrimination training, and an additive angular margin loss is introduced. First, the bag-level representation z... i L2 normalization is performed on the prototype vectors of each category to make the classification judgment based on the normalized feature direction. During the training phase, an additive angular margin is introduced only for the classification output corresponding to the true category, and the classification output is scaled by a scaling factor to enhance the clustering of similar samples in the feature space and expand the angular separation between different categories. Figure 4 This is a schematic diagram illustrating the principle of additive angular spacing loss. Figure 4 It is evident that without the introduction of angular intervals, the sample classification boundaries are relatively close, and blurring of boundaries easily occurs between different categories. Introducing additive angular intervals requires true category samples to be closer to their corresponding category center direction to be correctly classified, thus further tightening the sample-like structures towards the category center in the normalized feature space and promoting larger angular intervals between different categories. Since the additive angular interval is only applied when the true category label is known, it is no longer explicitly introduced during the inference stage. Instead, a score is calculated based on the cosine similarity between the normalized bag-level representation and the normalized category prototype vector, and scaled using a scaling factor S. The category corresponding to the highest score is used as the variety category output for the single soybean seed to be identified, thus completing the construction of a rapid single soybean variety identification model.

[0018] S6: Select soybean seeds to be identified, collect near-infrared diffuse reflectance spectral data using a portable fiber optic spectrometer, preprocess the spectral data and input it into the identification model to complete the rapid identification of single soybean varieties.

[0019] To verify the classification performance of the MMGA-Net variety identification model, it was compared with other comparative models, and the results are shown in Table 2. Furthermore, a comparative analysis was performed on the standard Softmax loss function and the additive angular interval loss, and the results are shown in Table 2. Figure 5 and Figure 6As shown in Table 2, the MMGA-Net variety identification model outperforms other comparative models in terms of accuracy, precision, recall, and F1 score. Specifically, compared to the MLP and 1D-CNN models, the MMGA-Net model improves accuracy by 6.22 percentage points and 29.00 percentage points, respectively, and precision by 5.63 percentage points and 24.84 percentage points, respectively, indicating that the MMGA-Net variety identification model has superior classification performance. Figure 5 (A represents the intra-class / inter-class distance distribution of the standard Softmax loss function, and B represents the intra-class / inter-class distance distribution of the additive angular interval loss.) Figure 6 (A is a visual comparison of t-SNE features using the standard Softmax loss function and B is using the additive angular interval loss) It can be seen that after adopting the additive angular interval loss, the distribution of samples of the same class in the feature space is more concentrated, the overall intra-class distance is reduced, and the overlapping area between different classes is significantly reduced, indicating that this loss function can effectively enhance intra-class compactness and improve inter-class separation; simultaneously, from Figure 6 As can be seen, the clustering structure of each category of samples in the feature space is clearer, the category boundaries are more obvious, and the aliasing phenomenon is significantly reduced. This indicates that after adopting additive angular interval loss, the MMGA-Net variety identification model can obtain clearer discrimination boundaries and stronger feature discrimination ability, thereby achieving better single-seed soybean variety identification results; Table 2 Comparison of results from multiple models

[0020] To analyze how the gating attention aggregation module utilizes information from multiple points, we output the attention weight vectors corresponding to each measurement point and visualize them. Figure 7 Visualization of attention weights. (By...) Figure 7 (A represents the average attention weight under 10 random seeds, B represents the attention weight of correctly classified samples, C represents the attention weight of misclassified samples due to keypoint collapse, and D represents the attention weight of misclassified samples due to peripheral point shift). It can be seen that the relative contributions of different measurement points in single-seed soybean variety identification vary. The average weights of the points corresponding to the seed hilum and the lateral points are relatively high, indicating that these points can provide strong discriminative information. When the classification is correct, the attention weight is usually more concentrated on high-contribution points; when the classification is incorrect, the attention distribution may shift towards low-contribution points. Therefore, the attention weights output by the gated attention aggregation module can be used to analyze the role of different points in single-seed soybean variety identification and provide a reference for subsequent point selection, sampling optimization, and lightweight deployment.

Claims

1. A method for rapid identification of single soybean varieties based on a portable fiber optic spectrometer, characterized in that, It includes the following steps: (1) Select soybean seed samples to be identified, assign a unique seed number to each soybean seed, and collect near-infrared diffuse reflectance spectral data at multiple measurement points on the same soybean seed; during the collection process, record the variety label, seed number and point number corresponding to each spectrum simultaneously to establish a multi-point near-infrared spectral dataset at the single-seed sample level. (2) For each spectrum in the multi-point near-infrared spectral dataset, compare the individual and pairwise preprocessing methods in Savitzky-Golay smoothing, first derivative, second derivative, standard normal variable transformation, multivariate scattering correction and standardization. Use the MMGA-Net model to screen each preprocessing scheme, and use the five-fold cross-validation results of accuracy, precision, recall and F1 score as the comprehensive evaluation basis to determine the optimal preprocessing method for subsequent modeling, so as to correct noise interference, baseline drift, scattering difference and scale difference in the original spectrum; (3) Based on multi-instance learning, the preprocessed multi-point spectral data is modeled at the package level. Multiple spectra of the same soybean seed collected at different measurement points are regarded as a package-level sample, and the single spectra corresponding to each measurement point in the package are regarded as instance samples. The multiple instance samples are assigned to the same package-level label, so that the spectra of each point of the same soybean seed are processed in the same package-level sample form during the training and inference stages. (4) Input the spectral instances corresponding to each measurement point within the package-level sample into the multilayer perceptron encoder with shared parameters. Utilize a unified nonlinear mapping structure to perform instance-level feature extraction, feature compression, and embedding representation learning on the spectra of each point, thereby obtaining the instance embedding representation h that corresponds one-to-one with each measurement point. k ; (5) Embed the instance corresponding to each measurement point into the representation h k The input is a gated attention aggregation module. Within this module, a content branch and a gated branch are constructed for each instance's embedded features. The content branch generates a semantic response related to category discrimination, while the gated branch generates a gated signal that selectively modulates the semantic response. The outputs of the content branch and the gated branch are fused element-wise, and then linearly mapped to obtain the attention score corresponding to each measurement point. The attention score is then normalized to obtain the attention weight corresponding to each measurement point. Based on the attention weight corresponding to each measurement point, the embedded features of each instance within the packet are weighted and summed to obtain a packet-level representation corresponding to a single soybean seed. (6) Input the package-level representation into the classification layer for variety discrimination training. During the training phase, an additive angular margin loss is introduced. Normalization is performed on the package-level representation and the prototype vectors of each category. An angular margin and a scaling factor are introduced into the classification output corresponding to the real category to construct a classification model for single soybean variety identification. During the inference phase, the variety identification model no longer introduces an additive angular margin. Instead, it calculates the score of each category based on the cosine similarity between the normalized package-level representation and the normalized prototype vectors of the categories. The score is then scaled using a scaling factor. The category corresponding to the maximum score is used as the variety category output of the single soybean seed to be identified, thus completing the construction of a fast single soybean variety identification model. (7) Select soybean seeds to be identified, collect near-infrared diffuse reflectance spectral data using a portable fiber optic spectrometer, preprocess the spectral data and input it into the identification model to complete the rapid identification of single soybean varieties.

2. The method for rapid identification of single soybean varieties based on a portable fiber optic spectrometer according to claim 1, characterized in that, The multiple measurement points in step (1) are four measurement points set based on different surface structure regions of a single soybean grain. The four measurement points are the hilum, the back of the hilum, the side, and the top region. Four near-infrared diffuse reflectance spectra are collected for each soybean grain at the four measurement points. In step (3), these four near-infrared diffuse reflectance spectra are used as four instance samples in the same package-level sample. The near-infrared spectrum is collected using a near-infrared fiber optic spectrometer in diffuse reflectance mode. The wavelength range is 1600nm to 2400nm. Each spectrum contains 200 data points. The single acquisition exposure time is 1.27ms. After multiple scans of each measurement point, the average value is taken as the final spectrum of that measurement point.

3. The method for rapid identification of single soybean varieties based on a portable fiber optic spectrometer according to claim 1, characterized in that, The gated attention aggregation module in step (5) adopts a dual-branch structure based on content branch and gated branch. The content branch is processed through a linear layer and then a tanh function is applied to extract semantic responses related to category discrimination. The gated branch is processed through a linear layer and then a sigmoid function is applied to generate a gated signal that selectively modulates the semantic responses. The dual-branch outputs are multiplied element-wise and then linearly mapped to obtain the first... k Attention score at measurement points r ik And using temperature coefficient τ The softmax function normalizes the attention scores at each measurement point to obtain the attention weights corresponding to each measurement point. a ik Its expression is: (1)。 4. The method for rapid identification of single soybean varieties based on a portable fiber optic spectrometer according to claim 1, characterized in that, The additive angular margin loss in step (6) includes the following processing steps: L2 normalization is performed on the packet-level representation and the prototype vectors of each category so that the classification is based on the normalized feature direction; During the training phase, an additive angular margin is introduced only for the classification output corresponding to the true class, and a scaling factor is used to scale the classification output. The classification output during the training phase satisfies the following: (2) in, For the first i The sample corresponds to the first c Class-based output, s As a scale factor, m For additive angular intervals, θ i The angle between the normalized package-level representation and the category prototype vector. y i For the first i The true category of each sample; During the inference phase, a score is assigned based on the cosine similarity between the normalized bag-level representation and the normalized category prototype vector, and the category with the highest score is output.