A propolis component intelligent identification and traceability system and method

By fusing near-infrared spectroscopy and HPLC data with CNN-LSTM models and combining them with blockchain technology, the accuracy and efficiency issues of propolis component identification and traceability have been solved, realizing intelligent propolis component identification and traceability, and enhancing market supervision and consumer trust.

CN120948407BActive Publication Date: 2026-06-23BEIJING ZHIFENGTANG PROD LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING ZHIFENGTANG PROD LTD
Filing Date
2025-09-23
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing propolis component identification technologies suffer from low accuracy and efficiency, difficulty in adapting to market changes, unreliable traceability information, and data processing defects, resulting in poor identification and traceability effects.

Method used

Near-infrared spectroscopy and HPLC multi-source data are fused, and data preprocessing is performed using Savitzky-Golay filtering and Asymmetric Least Squares algorithm. A CNN-LSTM hybrid model is constructed for identification. Blockchain technology is used to ensure the credibility and transparency of traceability information. Feature weights are optimized and the model is updated online through mutual information entropy optimization, thereby realizing intelligent component identification and traceability.

Benefits of technology

It achieves high accuracy and efficiency in identifying propolis components, adapts to market changes, enhances the credibility and transparency of the traceability system, protects consumer rights and market order, and improves regulatory efficiency and the practicality of data processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120948407B_ABST
    Figure CN120948407B_ABST
Patent Text Reader

Abstract

The application discloses a propolis component intelligent identification and tracing system and method, relates to the technical field of computer vision and pattern recognition, and comprises the following steps: a data acquisition module is used for collecting spectral data, chromatographic data and traceability information such as origin and processing; a data preprocessing unit is used for filtering, baseline correction and denoising, eliminating overlapping peaks, normalizing data and synchronizing time stamps; a component feature extraction module is used for screening 10 spectral characteristic peaks, calculating chromatographic characteristics, and reducing the dimensionality to 32 dimensions after splicing through principal component analysis; an intelligent identification module is used for determining purity grades and confidence levels by using a CNN-LSTM model; a traceability management module is used for storing full-link information in a block chain and managing and querying permissions in a distributed database; and a result output module is used for visually displaying results and triggering an alarm when the confidence level or the traceability integrity is not up to standard. The application improves identification accuracy through multi-source fusion, the model can be updated online to adapt to new scenarios, and the quality of propolis and the market order are efficiently guaranteed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision and pattern recognition technology, and in particular to a system and method for intelligent identification and traceability of propolis components. Background Technology

[0002] Current propolis component identification technologies have significant limitations, failing to meet the demands for precision and efficiency. Traditional propolis identification relies heavily on human sensory perception and single detection methods: manual identification, based on observing color, smelling odor, and tasting, is highly subjective, easily influenced by experience, and struggles to detect minor adulteration; while near-infrared spectroscopy is convenient, it only captures surface spectral information and is greatly affected by sample particle size and humidity variations, leading to a high false positive rate when propolis and adulterants have similar spectral characteristics; high-performance liquid chromatography (HPLC), while accurately detecting active ingredient content, is complex, time-consuming, and dependent on professional personnel and laboratory conditions, failing to meet the needs for rapid on-site identification. Furthermore, existing identification models are mostly trained on fixed samples, exhibiting weak generalization ability. When faced with new adulteration methods (such as resin modification and the addition of synthetic flavonoids), model performance rapidly declines, making it difficult to adapt to the constantly evolving types of adulteration in the market, hindering regulatory authorities from effectively combating propolis counterfeiting.

[0003] The current propolis traceability system suffers from unreliable data and a lack of transparency, failing to guarantee consumers' right to know and maintain market order. Existing traceability systems often rely on centralized databases, allowing processing companies to unilaterally modify key data such as origin and processing techniques, posing a high risk of data tampering. Some companies, in an effort to reduce costs, fail to fully record processing parameters or even falsify test reports, leading to a disconnect between traceability information and reality. When consumers search for traceability information, they can only obtain the simple name of the origin, unable to view crucial details such as active ingredient content and testing processes, making it difficult to assess propolis quality. Regulatory authorities must manually verify the authenticity of traceability data, a cumbersome and inefficient process that hinders real-time monitoring of propolis samples across the entire region. When adulterated propolis enters the market, it is difficult to quickly trace the responsible party, resulting in delayed recalls of problematic propolis and harming consumer health and market trust.

[0004] Deficiencies in data processing and model application further hinder the effective implementation of propolis identification and traceability technologies. Existing systems lack a robust data quality control mechanism. Near-infrared spectroscopy acquisition is prone to low signal-to-noise ratios and large spectral fluctuations due to poor instrument stability. Overlapping peaks in HPLC data are not effectively removed, leading to decreased accuracy in subsequent feature extraction. Feature vector construction often fails to consider differences in feature importance, treating low-contribution features such as background spectra equally with high-contribution features such as active ingredient peak areas, increasing computational load and reducing identification accuracy. Furthermore, the models lack continuous optimization capabilities. When new propolis samples from new origins or new types of adulteration appear, extensive re-collection and full-scale training are necessary, which is time-consuming and labor-intensive. The impact of environmental interference on the model is not adequately considered during training, making it difficult for the model to maintain stable performance in practical applications and unable to provide reliable technical support for long-term propolis quality control. Summary of the Invention

[0005] This invention proposes an intelligent identification and traceability system and method for propolis components to solve the problems mentioned in the prior art.

[0006] To achieve the above objectives, the present invention adopts the following technical solution: a propolis component intelligent identification and traceability system, comprising the following modules:

[0007] The data acquisition module includes a near-infrared spectroscopy acquisition unit, a high-performance liquid chromatography (HPLC) acquisition unit, and a traceability information acquisition unit. The near-infrared spectroscopy acquisition unit uses a Fourier transform near-infrared spectrometer, scanning each sample three times to obtain the average spectrum. The HPLC acquisition unit uses an ultraviolet detector, collecting data on five types of active ingredients according to a gradient elution program. The traceability information acquisition unit scans and reads the propolis packaging identification code and collects the information.

[0008] The data preprocessing unit uses Savitzky-Golay filtering to remove noise from near-infrared spectral data and then performs baseline correction using the Asymmetric LeastSquares algorithm; it uses Gaussian peak fitting algorithm to extract chromatographic peaks from high-performance liquid chromatography data and normalizes the two types of data to the [0,1] interval respectively.

[0009] The component feature extraction module includes a spectral feature extraction unit and a chromatographic feature extraction unit. The spectral feature extraction unit selects 10 characteristic peaks and calculates peak parameters, while the chromatographic feature extraction unit calculates chromatographic parameters of 5 types of active ingredients. The two types of features are concatenated into a 60-dimensional initial vector, which is then reduced to 32 dimensions by principal component analysis.

[0010] The intelligent identification module includes a model training unit and a real-time identification unit. The model training unit constructs a "CNN-LSTM" model, trains it with 2000 propolis samples, and combines it with data augmentation. The real-time identification unit takes the 32-dimensional feature vector of the sample to be tested as input and outputs the purity level and confidence level.

[0011] The traceability management module includes a blockchain storage unit and a distributed database unit. The blockchain storage unit deploys 3 nodes and adopts an authorization proof consensus mechanism to upload propolis information to the blockchain, with each record generating a unique hash value. The distributed database unit stores user query permission data, with different users corresponding to different query ranges.

[0012] The results output module includes a visualization terminal unit and an alarm unit. The visualization terminal unit displays the content of active ingredients, purity level, confidence level, and traceability link diagram of the propolis sample in real time. When the identification confidence level is <0.8 or the traceability information integrity is <90%, the alarm unit triggers an audio-visual prompt and pushes abnormal information to the regulatory department's terminal.

[0013] Furthermore, it also includes a component similarity calculation unit, which is used in the intelligent identification module to quantify the component matching degree between the sample to be tested and the standard sample. The calculation method is as follows: Where S is the component similarity; n is the feature dimension; A i B is the normalized value of the i-th feature of the sample to be tested; i The normalized value of the i-th feature of the standard genuine sample is used for calculation. When S ≥ 0.95, it is judged as "high similarity", corresponding to the purity level of "excellent" or "good"; when 0.8 ≤ S < 0.95, it is judged as "medium similarity", which needs to be verified by high performance liquid chromatography of active ingredient content; when S < 0.8, it is judged as "low similarity", and is directly marked as "adulterated" or "non-genuine".

[0014] Furthermore, the traceability management module also includes a traceability credibility assessment unit. The traceability credibility is calculated as C = w1E + w2P, where C is the traceability credibility; w1 and w2 are weighting coefficients; E is the matching degree of the place of origin environment; and P is the completeness of processing records. Through this assessment, when C ≥ 0.9, it is judged as "high credibility", and the traceability information is directly displayed to the public; when 0.7 ≤ C < 0.9, it is judged as "medium credibility", and the processing enterprise needs to supplement the missing records before displaying them; when C < 0.7, it is judged as "low credibility", and the sale and circulation of the propolis sample is restricted, and the regulatory authorities are notified for verification.

[0015] Furthermore, the component feature extraction module also includes a feature weight optimization unit, which uses mutual information entropy to calculate the importance of each feature to the identification result and optimizes the contribution ratio of the feature vector; specifically, it calculates the mutual information value I(X,Y) between each feature and the purity level label, where X is the feature value and Y is the purity level label, sorts them from largest to smallest according to the mutual information value I(X,Y), retains the top 25 high-importance features, and removes 7 low-importance features from the original 32-dimensional features.

[0016] Furthermore, the intelligent identification module also includes an online model update unit, which extracts identification logs from a distributed database for the past 7 days each week. After verification and labeling of the true purity level by a professional testing institution, the logs are added to the training set. The professional testing institution uses liquid chromatography-mass spectrometry to confirm the components. An incremental training method is adopted, fixing the CNN layer parameters and updating only the weights of the LSTM layer and output layer. The learning rate is set to 1e-4, which is 1 / 5 of the initial training learning rate. The update is completed after 20 iterations. After each update, the accuracy is verified through a test set. When the accuracy improves by ≥2%, the new model is saved and the original model is replaced.

[0017] Furthermore, the data acquisition module also includes a data quality detection unit, which detects the signal-to-noise ratio and spectral smoothness of near-infrared spectral data. When the signal-to-noise ratio is <30dB or the fluctuation is >5%, the spectrometer is controlled to rescan the sample. For high-performance liquid chromatography data, the peak resolution and peak shape symmetry are detected. Peak shape symmetry requires a symmetry factor of 0.8-1.2. When the resolution is <1.5 or the symmetry factor is out of range, the mobile phase gradient is adjusted and the sample is re-injected. The completeness of required fields for traceability information is checked, and prompts for supplementation when missing fields are found.

[0018] Furthermore, the result output module also includes a user interaction unit, supporting personalized queries and feedback for users with different roles; consumers can scan the code to view simplified traceability information and identification summaries, processing enterprises can view processing records and identification data of each batch through enterprise terminals, generate quality analysis reports, and regulatory authorities can view the identification anomaly statistics of propolis samples across the region through regulatory terminals, then initiate special verification and record the results into the blockchain.

[0019] Furthermore, a method for intelligent identification and traceability system of propolis components includes the following steps:

[0020] Step 1: Data acquisition and preprocessing. Diffuse reflectance spectra of propolis samples were acquired using a near-infrared spectrometer, with each sample scanned three times and the average spectrum taken. Chromatographic data of five types of active ingredients were acquired using high-performance liquid chromatography. Propolis identification codes were read using a barcode scanner to associate and collect traceability information. Savitzky-Golay filtering and Asymmetric Least Squares baseline correction were applied to the spectral data to normalize both types of data to the [0,1] interval.

[0021] Step 2: Component feature extraction. Ten spectral feature peaks were selected using partial least squares discriminant analysis, and the peak area, half width at half maximum (FWHM), and peak intensity of each feature peak were calculated. The retention time, peak height, and peak area of ​​five active components in high performance liquid chromatography (HPLC) were calculated. The spectral features and chromatographic features were concatenated to form an initial feature vector. Principal component analysis was used to reduce the dimensionality of the initial feature vector to a 32-dimensional feature vector. The feature weights were then optimized using mutual information entropy, retaining 25 highly important features.

[0022] Step 3: Intelligent identification. The optimized feature vector is input into the "CNN-LSTM" hybrid model, which outputs the purity level and confidence score; simultaneously, the component similarity S is calculated using the following method: When S≥0.95 and confidence level≥0.8, it is judged as genuine; when S<0.8, it is directly marked as adulterated. Every week, the identification logs of the past 7 days are extracted from the distributed database, and low confidence samples with confidence level <0.7 are screened. After being reviewed and labeled by a professional testing agency, they are added to the training set, and the model parameters are updated by incremental training.

[0023] Step 4: Traceability Management. Upload the origin environment data, processing parameters, and test report number to the blockchain storage. The blockchain adopts an authorization proof consensus mechanism, and each record generates a unique hash value. Calculate the traceability credibility: C = w1E + w2P. When C ≥ 0.9, the complete traceability link is displayed. When C < 0.7, the circulation of the propolis sample is restricted and the regulatory authorities are notified.

[0024] Step 5: Results Output. The active ingredient content, purity level, and traceability link of the propolis sample are displayed through a visual terminal. An audible and visual alarm is triggered when the identification confidence level is <0.8 or the traceability confidence level C is <0.7. Consumers can query a simplified version of the traceability information through mobile devices, processing enterprises can query quality analysis reports through enterprise terminals, and regulatory departments can query identification anomaly statistics and provide feedback on the verification results through regulatory terminals.

[0025] Furthermore, the training process of the "CNN-LSTM" hybrid model in step three also includes cross-validation and data augmentation optimization. Five-fold cross-validation is used, dividing the 2000 training samples into 5 groups. Each time, 4 groups are used as the training set and 1 group is used as the validation set. The cross-validation is completed in 5 cycles. The accuracy and loss value of each validation are calculated, and the average value is taken as the initial performance of the model. Data augmentation adds Gaussian noise and spectral rotation to simulate environmental interference in the actual acquisition process. After training, the model is validated on a test set containing 100 new samples.

[0026] Furthermore, the process of uploading and reviewing traceability information in step four also includes multi-level verification. After the origin testing station uploads environmental data, it needs to submit soil testing reports and meteorological records to the regulatory department. The data is uploaded to the blockchain after being reviewed and approved by the regulatory department. When processing enterprises upload process parameters, they need to associate them with monitoring video clips of the production workshop. The blockchain node verifies that the video timestamp is consistent with the parameter upload time to confirm the validity of the data. When warehousing enterprises upload temperature and humidity data, they need to upload it once every 24 hours. The fluctuation of temperature and humidity data for three consecutive days is considered valid if it is controlled within ±2℃ and ±5%RH.

[0027] Compared with existing technologies, the beneficial effects of this invention are:

[0028] In terms of component identification, the system employs multi-source data fusion of near-infrared spectroscopy and HPLC. This leverages the rapid detection advantages of near-infrared spectroscopy while precisely capturing the characteristics of active ingredients through HPLC. Combined with data preprocessing techniques such as Savitzky-Golay filtering and baseline correction, it effectively eliminates interference from sample condition and environmental factors, significantly reducing the probability of misclassification of similar components. The "CNN-LSTM" hybrid model can simultaneously extract the spatial distribution and temporal correlation of features. With feature weight optimization, it strengthens the support of high-contribution features for identification, making the identification results more accurate. Even when faced with novel adulterants, the model can be quickly adapted through online updates, maintaining stable identification performance over the long term. This provides reliable technical support for regulatory authorities to combat counterfeiting and for enterprises to control the quality of raw materials.

[0029] The credibility and transparency of the traceability system have been significantly improved, effectively protecting consumer rights and maintaining market order. The system uses blockchain technology to store traceability information, with a multi-node consensus mechanism and unique hash values ​​ensuring data immutability. Multiple entities, including processing enterprises, origin testing stations, and regulatory departments, participate in information uploading, preventing fraud by a single entity. The traceability credibility assessment and multi-level verification mechanism can filter out incomplete or inaccurate traceability data, restricting the circulation of low-credibility propolis and compelling enterprises to standardize processing parameters and improve origin information. Consumers can conveniently access detailed information about the active ingredient content, testing process, and responsible parties throughout the entire supply chain of propolis via mobile devices, moving beyond simply relying on the name of origin to judge quality and enhancing consumer confidence. Regulatory departments can view real-time statistics on abnormal propolis identification within their region, quickly locating high-risk enterprises and the source of problematic propolis, improving regulatory efficiency and maintaining a fair competitive market environment.

[0030] The system's optimizations in data processing and user adaptability enhance its practical value and implementation capabilities. The data quality detection unit monitors the validity of spectral and chromatographic data in real time, ensuring input data reliability through rescanning and adjusting the mobile phase gradient, laying a high-quality foundation for subsequent feature extraction and model identification. The user interaction unit is designed with differentiated functions for consumers, processing enterprises, and regulatory authorities. Consumers obtain simplified and easy-to-understand traceability summaries, enterprises generate quality analysis reports to assist production management, and regulatory authorities achieve rapid verification of abnormal data, meeting the actual needs of different roles. Overall, this invention achieves intelligent processing throughout the entire process of "component identification - traceability management - result application," solving the problems of low efficiency and unreliable traceability in traditional methods, while adapting to long-term market changes and multi-scenario application needs. It promotes the propolis industry towards quality control and information transparency, possessing both economic and social value. Attached Figure Description

[0031] Figure 1 This is a schematic block diagram of the intelligent identification and traceability system for propolis components proposed in this invention;

[0032] Figure 2 This is a schematic block diagram of the intelligent identification and traceability method for propolis components proposed in this invention;

[0033] Figure 3 Grouped bar charts comparing the accuracy of propolis identification under different types of adulteration;

[0034] Figure 4 Line graph showing the change in the credibility of propolis traceability under different storage methods;

[0035] Figure 5 A line graph showing the long-term accuracy changes of the model;

[0036] Figure 6 Grouped bar charts comparing detection time under different application scenarios. Detailed Implementation

[0037] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0038] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," and "counterclockwise," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this invention.

[0039] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, features defined with "first" and "second" may explicitly or implicitly include one or more of the stated features. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified. Furthermore, the terms "installed," "connected," and "linked" should be interpreted broadly; for example, they may refer to a fixed connection, a detachable connection, or an integral connection; they may refer to a mechanical connection or an electrical connection; they may refer to a direct connection or an indirect connection through an intermediate medium; and they may refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances. The invention will now be described in further detail with reference to the accompanying drawings.

[0040] Reference Figures 1 to 6 A smart identification and traceability system for propolis components, comprising the following modules:

[0041] The data acquisition module is used to collect component detection data and traceability information from propolis samples. This module includes a near-infrared spectroscopy acquisition unit, a high-performance liquid chromatography acquisition unit, and a traceability information acquisition unit. The near-infrared spectroscopy acquisition unit uses a Fourier transform near-infrared spectrometer with a resolution of 4 cm⁻¹. -1 The wavelength range is 800-2500nm, each sample is scanned 3 times, and the average spectrum of the 3 scans is taken as the final spectral data; the high performance liquid chromatography (HPLC) acquisition unit uses an ultraviolet detector with a detection wavelength of 280nm, and uses methanol-0.1% phosphoric acid aqueous solution as the mobile phase, and adopts a gradient elution program: 0-10 minutes methanol content 30%-50%, 10-20 minutes methanol content 50%-70%, 20-30 minutes methanol content 70%-90%, to collect chromatographic data of 5 major active ingredients in propolis, including flavonoids and phenolic acids; the traceability information acquisition unit reads the unique identification code of propolis packaging with a barcode scanner and associates it with the latitude and longitude of the place of origin, harvest time, name and qualification number of the processing enterprise, and storage temperature and humidity. The latitude and longitude of the place of origin is accurate to within 10m, the harvest time is accurate to the hour, and the storage temperature and humidity are controlled at 5-25℃ and 40%-60%RH;

[0042] The data preprocessing unit optimizes the quality of the acquired data. For near-infrared spectral data, Savitzky-Golay filtering is used to remove high-frequency noise. The filtering window size is 11 points, and the polynomial order is 2. Baseline correction is then performed using the Asymmetric Least Squares algorithm to eliminate baseline drift. For high-performance liquid chromatography data, a Gaussian peak fitting algorithm is used to extract chromatographic peaks, eliminating overlapping peaks with a resolution less than 1.5 and retaining the chromatographic peaks of the five main active ingredients. Both types of data are normalized to the [0,1] interval, with the timestamp synchronization error controlled within 5ms, thus achieving correlation between component data and traceability information.

[0043] The component feature extraction module is used to extract identification features of propolis samples, including a spectral feature extraction unit and a chromatographic feature extraction unit. The spectral feature extraction unit uses partial least squares discriminant analysis to screen 10 characteristic peaks, with corresponding wavelengths of 1120nm, 1380nm, 1510nm, 1760nm, 1940nm, 2100nm, 2250nm, 2320nm, 2410nm, and 2480nm, and calculates the peak area, half-width at half-maximum, and peak intensity of each characteristic peak. The chromatographic feature extraction unit calculates the retention time, peak height, and peak area of ​​5 types of active ingredients, with the retention time of caffeic acid phenethyl ester controlled at 14.5±0.2 minutes. The spectral and chromatographic features are concatenated to form an initial feature vector with a dimension of 60, and then principal component analysis is used to reduce the dimension to a 32-dimensional feature vector, removing redundant features.

[0044] The intelligent identification module is used to determine the purity and authenticity of propolis samples. It includes a model training unit and a real-time identification unit. The model training unit constructs a "CNN-LSTM" hybrid model. The CNN layer contains 3 convolutional blocks, each containing 2 3×3 convolutional layers and 1 2×2 pooling layer to extract the spatial distribution features of the feature vectors. The LSTM layer has 128 hidden units and a dropout value of 0.2 to capture the temporal correlation between features. The training set contains 2000 propolis samples, of which 1500 are genuine samples, covering 10 major producing areas. 00 samples were adulterated, with adulterants including gum, starch, and pigments. The model was trained using a cross-entropy loss function and data augmentation techniques were used to improve generalization ability. Data augmentation included spectral shift ±5nm and scaling ±10%. The real-time identification unit input the 32-dimensional feature vector of the sample to be tested into the trained model and output the purity level and corresponding confidence level. The purity level was divided into four categories: excellent, good, qualified, and adulterated. The excellent level corresponds to a total active ingredient content of ≥20%, the good level corresponds to 15%-20%, the qualified level corresponds to 10%-15%, and the adulterated level corresponds to <10% or contains non-propolis ingredients.

[0045] The traceability management module is used to store and query information about the entire propolis supply chain. It includes a blockchain storage unit and a distributed database unit. The blockchain storage unit deploys three nodes: the origin testing station, the processing enterprise, and the regulatory department. It adopts an authorized proof consensus mechanism to upload information such as the origin environment data, processing parameters, and test report numbers of the propolis to the blockchain. The origin environment data includes soil pH value of 5.5-7.5, annual average temperature of 15-25℃, and rainfall of 600-1200mm. The processing parameters include extraction temperature of 60-80℃, extraction time of 2-4 hours, and purification times of 2-3 times. Each record uses the SHA-256 algorithm to generate a unique hash value, ensuring that the data is tamper-proof. The distributed database unit stores user query permission data. Enterprise users can query the processing technology, consumers can query the origin and purity level, and regulatory departments can query the entire supply chain data. It supports multi-dimensional retrieval by unique identification code, origin, and processing time.

[0046] The results output module is used to display the identification and traceability results, including a visualization terminal unit and an alarm unit. The visualization terminal unit displays the active ingredient content, purity level, confidence level, and traceability link diagram of the propolis sample in real time. The traceability link diagram covers the time nodes and responsible entities from harvesting, processing, testing, storage to sales. The active ingredient content includes the specific component proportions such as 1.2% galangin and 0.8% galangin. When the identification confidence level is <0.8 or the traceability information completeness is <90%, the alarm unit triggers an audible and visual prompt, with a buzzer frequency of 1.5kHz and a flashing yellow light, while simultaneously pushing abnormal information to the regulatory department's terminal.

[0047] This invention also includes a component similarity calculation unit, which is used in the intelligent identification module to quantify the component matching degree between the sample to be tested and the standard sample. The calculation method is as follows: Where S is the component similarity, ranging from [0,1]; n is the feature dimension, taking a value of 32, corresponding to the dimension of the feature vector after dimensionality reduction; A i B is the normalized value of the i-th feature of the sample to be tested; i This is the normalized value of the i-th feature of the standard genuine sample, where the standard sample is the mean feature value of genuine propolis from each major producing area. Using this calculation, when S ≥ 0.95, it is judged as "high similarity," corresponding to a purity level of "excellent" or "good"; when 0.8 ≤ S < 0.95, it is judged as "medium similarity," requiring further verification by combining high-performance liquid chromatography (HPLC) with active ingredient content analysis; when S < 0.8, it is judged as "low similarity," directly labeled as "adulterated" or "non-genuine," reducing misjudgments caused by a single model, especially when adulterants have similar spectral characteristics to propolis, such as both resin and propolis absorbing at a wavelength of 1380 nm. The difference can be quantified through similarity measurement, improving the accuracy of identification.

[0048] In this invention, the traceability management module further includes a traceability credibility assessment unit, used to assess the reliability of propolis traceability information. The calculation method is C = w1E + w2P; where C is the traceability credibility, with a value range of [0,1]; w1 and w2 are weight coefficients, where w1 takes a value of 0.6 and w2 takes a value of 0.4, and the sum of w1 and w2 is 1; E is the origin environment matching degree, which is the cosine similarity between the origin environment data associated with the sample to be tested and the standard environment data of the origin area in the database, with a value range of [0,1]; P is the processing record completeness, which is the ratio of the number of fields filled in the processing record to the total number of fields, the total number of fields includes 8 items such as extraction temperature, time, and purification times, with a value range of [0,1]. Based on this assessment, when C ≥ 0.9, it is judged as "highly reliable," and the traceability information is directly displayed to the public; when 0.7 ≤ C < 0.9, it is judged as "medium reliable," and the processing enterprise needs to supplement the missing records before it is displayed; when C < 0.7, it is judged as "low reliable," and the sale and circulation of the propolis sample is restricted. At the same time, the regulatory authorities are notified to verify the information to avoid misleading consumers with false traceability information.

[0049] In this invention, the component feature extraction module further includes a feature weight optimization unit, which uses mutual information entropy to calculate the importance of each feature to the identification result and optimizes the contribution ratio of the feature vector. Specifically, it calculates the mutual information value I(X,Y) between each feature and the purity level label, where X is the feature value and Y is the purity level label. The features are sorted from largest to smallest according to the mutual information value I(X,Y), and the top 25 high-importance features are retained. Seven low-importance features are removed from the original 32-dimensional features, and higher weights are assigned to the high-importance features. The high-importance features include the peak area of ​​flavonoid components and the spectral intensity at a wavelength of 1510nm, with a weight coefficient of 1.2-1.5. The low-importance features are assigned a weight coefficient of 0.5-0.8, and the low-importance features include some background spectral features. The optimized feature vector is input into the intelligent identification model, which can reduce the interference of redundant features on the model. In the identification of adulterated samples, the false positive rate of the model is reduced from 8% before optimization to below 3%, and the model inference time is shortened from 500ms to 350ms.

[0050] In this invention, the intelligent identification module also includes an online model update unit, which periodically optimizes model parameters to adapt to new adulteration types and origin samples. It extracts identification logs from a distributed database for the past 7 days weekly, with a sample size ≥ 500. Low-confidence samples (<0.7) are selected, verified by a professional testing institution, and labeled with their true purity level before being added to the training set. The professional testing institution uses liquid chromatography-mass spectrometry to confirm the components. An incremental training method is used, fixing the CNN layer parameters and updating only the LSTM layer and output layer weights. The learning rate is set to 1e-4, which is 1 / 5 of the initial training learning rate. The update is completed after 20 iterations. After each update, the accuracy is verified using a test set containing 200 new samples not used in training. When the accuracy improves by ≥ 2%, the new model is saved and the original model is replaced. This unit avoids performance degradation caused by upgraded adulteration methods or the addition of new origin samples. For example, the model can still accurately identify new types of modified resin adulteration. During long-term operation, the model's identification accuracy remains above 95%.

[0051] In this invention, the data acquisition module also includes a data quality detection unit to monitor the validity of the acquired data in real time. For near-infrared spectral data, the unit detects the signal-to-noise ratio (SNR) and spectral smoothness. The SNR is ≥30dB, calculated by the ratio of peak signal to noise signal. Spectral smoothness requires the spectral intensity fluctuation of 10 consecutive wavelength points to be ≤5%. When the SNR is <30dB or the fluctuation is >5%, the spectrometer is controlled to rescan the sample. For high-performance liquid chromatography (HPLC) data, the unit detects the peak resolution and peak shape symmetry. The peak resolution of the main active ingredient is ≥1.5, calculated by the ratio of the retention time difference of adjacent peaks to the average peak width. Peak shape symmetry requires a symmetry factor of 0.8-1.2. When the resolution is <1.5 or the symmetry factor is out of range, the mobile phase gradient is adjusted and the sample is re-injected. For traceability information, the unit detects the completeness of required fields, including latitude and longitude of origin, harvest time, and processing enterprise qualifications. If one or more fields are missing, the unit prompts the data acquisition personnel to supplement them. This unit ensures the reliability of data input to subsequent modules and reduces identification errors and missing traceability information caused by data quality issues.

[0052] In this invention, the result output module also includes a user interaction unit, supporting personalized queries and feedback for users with different roles. Consumers can scan the unique identification code on the propolis packaging to view simplified traceability information and identification result summaries on their mobile devices. The simplified traceability information includes the place of origin, purity level, and testing institution name. Consumers can submit satisfaction ratings, with ratings ranging from 1 to 5 stars. Processing enterprises can view detailed processing records and batch identification data through their enterprise terminals, generating quality analysis reports. These reports include monthly pass rates and the proportion of raw materials from each production area. Regulatory authorities can view identification anomaly statistics for propolis samples across the entire region through their regulatory terminals. These anomaly statistics include the number of adulterated samples and a list of high-risk enterprises. Regulatory authorities can initiate special investigations and record the results on the blockchain. This unit can meet the information needs of different users, improving the system's usability and user participation.

[0053] This invention includes the following steps:

[0054] Step 1: Data Acquisition and Preprocessing. The diffuse reflectance spectrum of the propolis sample was acquired using a near-infrared spectrometer with a resolution of 4 cm⁻¹. -1 The wavelength range was 800-2500 nm, and each sample was scanned three times. The average spectrum of the three scans was taken. High-performance liquid chromatography (HPLC) was used to collect chromatographic data of five types of active ingredients. The HPLC was equipped with an ultraviolet detector with a detection wavelength of 280 nm. The mobile phase was methanol-0.1% phosphoric acid aqueous solution, and a gradient elution program was used. The unique identification code of the propolis packaging was read by a barcode scanner to collect traceability information such as place of origin, processing, and storage. Savitzky-Golay filtering and Asymmetric Least Squares baseline correction were used for the spectral data. Gaussian peak fitting was used to extract the effective peaks for the chromatographic data. The two types of data were normalized to the [0,1] interval respectively, and the timestamp synchronization error was controlled within 5ms.

[0055] Step 2: Component feature extraction. Ten spectral feature peaks were selected using partial least squares discriminant analysis, with corresponding wavelengths of 1120 nm, 1380 nm, etc. The peak area, half width at half maximum (FWHM), and peak intensity of each feature peak were calculated. The retention time, peak height, and peak area of ​​five types of active components in high performance liquid chromatography (HPLC) were calculated. The spectral and chromatographic features were concatenated to form an initial feature vector. Principal component analysis was used to reduce the dimensionality of the initial feature vector to a 32-dimensional feature vector. The feature weights were then optimized using mutual information entropy, retaining 25 highly important features.

[0056] Step 3: Intelligent identification. The optimized feature vector is input into the "CNN-LSTM" hybrid model, which outputs the purity level and confidence score; simultaneously, the component similarity S is calculated using the following method: When S≥0.95 and confidence level≥0.8, it is judged as genuine; when S<0.8, it is directly marked as adulterated. Every week, the identification logs of the past 7 days are extracted from the distributed database, and low confidence samples with confidence level <0.7 are screened. After being reviewed and labeled by a professional testing agency, they are added to the training set. The model parameters are updated by incremental training to maintain the model's identification accuracy.

[0057] Step 4: Traceability Management. Data on the origin environment, processing parameters, and test report numbers are uploaded to the blockchain for storage. Origin environment data includes soil pH and average annual temperature, while processing parameters include extraction temperature and time. The blockchain uses a proof-of-authority consensus mechanism, generating a unique hash value for each record. The traceability credibility C is calculated as C = w1E + w2P. When C ≥ 0.9, a complete traceability chain is displayed; when C < 0.7, the circulation of the propolis sample is restricted, and regulatory authorities are notified.

[0058] Step 5: Results Output. The active ingredient content, purity level, and traceability link of the propolis sample are displayed through a visual terminal. An audible and visual alarm is triggered when the identification confidence level is <0.8 or the traceability confidence level C is <0.7. Consumers can query a simplified version of the traceability information through mobile devices, processing enterprises can query quality analysis reports through enterprise terminals, and regulatory departments can query identification anomaly statistics and provide feedback on the verification results through regulatory terminals.

[0059] In this invention, the training process of the "CNN-LSTM" hybrid model in step three also includes cross-validation and data augmentation optimization. Five-fold cross-validation is used, dividing the 2000 training samples into 5 groups. Each time, 4 groups are used as the training set and 1 group as the validation set, and this process is repeated 5 times to complete the cross-validation. The accuracy and loss value of each validation are calculated, and the average value is taken as the initial performance of the model. The initial accuracy is 92%, and the loss value is 0.15. In addition to spectral translation and scaling, data augmentation also incorporates Gaussian noise and spectral rotation. The standard deviation of the Gaussian noise is 0.01, and the spectral rotation angle is ±2° to simulate environmental interference during actual data acquisition, such as temperature fluctuations and differences in sample granularity, enhancing the model's adaptability to complex scenarios. After training, the model is validated on a test set containing 100 new samples. The test set accuracy reaches 95%, and the spoofed sample identification rate reaches 98%, meeting the actual identification requirements.

[0060] In this invention, the upload and review process for traceability information in step four also includes multi-level verification. After the origin testing station uploads environmental data, it needs to submit soil testing reports and meteorological records to the regulatory department. Only data approved by the regulatory department can be uploaded to the blockchain, with a pass rate of ≥90%. When processing enterprises upload process parameters, they need to associate them with monitoring video clips from the production workshop. The video duration must be ≥5 minutes, and the video must include the temperature display of the extraction equipment. The blockchain node verifies that the video timestamp matches the parameter upload time to confirm the data validity. When warehousing enterprises upload temperature and humidity data, they need to upload it once every 24 hours. Temperature and humidity data fluctuations within ±2℃ and ±5%RH for three consecutive days are considered valid. This multi-level verification can prevent the falsification of traceability information, keep the blockchain-stored data authentic, and enhance consumers' trust in propolis traceability information.

[0061] The following two examples further illustrate the specific implementation of this system:

[0062] Example 1: Identification and traceability of raw materials entering the warehouse of propolis production enterprises (50 batches of raw materials processed daily, covering 8 major production areas)

[0063] 1. System Deployment and Technical Parameter Configuration

[0064] This embodiment is applied to the raw material receiving stage of a large propolis production enterprise. The enterprise receives propolis raw materials daily from eight major production areas, including Heilongjiang, Xinjiang, and Yunnan. It needs to quickly complete purity identification and traceability information binding to prevent adulterated raw materials from entering the production process. The system hardware configuration is as follows: The near-infrared spectroscopy acquisition unit uses a Thermo Scientific Antaris II Fourier transform near-infrared spectrometer (4cm resolution). -1 The system has a wavelength range of 800-2500nm, weighs 15kg, and is compatible with laboratory workbenches. The HPLC unit uses an Agilent 1260 Infinity II high-performance liquid chromatograph (UV detector model G1314F, detection wavelength 280nm, column oven temperature control accuracy ±0.1℃). The traceability information acquisition unit uses a Newland NLS-HR22 barcode scanner (scanning speed 300 times / second, supports UUID code and QR code recognition). The blockchain nodes are deployed in the company's raw material testing laboratory (1 local node), testing stations of cooperatives in various production areas (8 remote nodes), and local market supervision bureaus (1 regulatory node). It adopts the Hyperledger Fabric blockchain framework, with a consensus mechanism of Proof-of-Authority (PoA) and a block time of ≤3 seconds.

[0065] 2. Specific Implementation Process

[0066] (1) Data Acquisition and Preprocessing

[0067] When raw materials are received, three samples are randomly selected from each batch of propolis (each sample weighing 50g, pulverized to 80 mesh size to avoid uneven particle size affecting spectral acquisition). Near-infrared spectral acquisition: The sample is placed in a quartz cuvette (1mm thick) and placed on the spectrometer sample stage. Each sample is scanned three times (each scan lasting 15 seconds), and the average spectrum is taken as the spectral data of that sample. During the scanning process, the laboratory ambient temperature is controlled at 23±2℃ and the humidity at 45±5%RH to avoid environmental factors interfering with spectral stability. HPLC Acquisition: Take 5g of sample and extract with methanol by ultrasonication for 30 minutes (power 300W, temperature 40℃). Filter the extract through a 0.22μm organic filter membrane and inject (injection volume 10μL). The mobile phase gradient elution program is refined as follows: 0-5 minutes methanol 30%-40%, 5-10 minutes 40%-50%, 10-15 minutes 50%-60%, 15-20 minutes 60%-70%, 20-25 minutes 70%-80%, 25-30 minutes 80%-90%, flow rate 1.0mL / min, column temperature 30℃. Record the chromatographic peaks of 5 active ingredients (guarcin, galangin, caffeic acid phenethyl ester, apigenin, pinocembrin). Traceability Information Collection: The UUID code of each batch of propolis packaging is read by a barcode scanner (generated in advance by the cooperative in the production area and associated with the latitude and longitude of the production area, such as N45°23′ and E127°15′ in Heilongjiang Province, with an accuracy within 10m). The harvest time (accurate to the hour, such as 2024-05-10 09:30), the name of the cooperative (Heilongjiang Raohe Propolis Cooperative), and the storage temperature and humidity (the storage temperature in the production area is 15℃ and the humidity is 50%RH) are entered.

[0068] Preprocessing: Savitzky-Golay filtering (11-point window, polynomial order 2) was applied to the spectral data, significantly reducing high-frequency noise in the original spectra (such as baseline fluctuations near 2000nm wavelength), and improving the signal-to-noise ratio from 28dB to 35dB. Baseline correction was performed using the Asymmetric Least Squares algorithm to eliminate spectral baseline drift (baseline flatness improved by 40% after correction). Gaussian peak fitting algorithm was used to extract chromatographic peaks of 5 active ingredients from the HPLC data, removing one overlapping peak (the peak resolution of pinocembrin and apigenin at retention time 18.2 minutes was 1.2, less than 1.5), and retaining 4 effective peaks. The spectral and HPLC data were normalized to the [0,1] interval, and the timestamp synchronization error was controlled within 3ms to ensure that the component data of each batch of samples corresponded one-to-one with the traceability information.

[0069] (2) Component feature extraction and intelligent identification

[0070] Feature extraction: The spectral feature extraction unit used partial least squares discriminant analysis (PLS-DA) to select 10 characteristic peaks from the normalized spectral data. Among them, 1120nm corresponds to the CH stretching vibration of polysaccharide components in propolis, 1380nm corresponds to the OH bending vibration of flavonoid components, and 1510nm corresponds to the skeletal vibration of benzene ring. The peak area (e.g., 1380nm peak area 0.82), half width at half maximum (FWHM) (0.8nm), and peak intensity (0.75) of each characteristic peak were calculated. The chromatographic feature extraction unit calculated the retention time (phenethyl caffeate 14.4 min, guarin 16.2 min, galangin 19.5 min, apigenin 22.3 min), peak height (phenethyl caffeate peak height 0.65 AU), and peak area (0.78) of the four types of effective active ingredients. The initial 60-dimensional feature vector (10 spectral features × 3 parameters + 4 chromatographic features × 3 parameters) was dimensionality reduced by principal component analysis (PCA), and the top 32 principal components (cumulative contribution rate of 92%) were selected as the final feature vector.

[0071] Intelligent identification: The 32-dimensional feature vector is input into a CNN-LSTM hybrid model, which is deployed on an enterprise local server (CPU: Intel Xeon E5-2680, GPU: NVIDIA Tesla V100), with an inference time of 320ms / sample. Component similarity is also calculated simultaneously. Where A i For the characteristics of the sample to be tested (e.g., the normalized value of the 1380nm peak area is 0.82), B i Based on the standard sample characteristics (mean values ​​of genuine propolis characteristics from 8 major producing areas, with a normalized peak area of ​​1380nm of 0.85), the calculated S = 0.96. Combined with the model output confidence score of 0.88, this batch of propolis is classified as "Good" (total active ingredient content 18.5%). For a batch of adulterated raw materials (containing 20% ​​resin), the model output confidence score is 0.65, S = 0.75, directly marking it as "adulterated," triggering an alarm, and rejecting it from storage.

[0072] (3) Traceability Management and Results Output

[0073] Traceability Management: The company's testing laboratory entered the production environment data of this batch of propolis (soil pH value of 6.2, average annual temperature of 18℃, and annual precipitation of 750mm in Raohe, Heilongjiang Province) and the warehousing test report number (J20240510001) into the system and submitted it to the blockchain through a blockchain node; the cooperative testing station in the production area had previously uploaded the information of the harvester (Zhang San, certificate number NY2023001) and the initial inspection data of the raw materials (active ingredient content of 17.8%); after the local regulatory bureau node verified the integrity of the data, it generated a unique hash value (SHA-256 algorithm, hash value of 0x7a3f...) and completed the on-chain storage. The traceability credibility C = w1E + w2P is calculated, where w1 = 0.6, w2 = 0.4, E is the matching degree of the production environment (the cosine similarity between the environmental data of the sample to be tested and the standard data of the Heilongjiang production area is 0.93), and P is the completeness of the processing record (8 fields such as temperature and time have been filled in, with a completeness of 1.0). The result is C = 0.6 × 0.93 + 0.4 × 1.0 = 0.958, which is judged as "high credibility", and the traceability information is displayed to the public.

[0074] Results output: The visualization terminal displays the content of active ingredients in this batch of propolis (2.1% phenethyl caffeate, 1.8% galangin, 1.5% apigenin, 1.2%), purity grade "Good", confidence level 0.88, and traceability chain diagram (harvesting by Raohe Cooperative → cold chain transportation → enterprise testing → warehousing); the enterprise's production department can view the identification results of each batch of raw materials through the terminal and generate a monthly raw material quality report (500 batches were warehoused in May, with a pass rate of 98% and 2 batches adulterated); consumers can scan the product packaging UUID code to view a simplified traceability information (place of origin: Heilongjiang, purity: Good, testing institution: a certain enterprise laboratory).

[0075] 3. Performance Comparison Data

[0076] Table 1: Comparison of Raw Material Identification and Traceability Performance of Propolis Production Enterprises

[0077]

[0078] Table 1 is based on 6 months of operational data from the company (a total of 12,000 batches of raw materials processed). Traditional methods rely on manual HPLC operation, requiring 90 minutes for identification per batch, and lack near-infrared rapid detection, resulting in extremely low efficiency. Adulteration identification relies solely on single chromatographic data, achieving a recognition rate of less than 50% for minor adulteration (e.g., 5% resin), leading to an 88% warehousing pass rate. This invention, through parallel acquisition of multi-source data, reduces the identification time to 15 minutes, improving efficiency by 83%. Multi-feature fusion and similarity verification achieve an adulteration identification rate of 98%, intercepting two batches of adulterated raw materials. Blockchain storage ensures 99% integrity of traceability information, avoiding the tampering problems of traditional centralized databases. The model is incrementally trained weekly with low-confidence samples (e.g., 30 samples / week), maintaining an accuracy of 95% after 6 months, far exceeding the 82% of traditional methods. This effectively adapts to changes in raw material batches and upgrades in adulteration methods, providing stable technical support for the company's raw material quality control.

[0079] Example 2: Market supervision department's random inspection of the distribution process (an average of 20 batches per day, covering supermarkets, pharmacies, and specialty stores)

[0080] 1. System Deployment and Technical Parameter Configuration

[0081] This embodiment is applied to the sampling inspection of propolis products in a prefecture-level city's market supervision bureau. Supervisory personnel conduct on-site sampling inspections of finished propolis products (such as propolis capsules and liquid propolis) in supermarkets, pharmacies, and specialty stores, requiring rapid component identification and traceability verification to combat counterfeit and substandard products. The system hardware configuration prioritizes portability: the near-infrared spectral acquisition unit uses a Marine Optics NIRQuest512-2.5 near-infrared spectrometer (8cm resolution). -1 The system has a wavelength range of 900-2500nm, weighs 0.8kg, and has a battery life of 8 hours. The HPLC unit uses a Shimadzu Prominence-iLC-2030Plus portable high-performance liquid chromatograph (40×30×20cm, 12kg, compatible with vehicle power supply). The traceability information collection unit uses a Huawei Mate60 Pro mobile phone (with built-in QR code scanning function and supports real-time data upload via 5G network). The blockchain nodes are deployed in the municipal regulatory bureau's data center (1 main node), district regulatory offices (5 sub-nodes), and third-party testing institutions (2 verification nodes), adopting a consortium blockchain architecture with a data upload latency of ≤500ms.

[0082] 2. Specific Implementation Process

[0083] (1) Data Acquisition and Preprocessing

[0084] Regulatory personnel conducted random inspections of a certain brand of propolis capsules (0.5g / capsule, batch number 20240401) at a supermarket. Three capsules were randomly selected, and the contents (total 1.5g) were extracted, mixed thoroughly, and used as a sample. Near-infrared spectroscopy was performed: the sample was placed in the fiber optic probe of a portable spectrometer (sampling spot diameter 5mm), and scanned three times in the supermarket's ambient temperature environment (25℃, 50% RH), with each scan lasting 10 seconds. The spectral data was transmitted to a mobile phone via Bluetooth. To avoid ambient light interference, a light shield was used to cover the probe and sample during scanning. HPLC data collection: 0.5g sample was extracted with methanol using ultrasonic extraction for 20 minutes (portable ultrasonic instrument, power 200W, temperature 35℃). The extract was filtered through a 0.22μm filter and injected into the portable HPLC injection port (injection volume 5μL). The mobile phase gradient was simplified as follows: 0-10 minutes, methanol 30%-60%; 10-20 minutes, 60%-90%; flow rate 0.8mL / min; column temperature 28℃. Four types of active ingredients (phenethyl caffeate, galangin, apigenin, and caffeine) were detected. Traceability information collection: The UUID code on the capsule packaging was scanned with a mobile phone to read the associated place of origin (Yili, Xinjiang), processing company (a biotechnology company), and production date (2024-04-01). Real-time queries were performed on the blockchain to retrieve existing environmental data and processing parameters (extraction temperature 70℃, time 3h, purification times 2).

[0085] Preprocessing: The raw spectrum acquired by the portable spectrometer had a signal-to-noise ratio of 26dB. After applying Savitzky-Golay filtering (9-point window, polynomial order 2), the signal-to-noise ratio was improved to 32dB. The baseline was corrected using the Asymmetric Least Squares algorithm to eliminate baseline drift caused by supermarket lighting. The HPLC data were extracted by fitting Gaussian peaks to obtain the chromatographic peaks of four types of active ingredients, with no overlapping peaks (peak resolution ≥1.8). Both types of data were normalized to the [0,1] interval and synchronized to the Municipal Regulatory Bureau's data center via a 5G network, with a timestamp error ≤4ms.

[0086] (2) Component feature extraction and intelligent identification

[0087] Feature extraction: The spectral feature extraction unit screened 10 characteristic peaks (1120nm, 1380nm, etc.) using PLS-DA and calculated the peak area (1380nm peak area 0.78), half width at half maximum (0.9nm), and peak intensity (0.72); the chromatographic feature extraction unit calculated the retention time (phenethyl caffeate 14.6 min, succinate 16.3 min), peak height (0.6AU), and peak area (0.75) of four types of active ingredients; the initial 60-dimensional feature vector was reduced to 32 dimensions by PCA (cumulative contribution rate 90%).

[0088] Intelligent identification: A CNN-LSTM model is deployed on the municipal regulatory bureau's edge server (NVIDIA Jetson AGX Orin, computing power 200 TOPS) with a 32-dimensional feature vector input. The inference time is 380ms / sample, and the output purity level is "qualified" (total active ingredient content 12.5%) with a confidence level of 0.82. Component similarity is calculated. A i For the characteristics of the sampled specimens (peak area at 1380nm 0.78), B i The standard sample characteristics from the Yili production area of ​​Xinjiang (peak area at 1380nm: 0.80) yielded an S=0.94, which, combined with the confidence level, determined it to be genuine. Another sample of propolis liquid (batch number 20240315) from a pharmacy had a model output confidence level of 0.62 and an S=0.72, indicating it was "adulterated." Regulatory personnel seized this batch of products (50 bottles in total) on-site.

[0089] (3) Traceability Management and Results Output

[0090] Traceability Management: Supervisory personnel checked the blockchain traceability information of the propolis capsules via mobile phone and found that the processing company had not uploaded the number of purification cycles (out of 8 fields, 7 filled, completeness 0.875). The precipitation data (550mm) in the production area data also deviated significantly from the standard data (650mm) for the Yili production area in Xinjiang, with an E = 0.85. The traceability reliability was calculated as C = w1E + w2P = 0.6 × 0.85 + 0.4 × 0.875 = 0.86, classifying it as "moderately reliable." A notification was sent to the processing company, requiring them to supplement the purification cycle record within 24 hours. A third-party testing agency then verified the production area environmental data.

[0091] Results output: Regulatory personnel's mobile terminals display the sampling results (purity qualified, confidence level 0.82, S=0.94, C=0.86), and generate a sampling report (No. JC20240510002); the municipal regulatory bureau's data center provides real-time statistics on the day's sampling (20 batches, 1 batch adulterated, 2 batches reliable), and generates a quality analysis chart of the circulation process (supermarket pass rate 95%, pharmacy pass rate 90%); consumers scan the product UUID code to view simplified traceability information (origin Xinjiang, purity qualified, regulatory sampling qualified), and click "Traceability Verification" to view the summary of the regulatory sampling report.

[0092] 3. Performance Comparison Data

[0093] Table 2: Comparison of the performance of spot checks conducted by market supervision departments in the circulation sector

[0094]

[0095] Table 2 is based on 3 months of operational data from regulatory authorities (a total of 1800 batches sampled). Traditional sampling requires samples to be taken back to the laboratory for testing, and results cannot be obtained on-site. Each batch takes 4 hours to complete, and traceability information requires manual telephone verification of processing enterprises and production area cooperatives, resulting in extremely low efficiency. On-site adulteration identification relies solely on sensory judgment, with an identification rate of only 68%, leading to some adulterated products reaching consumers. This invention uses portable devices to collect data on-site, with edge servers enabling rapid inference, reducing sampling time to 40 minutes and improving efficiency by 83%. Real-time blockchain querying and automatic calculation of traceability credibility improve verification efficiency by 23 times. Multi-source verification and a high-generalization model achieve an on-site adulteration identification rate of 97%, with 17 batches of adulterated products promptly seized. Simplified consumer query functions increase the traceability query rate from 15% to 65%, enhancing consumer trust. The efficiency of recalling problematic products is reduced from 72 hours to 24 hours, decreasing the circulation time of defective products. This embodiment fully demonstrates the system's practicality in on-site supervision scenarios, providing an efficient technical tool for maintaining market order.

[0096] refer to Figure 3 This figure clearly demonstrates the identification advantages of this invention's multi-source data fusion method. Traditional single near-infrared spectroscopy relies on surface spectral information, achieving an accuracy rate of only 45% for adulteration with similar components, and only 58% for starch adulteration due to indistinct spectral features, which is insufficient to meet the requirements for precise identification. This invention combines near-infrared spectroscopy with HPLC, along with feature weight optimization and similarity verification (formula). Even when faced with difficult-to-detect adulteration such as 8% modified resin, the accuracy rate still reaches 92%.

[0097] refer to Figure 4 This diagram highlights the reliability and stability of the blockchain traceability system of this invention. Traditional centralized databases suffer from large fluctuations in credibility due to the unilateral potential for data tampering and information addition, sometimes even falling below the acceptable threshold, thus failing to guarantee the authenticity of traceability. This invention uses multi-node consensus and SHA-256 hash values ​​to solidify data, ensuring that the credibility of each batch remains stable within the range of 0.91-0.95. Furthermore, it employs the traceability credibility formula C = w1E + w2P for real-time evaluation, avoiding fluctuations caused by data tampering and addition. This allows consumers and regulatory authorities to obtain authentic and continuous traceability information, enhancing their trust in the quality of propolis.

[0098] refer to Figure 5 This figure verifies the effectiveness of the online update mechanism of the model in this invention. Traditional fixed models suffer from a gradual decline in accuracy due to their inability to adapt to new scenarios and changes in sample distribution, dropping to 72% after 5 months, failing to meet long-term identification requirements. This invention extracts low-confidence samples weekly, performs incremental training after manual annotation, and maintains a stable monthly accuracy of 94%-96%, with an error of only ±1%. Even when faced with new adulterants and samples from new production areas, it can quickly adapt through updates, avoiding model performance degradation and ensuring long-term high-precision support for propolis identification, reducing the cost and time of manual retraining.

[0099] refer to Figure 6 This diagram illustrates the high efficiency of the system in various scenarios, particularly its suitability for rapid on-site testing. Traditional testing methods, relying on benchtop laboratory equipment and lacking portable tools on-site, result in testing times exceeding 60 minutes in all scenarios, failing to meet the timeliness requirements of production warehousing and market supervision. This invention, through parallel near-infrared and portable HPLC acquisition and rapid inference via an edge server, reduces laboratory testing time to 15 minutes, on-site market sampling to only 40 minutes, and e-commerce sample testing to 18 minutes, significantly improving testing efficiency across various scenarios. It adapts to the timeliness requirements of the entire production, distribution, and regulatory chain, reducing process delays caused by waiting for test results.

[0100] The above are merely preferred embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A smart identification and traceability system for propolis components, characterized in that, Includes the following modules: The data acquisition module includes a near-infrared spectroscopy acquisition unit, a high-performance liquid chromatography (HPLC) acquisition unit, and a traceability information acquisition unit. The near-infrared spectroscopy acquisition unit uses a Fourier transform near-infrared spectrometer, scanning each sample three times to obtain the average spectrum. The HPLC acquisition unit uses an ultraviolet detector and acquires data on five types of active ingredients according to a gradient elution program. The traceability information acquisition unit scans the propolis packaging identification code and collects the information. The data preprocessing unit uses Savitzky-Golay filtering to remove noise from near-infrared spectral data and then performs baseline correction using the Asymmetric LeastSquares algorithm; it uses Gaussian peak fitting algorithm to extract chromatographic peaks from high-performance liquid chromatography data and normalizes the two types of data to the [0,1] interval respectively. The component feature extraction module includes a spectral feature extraction unit and a chromatographic feature extraction unit; The spectral feature extraction unit screened 10 characteristic peaks and calculated peak parameters, including peak area, half-width at half-maximum, and peak intensity. The chromatographic feature extraction unit calculated chromatographic parameters for 5 types of active ingredients, including retention time, peak height, and peak area. The two types of features were spliced ​​together to form a 45-dimensional initial vector, which was formed by splicing together 10 spectral features × 3 parameters + 5 chromatographic features × 3 parameters. Principal component analysis reduced the dimensionality to 32 dimensions. The component feature extraction module also includes a feature weight optimization unit, which uses mutual information entropy to calculate the importance of each feature to the identification result and optimizes the contribution ratio of the feature vector; specifically, it calculates the mutual information value I(X,Y) between each feature and the purity level label, where X is the feature value and Y is the purity level label, sorts them from largest to smallest according to the mutual information value I(X,Y), retains the top 25 high-importance features, and removes 7 low-importance features from the original 32-dimensional features; The intelligent identification module includes a model training unit and a real-time identification unit. The model training unit constructs a "CNN-LSTM" model, trains it with 2000 propolis samples, and combines it with data augmentation. The real-time identification unit takes the 25-dimensional feature vector of the sample to be tested as input and outputs the purity level and confidence level. The traceability management module includes a blockchain storage unit and a distributed database unit. The blockchain storage unit deploys 3 nodes and adopts an authorization proof consensus mechanism to upload propolis information to the blockchain, with each record generating a unique hash value. The distributed database unit stores user query permission data, with different users corresponding to different query ranges. The results output module includes a visualization terminal unit and an alarm unit. The visualization terminal unit displays the content of active ingredients, purity level, confidence level, and traceability link diagram of the propolis sample in real time. When the identification confidence level is <0.8 or the traceability confidence level is <0.9, the alarm unit triggers an audio-visual prompt and pushes abnormal information to the regulatory department's terminal.

2. The intelligent identification and traceability system for propolis components according to claim 1, characterized in that, It also includes a component similarity calculation unit, which is used in the intelligent identification module to quantify the component matching degree between the sample to be tested and the standard sample. The calculation method is as follows: Where S is the component similarity; n is the feature dimension; This is the normalized value of the i-th feature of the sample to be tested; The normalized value of the i-th feature of the standard genuine sample is used for calculation. When S ≥ 0.95, it is judged as "high similarity", corresponding to the purity level of "excellent" or "good"; when 0.8 ≤ S < 0.95, it is judged as "medium similarity", which needs to be verified by high performance liquid chromatography of active ingredient content; when S < 0.8, it is judged as "low similarity", and is directly marked as "adulterated" or "non-genuine".

3. The intelligent identification and traceability system for propolis components according to claim 1, characterized in that, The traceability management module also includes a traceability credibility assessment unit, and the traceability credibility is calculated as follows: Where C represents the credibility of tracing the source; , 1 is the weighting coefficient; E is the origin environment matching degree, which is the cosine similarity between the origin environment data associated with the sample to be tested and the standard environment data of that origin in the database; P is the processing record completeness, which is the ratio of the number of fields filled in the processing record to the total number of fields; based on this assessment, when C≥0.9, it is judged as "highly reliable", and the traceability information is directly displayed to the public; when 0.7≤C<0.9, it is judged as "medium reliable", and the processing enterprise needs to supplement the missing records before it is displayed. When C < 0.7, it is judged as "low confidence", and the sale and circulation of the propolis sample is restricted. At the same time, the regulatory authorities are notified to verify.

4. The intelligent identification and traceability system for propolis components according to claim 1, characterized in that, The intelligent identification module also includes an online model update unit, which extracts identification logs from a distributed database for the past 7 days weekly. After verification and labeling of the true purity level by a professional testing institution, the logs are added to the training set. The professional testing institution uses liquid chromatography-mass spectrometry to confirm the components. An incremental training method is adopted, fixing the CNN layer parameters and only updating the weights of the LSTM layer and output layer, with a learning rate set to 1×10⁻⁶. -4 The learning rate is 1 / 5 of the initial training rate, and the update is completed after 20 iterations. After each update, the accuracy is verified by testing the test set. When the accuracy improves by ≥2%, the new model is saved and the original model is replaced.

5. The intelligent identification and traceability system for propolis components according to claim 1, characterized in that, The data acquisition module also includes a data quality detection unit, which detects the signal-to-noise ratio and spectral smoothness of near-infrared spectral data. When the signal-to-noise ratio is <30dB or the fluctuation is >5%, the spectrometer is controlled to rescan the sample. For high-performance liquid chromatography data, the peak resolution and peak shape symmetry are detected. The peak shape symmetry requires a symmetry factor of 0.8-1.

2. When the resolution is <1.5 or the symmetry factor is out of range, the mobile phase gradient is adjusted and the sample is re-injected. The completeness of required fields for traceability information is checked, and prompts for supplementation when missing fields are found.

6. The intelligent identification and traceability system for propolis components according to claim 1, characterized in that, The result output module also includes a user interaction unit, which supports personalized queries and feedback for users with different roles; consumers can scan the code to view a simplified version of traceability information and identification summary; processing enterprises can view processing records and identification data of each batch through enterprise terminals, generate quality analysis reports; and regulatory authorities can view the identification anomaly statistics of propolis samples across the region through regulatory terminals, initiate special verification, and record the results into the blockchain.

7. A method for an intelligent identification and traceability system for propolis components according to any one of claims 1-6, characterized in that, Includes the following steps: Step 1: Data acquisition and preprocessing. Diffuse reflectance spectra of propolis samples were acquired using a near-infrared spectrometer, with each sample scanned three times and the average spectrum taken. Chromatographic data of five types of active ingredients were acquired using high-performance liquid chromatography. Propolis identification codes were read using a barcode scanner to associate and collect traceability information. Savitzky-Golay filtering and Asymmetric Least Squares baseline correction were applied to the spectral data to normalize both types of data to the [0,1] interval. Step 2: Component feature extraction. Ten spectral feature peaks were selected using partial least squares discriminant analysis, and the peak area, half width at half maximum (FWHM), and peak intensity of each feature peak were calculated. The retention time, peak height, and peak area of ​​five active components in high performance liquid chromatography (HPLC) were calculated. The spectral features and chromatographic features were concatenated to form an initial feature vector. Principal component analysis was used to reduce the dimensionality of the initial feature vector to a 32-dimensional feature vector. The feature weights were then optimized using mutual information entropy, retaining 25 highly important features. Step 3: Intelligent identification. The optimized feature vector is input into the "CNN-LSTM" hybrid model, which outputs the purity level and confidence score; simultaneously, the component similarity S is calculated using the following method: When S≥0.95 and confidence level≥0.8, it is judged as genuine; when S<0.8, it is directly marked as adulterated. Every week, the identification logs of the past 7 days are extracted from the distributed database, and low confidence samples with confidence level<0.7 are screened. After being reviewed and labeled by a professional testing agency, they are added to the training set, and the model parameters are updated by incremental training. Step 4: Traceability Management. Data on the place of origin, processing parameters, and test report numbers are uploaded to the blockchain storage. The blockchain uses an authorization proof consensus mechanism, and each record generates a unique hash value. Calculate the credibility of source tracing: When C≥0.9, a complete traceability chain is displayed; when C<0.7, the circulation of the propolis sample is restricted and the regulatory authorities are notified. Step 5: Results Output. The active ingredient content, purity level, and traceability link of the propolis sample are displayed through a visual terminal. An audible and visual alarm is triggered when the identification confidence level is <0.8 or the traceability confidence level C is <0.

9. Consumers can query a simplified version of the traceability information through mobile devices, processing enterprises can query quality analysis reports through enterprise terminals, and regulatory departments can query identification anomaly statistics and provide feedback on the verification results through regulatory terminals.

8. The method for intelligent identification and traceability system of propolis components according to claim 7, characterized in that, Step three of the "CNN-LSTM" hybrid model training process also includes cross-validation and data augmentation optimization; 5-fold cross-validation is used, dividing the 2000 training samples into 5 groups, taking 4 groups as the training set and 1 group as the validation set each time, and repeating the process 5 times to complete the cross-validation. The accuracy and loss value of each validation are calculated, and the average value is taken as the initial performance of the model; data augmentation adds Gaussian noise and spectral rotation to simulate environmental interference in the actual data acquisition process. After training, the model is validated on a test set containing 100 new samples.

9. The method for intelligent identification and traceability system of propolis components according to claim 7, characterized in that, Step four, the process of uploading and reviewing traceability information, also includes multi-level verification. After the origin testing station uploads environmental data, it needs to submit soil testing reports and meteorological records to the regulatory department. The data is uploaded to the blockchain after the regulatory department approves it. When processing enterprises upload process parameters, they need to link the monitoring video clips of the production workshop. The blockchain node verifies that the video timestamp is consistent with the parameter upload time to confirm the validity of the data. When warehousing enterprises upload temperature and humidity data, they need to upload it once every 24 hours. The fluctuation of temperature and humidity data for three consecutive days is considered valid if it is controlled within ±2℃ and ±5%RH.