A Deep Learning-Based Method and System for Processing Experimental Data of Glass Instruments
By configuring traceability identifiers and generating evaluation parameters for data fields, the reliability problem caused by the lack of transparency in data completion operations was solved, achieving transparency and reliability assessment of experimental results, and improving the transparency of data processing and the credibility of results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANTONG BAIAO GLASS INSTR CO LTD
- Filing Date
- 2026-06-03
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309933A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data processing technology, and more specifically, to a method and system for processing experimental data of glass instruments based on deep learning. Background Technology
[0002] In laboratories of chemistry, biology, and other fields, using deep learning models to process experimental data generated by glassware has become an important means of improving analytical efficiency. The conventional workflow typically involves data input, preprocessing, model inference, and result output. However, in practice, raw data often contains various subtle defects. For example, time-series data may be truncated due to file transfer interruptions, key unit or sample batch numbers may be omitted during manual entry, or empty placeholders may appear due to format incompatibility during system integration. Faced with this seemingly incomplete raw data, existing preprocessing programs often automatically fill in values with default values or interpolate according to patterns to ensure the subsequent model can function correctly. While this approach ensures the integrity of the data structure, it inadvertently alters the original semantics of the data. Deep learning models, upon receiving this repaired but semantically distorted data, still provide a seemingly reasonable processing result. Because the information from the completion operation is not effectively correlated with the data processing result, researchers cannot determine the reliability of the final result; that is, they cannot distinguish whether the result is based on the actual original measurements or on values artificially filled in by the preprocessing program. The lack of transparency in this data processing significantly reduces the credibility of the results, posing a major risk to subsequent scientific research analysis and quality control.
[0003] To address the aforementioned issues, existing technologies urgently need improvement. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this application provides a deep learning-based method and system for processing experimental data of glass instruments. This method and system can solve the technical problem in existing technologies where the lack of transparency in the numerical completion operation during data preprocessing makes it impossible to assess the reliability of the final experimental results.
[0005] Firstly, this application provides a deep learning-based method for processing experimental data from glass instruments, comprising: Access the raw data from glass instrument experiments and configure traceability identifiers for each data field in the raw data; The original data is preprocessed, and when a numerical completion operation occurs in a data field, the completion processing information is recorded in the corresponding traceability identifier; The preprocessed raw data and the corresponding traceability identifier are input into the deep learning data processing model to obtain the experimental processing results output by the deep learning data processing model. Identify multiple key data fields generated from the experimental processing results, and determine the target data field that has undergone numerical completion operation among the multiple key data fields based on the traceability identifiers corresponding to each key data field. Evaluation parameters characterizing the reliability of experimental processing results are generated based on each target data field, and the experimental processing results and evaluation parameters are correlated and output.
[0006] This technical solution establishes a complete identity for each data field from source to result—a traceability identifier—and records repair traces during data completion. Ultimately, by analyzing how much key data has been repaired, a quantitative reliability evaluation parameter is generated. This allows researchers to intuitively understand the credibility of the model's output, solving the problem of unassessable reliability due to opaque data processing.
[0007] Furthermore, the traceability identifier includes the original source of the data field, the access timestamp, the original numerical content, and the original acquisition status. The original acquisition status is used to identify the state of the data field when it is accessed, and the status includes the original complete state, the original missing state, the semantic placeholder state, or the non-explicit truncation state.
[0008] This technical solution refines the content of the traceability identifier, particularly by introducing the dimension of the original acquisition state. This allows for precise differentiation of whether the data was complete, completely absent, or existed as placeholders, truncated, or otherwise at the time of data access. This provides more accurate original contextual information for subsequent completion operations and risk assessments, improving the precision and accuracy of traceability.
[0009] Furthermore, the numerical completion operation includes completing data fields according to preset completion logic, wherein the preset completion logic includes completing by default value, completing by time series at equal intervals, or completing by default conditions of experimental items; The completion processing information includes the preset completion logic used and the timestamp of the numerical completion operation.
[0010] This technical solution clarifies the specific methods for numerical completion and records the completion logic used. Different completion logics have varying degrees of impact on data accuracy; recording this information provides a basis for more accurately assessing the risks associated with different completion operations.
[0011] Furthermore, the steps for generating evaluation parameters characterizing the reliability of experimental processing results based on target data fields include: Calculate the percentage of the target data field among multiple key data fields; The confidence score is obtained by weighting the quantity proportion and the preset weight coefficient, and is used as the evaluation parameter.
[0012] This technical solution provides a specific and operable method for generating evaluation parameters. By calculating the proportion of the completed key data field among all key fields and performing a weighted calculation, a preliminary confidence score can be quickly and intuitively quantified, providing a basic quantitative indicator for reliability assessment.
[0013] Further, after the step of calculating the confidence score by combining the quantity proportion and the preset weight coefficient, and using it as an evaluation parameter, the process includes: The contribution weight of each target data field to the experimental processing results is determined through analysis. The evaluation parameters are adjusted by weighting according to the contribution weight of each target data field.
[0014] This technical solution optimizes the calculation method of confidence scores. Instead of treating all completed fields equally, it introduces the concept of contribution weight, distinguishing which completed fields have a greater impact on the final result. By applying higher weights to the completed fields with greater impact, the evaluation parameters more accurately reflect the actual degree of influence of the completion operation on the result, leading to a more precise assessment.
[0015] Furthermore, the steps for weighted adjustment of the evaluation parameters based on the contribution weights corresponding to each target data field include: Obtain the traceability identifier corresponding to the target data field, and extract the completion processing information from the traceability identifier; Based on the completion processing information, determine the preset completion logic used in the target data field and obtain the risk weight corresponding to the preset completion logic; If the contribution weight exceeds the preset weight threshold and the risk weight exceeds the preset risk threshold, a risk circuit breaker operation is executed, forcibly adjusting the evaluation parameters below the preset warning line and generating a directional warning message; otherwise, the evaluation parameters are weighted and corrected based on the contribution weight and risk weight.
[0016] This technical solution introduces a risk-based circuit breaker mechanism, establishing a reliable safety baseline. When critical data that has a decisive impact on the outcome is supplemented through a high-risk method, the system will directly determine that the result is unreliable and issue a warning. This prevents users from being misled by a seemingly reasonable value derived from high-risk speculation, greatly improving the system's security and rigor.
[0017] Furthermore, after the step of generating evaluation parameters characterizing the reliability of the experimental processing results based on each target data field, the method also includes: Obtain the physical carrier traceability information corresponding to the original data. The physical carrier traceability information includes the heat treatment record and storage time of the glass instrument before this experiment. Based on the physical carrier traceability information and combined with the preset physical constraint model, the physical deformation parameters of the glass instrument at the time of the experiment are determined. The carrier efficiency coefficient is generated based on the physical deformation parameters, and the evaluation parameters are corrected using the carrier efficiency coefficient.
[0018] This technical solution expands the dimensions of reliability assessment from the purely data-driven level to the level of the physical carrier that supports the experiment. Considering that physical deformation of the glass instrument itself due to heat treatment, aging, and other factors can also affect data quality, a carrier efficiency coefficient is introduced to correct the evaluation parameters, making the reliability assessment more comprehensive and multi-dimensional, integrating both data integrity and instrument condition.
[0019] Furthermore, after the step of correcting the evaluation parameters using the carrier efficiency coefficient, the following steps are also included: Compare the data fluctuation characteristics of the original data within the historical experimental period; If the data fluctuation characteristics of the original data match the preset physical defect characteristics of the glass instrument, then a health warning mark is added to the corresponding traceability identifier; otherwise, no health warning mark is added.
[0020] This technical solution establishes an instrument health monitoring mechanism based on historical data. By analyzing long-term trends in data fluctuations, it's possible to infer whether potential physical defects have appeared in the glass instruments. Adding health warning markers to the traceability system enables early warnings of instrument conditions, facilitating preventative maintenance and ensuring data quality at its source.
[0021] Furthermore, after correlating and outputting the experimental processing results and evaluation parameters, it also includes: When the evaluation parameters are lower than the preset threshold, multiple key data fields and corresponding traceability identifiers, including completion processing information and health warning markers, are output for manual review and data traceability.
[0022] This technical solution provides users with a clear and actionable traceability path. When a result is deemed low reliability, the system not only provides a score but also proactively displays a detailed diagnostic report, clearly indicating which data was supplemented and how, and whether there are any instrument health issues. This greatly facilitates manual review, enabling users to quickly pinpoint the root cause of the problem and perform effective data correction or experiment redo.
[0023] Secondly, this application also discloses a deep learning-based experimental data processing system for glass instruments, used to execute the deep learning-based experimental data processing method for glass instruments as described in any of the preceding claims. The system includes: The data acquisition and configuration module is used to access the raw data of glass instrument experiments and configure traceability identifiers for each data field in the raw data; The preprocessing and recording module is used to preprocess the raw data and record the completion processing information in the corresponding traceability identifier when a numerical completion operation occurs in a data field. The model processing module is used to input the preprocessed raw data and the corresponding traceability identifier into the deep learning data processing model and obtain the experimental processing results output by the deep learning data processing model. The field identification module is used to identify multiple key data fields generated from the experimental processing results, and to determine the target data field that has undergone numerical completion operation among the multiple key data fields based on the traceability identifier corresponding to each key data field. The evaluation parameter generation and output module is used to generate evaluation parameters that characterize the reliability of experimental treatment results based on each target data field, and to associate and output the experimental treatment results and evaluation parameters.
[0024] The technical solution provided in this application achieves transparent recording of key operations throughout the data lifecycle by configuring a traceability identifier for each raw data field in glass instrument experiments, accompanying it throughout the entire processing. Specifically, when the preprocessing stage performs numerical completion on missing or incomplete data, information such as the type and timing of this operation is faithfully recorded in the traceability identifier. After the deep learning model outputs experimental results, the system can identify, based on the traceability identifier, which key data contributing to the results originated from the original measurements and which from subsequent completion. Based on this, the system can generate a quantitative evaluation parameter to characterize the reliability of the experimental results. Compared to existing technologies where the data completion process is like a black box, making it impossible for users to judge the credibility of the results, this application establishes a clear path from data completion behavior to result reliability assessment, ensuring that each result output by the model comes with a credibility report. This greatly enhances the transparency of the data processing process and the reliability of the results, providing a more solid basis for subsequent scientific analysis. Attached Figure Description
[0025] Figure 1 This is a flowchart illustrating a deep learning-based experimental data processing method for glass instruments, provided as an embodiment of this application.
[0026] Figure 2 This is a schematic diagram of the structure of a deep learning-based experimental data processing system for glass instruments, provided in an embodiment of this application.
[0027] Labeling Explanation: 210, Data Acquisition and Configuration Module; 220, Preprocessing and Recording Module; 230, Model Processing Module; 240, Field Recognition Module; 250, Evaluation Parameter Generation and Output Module. Detailed Implementation
[0028] The technical solutions of this application will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this application, and not all embodiments. The components of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0029] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, in the description of this application, the terms "first," "second," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0030] In modern chemical analysis or biopharmaceutical laboratories, the level of automation and intelligence in data processing directly impacts R&D efficiency and product quality. A typical scenario involves using high-performance liquid chromatography (HPLC) to determine the content of a specific active ingredient in a pharmaceutical product. Lab technicians process dozens, even hundreds, of samples daily. Each sample generates a chromatogram containing multiple data points, including time, pressure, and detector signals, along with metadata such as sample batch number, technician ID, column type, and mobile phase ratio. This diverse data—including CSV files directly exported from the instrument, sample information retrieved from the laboratory information management system, and manually entered electronic lab records—ultimately needs to be integrated and input into a deep learning model. This model can automatically identify chromatographic peaks, calculate peak areas, and ultimately determine the concentration of the component. However, data integrity is frequently challenged in this process. For example, when a chromatographic data file is transmitted to a central server over a network, a momentary network interruption may cause data loss at the end of the file, resulting in incomplete chromatographic peaks. In the heat of the moment, an experimenter might forget to fill in the crucial experimental condition of the mobile phase ratio. Or, during format conversion between different systems, a field that should be numerical might be incorrectly filled with text placeholders like N / A. Existing data processing workflows typically automatically complete these incomplete data to ensure subsequent model operation. However, this completion process and the strategies employed leave no trace on the final concentration result. This leads to a serious problem: an experimenter receives a concentration report, such as 10.2 mg / L, but has no way of knowing whether this result is based entirely on actual measurement data or on data that has been automatically corrected by the system and may have deviated from reality. This lack of transparency in the processing process makes the reliability of the results an unknown, creating potential risks for subsequent quality assessment and scientific research.
[0031] Regarding this, firstly, see... Figure 1 This application provides a deep learning-based method for processing experimental data of glass instruments, the method comprising: S1. Access the raw data of the glass instrument experiment and configure traceability identifiers for each data field in the raw data; S2. Preprocess the raw data, and when a data field is filled in, record the completion information in the corresponding traceability identifier; S3. Input the preprocessed raw data and the corresponding traceability identifier into the deep learning data processing model to obtain the experimental processing results output by the deep learning data processing model. S4. Identify multiple key data fields generated from the experimental processing results, and determine the target data field that has undergone numerical completion operation among the multiple key data fields based on the traceability identifier corresponding to each key data field. S5. Generate evaluation parameters characterizing the reliability of the experimental processing results based on each target data field, and associate and output the experimental processing results and evaluation parameters.
[0032] The traceability identifier can be understood as a metadata packet or numerical tag attached to each independent data field. It is not the value of the data field itself, but a collection of information recording the origin of that value. This identifier is always bound to the data field throughout the entire processing flow, like an inseparable electronic file.
[0033] The data completion information is a record specifically generated in the traceability identifier when a data field is filled or corrected by the system due to missing, invalid, or other reasons. This record details the specific circumstances of the completion operation, such as the rules on which the completion was based and the time when the completion occurred.
[0034] Key data fields refer to those input data fields that significantly influence the formation of the final result during the inference process of a deep learning model. An experiment's input data may contain hundreds or even thousands of fields, but not all fields are equally important. For example, in chromatographic analysis, the signal values in the chromatographic peak region have a much greater impact on the final concentration calculation than the signal values in the baseline region.
[0035] The target data fields are a special subset of the aforementioned key data fields. Specifically, they refer to data fields that are both identified as key and found to contain supplementary processing information in their traceability identifiers. These fields are the core review objects affecting the reliability of the results.
[0036] The evaluation parameter, generated in this application, is an indicator used to quantify the reliability of the experimental results. It is not the experimental result itself, but rather a credibility score for the experimental result. The value of this parameter directly reflects the extent to which the final result is based on the original, unmodified data.
[0037] The method proposed in this application will be described in detail below, taking a specific application scenario as an example.
[0038] In an experiment that uses a spectrophotometer to measure the absorbance of a solution, the system needs to process data including sample number, wavelength setting, and absorbance values measured at a series of time points.
[0039] The first step is to access the raw data and configure the traceability identifier. When experimental data, such as a spreadsheet file with multiple rows and columns, is accessed by the system, the system iterates through the data fields represented by each cell in the file. For each field, the system creates a structured traceability identifier. For example, for the wavelength setting field with a value of 540nm, the system will generate a traceability identifier, the content of which may be represented in JSON format as: {Original Source: Instrument A Export File Batch 20231101.xlsx, Cell C2, Access Timestamp: 2023-11-01T10:00:05Z, Original Value Content: 540, Original Acquisition Status: Original Complete Status}. For a field that should record the sample dilution factor but was forgotten by the experimenter, its traceability identifier may be configured as: {Original Source: Manual Input Interface, Field DilutionFactor, Access Timestamp: 2023-11-01T10:00:10Z, Original Value Content: null, Original Acquisition Status: Original Missing Status}.
[0040] The second step is to preprocess the raw data and record the completion information when numerical completion occurs. The system's data preprocessing module checks the integrity and standardization of the data. Continuing the previous example, when a missing dilution factor field is detected, the preprocessing module will complete it according to preset rules in order for the subsequent concentration calculation model to run. Assume the rule is that if the dilution factor is missing, it will be completed using the default value of 1. After filling the value of this field with 1, the system will immediately update its corresponding traceability identifier, appending the completion information to it. The updated traceability identifier becomes: {Original Source:..., Original Acquisition Status: Original Missing Status, Completion Information: {Preset Completion Logic Used: Complete with Default Value, Completed Value: 1, Timestamp of Numerical Completion Operation: 2023-11-01T10:05:20Z}}. In this way, although the value of the data field is modified, the modification action and the basis for the modification are permanently and immutably recorded.
[0041] The third step involves feeding the preprocessed data and the source identifier into a deep learning model for further processing. It's important to note that the model calculates the preprocessed data values themselves, while the source identifier, as accompanying information, is passed in parallel with the data but typically doesn't directly participate in the model's mathematical operations. The model, for example, a recurrent neural network used to analyze kinetic curves, processes the input absorbance time series and ultimately outputs an experimental result, such as the rate constant k of a chemical reaction being 0.05 s. -1 .
[0042] The fourth step is to identify key data fields and determine target data fields. After the model outputs results, the system will launch a model interpretability analysis program, such as using techniques like SHAP or LIME. These techniques can analyze and quantify the contribution of each input field to the final result within 0.05 seconds. -1 The system ranks fields by their contribution and defines the top 10% as key data fields. For example, analysis revealed that several absorbance values in the initial stage of the reaction, the reaction temperature, and the sample dilution factor had the greatest impact on the rate constant calculation. The system then checks the traceability identifiers of these key data fields one by one. It finds that the traceability identifiers for the initial absorbance and reaction temperature lacked supplementary processing information, while the traceability identifier for the dilution factor clearly recorded that it had been supplemented using the default value. Therefore, the dilution factor field was identified as a target data field.
[0043] The fifth step is to generate and output evaluation parameters based on the target data fields. The system counts the number of target data fields among all key data fields. Assuming there are 5 key data fields, only the dilution factor is a target data field, the system can calculate an evaluation parameter using a simple preliminary algorithm, such as reliability = (1 - number of target data fields / number of key data fields) * 100%. In this example, the evaluation parameter is (1 - 1 / 5) * 100% = 80%. Ultimately, the result displayed to the user is not simply the reaction rate constant k = 0.05s. -1 It is not a single set of information, but rather a combination of evaluation parameters, such as displaying on a user interface: reaction rate constant k = 0.05 s⁻¹. -1 (Reliability rating: 80%). Users can click on this rating to view further details. The system will highlight that the dilution factor field is based on default values.
[0044] Through the above process, the technical solution of this application provides a transparent and traceable reliability certificate for the output of each deep learning model. Compared with the prior art, where users can only blindly accept a computational result of unknown origin, this application enables users to clearly understand the credibility of the result and quickly locate the source of potential data problems, thereby making a more informed judgment: whether to accept the result, conduct data verification, or even re-experiment.
[0045] Furthermore, the traceability identifier includes the original source of the data field, the access timestamp, the original numerical content, and the original acquisition status. The original acquisition status is used to identify the state of the data field when it is accessed, and the status includes the original complete state, the original missing state, the semantic placeholder state, or the non-explicit truncation state.
[0046] The detailed breakdown of the original acquired state greatly enhances the accuracy of source tracing and the targeted nature of subsequent processing. Specifically: The original, complete state refers to a data field that is a correctly formatted and valid value or string upon access, requiring no processing. For example, a temperature sensor records a value of 25.3.
[0047] The original missing state refers to a situation where the field is completely blank or does not exist in the original data source. For example, in a row of a spreadsheet, the cell where the sample source should be entered is empty.
[0048] Semantic placeholder status is a more subtle case. Fields exist in form, but their content is a placeholder with no real meaning. This is very common when data is exported and integrated from different systems. For example, a database might automatically convert null values to NULL strings during export; or experimenters might habitually use -- or N / A to indicate inapplicable or unmeasured. These fields might pass a simple NOT NULL check, but they are semantically invalid. Explicitly identifying this status prevents preprocessors from misinterpreting it as a valid string.
[0049] Non-explicit truncation is primarily used for time series or ordered data. For example, a standard chromatographic analysis procedure is set to 15 minutes, theoretically there should be 900 data points (one per second). If the system receives a data file and finds that its time series only records up to the 12th minute (720 points) before abruptly stopping, and the file has no explicit end marker, the system can mark the acquisition status of this time series data as non-explicit truncation. This differs from explicit missing data; it implies that the incompleteness of the data is due to an unexpected interruption during transmission or storage, and the existing data remains valid.
[0050] Furthermore, the numerical completion operation includes completing data fields according to preset completion logic, wherein the preset completion logic includes completing by default value, completing by time series at equal intervals, or completing by default conditions of experimental items; the completion processing information includes the preset completion logic used and the timestamp of the numerical completion operation.
[0051] Here are some representative completion logics to demonstrate the universality of the framework: Using default values to complete the data is the simplest and most direct method. For example, in a chemical experiment, if the ambient humidity record is missing, the system can use a fixed default value, such as 45%, to complete the data based on the laboratory's standard operating procedures. While this method allows the process to continue, it may introduce significant errors.
[0052] Interpolation at equal intervals along the time sequence is suitable for ordered data. For example, in a dataset arranged by timestamps, if consecutive 0.5 seconds of data are lost due to signal interference, the system can use methods such as linear interpolation, polynomial interpolation, or spline interpolation to calculate and fill in the missing points based on the data points before and after the lost points, thus maintaining the continuity of the waveform. This approach is more reasonable than using a single default value.
[0053] Completing data based on default conditions for each experimental project is a more intelligent approach that integrates with business logic. For example, a large pharmaceutical company's R&D department might be working on multiple projects simultaneously, each using different solvents and catalysts. When a catalyst field is missing from an experimental dataset, the system can first read the associated experimental project number, then query a pre-defined project condition configuration table to find the corresponding standard catalyst model and use it to complete the data. This method leverages the experimental context, resulting in higher accuracy.
[0054] The completion logic used is explicitly recorded in the completion processing information, providing crucial differentiating information for subsequent reliability assessments. Clearly, context-based inference-based completion introduces less risk than simple, brute-force default value completion.
[0055] Furthermore, the steps for generating evaluation parameters characterizing the reliability of experimental processing results based on target data fields include: The percentage of the target data field among multiple key data fields is counted; the confidence score is obtained by weighting the percentage of the target data field with the preset weight coefficient, and used as the evaluation parameter.
[0056] This provides a basic and quickly implemented method for calculating confidence scores. The core idea is that the more key data points are added, the less reliable the result becomes. The specific calculation process is as follows: First, determine the total number of key data fields N_key, and the number of target data fields N_target.
[0057] Secondly, calculate the proportion of the target data fields R = N_target / N_key.
[0058] Finally, the percentage R is converted into a confidence score S using a function. A simple linear function could be S = (1 - R * W) * 100, where W is a preset weighting coefficient, such as 1.2. This coefficient amplifies the impact of the percentage. If there are no target data fields, R = 0, and S = 100. If all key data fields are complete, R = 1, and S = (1 - 1.2) * 100 = -20, allowing for a lower limit of 0. This method is simple and clear, providing a quick preliminary reliability assessment.
[0059] However, the simple calculation method based on quantity proportion has a significant limitation: it treats the influence of all key data fields equally. In reality, the impact of adding a field that has 90% influence on the result is drastically different from adding a field with only 1% influence on the reliability of the final result. To address this issue, this application further proposes: after obtaining a confidence score through weighted calculation combining quantity proportion and preset weighting coefficients as an evaluation parameter, the method includes: analyzing and determining the contribution weight of each target data field to the experimental processing result; and adjusting the evaluation parameter according to the corresponding contribution weight of each target data field.
[0060] The specific implementation of determining contribution weights can heavily rely on the aforementioned model interpretability techniques. For example, SHAP analysis can not only identify which fields are key fields but also provide a specific SHAP value for each key field. The absolute magnitude of this value directly reflects the extent to which the field contributes to the model output. The absolute SHAP values of all key fields can be normalized to obtain their respective contribution weights. For instance, in the calculation of the rate constant, the analysis shows that the contribution weight of the dilution factor is 0.4, the contribution weight of the reaction temperature is 0.35, and the combined contribution weight of the other absorbance values is 0.25.
[0061] After obtaining the contribution weights, the correction of the evaluation parameters can no longer be based on simple quantity, but on the accumulation of weights. For example, the corrected evaluation parameter can be calculated as follows: S_corrected = (1 - Sum(W_i)) * 100, where Sum(W_i) is the sum of the contribution weights of all target data fields (i.e., the key fields to be completed). In the example above, there is only one target data field, the dilution factor, with a contribution weight of 0.4. Therefore, the corrected evaluation parameter is (1 - 0.4) * 100 = 60 points. This score is much lower than the 80 points calculated based on quantity, because it accurately reflects that what was completed was precisely a very important field. This correction based on contribution weights allows the final evaluation parameter to more realistically and accurately reveal the actual impact of the numerical completion operation on the experimental results.
[0062] While adjusting evaluation parameters based solely on the contribution weight of the completed fields improves accuracy, it still doesn't fully account for the inherent risk differences between different completion methods. For example, data completed using context interpolation is inherently more reliable than data completed using a generic default value. To incorporate the risk level of completion behavior and establish a more robust reliability firewall, this application further proposes: The steps for weighted adjustment of the evaluation parameters based on the contribution weights of each target data field include: Obtain the traceability identifier corresponding to the target data field, and extract the completion processing information from the traceability identifier; Based on the completion processing information, determine the preset completion logic used in the target data field and obtain the risk weight corresponding to the preset completion logic; If the contribution weight exceeds the preset weight threshold and the risk weight exceeds the preset risk threshold, a risk circuit breaker operation is executed, forcibly adjusting the evaluation parameters below the preset warning line and generating a directional warning message; otherwise, the evaluation parameters are weighted and corrected based on the contribution weight and risk weight.
[0063] First, the system pre-configures a mapping table for the risk weights of the completion logic. The risk weight is a value between 0 and 1; a higher value indicates greater uncertainty introduced by the completion method. For example: Complete using default values: Risk weight 0.9 (high risk) Time series completion at equal intervals (linear interpolation): Risk weight 0.4 (medium risk) Complete the experiment using the default conditions: Risk weight 0.2 (low risk) Secondly, the system sets contribution weight thresholds (e.g., 0.3) and risk weight thresholds (e.g., 0.8). These two thresholds define the conditions for triggering a risk circuit breaker.
[0064] Continuing with the rate constant calculation scenario described earlier, the contribution weight of the dilution factor in the target data field is 0.4, and its completion logic is based on the default value, corresponding to a risk weight of 0.9. The system check reveals that the contribution weight of 0.4 exceeds the threshold of 0.3, and the risk weight of 0.9 also exceeds the threshold of 0.8. At this point, the risk circuit breaker mechanism is triggered. The system no longer performs the conventional weighted correction calculation but directly executes the risk circuit breaker operation: forcibly adjusting the evaluation parameter, i.e., the confidence score, to below a preset warning line (e.g., 30 points), for example, setting it directly to 10 points. Simultaneously, the system generates a clear and directional warning message: "Severe Warning: The experimental results are highly dependent on the key parameter dilution factor, which is completed using a high-risk default value, resulting in extremely low reliability!" This mechanism ensures that when a crucial data point is fabricated based on a highly unreliable method, the system can decisively determine the entire result is invalid, preventing users from being misled by a seemingly reasonable but fundamentally weak value.
[0065] If the circuit breaker is not triggered—for example, if the dilution factor is filled using the default conditions of the experimental item (risk weight 0.2) and does not exceed the risk threshold of 0.8—then the system will perform a more refined weighted adjustment. In this case, the calculation of the evaluation parameters will consider both contribution weight and risk weight. For example, a score related to the product of the two will be deducted from the base score of 100: Deduction score = Contribution weight * Risk weight * Penalty coefficient. In this example, the deduction score = 0.4 * 0.2 * 100 = 8 points, and the final evaluation parameter is 92 points.
[0066] So far, the evaluation system mainly focuses on the integrity of the data itself and the processing procedures. However, the quality of experimental data depends not only on data recording but also profoundly on the condition of the physical carriers of the experiment—the glassware itself. A volumetric flask that has undergone multiple high-temperature and high-pressure sterilizations or long-term storage may have experienced slight changes in its nominal volume. This kind of physical deviation cannot be detected by pure data analysis. Therefore, after generating evaluation parameters characterizing the reliability of experimental processing results based on each target data field, the following steps are also included: Obtain the physical carrier traceability information corresponding to the original data. The physical carrier traceability information includes the heat treatment record and storage time of the glass instrument before this experiment. Based on the physical carrier traceability information and combined with the preset physical constraint model, the physical deformation parameters of the glass instrument at the time of the experiment are determined. The carrier efficiency coefficient is generated based on the physical deformation parameters, and the evaluation parameters are corrected using the carrier efficiency coefficient.
[0067] The first step is to obtain the physical carrier traceability information. For each critical glass instrument, such as a 100 mL A volumetric flask used to prepare standard solutions, the system records its unique ID and links it to its history. This history is the physical carrier traceability information, which may include: {Instrument ID: VF-A-100-034, Last Calibration Date: 2023-01-15, Heat Treatment Record: [{Date: 2023-10-20, Method: Autoclaving, Temperature: 121℃, Duration: 20 minutes},...], Total Uses: 152, Storage Duration: 8 months}.
[0068] The second step involves calculating the physical deformation parameters using a physical constraint model. The system incorporates a physical constraint model based on materials science and engineering thermodynamics. This model takes the aforementioned traceability information as input and outputs the physical state deviation of the instrument at the current experimental moment. For example, based on the thermal expansion coefficient of borosilicate glass, the empirical formula for permanent deformation caused by stress release after multiple thermal cycles, and the volume shrinkage effect that may occur during long-term storage, the model calculates that the actual volume of the volumetric flask with ID VF-A-100-034 at the current experimental temperature of 25°C is not 100.00 mL, but rather 100.06 mL. Here, +0.06 mL or +0.06% represents the physical deformation parameter.
[0069] The third step involves generating a carrier effectiveness coefficient and correcting the evaluation parameters. The physical deformation parameter is converted into a carrier effectiveness coefficient, which reflects the impact of instrument condition on data accuracy. A simple conversion method is: Carrier effectiveness coefficient = 1 - |relative deformation|. In this example, the coefficient = 1 - 0.0006 = 0.9994. Finally, this coefficient is used to correct the previously calculated evaluation parameters. For example, if the evaluation parameter calculated based on data integrity was 92 points, after physical carrier correction, the final parameter becomes 92 * 0.9994 ≈ 91.94 points. Although the magnitude of a single correction may be small, it systematically incorporates physical factors such as instrument aging and wear into the reliability assessment framework, making the assessment system more comprehensive and rigorous.
[0070] In addition to known instrument history, instruments may also develop unrecorded, sudden physical defects, such as minute scratches or cracks that are difficult to detect with the naked eye during cleaning or use. These defects can also affect data quality. To address this, this application proposes a mechanism for inferring instrument health status from data: After the step of correcting the evaluation parameters using the carrier efficiency coefficient, the following steps are also included: Compare the data fluctuation characteristics of the original data within the historical experimental period; If the data fluctuation characteristics of the original data match the preset physical defect characteristics of the glass instrument, then a health warning mark is added to the corresponding traceability identifier; otherwise, no health warning mark is added.
[0071] First, the system needs to establish a knowledge base of physical defect data characteristics. This knowledge base is obtained through extensive experiments or simulations, recording typical data anomaly patterns corresponding to different physical defects. For example: Defect: Fine scratches appear on the inner wall of the spectrophotometer cuvette. Characteristic: The baseline noise (standard deviation) of the measurement signal is significantly increased, for example, from the normal 0.0001AU to more than 0.0005AU.
[0072] Defect: The burette stopcock seal deteriorates, resulting in slow leakage. Characteristic: During the plateau phase of the titration curve, the reading is not completely horizontal, but rather exhibits a small, continuously decreasing slope.
[0073] When processing the experimental data, the system extracts the actual data fluctuation characteristics. For example, the system calculates the baseline noise standard deviation of this absorbance measurement to be 0.0006 AU.
[0074] The system then matches this actual feature with preset physical defect features in the knowledge base. It finds that 0.0006AU > 0.0005AU, which matches the characteristics of scratches on the inner wall of the cuvette.
[0075] Once a match is successful, the system will not directly modify the data or evaluation parameters. Instead, it will add a health warning marker to the traceability identifier of all data fields involved in this experiment. For example, the marker content might be: Health Warning Marker: Warning: Abnormal baseline noise in the data, suspected scratches on the inner wall of the cuvette used, which may affect measurement accuracy.
[0076] After a series of complex analyses, calculations, and labelings, the system ultimately obtains an evaluation parameter that integrates multi-dimensional information such as data integrity, processing risks, instrument physical condition, and potential defects, as well as a comprehensive traceability identifier. The final step is to present this information in a user-friendly and practically instructive manner: After linking and outputting the experimental processing results and evaluation parameters, it also includes: When the evaluation parameters are lower than the preset threshold, multiple key data fields and corresponding traceability identifiers, including completion processing information and health warning markers, are output for manual review and data traceability.
[0077] This step is the final step in the entire technical solution, transforming the complex analysis results in the background into decision support information that front-end users can understand and act upon.
[0078] Suppose the system's final evaluation parameter output threshold is set to 70 points. In a certain experiment, because the critical sample concentration field was completed based on the default value (triggering a double penalty of contribution weight and risk weight), and data analysis showed abnormal baseline noise, the final evaluation parameter was calculated to be 58 points.
[0079] Because a score of 58 is below the threshold of 70, the system triggers a special detailed report mode when displaying the experimental results to the user. The user interface may display: Experimental Result: Pass (Evaluation Parameter: 58 points, low confidence, review recommended). When the user clicks to view details, the system will pop up a traceability report. Through this detailed report, the experimenter is no longer faced with a cold, unreliable number, but rather a complete diagnosis. They can immediately understand that this "pass" conclusion is based on a default, pre-filled concentration value and a potentially faulty instrument, making it extremely unreliable. This report directly points to the root cause of the problem, guiding them to verify the original sample information and check the cuvettes used, thereby enabling effective traceability and correction, and preventing incorrect research or production decisions based on erroneous data.
[0080] Secondly, see Figure 2 This application also provides a deep learning-based experimental data processing system for glass instruments. This system is the entity that carries out and executes the aforementioned series of data processing, analysis, and evaluation functions. The system includes: The data acquisition and configuration module 210 is used to access the raw data of glass instrument experiments and configure traceability identifiers for each data field in the raw data; The preprocessing and recording module 220 is used to preprocess the raw data and record the completion processing information in the corresponding traceability identifier when a numerical completion operation occurs in the data field. The model processing module 230 is used to input the preprocessed raw data and the corresponding traceability identifier into the deep learning data processing model and obtain the experimental processing results output by the deep learning data processing model. The field recognition module 240 is used to identify multiple key data fields generated from the experimental processing results, and to determine the target data field that has undergone numerical completion operation among the multiple key data fields based on the traceability identifier corresponding to each key data field. The evaluation parameter generation and output module 250 is used to generate evaluation parameters characterizing the reliability of the experimental treatment results based on each target data field, and to associate and output the experimental treatment results and evaluation parameters. The weighted correction module is used to calculate the confidence score by combining the quantity proportion and preset weight coefficients, which is then used as the evaluation parameter. The contribution weight of each target data field to the experimental processing results is determined through analysis. The evaluation parameters are adjusted by weighting according to the contribution weight of each target data field. After the step of generating evaluation parameters characterizing the reliability of experimental processing results based on each target data field, the following steps are also included: Obtain the physical carrier traceability information corresponding to the original data. The physical carrier traceability information includes the heat treatment record and storage time of the glass instrument before this experiment. Based on the physical carrier traceability information and combined with the preset physical constraint model, the physical deformation parameters of the glass instrument at the time of the experiment are determined. The carrier efficiency coefficient is generated based on the physical deformation parameters, and the evaluation parameters are corrected using the carrier efficiency coefficient.
[0081] This technical solution provides a physical system capable of implementing the aforementioned method, mapping each step in the method to specific functional modules, providing clear architectural support for the practical application of this technology, and demonstrating strong practicality.
[0082] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A method for processing experimental data of glass instruments based on deep learning, characterized in that, The method includes: Access the raw data from glass instrument experiments and configure traceability identifiers for each data field in the raw data; The original data is preprocessed, and when a numerical completion operation occurs in the data field, the completion processing information is recorded in the corresponding traceability identifier; The preprocessed raw data and the corresponding source identification are input into the deep learning data processing model to obtain the experimental processing results output by the deep learning data processing model. Identify multiple key data fields generated from the experimental processing results, and determine the target data field that has undergone the numerical completion operation among the multiple key data fields based on the source identifier corresponding to each key data field; Evaluation parameters characterizing the reliability of the experimental processing results are generated based on each of the target data fields, and the experimental processing results and the evaluation parameters are correlated and output.
2. The method for processing experimental data of glass instruments based on deep learning according to claim 1, characterized in that, The traceability identifier includes the original source of the data field, the access timestamp, the original numerical content, and the original acquisition status. The original acquisition status is used to identify the state of the data field when it is accessed, and the status includes the original complete state, the original missing state, the semantic placeholder state, or the non-explicit truncation state.
3. The method for processing experimental data of glass instruments based on deep learning according to claim 1, characterized in that, The numerical completion operation includes completing the data field according to a preset completion logic, wherein the preset completion logic includes completing according to the default value, completing according to the time series at equal intervals, or completing according to the default conditions of the experimental item. The completion processing information includes the preset completion logic used and the timestamp of the numerical completion operation.
4. The method for processing experimental data of glass instruments based on deep learning according to claim 1, characterized in that, The step of generating evaluation parameters characterizing the reliability of the experimental processing results based on the target data fields includes: Calculate the percentage of the target data field among the multiple key data fields; The confidence score is obtained by weighting the quantity proportion and the preset weight coefficient, and is used as the evaluation parameter.
5. The method for processing experimental data of glass instruments based on deep learning according to claim 4, characterized in that, After the step of calculating the confidence score by combining the quantity proportion and the preset weighting coefficient, and using it as the evaluation parameter, the following steps are included: The contribution weight of each target data field to the experimental processing result is determined through analysis; The evaluation parameters are weighted and adjusted according to the contribution weights corresponding to each of the target data fields.
6. The method for processing experimental data of glass instruments based on deep learning according to claim 5, characterized in that, The step of weighting and correcting the evaluation parameters according to the contribution weights corresponding to each of the target data fields includes: Obtain the traceability identifier corresponding to the target data field, and extract the completion processing information from the traceability identifier; Based on the completion processing information, determine the preset completion logic used by the target data field, and obtain the risk weight corresponding to the preset completion logic; If the contribution weight exceeds a preset weight threshold and the risk weight exceeds a preset risk threshold, a risk circuit breaker operation is executed to forcibly adjust the evaluation parameters below a preset warning line and generate a directional warning message; otherwise, the evaluation parameters are weighted and corrected based on the contribution weight and the risk weight.
7. The method for processing experimental data of glass instruments based on deep learning according to claim 1, characterized in that, After the step of generating evaluation parameters characterizing the reliability of the experimental processing results based on each of the target data fields, the method further includes: Obtain the physical carrier traceability information corresponding to the original data. The physical carrier traceability information includes the heat treatment record and storage duration of the glass instrument before this experiment. Based on the physical carrier traceability information and combined with the preset physical constraint model, the physical deformation parameters of the glass instrument at the time of the experiment are determined. The carrier efficiency coefficient is generated based on the physical deformation parameters, and the evaluation parameters are corrected using the carrier efficiency coefficient.
8. The method for processing experimental data of glass instruments based on deep learning according to claim 7, characterized in that, After the step of correcting the evaluation parameters using the carrier efficiency coefficient, the method further includes: Compare the data fluctuation characteristics of the original data within the historical experimental period; If the data fluctuation characteristics of the original data match the preset physical defect characteristics of the glass instrument, a health warning mark is added to the corresponding traceability identifier; otherwise, the health warning mark is not added.
9. The method for processing experimental data of glass instruments based on deep learning according to claim 8, characterized in that, After associating and outputting the experimental processing results and the evaluation parameters, the method further includes: When the evaluation parameter is lower than the preset threshold, multiple key data fields and the corresponding traceability identifier, including the completion processing information and the health warning mark, are output for manual review and data tracing.
10. A deep learning-based experimental data processing system for glass instruments, used to execute the deep learning-based experimental data processing method for glass instruments as described in any one of claims 1 to 9, characterized in that, The system includes: The data acquisition and configuration module is used to access the raw data of glass instrument experiments and configure traceability identifiers for each data field in the raw data; The preprocessing and recording module is used to preprocess the original data and record the completion processing information in the corresponding traceability identifier when a numerical completion operation occurs in the data field. The model processing module is used to input the preprocessed raw data and the corresponding traceability identifier into the deep learning data processing model, and obtain the experimental processing results output by the deep learning data processing model. The field identification module is used to identify multiple key data fields generated from the experimental processing results, and to determine the target data field that has undergone the numerical completion operation among the multiple key data fields based on the traceability identifier corresponding to each key data field. The evaluation parameter generation and output module is used to generate evaluation parameters characterizing the reliability of the experimental processing results based on each of the target data fields, and to associate and output the experimental processing results and the evaluation parameters.