Method and system based on conditional semantic constraints and unit / quantity gate

By combining knowledge graphs and OWL DL inference engines, a dimensional equivalence graph is constructed and the unit conversion path is optimized, solving the deep reasoning problem of unit and dimension checking in mass spectrometry data processing, and realizing efficient and accurate verification of mass spectrometry data quality control.

CN122242748APending Publication Date: 2026-06-19NATIONAL INSTITUTE OF METROLOGY CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NATIONAL INSTITUTE OF METROLOGY CHINA
Filing Date
2026-03-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In current mass spectrometry data processing, the lack of in-depth reasoning ability in unit and dimension checks leads to high data quality control costs and difficulty in continuous evolution. Static rules cannot effectively handle new instruments or special experimental modes, resulting in insufficient accuracy and dynamism in data quality control.

Method used

The ontology of mass spectrometry is incrementally updated by knowledge graph completion algorithm, a dimensional equivalence graph is constructed and the unit conversion path is optimized. Hierarchical reasoning is performed by combining OWL DL inference engine to generate dynamic numerical range constraints. The numerical distribution is learned by kernel density estimation algorithm and multi-field joint constraints are constructed for verification.

Benefits of technology

It improves the scalability and accuracy of mass spectrometry data verification, reduces the accuracy loss of single-step conversion, enhances the physical consistency verification of mass spectrometry data, and forms an automated metadata verification link.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242748A_ABST
    Figure CN122242748A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system based on conditional semantic constraints and unit / dimensional gates, comprising: acquiring logical metadata fields of multi-source mass spectrometry data; incrementally updating a preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm; expanding a QUDT dimensional library according to the updated fields and update relationships; constructing a dimensional equivalence graph based on derived SHACL constraint shapes; performing hierarchical dimensional reasoning on the dimensional types of the normalized logical metadata fields based on an OWL DL inference engine; learning historical data distribution based on the compatibility results of the reasoning using a kernel density estimation algorithm to generate dynamic numerical range constraints; and verifying the normalized values ​​based on multi-field joint constraints and dynamic numerical range constraints. The method utilizes incremental ontology updates, hierarchical dimensional reasoning, unit conversion, and associated field constraints to improve the accuracy, adaptability, and scalability of mass spectrometry metadata verification, while also exhibiting good interpretability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of mass spectrometry data analysis technology, and in particular to a method and system based on conditional semantic constraints and unit / dimensional gates. Background Technology

[0002] With the widespread application of mass spectrometry technology in life sciences, environmental monitoring, food safety and pharmaceutical research and development, the accuracy, completeness and reliability of mass spectrometry data have become key factors affecting the credibility of analytical results. The metadata generated during mass spectrometry detection covers multi-source heterogeneous data such as instrument parameters, sample information and environmental conditions. These data have significant differences in format specifications, unit expressions and dimensional definitions. Data processing needs to deal with complex and ever-changing experimental scenarios and related logical fields, which continuously increases the complexity of data analysis and transformation. During the transformation process, it is necessary to accurately verify the units, dimensions and specific values ​​of the original data. For multi-source data, the metadata verification mainly relies on predefined fixed constraint rules or manually written verification scripts. Static rules often fail to effectively handle new instruments, special experimental methods, or rare anomalies, leading to high maintenance costs and difficulty in continuous evolution of data quality control. Some studies have attempted to use domain ontology to perform unified semantic modeling of metadata and to achieve automated verification, unit conversion, and consistency checks based on shape constraint languages ​​such as SHACL. However, unit and dimension checks only remain at the level of simple conversion and lack the ability to deeply reason about the hierarchical structure of dimension compatibility, as well as the gate verification control of units, dimensions, and values ​​at different levels. The main numerical range verification also relies on manual experience to set. Therefore, how to improve the accuracy and dynamism of mass spectrometry data quality control has become an urgent problem to be solved. Summary of the Invention

[0003] The purpose of this invention is to provide a method and system based on conditional semantic constraints and unit / dimensional gates.

[0004] To achieve the above objectives, the present invention is implemented according to the following technical solution: The first aspect of this invention provides a method based on conditional semantic constraints and unit / dimensional gates, comprising: Obtain logical metadata fields from multi-source mass spectrometry data, incrementally update the preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm, and expand the QUDT dimensional library according to the updated fields and update relationships; Based on the updated mass spectrometry domain ontology, SHACL constraint shapes are derived, and mass spectrometry-specific QUDT dimensional information corresponding to the logical metadata fields is generated. A dimensional equivalence graph is constructed based on the mass spectrometry-specific QUDT dimensional information. The original unit is extracted from the logical metadata field. The optimal conversion path from the original unit to the desired unit in the SHACL constraint shape is solved by the unit conversion path optimization algorithm based on the dimensional equivalence diagram. The unit in the logical metadata field is then normalized to the desired unit. The dimension type of the normalized logical metadata field is subjected to hierarchical reasoning through the OWL DL inference engine to obtain the compatibility result between the dimension type and the expected dimension type in the SHACL constraint shape; Based on historical mass spectrometry data and compatibility results, the normal numerical distribution of each logical metadata field is learned through the kernel density estimation algorithm, dynamic numerical range constraints are generated, and updated to the SHACL constraint shape. By combining the retention time and peak shape characteristics of the logical metadata fields, a multi-field joint constraint is constructed. Based on the multi-field joint constraint, it is determined whether the normalized value is within the dynamic value range constraint and satisfies the joint constraint. The verification result is obtained, and the SHACL constraint version and reasoning basis are recorded in the knowledge graph.

[0005] Furthermore, the method for obtaining the update field and update relationship includes: Obtain the logical metadata fields of multi-source mass spectrometry data. Based on the logical metadata fields, extract entities and relations according to the preset mass spectrometry domain ontology to obtain a candidate set of triples to be completed. Input the candidate set of triples into the RotatE model. Based on the RotatE model, calculate the completion score of the candidate triples through relation rotation operations in the complex vector space and in combination with the mass spectrometry domain feature weight matrix. Filter out invalid triples with scores lower than the preset threshold to obtain the entities and relations to be added. If a new entity or relation is detected, the new entity is classified into the corresponding concept level of the preset mass spectrometry domain ontology, the new relation is mapped to the object attribute in the ontology, and the axiom constraints and hierarchical structure of the ontology are updated. Based on the updated ontology, a mass spectrometry-specific dimensional subclass is defined on the basis of the mass spectrometry QUDT dimensional library according to the new fields and new relations, and a bidirectional mapping relationship between the new unit and the extended dimension is constructed. If the correlation degree of the bidirectional mapping objects is greater than or equal to 85%, the dimension library expansion is completed.

[0006] Furthermore, the method for obtaining the dimensional equivalence diagram includes: Based on the updated mass spectrometry ontology, extract the conditional constraints of the logical metadata fields corresponding to the classes, generate SHACL NodeShapes for the logical metadata fields according to the conditional constraints, set the target class according to the concept class of the corresponding field in the ontology based on the SHACL NodeShapes, add PropertyShape constraints according to the field attributes, and supplement the corresponding SHACL NodeShapes with mass spectrometry-specific constraints. The conditional constraints include the expected unit, expected dimension type, data type and numerical range, and the mass spectrometry-specific constraints include positive real numbers, non-negative peak intensity and retention time matching experimental intervals. Based on the concentration, mass-to-charge ratio, and relative abundance of the logical metadata fields, the dimensional information is obtained by matching the exclusive dimensional subclass with the mass spectrometry-specific QUDT dimensional library. The dimensional information is then associated with the expected unit in the SHACL constraint. If the dimensional information is inconsistent with the ontology field definition, it is corrected through a two-way mapping relationship between units and dimensions. The dimensional information includes dimensional type, unit symbol, and dimensional exponent. Using the units in the mass spectrometry-specific QUDT dimensional library as nodes, add edges to each pair of units with a transformation relationship, assign compatibility weights to the edges based on the dimensional hierarchy, mark unit pairs with incompatible dimensions, remove invalid edges, and obtain a dimensional equivalence graph. If the dimensional exponents of the transformation paths in the dimensional equivalence graph are inconsistent, then correct the transformation coefficients or reconstruct the corresponding edges. The units include the original units, the expected units, and the intermediate conversion units. The edge attributes include conversion coefficients, conversion formulas, and dimensional compatibility relationships. The dimensional compatibility relationships include direct compatibility and subclass compatibility. The dimensional compatibility degree is inversely proportional to the edge weight.

[0007] Furthermore, the method for obtaining the normalized logical metadata field includes: The original unit identifier is extracted from the metadata attributes of the logical metadata field. Based on the original unit identifier, the corresponding unit is queried through the dimensional equivalence graph to see if it is a valid node. If the corresponding unit does not exist, the unit mapping is corrected through the two-way mapping relationship between the unit and the dimension, and the corrected unit is taken as the valid original unit. Using the effective original unit as the starting point and the desired unit in the SHACL constraint as the ending point, the path search space, path length normalization coefficient, single-step transformation error coefficient, initial value of confidence decay factor, and weight coefficient of the dimensional equivalence graph are initialized based on the starting and ending points. The dimensional equivalence graph is then traversed using a unit transformation path optimization algorithm to calculate the total cost of each feasible path, thereby obtaining the optimal transformation path. The formula for calculating the total cost is as follows: ; in For total cost, is the optimal conversion path from the original unit to the desired unit, and is the path corresponding to the minimum total cost. The path length weight is used for weighting. This is the ratio of the number of hops along the path to the maximum allowed number of hops. For error weights, For path The number of nodes in For the first Step to The conversion error coefficient of the step, For the first The percentage of data confidence maintained after the transformation. As the confidence level weight, For path The overall confidence level; The numerical conversion coefficients and conversion formulas are obtained according to the optimal conversion path. Based on the numerical conversion coefficients and conversion formulas, the original values ​​of the logical metadata fields are converted to units. If the conversion includes concentration and solubility, the experimental temperature and pressure parameters are extracted from the context information of the logical metadata fields. The conversion results are corrected according to the experimental temperature and pressure parameters through the temperature-pressure coupling correction formula to obtain the normalized values ​​and units.

[0008] Furthermore, the method for obtaining the compatibility result includes: The current dimension type is extracted based on the normalized logical metadata field, the desired dimension type is extracted based on the SHACL constraint shape, the updated mass spectrometry domain ontology is loaded into the OWL DL inference engine, and the current dimension type and desired dimension type are aligned to the dimension class nodes in the ontology. Based on the OWL DL inference engine, the hierarchy of dimensional classes is traversed, and the shortest hierarchical distance between the current dimensional type and the desired dimensional type is calculated. If the current dimensional type and the desired dimensional type are of the same class, the shortest hierarchical distance is zero. If the current dimensional type is a direct subclass of the desired dimensional type, the shortest hierarchical distance and subclass affiliation are 1. If they are indirect subclasses, the shortest hierarchical distance increases hierarchically, and the subclass affiliation remains at 1. If there is no inheritance relationship, the shortest hierarchical distance is a preset value, and the subclass affiliation is zero. If the current dimensional type and the desired dimensional type share a direct parent class and have no subclass relationship with each other, the subclass affiliation is set to 0.5. The overall compatibility score is calculated based on the shortest hierarchical path length, attribute matching degree, and subclass affiliation degree. The formula for calculating the overall compatibility score is as follows: ; in Current dimensional type and expected dimension type The overall compatibility score, For the shortest hierarchical distance, Match weights to attributes. For attribute matching degree, Subclass affiliation degree; The compatibility score and reasoning path are recorded in the knowledge graph to obtain the compatibility result.

[0009] Furthermore, the method for obtaining the dynamic numerical range constraint includes: Historical samples matching the dimensional type in the current logical metadata fields are extracted from historical mass spectrometry data. High-compatibility samples are then selected from the historical samples using ROC curves based on the compatibility results. Based on the set of high-compatibility samples, the normal numerical probability density distribution of each logical metadata field is fitted using a kernel density estimation algorithm. The probability density function is calculated using the following formula: ; in This is the numerical probability density function value of the current logical metadata field. The target value to be evaluated. To calculate the overall compatibility score, The set of experimental context conditions for the target value to be evaluated. A highly compatible sample set, For kernel density estimation bandwidth, For the first The values ​​of a highly compatible sample. For kernel function, For the first The context weight of each historical sample is determined based on the current context. With historical sample context The Euclidean distance is obtained; The dynamic numerical range of each logical metadata field is calculated based on the cumulative distribution function of the probability density function. The formula for calculating the dynamic numerical range is as follows: ; in For dynamic numerical range, This represents the lower limit of the dynamic numerical range. This represents the upper limit of the dynamic numerical range. To find the optimal solution operator, The cumulative distribution function is... This represents the lowest effective threshold for the probability density. The dynamic numerical range is used as a numerical constraint. The numerical constraint is updated to the SHACL constraint shape of the corresponding logical metadata field, and the associated information of the constraint update is synchronously recorded in the knowledge graph. The experimental context condition set includes experimental temperature, pressure, instrument model and sample type. The kernel density estimation bandwidth is inversely proportional to the comprehensive compatibility score.

[0010] Furthermore, the method for obtaining the verification result includes: Based on the normalized logical metadata fields, the retention time and peak shape features are extracted. Based on the retention time and peak shape features, the corresponding constraint rules are extracted from the updated SHACL constraint shape to obtain multi-field joint constraints. Based on the multi-field joint constraints and dynamic numerical range constraints, the normalized values ​​are checked to obtain the check link and check results. If the verification result passes, the reason for success is recorded; if the verification result fails, the reason for failure is recorded. Based on the verification link and the logical metadata field of the failure reason, the verification result, SHACL constraint shape version, compatibility result reasoning basis, and verification link are recorded in the knowledge graph.

[0011] A second aspect of the present invention provides a system based on conditional semantic constraints and unit / dimensional gates, comprising: Data acquisition module: used to acquire logical metadata fields of multi-source mass spectrometry data, incrementally update the preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm, and expand the QUDT dimensional library according to the updated fields and update relationships; Dimensional information acquisition module: used to derive SHACL constraint shape based on the updated mass spectrometry domain ontology, generate mass spectrometry-specific QUDT dimensional information corresponding to the logical metadata field, and construct a dimensional equivalence graph based on the mass spectrometry-specific QUDT dimensional information; Desired Unit Conversion Module: This module is used to extract the original unit based on the logical metadata field, solve the optimal conversion path from the original unit to the desired unit in the SHACL constraint shape through a unit conversion path optimization algorithm based on the dimensional equivalence diagram, and normalize the unit in the logical metadata field to the desired unit. Compatibility Result Inference Module: Used to perform hierarchical inference on the dimension type of the normalized logical metadata field using the OWL DL inference engine to obtain the compatibility result between the dimension type and the expected dimension type in the SHACL constraint shape; Numerical distribution constraint module: Based on historical mass spectrometry data and compatibility results, it learns the normal numerical distribution of each logical metadata field through a kernel density estimation algorithm, generates dynamic numerical range constraints, and updates them to the SHACL constraint shape; Conditional semantic constraint verification module: It is used to construct multi-field joint constraints by combining the retention time and peak shape characteristics of the logical metadata fields, determine whether the normalized value is within the dynamic value range constraint and satisfies the joint constraint based on the multi-field joint constraints, obtain the verification result, and record the SHACL constraint version and reasoning basis to the knowledge graph.

[0012] Compared with the prior art, the embodiments of the present invention have at least the following advantages or beneficial effects: This invention incrementally updates the ontology of mass spectrometry using a knowledge graph completion algorithm, identifies new entities and relationships, and expands the mass spectrometry-specific QUDT dimensional library. This improves the scalability of the verification link and the coverage of ontology fields. Based on the RotatE model and complex vector space rotation operations, triplet completion scores are calculated, and entity relationships are predicted by combining the feature weight matrix of the mass spectrometry domain. This improves the adaptability to asymmetric relationships between units, dimensions, and the mass spectrometry ontology, and enhances the accuracy and reliability of ontology completion. By constructing a dimensional equivalence graph and calculating the optimal conversion path, the accuracy loss and path suboptimal problem of single-step conversion are reduced. The OWL DL inference engine is used for hierarchical reasoning of dimensional types to comprehensively evaluate the dimensional compatibility of the fields to be verified. Multi-field joint constraints are used for association verification to enhance the conditional verification of the physical consistency of mass spectrometry data. This forms an automatic verification link for mass spectrometry metadata from ontology update, hierarchical reasoning of dimensions, unit conversion, and related field constraints. Attached Figure Description

[0013] Figure 1 This is a flowchart illustrating the steps of the method based on conditional semantic constraints and unit / dimensional gates in an embodiment of the present invention. Detailed Implementation

[0014] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0015] Reference Figure 1 As shown, the first aspect of the present invention provides a method based on conditional semantic constraints and unit / dimensional gates, comprising: S1. Obtain the logical metadata fields of multi-source mass spectrometry data, and incrementally update the preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm. Expand the QUDT dimensional library according to the updated fields and update relationships. In the actual evaluation, data source A for the multi-source mass spectrometry data was a RAW format file output from a Thermo Q Exactive HF mass spectrometer. The parsed logical metadata fields were: original value 350, original unit μg / L, dimension type Aqueous Microgram PerLiter, retention time 5.18 min, peak shape characteristics 98.2% peak purity, tailing factor 1.03, and signal-to-noise ratio 28. The corresponding experimental context was: temperature 25℃, pressure 1 atm, and instrument model Thermo Q Exactive HF. Data source B was an Agilent 1290-6460 LC-MS / MS output .d format file. The parsed logical metadata fields were: original value 0.32, original unit mg / L, dimension type Aqueous Mass Concentration, retention time 5.25 min, peak shape characteristics 94.7% peak purity, tailing factor 1.18, and signal-to-noise ratio 12. The experimental context was: temperature 26℃, pressure 1 atm, and instrument model Agilent. 1290-6460; Data source C is CSV metadata exported from the laboratory LIMS system, with the original value 0.40, the original unit ppm, the unit type Aqueous PPM, the retention time 5.15 min, the peak characteristics as peak purity 97.9%, tailing factor 1.05, signal-to-noise ratio 25, and the corresponding experimental context as temperature 25℃, pressure 1 atm, and instrument model Shimadzu LCMS-8060; The preset mass spectrometry ontology V2.3 includes core classes such as AqueousMassConcentration (water phase mass concentration). The mass spectrometry QUDT dimensional library V2.1 initially includes units such as mg / L and μg / L, but lacks ppm water phase dimensional mapping. Historical mass spectrometry data includes 120 labeled samples with "normal" and "abnormal" tags. Entities and relationships are extracted from three sets of logical metadata fields to generate candidate triples (Shimadzu LCMS-8060, isInstrumentFor, AqueousMassConcentration) and (ppm, hasDimensionality, M·L). -3 ), (AqueousPPM, subClassOf, AqueousMassConcentration) Input the candidate triples into the RotatE model and calculate the completion score by combining the mass spectrometry domain weight matrix. The weight of the relationship between the instrument and the detection item is 0.8. If the score of triple 1 is 0.92 or higher than the preset threshold of 0.8, it is retained. If the score of triple 2 is 0.88 or higher than the preset threshold of 0.8, it is retained. If the score of triple 3 is 0.85 or higher than the preset threshold of 0.8, it is retained. Shimadzu LCMS-8060 is classified into the MassSpectrometer class. The adaptation relationship between ppm and AqueousMassConcentration is added. The AqueousPPM subclass belonging to AqueousMassConcentration is defined. A bidirectional mapping from ppm to AqueousPPM is constructed with a correlation degree of 92%, which is greater than 85%. The dimensional library is updated to V2.1.1. S2. Derive the SHACL constraint shape based on the updated mass spectrometry domain ontology, and generate mass spectrometry-specific QUDT dimensional information corresponding to the logical metadata field. Construct a dimensional equivalence graph based on the mass spectrometry-specific QUDT dimensional information. In practical evaluation, a SHACL NodeShape is derived from the updated mass spectrometry ontology, where the core constraints are the target class AqueousMassConcentration, the desired unit mg / L, and the desired dimension type M·L. -3 The data type is positive real numbers, peak intensity is non-negative, retention time interval is [4.8, 5.5] min, peak shape constraints are peak purity ≥ 95%, tailing factor [0.9, 1.2], signal-to-noise ratio ≥ 10. Units from the mass spectrometry-specific QUDT dimensional library are used as nodes. Edges are added to each pair of units with a conversion relationship. Based on the dimensional hierarchy, compatibility weights are assigned to the edges, and incompatible unit pairs are marked. Invalid edges are removed to obtain a dimensional equivalence graph, where μg / L, mg / L, and ppm are used as nodes, and the unit pair is μg / L. At mg / L, the corresponding conversion factor is 1000. Dimensionality compatibility is inversely proportional to the edge weight, therefore the edge weight is 0.05, ppm. mg / L, based on toluene density 0.86 g / cm³ 3 Calculating that 1ppm ≈ 0.86mg / L, the corresponding conversion factor is 0.86, and the edge weight is 0.10; S3. Extract the original unit based on the logical metadata field, and solve the optimal conversion path from the original unit to the desired unit in the SHACL constraint shape through the unit conversion path optimization algorithm based on the dimensional equivalence diagram, and normalize the unit of the logical metadata field to the desired unit. In the actual evaluation, based on the current original unit identifier, the dimensional equivalence graph is queried. All corresponding units are valid nodes and require no correction. The valid original units are used as the starting point, and the expected units in the SHACL constraint are used as the ending point. The path search space, path length normalization coefficient, single-step transformation error coefficient, initial value of the confidence decay factor, and weight coefficients of the dimensional equivalence graph are initialized based on the starting and ending points. The dimensional equivalence graph is traversed using a unit transformation path optimization algorithm to calculate the total cost of each feasible path and obtain the optimal transformation path. The initialized path parameters include a path length weight of 0.3, an error weight of 0.5, a confidence weight of 0.2, a single-step error of 0.01, and a data confidence retention ratio of 0.99 after transformation. The unit of data source A changes from μg / L to mg / L. The optimal conversion path has a total cost of 0.51. The unit of data source B changes from mg / L to mg / L, which is a zero-step path with a total cost of 0. The unit of data source C changes from ppm to mg / L, which is a single-step path with a total cost of 0.53. Based on the optimal conversion path, numerical conversion coefficients and conversion formulas are obtained. The original values ​​of the logical metadata fields are then converted using these coefficients and formulas. Specifically, data source A: 350μg / L ÷ 1000 = 0.35mg / L; data source B: 0.32mg / L (no conversion required); data source C: 0.40ppm × 0.86 = 0.344mg / L. The experimental temperature deviates from the standard conditions of 25℃ and 1 atm by ≤1℃, with a correction value ≤0.001mg / L, which is negligible. The normalized values ​​and units are then obtained. S4. The dimension type of the normalized logical metadata field is subjected to hierarchical reasoning through the OWL DL inference engine to obtain the compatibility result between the dimension type and the expected dimension type in the SHACL constraint shape. In the actual evaluation, the current dimension type is extracted based on the normalized logical metadata fields, and the expected dimension type is extracted based on the SHACL constraint shape. The updated mass spectrometry domain ontology is loaded into the OWL DL inference engine, and the current dimension type and expected dimension type are aligned to the dimension class nodes in the ontology. The Hermit inference engine is selected as the OWL DL inference engine. For the aligned dimension types, the current dimension type of data source A is AqueousMicrogramPerLiter, the expected dimension type is AqueousMassConcentration, the shortest hierarchical distance is 1, and the subclass affiliation is 1. The current dimension type of data source B is AqueousMassConcentration, the expected dimension type is AqueousMassConcentration, the shortest hierarchical distance is 0, and the subclass affiliation is 1. The current dimension type of data source C is AqueousPPM, the expected dimension type is AqueousMassConcentration, the shortest hierarchical distance is 1, and the subclass affiliation is 1. The comprehensive compatibility score is calculated based on the shortest hierarchical path length, attribute matching degree, and subclass affiliation. The overall compatibility score of data source A is 0.5×(0.7×1+0.3×1)=0.5, the overall compatibility score of data source B is 1.0, and the overall compatibility score of data source C is 0.5. S5. Based on historical mass spectrometry data and compatibility results, learn the normal numerical distribution of each logical metadata field through the kernel density estimation algorithm, generate dynamic numerical range constraints, and update them to the SHACL constraint shape. In the actual evaluation, ROC curves were plotted based on 120 sets of historical mass spectrometry data, and the optimal threshold was found to be 0.45. A total of 102 historical samples with a comprehensive compatibility score ≥ 0.45 were selected. Based on the high-compatibility sample set, the normal numerical probability density distribution of each logical metadata field was fitted using a kernel density estimation algorithm. One sample had a comprehensive compatibility score of 0.67. Using the standard deviation of 0.18 and interquartile range of 0.25 for the high-compatibility sample values, the initial bandwidth was calculated to be 0.063 using the Silverman rule. Taking an approximation of 0.05 as the initial bandwidth, the kernel density estimation bandwidth was 0.05 × (1 - 0.67) = 0.0165. The context weight of samples whose experimental conditions perfectly match the current context in the high-compatibility samples was 1, and the weight of samples with deviations was 0.7. Therefore, the probability density function was... Integrating the probability density function yields the cumulative distribution function F(x). At 0.35 mg / L, the probability density value is 0.398, which is a typical value within the normal distribution range. The density threshold is 0.005. The calculated dynamic value range is [0.02 mg / L, 1.85 mg / L], which is then updated to the SHACL constraint shape. S6. Construct a multi-field joint constraint by combining the retention time and peak shape characteristics of the logical metadata fields. Determine whether the normalized value is within the dynamic value range constraint and satisfies the joint constraint based on the multi-field joint constraint. Obtain the verification result and record the SHACL constraint version and reasoning basis to the knowledge graph.

[0016] In the actual evaluation, retention time and peak shape features are extracted from the normalized logical metadata fields. Based on the retention time and peak shape features, corresponding constraint rules are extracted from the updated SHACL constraint shape to obtain multi-field joint constraints. The normalized values ​​are then verified based on the multi-field joint constraints and dynamic numerical range constraints to obtain the verification link and verification results. The multi-field joint constraints and dynamic numerical range constraints of data source A are: "(Comprehensive compatibility score 0.5 ≥ 0.45) ∧ (Normalized value 0.35 ∈ [0.02, 1.85]) ∧ (Retention time 5.18 ∈ [4.8, 5.5]) ∧ (Peak purity 98.2% ≥ 95% ∧ Tail factor 1.03 ∈ [0.9, 1.2] ∧ Signal-to-noise ratio 28 ≥ 10)". The units and dimensions of data source A are verified. Data source B's constraints are: "(Comprehensive compatibility score 1 ≥ 0.45) ∧ (Normalized value 0.32 ∈ [0.02, 1.85] ... .02,1.85])∧(Retention time 5.25∈[4.8,5.5])∧(Peak purity "94.7%"≥95%∧Tailing factor 1.18∈[0.9,1.2]∧Signal-to-noise ratio 12≥10)", where the actual peak purity is less than 95%, then the numerical verification of data source B fails, tracing back to the "peak shape feature extraction stage", marked as "insufficient mass spectrometry signal purity", and data source C is "(Comprehensive compatibility score 0.5≥0.4" 5)∧(Normalized value 0.344∈[0.02,1.85])∧(Retention time 5.15∈[4.8,5.5])∧(Peak purity "97.9%"≥95%∧Tail factor 1.05∈[0.9,1.2]∧Signal-to-noise ratio 25≥10)”, then the verification of data source C is passed, and the verification results, SHACL constraint shape version V1.3, compatibility result reasoning basis and verification link are synchronously recorded to the knowledge graph.

[0017] In this embodiment, the method for obtaining the update field and update relationship includes: Obtain the logical metadata fields of multi-source mass spectrometry data. Based on the logical metadata fields, extract entities and relations according to the preset mass spectrometry domain ontology to obtain a candidate set of triples to be completed. Input the candidate set of triples into the RotatE model. Based on the RotatE model, calculate the completion score of the candidate triples through relation rotation operations in the complex vector space and in combination with the mass spectrometry domain feature weight matrix. Filter out invalid triples with scores lower than the preset threshold to obtain the entities and relations to be added. If a new entity or relation is detected, the new entity is classified into the corresponding concept level of the preset mass spectrometry domain ontology, the new relation is mapped to the object attribute in the ontology, and the axiom constraints and hierarchical structure of the ontology are updated. Based on the updated ontology, a mass spectrometry-specific dimensional subclass is defined on the basis of the mass spectrometry QUDT dimensional library according to the new fields and new relations, and a bidirectional mapping relationship between the new unit and the extended dimension is constructed. If the correlation degree of the bidirectional mapping objects is greater than or equal to 85%, the dimension library expansion is completed.

[0018] In this embodiment, the method for obtaining the dimensional equivalence diagram includes: Based on the updated mass spectrometry ontology, extract the conditional constraints of the logical metadata fields corresponding to the classes, generate SHACL NodeShapes for the logical metadata fields according to the conditional constraints, set the target class according to the concept class of the corresponding field in the ontology based on the SHACL NodeShapes, add PropertyShape constraints according to the field attributes, and supplement the corresponding SHACL NodeShapes with mass spectrometry-specific constraints. The conditional constraints include the expected unit, expected dimension type, data type and numerical range, and the mass spectrometry-specific constraints include positive real numbers, non-negative peak intensity and retention time matching experimental intervals. Based on the concentration, mass-to-charge ratio, and relative abundance of the logical metadata fields, the dimensional information is obtained by matching the exclusive dimensional subclass with the mass spectrometry-specific QUDT dimensional library. The dimensional information is then associated with the expected unit in the SHACL constraint. If the dimensional information is inconsistent with the ontology field definition, it is corrected through a two-way mapping relationship between units and dimensions. The dimensional information includes dimensional type, unit symbol, and dimensional exponent. Using the units in the mass spectrometry-specific QUDT dimensional library as nodes, add edges to each pair of units with a transformation relationship, assign compatibility weights to the edges based on the dimensional hierarchy, mark unit pairs with incompatible dimensions, remove invalid edges, and obtain a dimensional equivalence graph. If the dimensional exponents of the transformation paths in the dimensional equivalence graph are inconsistent, then correct the transformation coefficients or reconstruct the corresponding edges. The units include the original units, the expected units, and the intermediate conversion units. The edge attributes include conversion coefficients, conversion formulas, and dimensional compatibility relationships. The dimensional compatibility relationships include direct compatibility and subclass compatibility. The dimensional compatibility degree is inversely proportional to the edge weight.

[0019] In this embodiment, the method for obtaining the normalized logical metadata field includes: The original unit identifier is extracted from the metadata attributes of the logical metadata field. Based on the original unit identifier, the corresponding unit is queried through the dimensional equivalence graph to see if it is a valid node. If the corresponding unit does not exist, the unit mapping is corrected through the two-way mapping relationship between the unit and the dimension, and the corrected unit is taken as the valid original unit. Using the effective original unit as the starting point and the desired unit in the SHACL constraint as the ending point, the path search space, path length normalization coefficient, single-step transformation error coefficient, initial value of confidence decay factor, and weight coefficient of the dimensional equivalence graph are initialized based on the starting and ending points. The dimensional equivalence graph is then traversed using a unit transformation path optimization algorithm to calculate the total cost of each feasible path, thereby obtaining the optimal transformation path. The formula for calculating the total cost is as follows: ; in For total cost, is the optimal conversion path from the original unit to the desired unit, and is the path corresponding to the minimum total cost. The path length weight is used for weighting. This is the ratio of the number of hops along the path to the maximum allowed number of hops. For error weights, For path The number of nodes in For the first Step to The conversion error coefficient of the step, For the first The percentage of data confidence maintained after the transformation. As the confidence level weight, For path The overall confidence level; The numerical conversion coefficients and conversion formulas are obtained according to the optimal conversion path. Based on the numerical conversion coefficients and conversion formulas, the original values ​​of the logical metadata fields are converted to units. If the conversion includes concentration and solubility, the experimental temperature and pressure parameters are extracted from the context information of the logical metadata fields. The conversion results are corrected according to the experimental temperature and pressure parameters through the temperature-pressure coupling correction formula to obtain the normalized values ​​and units.

[0020] In this embodiment, the method for obtaining the compatibility result includes: The current dimension type is extracted based on the normalized logical metadata field, the desired dimension type is extracted based on the SHACL constraint shape, the updated mass spectrometry domain ontology is loaded into the OWL DL inference engine, and the current dimension type and desired dimension type are aligned to the dimension class nodes in the ontology. Based on the OWL DL inference engine, the hierarchy of dimensional classes is traversed, and the shortest hierarchical distance between the current dimensional type and the desired dimensional type is calculated. If the current dimensional type and the desired dimensional type are of the same class, the shortest hierarchical distance is zero. If the current dimensional type is a direct subclass of the desired dimensional type, the shortest hierarchical distance and subclass affiliation are 1. If they are indirect subclasses, the shortest hierarchical distance increases hierarchically, and the subclass affiliation remains at 1. If there is no inheritance relationship, the shortest hierarchical distance is a preset value, and the subclass affiliation is zero. If the current dimensional type and the desired dimensional type share a direct parent class and have no subclass relationship with each other, the subclass affiliation is set to 0.5. The overall compatibility score is calculated based on the shortest hierarchical path length, attribute matching degree, and subclass affiliation degree. The formula for calculating the overall compatibility score is as follows: ; in Current dimensional type and expected dimension type The overall compatibility score, For the shortest hierarchical distance, Match weights to attributes. For attribute matching degree, Subclass affiliation degree; The compatibility score and reasoning path are recorded in the knowledge graph to obtain the compatibility result.

[0021] In this embodiment, the method for obtaining the dynamic numerical range constraint includes: Historical samples matching the dimensional type in the current logical metadata fields are extracted from historical mass spectrometry data. High-compatibility samples are then selected from the historical samples using ROC curves based on the compatibility results. Based on the set of high-compatibility samples, the normal numerical probability density distribution of each logical metadata field is fitted using a kernel density estimation algorithm. The probability density function is calculated using the following formula: ; in This is the numerical probability density function value of the current logical metadata field. The target value to be evaluated. To calculate the overall compatibility score, The set of experimental context conditions for the target value to be evaluated. A highly compatible sample set, For kernel density estimation bandwidth, For the first The values ​​of a highly compatible sample. For kernel function, For the first The context weight of each historical sample is determined based on the current context. With historical sample context The Euclidean distance is obtained; The dynamic numerical range of each logical metadata field is calculated based on the cumulative distribution function of the probability density function. The formula for calculating the dynamic numerical range is as follows: ; in For dynamic numerical range, This represents the lower limit of the dynamic numerical range. This represents the upper limit of the dynamic numerical range. To find the optimal solution operator, The cumulative distribution function is... This represents the lowest effective threshold for the probability density. The dynamic numerical range is used as a numerical constraint. The numerical constraint is updated to the SHACL constraint shape of the corresponding logical metadata field, and the associated information of the constraint update is synchronously recorded in the knowledge graph. The experimental context condition set includes experimental temperature, pressure, instrument model and sample type. The kernel density estimation bandwidth is inversely proportional to the comprehensive compatibility score.

[0022] In this embodiment, the method for obtaining the verification result includes: Based on the normalized logical metadata fields, the retention time and peak shape features are extracted. Based on the retention time and peak shape features, the corresponding constraint rules are extracted from the updated SHACL constraint shape to obtain multi-field joint constraints. Based on the multi-field joint constraints and dynamic numerical range constraints, the normalized values ​​are checked to obtain the check link and check results. If the verification result passes, the reason for success is recorded; if the verification result fails, the reason for failure is recorded. Based on the verification link and the logical metadata field of the failure reason, the verification result, SHACL constraint shape version, compatibility result reasoning basis, and verification link are recorded in the knowledge graph.

[0023] A second aspect of the present invention also provides a system based on conditional semantic constraints and unit / dimensional gates, comprising: Data acquisition module: used to acquire logical metadata fields of multi-source mass spectrometry data, incrementally update the preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm, and expand the QUDT dimensional library according to the updated fields and update relationships; Dimensional information acquisition module: used to derive SHACL constraint shape based on the updated mass spectrometry domain ontology, generate mass spectrometry-specific QUDT dimensional information corresponding to the logical metadata field, and construct a dimensional equivalence graph based on the mass spectrometry-specific QUDT dimensional information; Desired Unit Conversion Module: This module is used to extract the original unit based on the logical metadata field, solve the optimal conversion path from the original unit to the desired unit in the SHACL constraint shape through a unit conversion path optimization algorithm based on the dimensional equivalence diagram, and normalize the unit in the logical metadata field to the desired unit. Compatibility Result Inference Module: Used to perform hierarchical inference on the dimension type of the normalized logical metadata field using the OWL DL inference engine to obtain the compatibility result between the dimension type and the expected dimension type in the SHACL constraint shape; Numerical distribution constraint module: Based on historical mass spectrometry data and compatibility results, it learns the normal numerical distribution of each logical metadata field through a kernel density estimation algorithm, generates dynamic numerical range constraints, and updates them to the SHACL constraint shape; Conditional semantic constraint verification module: It is used to construct multi-field joint constraints by combining the retention time and peak shape characteristics of the logical metadata fields, determine whether the normalized value is within the dynamic value range constraint and satisfies the joint constraint based on the multi-field joint constraints, obtain the verification result, and record the SHACL constraint version and reasoning basis to the knowledge graph.

[0024] The above description is merely an example and illustration of the structure of the present invention. Those skilled in the art can make various modifications or additions to the specific embodiments described, or use similar methods to replace them, as long as they do not deviate from the structure of the invention or exceed the scope defined in the claims, all of which should fall within the protection scope of the present invention.

Claims

1. A method based on conditional semantic constraints and unit / dimensional gates, characterized in that, Includes the following steps: Obtain logical metadata fields from multi-source mass spectrometry data, incrementally update the preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm, and expand the QUDT dimensional library according to the updated fields and update relationships; Based on the updated mass spectrometry domain ontology, SHACL constraint shapes are derived, and mass spectrometry-specific QUDT dimensional information corresponding to the logical metadata fields is generated. A dimensional equivalence graph is constructed based on the mass spectrometry-specific QUDT dimensional information. The original unit is extracted from the logical metadata field. The optimal conversion path from the original unit to the desired unit in the SHACL constraint shape is solved by the unit conversion path optimization algorithm based on the dimensional equivalence diagram. The unit in the logical metadata field is then normalized to the desired unit. The dimension type of the normalized logical metadata field is subjected to hierarchical reasoning through the OWL DL inference engine to obtain the compatibility result between the dimension type and the expected dimension type in the SHACL constraint shape; Based on historical mass spectrometry data and compatibility results, the normal numerical distribution of each logical metadata field is learned through the kernel density estimation algorithm, dynamic numerical range constraints are generated, and updated to the SHACL constraint shape. By combining the retention time and peak shape characteristics of the logical metadata fields, a multi-field joint constraint is constructed. Based on the multi-field joint constraint, it is determined whether the normalized value is within the dynamic value range constraint and satisfies the joint constraint. The verification result is obtained, and the SHACL constraint version and reasoning basis are recorded in the knowledge graph.

2. The method based on conditional semantic constraints and unit / dimensional gates according to claim 1, characterized in that, The methods for obtaining the update field and update relationship include: Obtain the logical metadata fields of multi-source mass spectrometry data. Based on the logical metadata fields, extract entities and relations according to the preset mass spectrometry domain ontology to obtain a candidate set of triples to be completed. Input the candidate set of triples into the RotatE model. Based on the RotatE model, calculate the completion score of the candidate triples through relation rotation operations in the complex vector space and in combination with the mass spectrometry domain feature weight matrix. Filter out invalid triples with scores lower than the preset threshold to obtain the entities and relations to be added. If a new entity or relation is detected, the new entity is classified into the corresponding concept level of the preset mass spectrometry domain ontology, the new relation is mapped to the object attribute in the ontology, and the axiom constraints and hierarchical structure of the ontology are updated. Based on the updated ontology, a mass spectrometry-specific dimensional subclass is defined on the basis of the mass spectrometry QUDT dimensional library according to the new fields and new relations, and a bidirectional mapping relationship between the new unit and the extended dimension is constructed. If the correlation degree of the bidirectional mapping objects is greater than or equal to 85%, the dimension library expansion is completed.

3. The method based on conditional semantic constraints and unit / dimensional gates according to claim 1, characterized in that, The method for obtaining the dimensional equivalence diagram includes: Based on the updated mass spectrometry ontology, extract the conditional constraints of the logical metadata fields corresponding to the classes, generate SHACL NodeShapes for the logical metadata fields according to the conditional constraints, set the target class according to the concept class of the corresponding field in the ontology based on the SHACL NodeShapes, add PropertyShape constraints according to the field attributes, and supplement the corresponding SHACL NodeShapes with mass spectrometry-specific constraints. The conditional constraints include the expected unit, expected dimension type, data type and numerical range, and the mass spectrometry-specific constraints include positive real numbers, non-negative peak intensity and retention time matching experimental intervals. Based on the concentration, mass-to-charge ratio, and relative abundance of the logical metadata fields, the dimensional information is obtained by matching the exclusive dimensional subclass with the mass spectrometry-specific QUDT dimensional library. The dimensional information is then associated with the expected unit in the SHACL constraint. If the dimensional information is inconsistent with the ontology field definition, it is corrected through a two-way mapping relationship between units and dimensions. The dimensional information includes dimensional type, unit symbol, and dimensional exponent. Using the units in the mass spectrometry-specific QUDT dimensional library as nodes, add edges to each pair of units with a transformation relationship, assign compatibility weights to the edges based on the dimensional hierarchy, mark unit pairs with incompatible dimensions, remove invalid edges, and obtain a dimensional equivalence graph. If the dimensional exponents of the transformation paths in the dimensional equivalence graph are inconsistent, then correct the transformation coefficients or reconstruct the corresponding edges. The units include the original units, the expected units, and the intermediate conversion units. The edge attributes include conversion coefficients, conversion formulas, and dimensional compatibility relationships. The dimensional compatibility relationships include direct compatibility and subclass compatibility. The dimensional compatibility degree is inversely proportional to the edge weight.

4. The method based on conditional semantic constraints and unit / dimensional gates according to claim 1, characterized in that, The method for obtaining the normalized logical metadata field includes: The original unit identifier is extracted from the metadata attributes of the logical metadata field. Based on the original unit identifier, the corresponding unit is queried through the dimensional equivalence graph to see if it is a valid node. If the corresponding unit does not exist, the unit mapping is corrected through the two-way mapping relationship between the unit and the dimension, and the corrected unit is taken as the valid original unit. Using the effective original unit as the starting point and the desired unit in the SHACL constraint as the ending point, the path search space, path length normalization coefficient, single-step transformation error coefficient, initial value of confidence decay factor, and weight coefficient of the dimensional equivalence graph are initialized based on the starting and ending points. The dimensional equivalence graph is then traversed using a unit transformation path optimization algorithm to calculate the total cost of each feasible path, thereby obtaining the optimal transformation path. The formula for calculating the total cost is as follows: ; in For total cost, is the optimal conversion path from the original unit to the desired unit, and is the path corresponding to the minimum total cost. The path length weight is used for weighting. This is the ratio of the number of hops along the path to the maximum allowed number of hops. For error weights, For path The number of nodes in For the first Step to The conversion error coefficient of the step, For the first The percentage of data confidence maintained after the transformation. As the confidence level weight, For path The overall confidence level; The numerical conversion coefficients and conversion formulas are obtained according to the optimal conversion path. Based on the numerical conversion coefficients and conversion formulas, the original values ​​of the logical metadata fields are converted to units. If the conversion includes concentration and solubility, the experimental temperature and pressure parameters are extracted from the context information of the logical metadata fields. The conversion results are corrected according to the experimental temperature and pressure parameters through the temperature-pressure coupling correction formula to obtain the normalized values ​​and units.

5. The method based on conditional semantic constraints and unit / dimensional gates according to claim 1, characterized in that, The method for obtaining the compatibility result includes: The current dimension type is extracted based on the normalized logical metadata field, the desired dimension type is extracted based on the SHACL constraint shape, the updated mass spectrometry domain ontology is loaded into the OWL DL inference engine, and the current dimension type and desired dimension type are aligned to the dimension class nodes in the ontology. Based on the OWL DL inference engine, the hierarchy of dimensional classes is traversed, and the shortest hierarchical distance between the current dimensional type and the desired dimensional type is calculated. If the current dimensional type and the desired dimensional type are of the same class, the shortest hierarchical distance is zero. If the current dimensional type is a direct subclass of the desired dimensional type, the shortest hierarchical distance and subclass affiliation are 1. If they are indirect subclasses, the shortest hierarchical distance increases hierarchically, and the subclass affiliation remains at 1. If there is no inheritance relationship, the shortest hierarchical distance is a preset value, and the subclass affiliation is zero. If the current dimensional type and the desired dimensional type share a direct parent class and have no subclass relationship with each other, the subclass affiliation is set to 0.

5. The overall compatibility score is calculated based on the shortest hierarchical path length, attribute matching degree, and subclass affiliation degree. The formula for calculating the overall compatibility score is as follows: ; in Current dimensional type and expected dimension type The overall compatibility score, For the shortest hierarchical distance, Match weights to attributes. For attribute matching degree, Subclass affiliation degree; The compatibility score and reasoning path are recorded in the knowledge graph to obtain the compatibility result.

6. The method based on conditional semantic constraints and unit / dimensional gates according to claim 1, characterized in that, The method for obtaining the dynamic numerical range constraint includes: Historical samples matching the dimensional type in the current logical metadata fields are extracted from historical mass spectrometry data. High-compatibility samples are then selected from the historical samples using ROC curves based on the compatibility results. Based on the set of high-compatibility samples, the normal numerical probability density distribution of each logical metadata field is fitted using a kernel density estimation algorithm. The probability density function is calculated using the following formula: ; in This is the numerical probability density function value of the current logical metadata field. The target value to be evaluated. To calculate the overall compatibility score, The set of experimental context conditions for the target value to be evaluated. A highly compatible sample set, For kernel density estimation bandwidth, For the first The values ​​of a highly compatible sample. For kernel function, For the first The context weight of each historical sample is determined based on the current context. With historical sample context The Euclidean distance is obtained; The dynamic numerical range of each logical metadata field is calculated based on the cumulative distribution function of the probability density function. The formula for calculating the dynamic numerical range is as follows: ; in For dynamic numerical range, This represents the lower limit of the dynamic numerical range. This represents the upper limit of the dynamic numerical range. To find the optimal solution operator, The cumulative distribution function is... This represents the lowest effective threshold for the probability density. The dynamic numerical range is used as a numerical constraint. The numerical constraint is updated to the SHACL constraint shape of the corresponding logical metadata field, and the associated information of the constraint update is synchronously recorded in the knowledge graph. The experimental context condition set includes experimental temperature, pressure, instrument model and sample type. The kernel density estimation bandwidth is inversely proportional to the comprehensive compatibility score.

7. The method based on conditional semantic constraints and unit / dimensional gates according to claim 1, characterized in that, The method for obtaining the verification result includes: Based on the normalized logical metadata fields, the retention time and peak shape features are extracted. Based on the retention time and peak shape features, the corresponding constraint rules are extracted from the updated SHACL constraint shape to obtain multi-field joint constraints. Based on the multi-field joint constraints and dynamic numerical range constraints, the normalized values ​​are checked to obtain the check link and check results. If the verification result passes, the reason for success is recorded; if the verification result fails, the reason for failure is recorded. Based on the verification link and the logical metadata field of the failure reason, the verification result, SHACL constraint shape version, compatibility result reasoning basis, and verification link are recorded in the knowledge graph.

8. A system based on conditional semantic constraints and unit / dimensional gates, for executing the method based on conditional semantic constraints and unit / dimensional gates as described in any one of claims 1 to 7, characterized in that, The system includes: Data acquisition module: used to acquire logical metadata fields of multi-source mass spectrometry data, incrementally update the preset mass spectrometry domain ontology based on the logical metadata fields using a knowledge graph completion algorithm, and expand the QUDT dimensional library according to the updated fields and update relationships; Dimensional information acquisition module: used to derive SHACL constraint shape based on the updated mass spectrometry domain ontology, generate mass spectrometry-specific QUDT dimensional information corresponding to the logical metadata field, and construct a dimensional equivalence graph based on the mass spectrometry-specific QUDT dimensional information; Desired Unit Conversion Module: This module is used to extract the original unit based on the logical metadata field, solve the optimal conversion path from the original unit to the desired unit in the SHACL constraint shape through a unit conversion path optimization algorithm based on the dimensional equivalence diagram, and normalize the unit in the logical metadata field to the desired unit. Compatibility Result Inference Module: Used to perform hierarchical inference on the dimension type of the normalized logical metadata field using the OWL DL inference engine to obtain the compatibility result between the dimension type and the expected dimension type in the SHACL constraint shape; Numerical distribution constraint module: Based on historical mass spectrometry data and compatibility results, it learns the normal numerical distribution of each logical metadata field through a kernel density estimation algorithm, generates dynamic numerical range constraints, and updates them to the SHACL constraint shape; Conditional semantic constraint verification module: It is used to construct multi-field joint constraints by combining the retention time and peak shape characteristics of the logical metadata fields, determine whether the normalized value is within the dynamic value range constraint and satisfies the joint constraint based on the multi-field joint constraints, obtain the verification result, and record the SHACL constraint version and reasoning basis to the knowledge graph.