Nucleic acid quantification admission determination method and system based on sequencing response feature auditing

By using a nucleic acid quantification admission determination method based on sequencing response feature auditing, the systemic failure problem of nucleic acid quantification in complex matrix samples was solved. This method enables proactive identification of initial physical quantification and dynamic range boundary definition, thereby improving the industrial rigor and clinical reliability of nucleic acid quantification results.

CN122201445APending Publication Date: 2026-06-12GUANGDONG MEIGE GENE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGDONG MEIGE GENE TECH CO LTD
Filing Date
2026-03-05
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing nucleic acid quantification methods suffer from several drawbacks when dealing with complex matrix samples. These include initial physical quantification errors, structural independence of quality benchmark-dependent verification dimensions, simultaneous distortion of dual detection channels due to co-source interference from complex matrices, and a lack of automatic detection and circuit breaker mechanisms for logical collapse, leading to the risk of systemic failure.

Method used

A nucleic acid quantification admission determination method based on sequencing response feature auditing is adopted. Through the external nucleic acid internal standard cluster and sequencing response feature auditing mechanism, real-time auditing logic and dynamic range boundary definition, the ratio of the original sequencing yield of the sample to the system baseline yield is calculated as the audit offset vector to achieve active identification and penetration of the initial physical quantification.

🎯Benefits of technology

It significantly improves the industrial rigor and clinical reliability of nucleic acid quantification results, and identifies and intercepts false quantitative data through a full-chain quality control barrier, ensuring the authenticity of the starting point of quantitative analysis and the accuracy of the results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure FT_1
    Figure FT_1
  • Figure FT_2
    Figure FT_2
  • Figure FT_3
    Figure FT_3
Patent Text Reader

Abstract

The application discloses a nucleic acid quantitative access determination method and system based on sequencing response feature auditing, and belongs to the technical field of high-throughput sequencing, biological monitoring and clinical molecular detection. In view of the industry pain points such as initial physical load distortion and invalid external verification means commonly existing in absolute quantitative sequencing of complex matrix, the application proposes a real-time auditing and monitoring system based on internal standard absolute quality feedback and independent of quantitative calculation logic. The system realizes automatic sensing and process fusing of logic collapse samples by constructing an asymmetric decision matrix, and significantly improves the robustness and data reliability of cross-medium absolute quantification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of high-throughput sequencing, biomonitoring, and clinical molecular detection technologies, and in particular to a method and system for determining nucleic acid quantitative access based on sequencing response characteristic auditing. Background Technology

[0002] In the field of absolute metagenomic quantification, the currently accepted standard method is to incorporate artificially synthesized sequences into the sample as internal standards. Within the field of absolute metagenomic quantification, the qmNGS framework proposed by Li et al. constructs a sequencing yield-based method (Y). seq The system-level quantitative model established the industry paradigm of using exogenous multi-gradient internal standards as quantitative benchmarks. Subsequent technologies, represented by Tianhao's patent (CN118389641B), are essentially sample-level fitting models derived from the Li (2021) framework. Their core is to improve computational accuracy at the mathematical level by optimizing the linear regression algorithm for multi-gradient internal standards within a single sample. However, through in-depth research and large-scale engineering applications, the inventors discovered that existing quantitative systems still suffer from the following systemic risks due to "audit deficiencies" when facing complex real-world samples:

[0003] However, existing research and inventions are mostly based on small numbers of clean samples and laboratory validation. Subsequent in-depth research and large-scale engineering applications have revealed that the existing "sample model" fitting logic still faces the risk of systemic failure when dealing with real-world complex matrices (such as soil, sewage, and clinical swabs) due to "missing dimensions" and "missing audit loops," exhibiting the following shortcomings:

[0004] 1. Unable to identify and intercept "initial physical quantitative lies":

[0005] Current technologies assume that DNA concentrations measured by physical devices such as Qubit or spectrophotometers are absolutely accurate. However, non-nucleic acid fluorescent interfering substances or highly degraded DNA free nucleotides are commonly found in complex matrices such as soil, feces, and clinical secretions. These impurities possess the false attribute of being "detectable in a physical dimension but unsequential in a biochemical dimension"—they can significantly boost physical quantification readings but cannot be sequenced due to their lack of complete molecular structure. Existing systems blindly rely on the initial sample loading volume m given by the front-end physical device (such as Qubit). sample In industrialized constant-load library construction processes (such as requiring 300 ng), physical equipment cannot identify "non-biomass loads" such as fluorescent backgrounds of non-nucleic acid impurities. This "denominator lie" means that even if the "sample model" fits a standard curve with R² > 0.99, its calculation basis is incorrect, resulting in misleading data with a field shift.

[0006] 2. The structural loss of independence of verification dimensions due to dependence on quality benchmarks:

[0007] In existing technological systems, deviations in initial physical load are typically addressed through external validation using molecular detection methods such as qPCR. However, this type of validation is computationally dependent on physical quality benchmarks, making it difficult to establish a truly independent audit dimension.

[0008] qPCR results are typically expressed as volume concentration (copies / μL) or unit reaction input volume, while NGS industrial processes use absolute DNA mass (ng) for load control. In practical engineering applications, to achieve comparison and verification between the two, volume concentration needs to be converted to copy number per unit mass.

[0009] Copies / ng = (Copies / μL) ÷ (ng / μL)

[0010] The denominator ng / μL usually comes from physical quantitative instruments such as Qubit or spectrophotometer.

[0011] When there is a systematic bias in the physical quality measurement values, this bias will be passed to the validation results through the conversion formula, making the molecular detection results mathematically dependent on the physical denominator. Therefore, even if the qPCR amplification efficiency is normal, the converted unit mass expression value may still shift synchronously.

[0012] Therefore, within the existing technological framework, there is a structural coupling between molecular verification results and physical mass readings, making it difficult to achieve independent reverse verification of the initial loading's authenticity. The existing system lacks a quantitative auditing path that does not rely on the physical mass denominator.

[0013] 3. Synchronization distortion of dual detection channels caused by co-source interference from complex matrices.

[0014] Complex matrices such as soil, sewage, sludge, and clinical samples often contain free nucleotides, salt ions, protein residues, and other non-nucleic acid background components.

[0015] This type of matrix component may affect both types of detection channels simultaneously through different mechanisms:

[0016] (1) At the physical detection level, some components may participate in the binding of fluorescent dyes or generate background signals, affecting the accuracy of DNA quality readings;

[0017] (2) At the molecular detection level, the above components may inhibit polymerase activity or affect amplification kinetics, thereby reducing the efficiency of qPCR or library preparation reaction.

[0018] When the same type of matrix factor acts on both physical and molecular detection, a "co-source interference" phenomenon may occur, that is, the two types of detection results may shift synchronously or in opposite directions, thereby weakening their cross-validation ability.

[0019] In this situation, even if a dual monitoring path of physical detection and molecular detection is set up, it may still be difficult to identify the true source of load shift through conventional comparison because the sources of interference are common.

[0020] Existing technologies lack independent identification and isolation mechanisms for common-source interference in complex matrices.

[0021] 4. Lack of automatic detection and circuit breaker mechanisms for "logic collapse":

[0022] Existing quantitative systems (such as Tianhao CN118389641B) are still limited to a "passive calculation" mode, lacking proactive quality control decision-making capabilities. When samples exhibit the aforementioned initial physical load distortion or severely insufficient effective biomass (e.g., nominal load of 300 ng, but actual actual DNA of only 15.9 ng), current technologies cannot identify this "collapse of underlying logic." Due to the lack of a real-time monitoring system independent of the calculation process, the system will still forcibly output misleading data containing drastic statistical fluctuations, and will be unable to perform risk triage based on the offset vector fed back by the internal standard. This "decision vacuum" makes it impossible to achieve real-time interception of failed samples in industrial processes, seriously undermining the industrial rigor of absolute quantitative detection and the clinical reliability of results.

[0023] Therefore, there is an urgent need for a nucleic acid quantification admission determination method and system based on sequencing response feature auditing, which has real-time auditing logic, initial load correction and dynamic range boundary definition functions, in order to solve the above-mentioned technical problems. Summary of the Invention

[0024] The purpose of this invention is to provide a nucleic acid quantification admission determination method and system based on sequencing response feature auditing, which has real-time auditing logic, initial load correction and dynamic range boundary definition functions.

[0025] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0026] This invention provides a nucleic acid quantification admission determination method based on sequencing response feature auditing, comprising:

[0027] S1: Sample processing and initial quality calibration. Nucleic acid extraction is performed on test samples of different matrix types. The apparent quality of total nucleic acid in the samples is obtained using physical detection methods. Exogenous nucleic acid internal standard clusters are added in a certain proportion. The exogenous nucleic acid internal standard clusters consist of M artificially synthesized double-stranded DNA sequences, and the mass ratio of internal standard addition is f. ISThe feasible range is 0.01%-1%;

[0028] S2: Library construction and sequencing. The sample DNA and internal standard mixture are used to construct a library. The qualified library is sequenced to obtain raw sequencing data.

[0029] S3: Sequencing data processing and base counting. After quality control and sequence alignment of the raw sequencing data, the total number of effective sequencing bases n is counted. TOT,s The number of bases covered by the internal standard, n IS,s The target gene covers n bases. G,s Calculate the internal standard coverage ratio B IS,s and the coverage ratio of target genes B G,s ;

[0030] S4: Raw sequencing yield calculation and deviation audit index establishment, based on the internal standard coverage data obtained from sequencing and the initial addition ratio, using formula Y seq,s = B IS,s / f IS Calculate the raw sequencing yield of the sample and retrieve the system baseline yield Y. ref Through the formula η= Y seq,s / Y ref Calculate the audit offset vector η;

[0031] S5: Based on the asymmetric audit access determination of η, preset the self-consistency threshold T1 and the circuit breaker threshold T2, and determine the sample into three levels according to the value range of η: high confidence access, risk access and correction guidance, and logic collapse circuit breaker, and execute the corresponding automated decision-making actions.

[0032] Furthermore, in step S1, the initial quality calibration includes: determining the concentration C of the double-stranded DNA in the sample using a fluorescent dye method. _ph The impurity content was assessed by combining the A260 / 280 and A260 / 230 ratios determined by absorbance method, and calculated according to m. E = C×V calculates the initial total physical mass. The instrument is used after being calibrated at least two points with standard materials. The system synchronously records the sample matrix type and the A260 / 230 ratio. If A260 / 230 < 1.0, it is marked as a high-risk sample for impurities and is associated with the subsequent judgment logic in S4 to help interpret the offset of η.

[0033] Furthermore, the double-stranded DNA sequence fragment of the exogenous nucleic acid internal standard cluster in S1 has a length range of 150-1500 bp, a GC content controlled at 40%-60%, and a purity ≥95% after HPLC purification. Moreover, the internal standard sequence has unique identifiability in the genomic background of the sample to be tested.

[0034] Furthermore, the system baseline yield Y in S4 ref Y is obtained by multiple parallel determinations of standard mock DNA samples under ideal experimental conditions without matrix inhibition and physical quantitative interference. seq Arithmetic mean.

[0035] Furthermore, in S5,

[0036] The high-confidence admission determination is that η is in T. 1low ≦η≦T 1high Within a given range, the system determines that the initial physical quantification and biochemical yield are highly consistent, automatically releases the data, and directly outputs the quantification results.

[0037] The risk access and correction guidance determination is T2 < η < T. 1low or T 1high < η < T max The system initiates the payload reconfiguration process, pressing m. ref =m E / η calculates the reference effective DNA load and performs a deep reconstruction calculation based on multidimensional error weights;

[0038] The logic collapse circuit breaker is determined when η < T2 or η > T. max It forcibly intercepts the data output stream, issues a warning command, and provides a suggestion for dilution and retesting.

[0039] Furthermore, in S2, the library construction includes end repair, adapter ligation, PCR amplification, magnetic bead library purification, and library quality control. The quality control requires the library fragment size to be 200-500 bp and the Qubit quantitative library concentration to be ≥1 nM. The sequencing concentration is 1.0 nM, and the minimum sequencing depth is ≥10 M reads.

[0040] Furthermore, the quality control standards for the raw sequencing data in S3 are as follows: remove adapter sequences, low-quality bases with a Q value < 20, and short fragments with a length < 50bp; the mismatch rate of the sequence alignment is ≤ 2, the similarity is ≥ 90%, and uniquely aligned reads are retained for base statistics.

[0041] Furthermore, the samples to be tested include solid samples, fecal samples, liquid samples, and filter membrane enriched samples. Nucleic acid extraction is performed on different matrix samples using corresponding dedicated nucleic acid extraction kits.

[0042] This invention also provides a nucleic acid quantification admission determination system based on sequencing response feature auditing, comprising:

[0043] The sample processing module is used for sample processing and initial quality calibration. It extracts nucleic acids from test samples of different matrix types, obtains the apparent quality of total nucleic acids using physical detection methods, and adds exogenous nucleic acid internal standard clusters in a specified proportion. These exogenous nucleic acid internal standard clusters consist of M artificially synthesized double-stranded DNA sequences, with an addition mass ratio f. IS The feasible range is 0.01%-1%;

[0044] The sequencing module is used for library construction and sequencing. It constructs a library by mixing sample DNA with an internal standard, and sequences qualified libraries to obtain raw sequencing data.

[0045] The data processing module is used for sequencing data processing and base counting. After quality control and sequence alignment of the raw sequencing data, it counts the total number of effective sequencing bases n. TOT,s The number of bases covered by the internal standard, n IS,s The target gene covers n bases. G,s Calculate the internal standard coverage ratio B IS,s and the coverage ratio of target genes B G,s ;

[0046] The audit judgment module is used for raw sequencing yield calculation and deviation audit index establishment. Based on the internal standard coverage data obtained from sequencing and the initial addition ratio, it uses formula Y... seq,s = B IS,s / f IS Calculate the raw sequencing yield of the sample and retrieve the system baseline yield Y. ref Through the formula η= Y seq,s / Y ref Calculate the audit offset vector η;

[0047] The early warning module is used to determine the asymmetric audit access based on the η, preset the self-consistency threshold T1 and the circuit breaker threshold T2, and determine the sample into three levels according to the value range of η: high confidence access, risk access and correction guidance, and logic collapse circuit breaker, and execute the corresponding automated decision-making actions.

[0048] Compared with the prior art, the technical solution disclosed in this invention has the following beneficial effects:

[0049] Compared to existing technologies, this invention provides a method and system with real-time auditing logic, initial load correction, and dynamic range boundary definition. By introducing an auditing mechanism based on exogenous nucleic acid internal standard clusters and sequencing response characteristics, it achieves proactive identification and penetration of the "denominator lie" in initial physical quantification. Specifically, it calculates the original sequencing yield Y of the sample. seq,s Compared with the system baseline yield Y refThe ratio η, used as the audit offset vector, can effectively quantify the risk of "logic collapse" caused by factors such as matrix inhibition and physical quantification errors. This dynamic audit logic based on sequencing response breaks the traditional "logic black box" that relies on physical readings or qPCR, and constructs a full-chain quality control barrier from sample input to data output, significantly improving the industrial rigor and clinical reliability of nucleic acid quantification results. Attached Figure Description

[0050] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0051] Figure 1 This is a schematic diagram of the nucleic acid quantification admission determination method based on sequencing response feature auditing provided in an embodiment of the present invention;

[0052] Figure 2 This is a schematic diagram illustrating the principle of the nucleic acid quantification admission determination method based on sequencing response feature auditing provided in an embodiment of the present invention.

[0053] Figure 3 This is a schematic diagram illustrating the correlation between the percentage of DNA quality added as internal standard and the percentage of matching bases in actual sequencing in Example 1 of the present invention.

[0054] Figure 4 This is a bar chart comparing the theoretical and measured abundance of representative strains in Example 1 of the present invention.

[0055] Figure 5 The diagrams show the differences in quantitative curves between the conventional quantitative method and the method of the present invention in high-interference matrix samples in Embodiment 2 of the present invention, namely, the comparison diagrams before and after C1 correction, the comparison diagrams before and after C4 correction, the comparison diagrams before and after T2 correction, and the comparison diagrams before and after W6 correction.

[0056] Figure 6 This is a schematic diagram showing the comparison between the theoretical and predicted absolute copy numbers of the target gene in different complex matrix samples in Example 3 of the present invention. Detailed Implementation

[0057] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0058] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0059] This invention provides a method and system for nucleic acid quantification admission determination based on sequencing response feature auditing. It features real-time auditing logic, initial load correction, and dynamic range boundary definition. By introducing exogenous nucleic acid internal standard clusters and a sequencing response feature auditing mechanism, it achieves proactive identification and penetration of the "denominator lie" in initial physical quantification. Specifically, it calculates the original sequencing yield Y of the sample... seq,s Compared with the system baseline yield Y ref The ratio η, used as the audit offset vector, can effectively quantify the risk of "logic collapse" caused by factors such as matrix inhibition and physical quantification errors. This dynamic audit logic based on sequencing response breaks the traditional "logic black box" that relies on physical readings or qPCR, and constructs a full-chain quality control barrier from sample input to data output, significantly improving the industrial rigor and clinical reliability of nucleic acid quantification results.

[0060] like Figures 1-2 As shown, this embodiment of the invention provides a nucleic acid quantification admission determination method based on sequencing response feature auditing. By calculating the offset vector between the real-time sequencing yield of the sample and the system baseline yield, a three-level admission determination is performed on the sample to intercept false quantitative data caused by matrix inhibition at the source.

[0061] Construction of an audit benchmarking system

[0062] Internal standard calibrator: The method uses exogenous nucleic acid fragments as calibrators. The calibrator is not limited to a specific nucleic acid sequence, but is characterized by: the sequence having unique identifiability in the genomic background of the sample to be tested, and being able to simulate the biochemical response characteristics of the nucleic acid in the sample to be tested during extraction, library construction, and sequencing.

[0063] Audit baseline parameter library: Preset with a set of system baseline yield parameters Y ref . Yref Characterizes the baseline efficiency by which a unit mass of exogenous nucleic acid marker can be converted into effective sequencing reads or base reads (bp) under preset standard matrix conditions.

[0064] The detailed steps are as follows:

[0065] Step S1: Sample processing and initial quality calibration. This involves extracting nucleic acids from samples of different matrix types, obtaining the apparent quality of total nucleic acids using physical detection methods, and adding exogenous nucleic acid internal standard clusters (consisting of M artificially synthesized double-stranded DNA sequences) at a ratio of f. ISThe feasible range is 0.01%-1%.

[0066] Sample pretreatment: Select a conventional pretreatment method based on the sample matrix type.

[0067] Solid samples (sediments, soil, sludge): Take an equal amount of wet weight sample and use an environmental sample DNA extraction kit (such as QIAGEN DNeasy PowerSoil Kit) for homogenization, lysis, centrifugation, and DNA extraction;

[0068] Fecal samples: Take an equal volume of fecal homogenate and extract it using a fecal DNA extraction kit (such as QIAGEN QIAampPowerFecalProDNAKit) to avoid excessive interference with host DNA;

[0069] Liquid samples (water, serum, tissue fluid, bacterial culture): Take a preset volume of sample and extract DNA by centrifugation or directly using a nucleic acid extraction kit (such as Thermo Fisher MagMAX Viral / Pathogen Nucleic Acid Isolation Kit);

[0070] Filter membrane enrichment of samples (filtered water samples, air samples): After cutting the filter membrane into pieces, the DNA adsorbed on the filter membrane is extracted using a filter membrane DNA extraction kit (such as MobioPowerWaterDNAIsolationKit).

[0071] Initial mass calibration (establishment of physical quantification and expected load capacity)

[0072] This step aims to obtain the apparent quality of total nucleic acids in the sample through physical detection methods, serving as the input basis for subsequent audit judgment. The specific operation is as follows:

[0073] (1) Sample concentration determination: Based on the required level and purity of the extracted nucleic acid, at least one of the following physical methods shall be selected for concentration determination:

[0074] Fluorescent dye method (high specificity detection): Using a Qubit fluorometer or a similar micro-quantitative fluorescence instrument, the concentration of double-stranded DNA (ng / µL) in the extract is determined using a dsDNA-specific binding dye (such as QubitdsDNAHSAssay). This value is defined as C_ph.

[0075] Absorbance method (purity-assisted detection): Using a Nanodrop or UV spectrophotometer, the absorbance values ​​of the sample at 260 nm, 280 nm, and 230 nm are measured. The ratio of A260 / 280 to A260 / 230 is recorded as a physicochemical characteristic parameter for assessing the content of impurities in the sample matrix (such as proteins, phenols, humic acids, etc.).

[0076] (2) Initial total physical mass m E The initial physical quantitative total mass m is calculated based on the measured concentration C and the final volume V of the extract. E :m E =C×V of m E The value is used as the proportion f added to the internal standard in this invention. IS And the key denominator parameters for subsequent absolute abundance conversion.

[0077] (3) Calibration Environment and Calibration: All physical quantitative operations must be performed in a constant temperature environment of 20°C to 25°C. Before use, the instrument should be calibrated at least at two points using a standard substance (such as Lambda DNA Standard) to ensure that the systematic deviation of the physical readings is controlled within ±5%.

[0078] (4) The physicochemical index recording system synchronously records the matrix type (E) and A260 / 230 ratio of the sample. If A260 / 230 < 1.0, the sample is automatically marked as a "high impurity risk sample" in the background of the system and associated with the judgment logic of the subsequent S4 step to help explain possible η offset (such as falsely high physical quantification).

[0079] Internal Standard Preparation and Joining:

[0080] Preparation of internal standard mixture: Select M artificially synthesized double-stranded DNA sequences with fragment lengths ranging from 150 to 1500 bp and GC content controlled at 40% to 60%. Purify by HPLC (purity ≥ 95%), verify the molecular weight consistency with the designed sequence by mass spectrometry, and confirm the sequence accuracy by random sampling and sequencing (internal standard structure identification completed).

[0081] Internal standard addition ratio control: Quantify the extracted sample DNA to the target mass (e.g., 300 ng, which can be adjusted to 100-500 ng according to library construction needs), calculate the internal standard addition amount according to Formula 1, and the internal standard addition mass ratio f. IS The feasible range is 0.01%-1%, with 0.1% being the optimal choice, balancing detection sensitivity and the interference of the internal standard on the target sequence;

[0082] Among them, the internal standard is added in a mass ratio f IS The calculation is obtained from the following formula 1:

[0083] Formula 1: ;

[0084] Among them, f IS Internal Standard (IS) is the mass fraction of DNA added to the mixture of sample and internal standard; m E : The quality of DNA used for library construction after sample extraction and purification; m IS: The mass of exogenous internal standard DNA added to the sample; the internal standard mixture contains M internal standard sequences IS1…IS M This embodiment does not limit the sequence structure, length, GC content, or chemical modification of the internal standard; IS: the exogenous DNA internal standard as a whole, used as a reference standard for calibrating the recovery efficiency of the entire sequencing process, is a mixture of artificially synthesized DNA fragments, and has unique identification in the sample DNA; IS1…IS M M is the exogenous DNA internal standard sequence, where M can be chosen arbitrarily without special restrictions; m E +m IS =m TOT Total DNA mass: The total mass of the sample DNA after mixing with the exogenous internal standard.

[0085] Mixing procedure: Add the calculated amount of internal standard mixture to the sample DNA solution, vortex to mix thoroughly, and ensure that the internal standard and sample DNA are fully mixed.

[0086] Step S2: Library construction and high-throughput sequencing. The mixed DNA is used for library construction and high-throughput sequencing to obtain raw sequencing data. Specifically, library construction and sequencing involve constructing a library from the sample DNA and internal standard mixture, and sequencing the qualified library to obtain raw sequencing data. This includes:

[0087] 1. Library Construction: Use a standard high-throughput sequencing library construction kit (such as KAPA HyperPrep Kit, Illumina TruSeq NanoDNA Library Prep Kit) and follow the kit instructions. The core steps include:

[0088] End repair: performing blunt end repair and adding A tails to the 3' ends of DNA fragments;

[0089] Connector connection: Connect to the sequencing platform adapter (such as Illumina P5 / P7 adapter).

[0090] PCR amplification: Adjust the number of amplification cycles (10-18 cycles) according to the DNA concentration. The amplification system contains conventional components such as primers, DNA polymerase, and dNTPs.

[0091] Library purification: PCR products are purified using magnetic beads (such as AMPureXPBeads) to remove primer dimers and non-specific amplification products;

[0092] Library quality control: Library fragment size (target range 200-500bp) was detected using an Agilent 2100 Bioanalyzer, and library concentration was quantified using Qubit (≥1nM).

[0093] 2. High-throughput sequencing: Dilute the qualified library to the required concentration for sequencing (e.g., 1.0 nM) according to the sequencing platform requirements. Select a conventional high-throughput sequencing platform (e.g., Illumina NovaSeq6000, MiSeq, HiSeq), set the read length (e.g., PE150, PE250) and sequencing depth (adjusted according to the abundance of the target gene, minimum sequencing depth ≥ 10 M reads), and perform sequencing to obtain raw sequencing data (raw reads).

[0094] Step S3: Sequencing data processing and base counting, i.e., after quality control and sequence alignment of the raw sequencing data, the total number of effective sequencing bases n is counted. TOT,s The number of bases covered by the internal standard, n IS,s The target gene covers n bases. G,s Calculate the internal standard coverage ratio B IS,s and the coverage ratio of target genes B G,s ;

[0095] This step aims to obtain the effective base coverage required for the calculation, which can be achieved using conventional bioinformatics software in this field. The specific operation is as follows:

[0096] 1. Data quality control: FastQC software was used to evaluate the quality of raw reads, and Trimmomatic software was used for quality control to remove adapter sequences, low-quality bases (bases with Q value < 20) and short fragments (length < 50 bp) to obtain clean reads;

[0097] 2. Reference sequence construction: Construct a sequence containing all internal standard sequences (ISF1…ISF). M The reference sequence database of the target gene G was converted to FASTA format.

[0098] 3. Sequence alignment: Use software such as BWA-MEM, Bowtie2 or Blast to align cleanreads to the above reference sequence database, set alignment parameters (such as mismatch rate ≤2, similarity 90%, etc.), and retain uniquely aligned reads;

[0099] 4. Base Statistics: The following data for each sample were statistically analyzed using Samtools software:

[0100] Total effective number of sequencing bases n TOT,s The total number of all valid base pairs in cleanreads;

[0101] The number of bases covered by the internal standard n IS,s The sum of effective base pairs compared to the internal standard fragment;

[0102] The target gene covers n bases G,sThe number of effective base pairs covered by the target gene G in the sequencing results of sample s;

[0103] 5. Proportion Calculation: Calculate the internal standard coverage proportion B for each sample according to Formulas 2 and 3. IS,s B G,s .

[0104] Among them, B IS,s The calculation is obtained from the following formula 2:

[0105] Formula 2: ;

[0106] Among them, B IS,s Internal standard coverage ratio: The proportion of internal standard-covered bases to the total effective sequenced bases; a dimensionless parameter.

[0107] n TOT,s : The total number of effective sequencing bases in sample s; refers to the total number of all effective sequencing base pairs contained in the clean reads of sample s, excluding base pairs that did not pass quality control.

[0108] n IS,s : The effective coverage number of all exogenous internal standard ISFs in sample s, in bp; refers to the number of clean reads of sample s aligned to the internal standard fragment set ISF1~ISF1. M After the reference sequence, the sum of the number of effective base pairs successfully aligned to any internal standard fragment; although this method uses "total coverage n of all internal standard fragments" in the calculation. IS,s However, internal standard mixtures can contain multiple fragments, and this method does not require individual modeling of each fragment.

[0109] B G,s The calculation is obtained from the following formula 3:

[0110] Formula 3: ;

[0111] Among them, B G,s : Target gene coverage ratio, which is the proportion of the number of bases covered by the target gene to the total number of effective sequencing bases. It is a dimensionless parameter.

[0112] n G,s The number of effective base pairs covered by the target gene G in the sequencing results of sample s;

[0113] n TOT,s Effective total bases: The total number of effective sequencing base pairs in sample s; refers to the total number of effective sequencing base pairs contained in the clean reads of sample s, excluding base pairs that did not pass quality control.

[0114] Step S4: Calculation of raw sequencing yield and establishment of deviation audit indicators, i.e., based on the internal standard coverage data obtained from sequencing and the initial addition ratio, the deviation is calculated using the formula Y. seq,s =B IS,s / f IS Calculate the raw sequencing yield of the sample and retrieve the system baseline yield Y. ref Through the formula η=Y seq,s / Y ref Calculate the audit offset vector η;

[0115] This step compares the internal standard signal returned from sequencing with the initial physical input to establish a data audit benchmark for assessing sample quality. The specific steps are as follows:

[0116] S4-1: Calculate the real-time system sequencing response parameter Yseq,s. Based on the internal standard coverage data obtained from sequencing and the initial addition ratio, the system calculates the original sequencing yield of sample s using Formula 4:

[0117] Formula 4: Y seq,s =B IS,s / f IS

[0118] The meanings of each parameter are as follows:

[0119] Y seq,s System Sequencing Response (SSR) is a parameter used to characterize the intensity and consistency of the response of a sample s to its added proportion under a given library preparation-sequencing-analysis workflow. This parameter is used for auditing and pathway decision-making and is not equivalent to the true recovery rate or the true absolute content of the target nucleic acid.

[0120] B IS,s Internal standard coverage ratio, i.e., the number of bases covered by the internal standard (n) IS,s ) as a percentage of the total effective sequencing bases (n TOT,s ) proportion.

[0121] f IS The internal standard includes a mass fraction, i.e., the mass of the external internal standard (m). IS ) as a percentage of the total mass of the sample and internal standard mixture (m E +m IS ) proportion.

[0122] S4-2: Establishing the system baseline yield Y ref The system presets or synchronously runs a reference base value Y. ref This value represents the expected standard recovery efficiency of the system under ideal experimental conditions (no matrix inhibition, no physical quantitative interference).

[0123] In a preferred embodiment of the present invention, by performing multiple batches of parallel determinations on standard MockDNA samples, the Y... seq The average value (e.g., 0.42) is used as the zero-point baseline for system auditing.

[0124] S4-3: Calculating the audit offset vector η. The system uses the ratio of real-time yield to baseline yield to derive the offset vector for decision-making using the following formula 5:

[0125] Formula 5: η=Y seq,s / Y ref

[0126] S4-4: Explanation of the physical meaning of the offset vector (for subsequent auditing decisions)

[0127] Matching state (η≈1): Determine the initial physical quantity m E It was authentic, and the biochemical response was normal throughout the entire experimental process.

[0128] Inhibition state (η<1): This indicates the presence of inhibitor interference (weak biochemical response) in the sample, or that the physical quantitative reading C is too high due to impurity interference.

[0129] Abnormal gain state (η>1): Determine the initial physical quantitative reading m of the sample. E The severely understated (physical quantitative lie) causes the denominator in the calculation formula to become invalid, resulting in a logically inflated yield.

[0130] Step S5: Based on the asymmetric audit admission determination of the offset vector η, that is, preset the self-consistency threshold T1 and the circuit breaker threshold T2, and determine the sample into three levels according to the value range of η: high confidence admission, risk admission and correction guidance, and logic collapse circuit breaker, and execute the corresponding automated decision-making actions.

[0131] This step aims to automatically determine the authenticity of sample data based on the offset vector η calculated in step S4, using a preset dynamic threshold matrix. The system presets two core audit thresholds, T1 (self-consistency threshold) and T2 (circuit breaker threshold), thereby defining three independent decision paths:

[0132] The specific judgment logic is as follows:

[0133] S5-1: High-confidence admission determination (Path A: Green clearance zone)

[0134] When the offset vector η is in the neighborhood of the unit value 1, that is, when T 1low ≦η≦T 1high When the time comes, it is determined to be path A.

[0135] Technical diagnosis: The initial physical quantification and biochemical yield are highly consistent.

[0136] Recommended value: In a preferred embodiment for complex environmental matrices (such as sewage, soil), T 1low can take the values 0.75, T 1high takes the value 1.25 (i.e., the interval is 0.75 - 1.25, with a fluctuation interval of plus or minus 25%).

[0137] Decision action: Automatically release and directly output the result.

[0138] The system determines that the sample is in a high-confidence access state. The initial physical quantitative mass m E is determined to be true and reliable, and the experimental biochemical response highly matches the expectation. The system directly calls the original yield Y seq,s for calculation.

[0139] S5-2: Risk access and correction guidance determination (Path B: Audit decision interval)

[0140] When the offset vector η is outside the above interval but within the correctable boundary, i.e., satisfying T2 < η < T 1low or T 1high < T max , it is determined as Path B.

[0141] Technical diagnosis: Identified as a risk of deviation in physical quantification or moderate interference in biochemical yield, resulting in the distortion of the "nominal load denominator".

[0142] Recommended value: In the above preferred embodiment, T2 takes the value 0.10, T max takes the value 5.0 (i.e., the low-yield correction area is 0.10 - 0.75, and the high-yield correction area is 1.25 - 5.0).

[0143] Decision action: The system starts the effective load reconstruction program and generates a preliminary correction value (i.e., the reference effective load m ref ), and its calculation method is as follows:

[0144] Reference effective DNA load m ref : m ref = m E / η,

[0145] Subsequently, the system guides m ref and the accompanying audit status bit into the subsequent precision reduction module to perform depth reduction calculation based on multi-dimensional error weights.

[0146] S5-3: Logical collapse fuse determination (Path C: Safety fuse interval)

[0147] When the offset vector satisfies η < T2 or η > T max , it is determined as Path C.

[0148] Technical diagnosis: If there is a logical collapse between physical readings and biochemical yield, and the deviation between the two exceeds the order of magnitude (such as an extreme mismatch between 15.9 ng and 300 ng), it is determined that the physical readings have completely failed or that an uncontrollable mutation has occurred in the experiment.

[0149] Recommended value: In the preferred embodiment above, it is triggered when η<0.10 or η>5.0.

[0150] Decision action: System-level circuit breaker. The system forcibly intercepts the data output stream of this sample, preventing result generation and issuing a "logic collapse" or "load lie" warning to the audit terminal.

[0151] Judgment C1 (η>5.0): Identified as a "physical quantitative lie", initial mass m E The quantitative logic collapsed due to severely low interference levels.

[0152] C2 (η<0.1): Identified as "biochemical response collapse," indicating that the inhibitor caused a loss of signal, and the data is unreliable. The system terminates the report output for such samples and provides dilution and retesting suggestions based on the η value.

[0153] S5-4: Threshold Applicability Explanation The audit threshold ranges (0.1, 0.7, 1.25, 5.0) of this invention are values ​​taken in a set of preferred embodiments. Those skilled in the art should understand that, depending on the sensitivity differences of sequencing platforms (such as Illumina, MGI, Oxford Nanopore, etc.) and the different sample matrix types (such as samples from special extreme environments), the above-mentioned threshold ranges can be adjusted without departing from the asymmetric audit logic of this invention. Such numerical fine-tuning based on the logic of this invention still falls within the protection scope of this invention.

[0154] This invention provides a quantitative sequencing audit admission method based on sequencing yield characteristic shift. By establishing a "real-time audit monitoring system" independent of the computational process, it has the following significant advantages and advancements compared to existing technologies (including but not limited to all methods of quantitative sequencing using internal standards, such as multi-gradient internal standard linear regression and single-point internal standard ratio conversion):

[0155] This fills the technical gap in the existing internal standard system regarding the inability to audit the "authenticity of initial sample addition".

[0156] Limitations of existing technology: All existing internal standard quantification methods have a "logical presupposition," that is, they assume that the quality m of the nucleic acid entering the experimental system is... E It is accurate. Current technology focuses on calibrating proportional biases during amplification and sequencing, but it completely fails to defend against false initial physical readings (physical quantitative lies) caused by matrix effects.

[0157] Advantages of this invention: This invention introduces a real-time sequencing yield Y through step S4. seq,s Along with the offset vector η, the biochemical signal returned from the sequencing end is used as the "sole benchmark" for measuring the authenticity of the physical readings.

[0158] Technical Gains: This invention does not blindly follow physical quantitative readings. When η deviates significantly, the system can keenly detect the discrepancy between the physical reading and the actual load. Compared to traditional methods that perform futile linear calibration on "incorrect denominators," this invention can identify and intercept logically collapsed samples at the source (as in Example 2), ensuring that the starting point of quantitative analysis is the true value.

[0159] Solved the problem of representative assessment of effective biomass in the industrial process of "constant biomass construction".

[0160] Limitations of existing technology: In existing standardized NGS workflows, a constant loading rate (e.g., 300 ng) is typically required for the library preparation reaction. Current methods cannot identify the ratio of real DNA to non-nucleic acid impurities within this "300 ng". If the impurity content is high, the real DNA template may be extremely scarce, leading to a dilution of sequencing depth by the impurity background. If the real loading rate is low, it cannot cover sufficient biological representativeness and results in an underestimation of concentration based on DNA loading.

[0161] Advantages of this invention: This invention utilizes the yield offset vector η to achieve real-time quantitative auditing of effective biomass loading.

[0162] Technical Gain: Derived from technical point 3.3-S5-3. The system can automatically identify samples that nominally meet the criteria but have a severely insufficient actual effective template quantity (e.g., only 15.9 ng). Existing technologies, even with the most complex internal standard regression models, cannot conceal the fact that the samples themselves have lost their representativeness. This invention, however, effectively intercepts such highly volatile and unreliable junk data by forcibly circumventing the circuit breaker through path C.

[0163] An asymmetric access determination decision logic was established, balancing the flexibility of modification with the rigidity of auditing.

[0164] Limitations of existing technology: Traditional internal standard methods are mostly "calculation formulas" rather than "audit strategies". When faced with abnormal data, they lack automated processing logic, which often leads to the incorrect release of biased data or the blind rejection of valid samples.

[0165] Advantages of this invention: This invention proposes a three-level decision matrix for path A / B / C, which decouples "auditing" from "computation".

[0166] Technical Gains: Through the risk compensation mechanism of path B, this invention can salvage samples that are affected by biochemical inhibition but still have analytical value (such as the K1 swab sample in Example 4), and restore their true abundance through algorithmic weighting. This asymmetric admission judgment logic, while ensuring audit rigidity (breaking down severe deviations), provides correction flexibility (saving mild deviations), significantly improving the output rate of complex samples in actual industrial testing.

[0167] It reduces the reliance on the complexity of internal standard sequence design and has higher biochemical robustness.

[0168] Limitations of existing technologies: In order to pursue linearity, some existing methods (such as multigradient regression) require the introduction of a large number of high-concentration internal standard sequences into the sample. This not only consumes sequencing depth, but also easily causes primer competition or non-specific amplification in extremely complex samples.

[0169] Advantages of this invention: This invention adopts a macro audit logic based on the yield offset of the entire process, which does not require fine gradient dependency modeling of internal standard segments.

[0170] Technical gains: The performance evaluation of the entire sequencing system can be completed with a single offset vector η, which simplifies the computational power requirements for bioinformatics analysis and reduces the risk of cross-interference between internal standard sequences and complex matrix species, making it more applicable in scenarios with extremely high requirements for timeliness and stability, such as rapid clinical diagnosis.

[0171] Based on the same idea, this invention also provides a nucleic acid quantification admission determination system based on sequencing response feature auditing, comprising:

[0172] The sample processing module is used for sample processing and initial quality calibration. It extracts nucleic acids from test samples of different matrix types, obtains the apparent quality of total nucleic acids using physical detection methods, and adds exogenous nucleic acid internal standard clusters in a specified proportion. These exogenous nucleic acid internal standard clusters consist of M artificially synthesized double-stranded DNA sequences, with an addition mass ratio f. IS The feasible range is 0.01%-1%;

[0173] The sequencing module is used for library construction and sequencing. It constructs a library by mixing sample DNA with an internal standard, and sequences qualified libraries to obtain raw sequencing data.

[0174] The data processing module is used for sequencing data processing and base counting. After quality control and sequence alignment of the raw sequencing data, it counts the total number of effective sequencing bases n. TOT,s The number of bases covered by the internal standard, n IS,s The target gene covers n bases. G,s Calculate the internal standard coverage ratio B IS,sand the coverage ratio of target genes B G,s ;

[0175] The audit judgment module is used for raw sequencing yield calculation and deviation audit index establishment. Based on the internal standard coverage data obtained from sequencing and the initial addition ratio, it uses formula Y... seq,s = B IS,s / f IS Calculate the raw sequencing yield of the sample and retrieve the system baseline yield Y. ref Through the formula η= Y seq,s / Y ref Calculate the audit offset vector η;

[0176] The early warning module is used to determine the asymmetric audit access based on the η, preset the self-consistency threshold T1 and the circuit breaker threshold T2, and determine the sample into three levels according to the value range of η: high confidence access, risk access and correction guidance, and logic collapse circuit breaker, and execute the corresponding automated decision-making actions.

[0177] The specific embodiments provided by the present invention are as follows:

[0178] Example 1: Audit Admission and Accuracy Verification of the Benchmark Sample

[0179] S1: Experimental Materials and Design

[0180] 1. Sample Selection and Background Setting In this embodiment, standard artificial simulated microbial community DNA (e.g., ZymoBIOMICS Microbial Community DNA Standard, purchased from Zymo Research) was used.

[0181] Component characteristics: The sample contains genomic DNA from 8 bacteria and 2 fungi, with known species composition and theoretical absolute abundance.

[0182] Purpose of Selection: Because this standard is in a high-purity solution state, free of matrix inhibitors such as humic acid and salts, and its physical concentration is calibrated with extreme accuracy, it is used as the "audit zero point" verification sample for this invention to determine the system's baseline yield Y. ref .

[0183] 2. Selection of Internal Standards and System Construction

[0184] This invention uses a set of artificially synthesized, non-natural double-stranded DNA fragments as exogenous internal standards (IS).

[0185] Internal standard composition: The internal standard mixture consists of 15 sequences of varying lengths, ranging from 94 to 421 bp, with GC content ranging from 26.8% to 63.8%, simulating common GC gradients in natural microbial genomes to cover amplification biases during sequencing. All internal standard sequences were purified by high-performance liquid chromatography (HPLC) to a purity of over 98%.

[0186] Method of addition: Based on the initial mass m of the Zymo sample measured in S1-2 E The internal standard mixture is precisely added according to a preset ratio (e.g., the total mass fraction of the internal standard is 0.1%).

[0187] 3. Experimental consumables and instruments

[0188] Nucleic acid extraction reagents:

[0189] Standard bacterial flora or pure bacterial solution: QIAGENDN easyBlood & Tissue Kit;

[0190] Sediment, soil, or sludge samples: QIAGENDN easyPowerSoilKit;

[0191] Fecal or intestinal contents samples: QIAGEN QIAamp Power Fecal Pro DNA Kit.

[0192] Physical quantitative and quality control instruments:

[0193] Accurate concentration determination: Qubit 4.0 fluorescence quantitative quantitation instrument (ThermoFisherScientific).

[0194] Purity index monitoring (A260 / 280, A260 / 230): Nanodrop2000 spectrophotometer (ThermoFisherScientific).

[0195] Fragment integrity assessment: Agilent 2100 Bioanalyzer and its matching DNA kit were used.

[0196] Library construction and sequencing platform:

[0197] Library construction reagents: KAPAHyperPrepKit was selected;

[0198] High-throughput sequencing: The Illumina NovaSeq 6000 sequencing platform was selected.

[0199] Bioinformatics analysis software:

[0200] Sequencing quality assessment: FastQCv0.11.9;

[0201] Connector Filtration and Low-Quality Shearing: Trimmomatic v0.39;

[0202] Sequence alignment and reference genome localization: BWA-MEMv0.7.17;

[0203] Comparison results statistics and data processing: Samtoolsv1.15.

[0204] S2 Experimental Procedure:

[0205] 1. Sample preparation and internal standard addition:

[0206] DNA was extracted from each matrix sample according to the corresponding kit instructions. After extraction, the DNA was quantified using Qubit, and the quantification was uniformly up to 300 ng.

[0207] The overall proportion of internal standard added f IS =0.1%, meaning the mass m of the internal standard added to each sample. IS =300ng × 0.1% = 0.3ng;

[0208] Add the internal standard mixture to the sample DNA solution, mix well, and incubate at room temperature for 5 minutes.

[0209] The audit-labeled nucleic acid is a combination of several exogenous double-stranded DNA fragments, the proportion of which is added to the sample is known, and it is used to characterize the system sequencing yield response of the sample under the current library preparation-sequencing process.

[0210] In this embodiment, the audit-labeled nucleic acids are selected from a subset of sequences in an exogenous nucleic acid library. The specific number, numbering, and composition of these sequences do not constitute limitations of this invention.

[0211] 2. Library preparation and sequencing:

[0212] All samples were prepared using the same library preparation system (50 μL system), and PCR amplification was performed in 12 cycles. After purification with magnetic beads, the samples were quality checked.

[0213] The library was diluted to 1.0 nM and sequenced on the NovaSeq 6000 platform with a read length of PE150 and a sequencing depth of 10 Gb / sample.

[0214] 3. Data processing and parameter calculation:

[0215] Quality control: Remove the connectors and bases with Q < 20 to obtain clean reads;

[0216] Alignment: Align cleanreads to the internal standard fragment sequence and the Zymo genomic DNA sequence;

[0217] Statistics: Calculate n for each sampleTOT,s n IS,s n G,s Thus, we obtain B. IS,s B G,s Y seq ;

[0218] S3 Experimental Results and Analysis:

[0219] 1. System baseline yield (Y) ref calibration data

[0220] This embodiment first establishes the baseline yield Y of the testing system under ideal conditions by testing Zymo standard samples. ref The experiment used the NovaSeq 6000 sequencing platform, with a sequencing depth of 10 Gb / sample. The initial calibration data and sequencing output obtained are shown in Table 1 below:

[0221] Table 1: Basic Parameters of the Benchmark Sample in Example 1

[0222] Parameter Items symbol Experimental measurements / calculated values unit Initial sample input quality <![CDATA[m E ]]> 300.0 ng Total input mass of internal standard mixture <![CDATA[m IS ]]> 0.3 ng Add total mass fraction to internal standard <![CDATA[f IS ]]> 0.001 - Total sequencing data <![CDATA[n TOT,s ]]> 9,925,386,101 bp

[0223] Seven internal standard fragments (ISF-1 to ISF-7) with different physical gradients were compared and analyzed using a bioinformatics workflow. The real-time yield data of each fragment are shown in Table 2 below.

[0224] Table 2: Real-time Yield Analysis Statistics of Each Fragment of Internal Standard in Example 1

[0225] Internal Standard Number <![CDATA[Mass ratio f IS,i > <![CDATA[Number of matching bases n IS,i > <![CDATA[Fragment yield Y seq,i > ISF-1 9.9E-05 406034 0.41 ISF-2 1.1E-04 548604 0.48 ISF-3 1.2E-04 574220 0.50 ISF-4 1.7E-04 922541 0.53 ISF-5 2.4E-04 1184876 0.49 ISF-6 5.4E-05 201757 0.37 ISF-7 2.0E-04 549388 0.28 ISF-8 1.3E-06 6601 0.51 ISF-9 1.5E-06 7551 0.52 ISF-10 1.2E-06 5903 0.50 ISF-11 1.2E-06 7581 0.64 ISF-12 1.2E-06 7622 0.66 ISF-13 1.9E-06 9549 0.51 ISF-14 7.8E-07 3978 0.51 ISF-15 1.0E-06 4743 0.48 Average yield 0.49

[0226] like Figure 3 As shown, in the standard sample system of Example 1, by setting multiple gradients of exogenous DNA internal standard incorporation ratios (logf... IS ), and calculate the corresponding internal standard base coverage ratio (logB) in the sequencing data. IS It can be observed that the two exhibit a highly linear correspondence (R²=0.988).

[0227] These results indicate that, under ideal conditions of no significant matrix inhibition and accurate physical quantification, the internal standard exhibits stable and predictable system response characteristics from the addition of the standard to the sequencing recovery signal.

[0228] As shown in Table 2, multiple exogenous DNA internal standard fragments covering approximately five orders of magnitude of quality gradient were introduced into the standard sample system of Example 1. The real-time yield Y of each internal standard fragment in the sequencing data is shown. seq,i The overall distribution ranged from 0.28 to 0.66, with the vast majority of fragments concentrated in the range of 0.4 to 0.6, and no variation was observed with the internal standard incorporation ratio f. IS,iThe systematic drift trend resulting from the changes.

[0229] The above results indicate that, under ideal conditions with no significant matrix inhibition and physical quantification bias, the sequencing system exhibits highly consistent end-to-end recovery response characteristics for internal standard fragments of different orders of magnitude. Based on this consistency, this embodiment uses the arithmetic mean of the yields of all internal standard fragments as the system baseline yield Y. ref (Approximately 0.49 in this example) is used for audit offset determination of subsequent samples. It should be emphasized that this average yield is only used to establish the audit baseline zero point, and is not a assumption that it must hold true in complex matrix samples; this invention uses the offset vector η to perform real-time auditing of whether each independent sample is still within this consistency interval.

[0230] This invention establishes the system's baseline yield parameter Y based on this stable response range. ref This serves as a reference zero point for subsequent sample audit decisions. It should be noted that this linear relationship is only used to establish the audit baseline and does not presuppose that it necessarily holds true in complex matrix samples; on the contrary, this invention calculates the offset vector η in real time to determine whether each independent sample is still within this linear response range.

[0231] 2. Audit Judgment Logic Verification

[0232] Based on the statistical results in Table 2, an audit determination is made for this embodiment:

[0233] Establish a baseline: Take the arithmetic mean of the yields of 15 segments to determine the baseline yield Y of this system. ref =0.49.

[0234] Calculate the offset vector: according to the formula η=Y seq / Y ref In this embodiment, η=1.

[0235] Audit conclusion: The offset vector η=1.00 is within the high confidence threshold range [0.75, 1.25] set by this invention. The system determines that the physical quantification of the sample is true and approves the audit.

[0236] 3. Quantitative accuracy verification analysis

[0237] Assuming the system audit is successful, the absolute abundance of the three main bacterial strains in the sample was calculated using real-time yield data. The measured abundance (%) was compared with the official theoretical abundance (%), and the results are attached. Figure 4The attached figure (bar chart) shows the abundance comparison of three representative strains: Staphylococcus aureus, Lactobacillus fermentum, and Enterococcus faecalis. The blue bars (theoretical values) and orange bars (measured values) are at roughly the same height, visually verifying the detection accuracy of the auditing system of this invention under access conditions.

[0238] In Embodiment 1 of this invention, Zymo (a standard) was used to perform benchmark calibration on the system. Experimental results show that after the audit system determined that the offset vector η=1.00 (access granted), the absolute quantitative model of this invention exhibits extremely high stability and accuracy.

[0239] The abundance of representative bacterial species in the community was calculated as follows: *S. aureus* had a measured abundance of 12.4% (theoretical value 12.0%), *E. faecalis* had a measured abundance of 11.1% (theoretical value 12.0%), and *L. fermentum* had a measured abundance of 10.7% (theoretical value 12.0%). Statistical analysis showed that the measured values ​​for the above species were completely consistent with the theoretical values, with an average absolute deviation of only 0.87%. This proves the Y... established in Example 1. ref =0.49 has extremely high calibration value and can effectively support the audit judgment logic for subsequent complex samples (such as samples with physical quantitative bias or matrix inhibition).

[0240] Example 2: Audit Determination of "Physical Quantitative Lies" Caused by Highly Interfering Matrix Samples

[0241] 1. Purpose of Implementation

[0242] This embodiment aims to verify the ability of the auditing system of the present invention to identify false concentration readings (i.e., "physical quantitative lies") generated by the physical quantification device (Qubit) when processing samples containing high concentrations of complex impurities (such as humic acid, polysaccharides, etc.) (e.g., high-impurity soil, sludge). Through real-time yield auditing and quantification of the offset vector η, it is demonstrated that the present invention can effectively prevent the outbreak of negative absolute abundance of microorganisms caused by distortion of physical quantification.

[0243] 1. Purpose of Implementation

[0244] This embodiment aims to verify the ability of the auditing system of the present invention to identify false concentration readings (i.e., "physical quantitative lies") generated by the physical quantification device (Qubit) when processing samples containing high concentrations of complex impurities (such as humic acid, polysaccharides, etc.) (e.g., high-impurity soil, sludge). Through real-time yield auditing and quantification of the offset vector η, it is demonstrated that the present invention can effectively prevent the outbreak of negative absolute abundance of microorganisms caused by distortion of physical quantification.

[0245] 2. Experimental setup and standard operating procedures

[0246] Sample selection: Soil samples (T6, T2), sludge samples (W6), sediment samples (C1, C4), and animal feces samples (F1, F2) with typical environmental disturbance characteristics were selected.

[0247] Calibration benchmark: Real-time yield Y of standard ZymoBIOMICS in a clean environment ref =0.49 is used as the global calibration point for the system.

[0248] Pre-physical quantification and library construction:

[0249] The purity of nucleic acid extracts from each sample was determined using NanoDrop2000.

[0250] DNA mass concentration was determined using Qubit 4.0.

[0251] Based on the concentration values ​​output by Qubit, all samples were constructed using a "constant 300ng DNA load" for sequencing library construction.

[0252] Audit metric calculation: The system calculates the real-time yield Y and the offset vector η=Y / Y based on the proportion of Reads in the internal standard sequence of the offline data. ref .

[0253] 3. Summary table of audit judgment data

[0254] Table 3. Data on yield deviation analysis and path decision matrix verification for complex sample systems.

[0255] Sample Name Sample prototype A260 / A280 A260 / A230 Nominal database size ng (Qubit) Mean apparent yield Y Offset vector η Path hit determination Audit action conclusions B1 Standard products 1.63 2.37 300 0.49 1.0 Path A Green clearance C1 Sediments 1.88 1.84 300 0.74 1.5 Path B Risk-based access restrictions (revised) C4 Sediments 1.83 1.56 300 1.82 3.7 Path C Forced circuit breaker (high bias) T2 soil 1.86 2.14 300 2.11 4.3 Path B Risk-based access restrictions (revised) T6 soil 1.82 0.45 300 8.34 17.0 Path B Risk-restricted entry (revised) W6 sludge 1.83 0.66 300 1.92 3.9 Path B Risk-restricted entry (revised) F1 stool 1.84 1.92 300 0.53 1.1 Path A Green clearance F2 stool 1.83 1.91 300 0.49 1.0 Path A Green clearance

[0256] 4. Detailed analysis of audit judgment logic and technical effectiveness

[0257] 4.1 Identification and interception of extreme "physical quantitative lies" (taking soil sample T6 as an example):

[0258] Technical malfunction: The photometric ratio A260 / A230 of sample T6 is only 0.45, indicating the presence of high concentrations of organic interfering substances such as humic acid. Complex matrices may contain components that bind to fluorescent dyes / enhance the background, or cause the formation of non-target structures recognizable by the dye, resulting in systematically high fluorescence readings. Simultaneously, trace particles / colloids and residual chemicals may also introduce baseline drift. Such high readings may not be fully identifiable by Nanodrop absorbance ratios.

[0259] Audit conflict identification: The experimenter added nominally "300ng" of DNA into the library based on the Qubit reading, but the audit system measured its η value to be 17.0.

[0260] Causal deduction: η far exceeding the threshold means that the actual DNA output observed at the sequencing end is far lower than physically expected. Conversely, it can be deduced that the total amount of actual DNA in the liquid measured by the experimenter is only approximately 300ng / 17.0≈17.62ng.

[0261] Final action: The system determines that the physical quantification of the sample is completely invalid and executes path C (forced circuit breaker). If this audit interception is not executed, the absolute abundance of all target microorganisms in the sample will be erroneously amplified by 17 times due to the false denominator (300ng). This invention safeguards the authenticity of absolute quantification through this mechanism.

[0262] 4.2 Algorithm calibration for high-risk bias samples (taking soil sample T2 as an example):

[0263] Audit findings: Sample T2 has an η of 4.31, falling into path B (risk correction interval).

[0264] Logical deviation identification: Before correction, the average apparent yield was 2.11, deviating from the baseline (0.49) by a significant 330.6%. This indicates that the front-end physical quantitative analysis... E A serious pathological deviation has occurred. If 300ng is used directly as the denominator for quantification, the calculated absolute abundance will be underestimated by more than 4 times.

[0265] Quantitative correction effect: The actual load is corrected using the formula m ref =m E The internal standard sequencing yield was calculated based on the corrected reference load using / η, and the correlation between the internal standard quality ratio and the sequencing base ratio was fitted. The actual load of the sample was then corrected using formula m. ref =m E The corrected DNA loading capacity was calculated to be 69.61 ng based on / η. Subsequently, the sequencing yield (Y) of the internal standard was recalculated based on this corrected loading capacity. seq As shown in Table 4. After correction, the yield Y... seq The mean reverted to 0.35, and the overall deviation from the benchmark value plummeted from 330.6% to 27.6%, significantly improving the physical authenticity of the quantitative results.

[0266] Although path B achieved linear calibration of the magnitude, residual analysis revealed a residual deviation (-70.9%) of certain internal standard fragments, such as ISF-7, relative to the regression line. This residual deviation reveals differences in the nonlinear response among DNA fragments under complex matrix interference. These fragment-specific discrete characteristics were identified by the system as "guide signals" to trigger the subsequent precise reduction module (handled separately). This module utilizes the coefficient of variation (CV) and weight matrix of the internal standard group for secondary fine compensation to eliminate nonlinear noise that cannot be covered by the first-order linear correction.

[0267] Table 4: Statistical Table of Audit Correction Parameters for Complex Sample T2 in Example 2

[0268] Internal label ID <![CDATA[Apparent m E (ng)]]> Offset vector η <![CDATA[Reconstruction load m ref (ng)]]> <![CDATA[Apparent Y seq (Before correction)]]> Corrected Yseq Deviation rate ISF-1 300 4.31 69.61 1.58 0.30 -39.8% ISF-2 300 4.31 69.61 1.89 0.32 -35.4% ISF-3 300 4.31 69.61 1.88 0.34 -30.3% ISF-4 300 4.31 69.61 2.48 0.33 -32.3% ISF-5 300 4.31 69.61 2.25 0.22 -55.7% ISF-6 300 4.31 69.61 1.65 0.34 -29.9% ISF-7 300 4.31 69.61 1.17 0.14 -70.9% ISF-8 300 4.31 69.61 2.49 0.43 -11.2% ISF-9 300 4.31 69.61 2.41 0.41 -15.5% ISF-10 300 4.31 69.61 2.22 0.41 -17.3% ISF-11 300 4.31 69.61 2.99 0.55 11.6% ISF-12 300 4.31 69.61 2.02 0.38 -21.7% ISF-13 300 4.31 69.61 2.80 0.40 -18.9% ISF-14 300 4.31 69.61 2.21 0.45 -8.9% ISF-15 300 4.31 69.61 1.59 0.31 -37.2% Mean (AVE) 300 4.31 69.61 2.11 0.35 -27.6%

[0269] 4.3 Load Reconfiguration Logic Verification

[0270] like Figure 5 As shown, the internal standard sequencing coverage ratio (logB) for sediment sample C1, sediment sample C4, soil sample T2, and sludge sample W6 under the conditions before and after correction is presented. IS The ratio of internal standard theory to logf IS The relationship between them was compared with the system standard sample (Zymo standard).

[0271] Without audit corrections (blue triangle), the logB of the aforementioned high-interference matrix sample IS -logf IS The fitting results deviated from the standard trajectory of the system as a whole, and the linear slope was significantly lower than the system baseline slope of the Zymo standard (approximately 1.086). For example, the slope before correction for the T2 sample was 0.938, the slope before correction for the W6 sample was 0.946, and the slope before correction for the C4 sample was 0.950. This indicates that when the Qubit epigenetic DNA load was used as the denominator for calculation, the physical quantification error caused the overall sequencing response of the system to be suppressed.

[0272] After calculating the offset vector η using the real-time yield auditing method of this invention and reconstructing the nominal DNA load of the samples (orange dots), the logB of each sample is... IS -logf IS The relationship reverted to the linear response range defined by the system standard, with its linear slope recovering to the range of 1.025–1.067, which is highly consistent with the system baseline slope, and the correlation coefficients all reached R²≥0.999.

[0273] The above results indicate that the quantitative bias observed in high-interference matrix samples does not originate from a failure of the sequencing system itself, but rather from a systematic shift introduced by distortion of the physical quantification denominator. This invention, by auditing the system's sequencing response parameters and reconstructing the payload when necessary, significantly restores the quantitative consistency between different matrix samples and the system baseline without altering the inherent response characteristics of the sequencing system.

[0274] Meanwhile, it can be observed that each sample still retains a slight residual bias after correction, reflecting the inherent differences in PCR amplification efficiency, fragment length and sequence composition of different internal standard fragments. This residual bias is within the normal response fluctuation range of the sequencing system and does not affect the function of this invention in identifying and blocking "physical quantitative lies".

[0275] It should be noted that conventional quantitative sequencing methods in the current technology usually assume that the physical quantitative results (such as DNA quality measured by Qubit based on fluorescent dyes) are true and reliable, and lack an auditing process for the validity of the physical quantitative results. Based on this, quantitative curves of internal standards or standard products are directly constructed to infer the absolute abundance of target nucleic acids in the sample from the sequencing signal.

[0276] However, in complex environmental matrix samples, the above preconditions often do not hold. When humic acid, polysaccharides, colloidal organic matter, or other interfering substances with fluorescence response or binding properties are present in the sample, the DNA concentration reading output by the physical quantification instrument may be systematically high, resulting in a significant deviation between the "nominal nucleic acid load" used for library construction and the actual amplifiable nucleic acid load in the sample.

[0277] In this situation, existing methods still directly use the quantitative curve constructed based on the nominal abundance for absolute quantitative calculation, which is essentially equivalent to fitting and using a quantitative model under the premise of distorted denominator conditions. This type of bias is usually not explicitly exposed in curve linearity indicators (such as correlation coefficient R²), but exists covertly in the form of overall slope changes or systematic shifts, thus causing the absolute abundance of the target gene to be magnified or compressed by a factor of two at the result level, forming a systematic false positive or false negative risk that is difficult to identify through conventional quality control methods.

[0278] To address the aforementioned issues, this invention introduces an auditing mechanism based on real-time sequencing yield. Before absolute quantification calculation, the consistency between the physical quantification results and the sequencing system response is assessed. This allows for the identification and prevention of quantification risks caused by physical quantification distortion before the quantification curve is actually invoked, ensuring that subsequent absolute quantification calculations are performed only under the premise of consistency between physical load and system sequencing response.

[0279] Example 3: Blind test verification of unknown target mass reconstruction in complex matrix environment

[0280] 1. Purpose of Implementation

[0281] This embodiment aims to verify:

[0282] In complex matrix environments where the absolute copy number of the target gene is unknown and there is systematic distortion in the physical quantification of the sample, this invention proposes an audit-load reconstruction method based on real-time sequencing yield offset. It aims to determine whether this method can achieve high-precision reconstruction of the absolute copy number of the unknown target nucleic acid using only exogenous nucleic acid audit signals, without relying on any prior quantitative information of the target gene.

[0283] 2. Experimental Design and Overall Approach

[0284] Test sample matrix: covering soil samples (T2), sludge samples (W6), and sediment samples (C1, C4) with typical environmental disturbance characteristics in Example 2.

[0285] Library construction, DNA quality control, and sequencing were performed under the same conditions as in Example 2.

[0286] The sequencing results were analyzed using bioinformatics to align with the internal standard and target DNA sequences, and the matching bases were counted.

[0287] In addition to Example 2, seven DNA fragments with known absolute copy numbers (mock targets) were added, with lengths ranging from 150 to 420 bp and GC ratios ranging from 40% to 64%.

[0288] 3. Detailed calculation and audit restructuring steps

[0289] S3-1: Derivation of the offset vector η

[0290] As in Example 2, the sequencing base coverage ratio B of the audit group within the sample was read. IS And combined with its known addition of mass fraction f IS The average real-time sequencing yield of the audit group was calculated according to formula (4):

[0291]

[0292] Further combined with the system's baseline yield Y ref =0.49, and the sample-specific offset vector is derived according to formula (5):

[0293]

[0294] Calculate the effective DNA load m ref :m ref =m E / η,

[0295] S3-2 Statistical analysis of the number of effective matching bases n for each sample and each target G,sAnd obtain B through formula (3) G,s

[0296] S3-4 combines η obtained in step S3-1 and the reconstructed reference load m ref Calculate the DNA quality of each mock target.

[0297]

[0298] Where, m G,i Let be the absolute mass (ng) of the i-th mock target in the sample DNA library.

[0299] Then convert to copy number:

[0300]

[0301] The results are as follows Figure 6 As shown, the results of blind absolute copy number determination of seven unknown target genes (Target1–Target7) are presented in a complex matrix environment.

[0302] The horizontal axis represents the different target gene numbers, and the vertical axis represents the logarithm of the absolute copy number of the target gene (log10). Among them, the blue bars represent the theoretically added target gene copy number, and the orange, gray, yellow, and dark blue bars represent the predicted copy number results calculated by the audit-reconstruction method of this invention in different complex matrix samples (T2 soil, C1 sediment, W6 sludge, and C4 sediment).

[0303] The results showed that for all seven target genes, under different complex matrix conditions, the predicted copy numbers were highly consistent with the theoretical values, and the overall error was controlled within the same order of magnitude, with no systematic overestimation or underestimation. Especially in highly interfering samples (such as T2 soil and W6 sludge), although the physical quantification stage had been shown in Example 2 to have a significant risk of load distortion, the predicted copy numbers of each target gene still stably returned to the theoretical reference level after the real-time sequencing yield audit and load reconstruction of this invention.

[0304] Further comparison of different target genes reveals that this consistency does not depend on the copy number level of the target gene itself: both high-copy targets (log10 approximately 8.1–8.3) and low-copy targets (log10 approximately 6.2–6.4) exhibit similar prediction accuracy. This indicates that the method of this invention corrects sample-level system load shifts, rather than making empirical corrections for a specific target sequence.

[0305] The above results demonstrate that, in complex matrix environments where the absolute copy number of the target is unknown and the physical quantification of the sample is systematically distorted, the present invention can achieve stable and accurate absolute copy number restoration of independent target nucleic acids by relying solely on the system parameters derived from audited and calibrated nucleic acids. This verifies the universality and engineering reliability of the method in real-world application scenarios.

[0306] Research and development concept and technological breakthroughs of the present invention

[0307] To address the systemic deficiencies of the existing technologies, this invention, based on the qmNGS yield system correction model proposed by Li et al. (2021), achieves a paradigm shift from an "arithmetic tool" to an "audit engine." Li (2021) established the use of internal standard yield Y... seq Mathematical foundations describe the state of sequencing systems, but when faced with complex real-world substrates, simple mathematical fitting is no longer sufficient to address underlying logical failures. This invention's research concept goes beyond simply "passively accepting sample parameters," instead constructing a decision-making architecture with proactive auditing capabilities. Its core technological breakthroughs are as follows:

[0308] 1. Introduce "offset vector η" as the core audit variable to expose physical quantitative lies.

[0309] The essence of innovation: Traditional technologies blindly rely on the physical readings of DNA samples as the denominator, while this invention introduces "logic self-consistency of sequencing yield" into the audit process for the first time.

[0310] Technological Breakthrough: Real-time calculation of internal standard feedback signal and nominal sample loading volume m sample The system can automatically identify non-biomass loads (such as free nucleotides and fluorescent impurities) in samples that are "detectable but not sequenced" by using the offset vector η between them.

[0311] Addressing pain points: It completely eliminates the "denominator lie" caused by the initial physical reading being too high, and achieves accurate identification and load correction of "water-injected samples".

[0312] 2. Construct a backtracking mechanism based on the "quality dimension" to break the logical loop of qPCR validation.

[0313] The essence of innovation: It abandons the traditional logic of relying on "volume concentration" for verification and establishes a biochemical audit dimension based on "absolute mass load".

[0314] Technological Breakthrough: By utilizing the backhaul signal strength of multi-gradient internal standards, the true "effective template load" in the sample can be calculated in reverse. This logic does not rely on the initial readings provided by the front-end physical equipment, thus enabling third-party biochemical auditing independent of physical quantification.

[0315] Addressing pain points: It breaks down the "audit blind spot" caused by dimensional misalignment and denominator dependence in qPCR validation, transforming load deviation from an unverifiable black box into a quantifiable deterministic indicator.

[0316] 3. The innovative "PathA / B / C" multi-path automatic traffic splitting and decision-making circuit breaker mechanism is fundamentally innovative:

[0317] This represents a paradigm shift from "passive calculation" to "proactive quality control decision-making," granting the absolute quantitative system core powers of autonomous auditing and risk interception (Melt). Technological breakthroughs: Path A (Judgment: High Confidence Admission): When the audit vector η is within the ideal threshold range, the system determines a high match between physical load and biochemical signals, and automatically outputs high-confidence data. Path B (Judgment: Risk Admission): When η experiences a predictable systematic shift, the system automatically identifies "background noise" in the front-end physical quantification and activates the denominator correction matrix to correct the initial load error before outputting reliable data. Path C (Judgment: Instantaneous Blocking): When the audit detects a severely excessive shift vector η and the actual effective template quantity is extremely low (e.g., an extreme sample with a nominal load of 300 ng but actual DNA of only 15.9 ng), the system determines that the sample has experienced underlying logical collapse, immediately triggering a forced circuit breaker command and blocking misleading results. Addressing key pain points: An absolutely quantitative "industrial-grade safety valve" has been established, completely ending the chaotic logic of "forcibly outputting data regardless of sample validity" in existing technologies, ensuring the underlying authenticity of every set of data in clinical diagnosis and serious scientific research. Addressing key pain points: An industrial-scale, absolutely quantitative "seriousness bottom line" has been established, eliminating the industry chaos of "outputting data regardless of quality."

[0318] Alternative solutions are as follows:

[0319] Algebraic substitution of audit criteria indicators (to prevent algorithmic circumvention)

[0320] In practical implementation, the calculation of the offset vector η can be replaced by the following equivalent or related indices:

[0321] Reads percentage deviation audit: Absolute yield Y not explicitly calculated seq Instead, it directly monitors the dispersion between the "real-time proportion of internal standard Reads in total effective Reads" and its "theoretical sampling proportion".

[0322] Abundance ratio constancy audit: Without calculating yield, monitor whether the ratio between internal standard members with different sequence characteristics (such as different GC content or length) remains constant. If the ratio undergoes non-linear distortion with matrix changes, it is considered an audit anomaly.

[0323] Statistical probability distribution audit: Use machine learning models (such as support vector machines or random forests) to extract feature vectors of internal standard coverage, and use a classifier to determine whether a sample belongs to the "pass" or "risk" state, rather than using rigid algebraic thresholds.

[0324] Reason for non-replaceability: The above indicators are merely different mathematical representations of eta, and their auditing essence of relying on sequencing feedback signals to "verify the authenticity of load" remains unchanged.

[0325] Logical Evolution and Alternatives of Audit Decision-Making Paths (Preventing Process Circumvention)

[0326] For the traffic splitting decision for paths A / B / C, the following alternative processes exist:

[0327] Tiered confidence scaling: The system does not forcibly interrupt (circuit break) the data, but instead assigns a "confidence score" determined by η to each set of output absolute quantitative data.

[0328] Biochemical feedback correction loop: When an audit detects an offset, the system automatically issues physical instructions such as "dilution and retest" or "increase sequencing depth" instead of performing corrections at the algorithm level.

[0329] Reason for irreplaceability: This "feedback improvement" is still based on the audit judgment logic proposed in this invention.

[0330] Physical / biochemical remedial attempts (comparative defect solutions)

[0331] To eliminate the influence of the matrix, existing technologies often attempt the following physical approaches, but none of them can replace the auditing system of this invention:

[0332] High-intensity matrix purification protocol: This protocol employs multiple rounds of inhibitor removal kits to attempt to eliminate matrix interference with physical quantification methods such as Qubit.

[0333] Matrix-specific standard curve compensation: A dedicated quantitative compensation model is established for each specific matrix type.

[0334] Limitations: Physical purification cannot quantify residual interference and results in the loss of low-abundance templates; dedicated standard curves cannot handle heterogeneous drift between samples. The "real-time audit logic" of this invention (i.e., independent auditing for each sample) has overwhelming advantages in terms of versatility, cost-effectiveness, and data consistency.

[0335] The basic principles of the present invention have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in the present invention are merely examples and not limitations, and should not be considered as essential features of each embodiment of the present invention. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the present invention to the necessity of employing the aforementioned specific details.

[0336] The block diagrams of devices, apparatuses, devices, and systems involved in this invention are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as “comprising,” “including,” “having,” etc., are open-ended terms meaning “including but not limited to,” and are used interchangeably with them. The terms “or” and “and” as used herein refer to the terms “and / or,” and are used interchangeably with them unless the context clearly indicates otherwise. The term “such as” as used herein refers to the phrase “such as but not limited to,” and is used interchangeably with it.

[0337] It should also be noted that in the apparatus, device, and method of the present invention, the components or steps can be disassembled and / or recombined. These disassemblies and / or recombinations should be considered as equivalent solutions of the present invention.

[0338] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of the invention. Therefore, the invention is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.

[0339] It should be understood that the qualifying terms "first", "second", "third", "fourth", "fifth" and "sixth" used in the description of the embodiments of the present invention are only used to more clearly illustrate the technical solutions and are not intended to limit the scope of protection of the present invention.

[0340] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the invention to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations therein.

Claims

1. A nucleic acid quantification admission determination method based on sequencing response feature auditing, characterized in that, Includes the following steps: S1: Sample processing and initial quality calibration. Nucleic acid extraction is performed on test samples of different matrix types. The apparent quality of total nucleic acid in the samples is obtained using physical detection methods. Exogenous nucleic acid internal standard clusters are added in a certain proportion. The exogenous nucleic acid internal standard clusters consist of M artificially synthesized double-stranded DNA sequences, and the mass ratio of internal standard addition is f. IS The feasible range is 0.01%-1%; S2: Library construction and sequencing. The sample DNA and internal standard mixture are used to construct a library. The qualified library is sequenced to obtain raw sequencing data. S3: Sequencing data processing and base counting. After quality control and sequence alignment of the raw sequencing data, the total number of effective sequencing bases n is counted. TOT,s The number of bases covered by the internal standard, n IS,s The target gene covers n bases. G,s Calculate the internal standard coverage ratio B IS,s and the coverage ratio of target genes B G,s ; S4: Raw sequencing yield calculation and deviation audit index establishment, based on the internal standard coverage data obtained from sequencing and the initial addition ratio, using formula Y seq,s = B IS,s / f IS Calculate the raw sequencing yield of the sample and retrieve the system baseline yield Y. ref Through the formula η= Y seq,s / Y ref Calculate the audit offset vector η; S5: Based on the asymmetric audit access determination of η, preset the self-consistency threshold T1 and the circuit breaker threshold T2, and determine the sample into three levels according to the value range of η: high confidence access, risk access and correction guidance, and logic collapse circuit breaker, and execute the corresponding automated decision-making actions.

2. The method according to claim 1, characterized in that, In step S1, the initial quality calibration includes: determining the concentration C of double-stranded DNA in the sample using a fluorescent dye method. _ph The impurity content was assessed by combining the A260 / 280 and A260 / 230 ratios determined by absorbance method, and calculated according to m. E = C×V calculates the initial total physical mass. The instrument is used after being calibrated at least two points with standard materials. The system synchronously records the sample matrix type and the A260 / 230 ratio. If A260 / 230 < 1.0, it is marked as a high-risk sample for impurities and is associated with the subsequent judgment logic in S4 to help interpret the offset of η.

3. The method according to claim 1, characterized in that, The double-stranded DNA sequence fragment of the exogenous nucleic acid internal standard cluster in S1 has a length range of 150-1500 bp, a GC content controlled at 40%-60%, and a purity of ≥95% after HPLC purification. Moreover, the internal standard sequence has unique recognition in the genomic background of the sample to be tested.

4. The method according to claim 1, characterized in that, The system baseline yield Y in S4 ref Y is obtained by multiple parallel determinations of standard mock DNA samples under ideal experimental conditions without matrix inhibition and physical quantitative interference. seq Arithmetic mean.

5. The method according to claim 1, characterized in that, In S5, The high-confidence admission determination is that η is in T. 1low ≦η≦T 1high Within a given range, the system determines that the initial physical quantification and biochemical yield are highly consistent, automatically releases the data, and directly outputs the quantification results. The risk access and correction guidance determination is T2 < η < T. 1low or T 1high < η < T max The system initiates the payload reconfiguration process, pressing m. ref =m E / η calculates the reference effective DNA load and performs a deep reduction calculation based on multidimensional error weights; The logic collapse circuit breaker is determined when η < T2 or η > T. max It forcibly intercepts the data output stream, issues a warning command, and provides a suggestion for dilution and retesting.

6. The method according to claim 1, characterized in that, In step S2, the library construction includes end repair, adapter ligation, PCR amplification, magnetic bead library purification, and library quality control. The quality control requires the library fragment size to be 200-500 bp and the Qubit quantitative library concentration to be ≥1 nM. The sequencing concentration is 1.0 nM, and the minimum sequencing depth is ≥10 M reads.

7. The method according to claim 1, characterized in that, The quality control standards for the raw sequencing data in S3 are as follows: remove adapter sequences, low-quality bases with a Q value < 20, and short fragments with a length < 50bp; the mismatch rate of the sequence alignment is ≤ 2, the similarity is ≥ 90%, and uniquely aligned reads are retained for base statistics.

8. The method according to any one of claims 1-7, characterized in that, The samples to be tested include solid samples, fecal samples, liquid samples, and filter membrane enriched samples. Nucleic acid extraction is performed on different matrix samples using corresponding dedicated nucleic acid extraction kits.

9. A nucleic acid quantification admission determination system based on sequencing response feature auditing, characterized in that, include: The sample processing module is used for sample processing and initial quality calibration. It extracts nucleic acids from test samples of different matrix types, obtains the apparent quality of total nucleic acids using physical detection methods, and adds exogenous nucleic acid internal standard clusters in a specific ratio. These exogenous nucleic acid internal standard clusters consist of M artificially synthesized double-stranded DNA sequences, with an addition mass ratio f. IS The feasible range is 0.01%-1%; The sequencing module is used for library construction and sequencing. It constructs a library by mixing sample DNA with an internal standard, and then sequences the qualified library to obtain raw sequencing data. The data processing module is used for sequencing data processing and base counting. After quality control and sequence alignment of the raw sequencing data, it counts the total number of effective sequencing bases n. TOT,s The number of bases covered by the internal standard, n IS,s The target gene covers n bases. G,s Calculate the internal standard coverage ratio B IS,s and the coverage ratio of target genes B G,s ; The audit judgment module is used for raw sequencing yield calculation and deviation audit index establishment. Based on the internal standard coverage data obtained from sequencing and the initial addition ratio, it uses formula Y... seq,s = B IS,s / f IS Calculate the raw sequencing yield of the sample and retrieve the system baseline yield Y. ref Through the formula η= Y seq,s / Y ref Calculate the audit offset vector η; The early warning module is used to determine the asymmetric audit access based on the η, preset the self-consistency threshold T1 and the circuit breaker threshold T2, and determine the sample into three levels according to the value range of η: high confidence access, risk access and correction guidance, and logic collapse circuit breaker, and execute the corresponding automated decision-making actions.