Marker combination for assisting diagnosis of depression and its severity grading and application thereof

CN122189176APending Publication Date: 2026-06-12NANJING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV
Filing Date
2026-04-02
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

症状描述的模糊性、不同个体对情绪体验和躯体症状报告能力的差异,以及临床医生对诊断标准解读的不一致性,常常导致对疾病严重程度的判断出现偏差

Benefits of technology

1、提供了高准确性、可客观验证的诊断工具,能够实现对抑郁症的诊断及其严重程度的区分判定,有效弥补了现有主观评估方法的不足;

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122189176A_ABST
    Figure CN122189176A_ABST
Patent Text Reader

Abstract

The application provides a marker combination for assisting in diagnosing depression and grading the severity thereof and specific application thereof, the markers comprising 17 circulating small RNAs; the sequences of the 17 circulating small RNAs are respectively shown as SEQ ID No. 1-SEQ ID No. 17. The technical scheme provided by the application has the following remarkable beneficial effects: a biomarker combination comprising 17 circulating small RNAs is screened and verified for the first time, the marker combination has high diagnostic efficiency, objective quantification capacity, non-invasive detection advantage and high clinical operation feasibility, and provides a revolutionary tool for accurate differential diagnosis of depression and stratified management of mild and severe cases, and has great clinical value and social significance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of biotechnology, specifically to a combination of biomarkers for the auxiliary diagnosis of depression and its severity grading, and their application. Background Technology

[0002] Depression, as a common mental disorder, exhibits significant heterogeneity in clinical practice. Based on symptom severity, duration, and level of functional impairment, depressive disorders can be further subdivided into mild, moderate, and severe subtypes. Major Depressive Disorder (MDD), as its most severe and disabling form, is one of the leading causes of disease burden and functional loss worldwide. Currently, standardized tools such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) provide a framework for the diagnosis of depression, and scales such as the Hamilton Depression Rating Scale (HAMD) are widely used to assess symptom severity. However, the current diagnostic and assessment system relies primarily on patients' subjective symptom descriptions and clinicians' empirical judgments, lacking objective, quantifiable biological indicators as support. This subjective assessment-dependent model reveals significant limitations when differentiating within the depression spectrum—that is, between ordinary depressive states and major depressive disorder. The ambiguity of symptom descriptions, differences in individuals' ability to report emotional experiences and physical symptoms, and inconsistencies in clinicians' interpretation of diagnostic criteria often lead to biases in the assessment of disease severity. This could result in some patients with major depressive disorder having their condition underestimated and failing to receive timely, intensive, and appropriate treatment interventions; conversely, it could also lead to some individuals with mild or situational depression being overdiagnosed with major depressive disorder, resulting in unnecessary medication and its potential side effects. Although the HAMD score can reflect the association between functional impairment and disease burden to some extent, it is essentially a quantitative set of subjective symptoms and cannot penetrate the surface symptoms to reveal the underlying biological essence that distinguishes the severity of the disease. This "superficial" assessment method is particularly unreliable in primary care or non-psychiatric settings, making it difficult to achieve accurate stratified diagnosis and management. Therefore, clinical practice urgently needs biological tools that can objectively distinguish different subtypes of depression in severity, especially accurately identifying major depressive disorder. In recent years, circulating small RNAs (sRNAs) have been regarded as a potential goldmine of biomarkers for neuropsychiatric diseases due to their stability in body fluids, disease-specific expression changes, and ease of non-invasive acquisition. This study aims to explore objective detection methods based on circulating sRNA expression profiles to compensate for the shortcomings of current subjective diagnostic systems in accurately distinguishing the severity of depression, especially in reliably identifying major depressive disorder, and to provide key evidence for implementing individualized treatment strategies. Summary of the Invention

[0003] To address the aforementioned technical limitations, this application proposes a combination of biomarkers for assisting in the diagnosis of depression and its severity grading, and its application; this overcomes the deficiencies and defects mentioned in the background art.

[0004] To achieve the above objectives, this application adopts the following technical solution: The inventive point of this application is to provide a combination of biomarkers for assisting in the diagnosis of depression and its severity classification. The combination of biomarkers includes one or more of the following: rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82; their sequences are shown in SEQ ID No. 1-SEQ ID No. 17, respectively.

[0005] The sequences of SEQ ID No. 1 to SEQ ID No. 17 are shown in Table 1.

[0006] Table 1

[0007] The second inventive point of this application is to provide the application of a primer combination in the preparation of a reagent for the auxiliary diagnosis of depression and its severity classification, wherein the marker combination includes one or more of the following: rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82; the sequences of which are shown in SEQ ID No. 1-SEQ ID No. 17 respectively.

[0008] Optionally, in the above application, the primer combination includes a primer combination for detecting at least one sRNA marker among rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82. The primer combination consists of reverse transcription primers and primer pairs, wherein: The nucleotide sequences of the reverse transcription primers used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 are shown in SEQ ID No. 18-SEQ ID No. 34. The primer pair used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 consists of an upstream primer and a downstream primer. The nucleotide sequences of the upstream primer are shown in SEQ ID No. 35-SEQ ID No. 51, and the nucleotide sequences of the downstream primer are shown in SEQ ID No. 52-SEQ ID No. 68.

[0009] The primers are summarized in Table 2 (the primers are from GENERAL BIOL).

[0010] Table 2

[0011] The third inventive point of this application is to provide the application of a biomarker combination in the preparation of a kit for assisting in the diagnosis of depression and its severity classification, wherein the kit is a real-time quantitative qRT-PCR kit and the biomarker combination is the biomarker combination described above.

[0012] Alternatively, the above applications, The kit includes a primer set for assisting in the diagnosis of depression and its severity grading. The primer set comprises primers for detecting at least one sRNA marker among rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82. The primer set consists of reverse transcription primers and primer pairs, wherein: The nucleotide sequences of the reverse transcription primers used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 are shown in SEQ ID No. 18-SEQ ID No. 34. The primer pair used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 consists of an upstream primer and a downstream primer. The nucleotide sequences of the upstream primer are shown in SEQ ID No. 35-SEQ ID No. 51, and the nucleotide sequences of the downstream primer are shown in SEQ ID No. 52-SEQ ID No. 68.

[0013] Optionally, in the above applications, the kit further includes a total sRNA extraction reagent set and an RNA reverse transcription reagent set.

[0014] Optionally, in the above-described application, the total sRNA extraction reagent kit includes protein denaturation lysis buffer, chloroform, ethanol, and washing buffer.

[0015] Optionally, in the above-described application, the RNA reverse transcription reagent kit includes reverse transcription primers, reverse transcription reaction buffer, and a reverse transcriptase mixture.

[0016] The preferred method is to use the RNA extraction kit (product number DP419) from Tiangen Biotech (Beijing) Co., Ltd., to extract plasma samples according to the instructions.

[0017] RT-qPCR was performed using a single-channel, one-step method. The reaction system was prepared according to Table 3 below: Table 3

[0018] Optionally, in the above applications, the kit further includes a primer for amplifying the internal reference gene U6; the sequence of the primer for amplifying the internal reference gene U6 is: Forward primer for internal reference gene U6: GCTTGGCAGCACATATACTAAA (SEQ ID No. 69); Reverse primer for internal reference gene U6: TTTGCGTGTCATCCTTGCG (SEQ ID No. 70).

[0019] Optionally, in the above application, the reaction conditions for the one-step real-time quantitative qRT-PCR method are: 16℃ for 5 min; 50℃ for 15 min; 95℃ for 10 min; 95℃ for 15 s; 60℃ for 40 s; 50 cycles, with fluorescence signal detected and acquired at 60℃.

[0020] The technical solution provided in this application has the following significant advantages compared to the prior art: 1. It provides a highly accurate and objectively verifiable diagnostic tool that can diagnose depression and differentiate its severity, effectively making up for the shortcomings of existing subjective assessment methods; This application is the first to screen and validate a biomarker ensemble consisting of 17 circulating small RNAs. In the sequencing data discovery cohort, this ensemble model demonstrated superior diagnostic performance in independent validation, achieving an AUC of 0.965 [0.930-0.974] on MDD diagnosis. Figure 9 A).

[0021] In addition, for clinical applications, qPCR was used to develop a kit, and the performance remained stable, with an AUC of 0.948 [0.927-0.957] for this biological problem. Figure 9(B) This method is comparable in efficacy to the current symptomatology-based structured diagnostic interview (gold standard). It provides molecular biological evidence for the diagnosis of depression (MDD) and the differentiation of its severity without relying on subjective judgment, potentially fundamentally reducing the rate of clinical misdiagnosis. 2. It enables objective and quantitative assessment of disease severity.

[0022] This application overcomes the limitations of traditional scales (such as HAMD-24) that rely on subjective assessments by clinicians. The study found that sRNA expression levels were significantly correlated with HAMD-24 scores, and the performance of a resilient network regression model in predicting severity was good (R² = 0.705). Furthermore, a classifier based on 17 circulating small sRNAs achieved an AUC of 0.940 [0.831-0.966] when distinguishing between major and minor depression (using HAMD-24 as the standard, with a cutoff of 20 for severity), and an AUC of 0.933 [0.916-0.998] based on PC analysis of expression levels. This demonstrates the significant potential of this technique in achieving objective stratification of symptom severity, providing a quantitative basis for treatment strategy development and efficacy monitoring.

[0023] 3. Possesses excellent clinical translation and practical feasibility.

[0024] To facilitate widespread clinical application, this application successfully transformed a combination of 17 sRNA biomarkers into a one-step quantitative reverse transcription polymerase chain reaction (qRT-PCR) assay. This assay maintained high diagnostic accuracy in independent cohorts. Furthermore, compared to costly and complex high-throughput sequencing technologies, qRT-PCR offers advantages such as lower cost, faster turnaround time, and standardized operation, making it highly suitable for adoption in routine clinical settings in primary healthcare facilities and specialist clinics. This achieves a smooth transition from basic research to clinical application.

[0025] 4. Advanced and unbiased research strategies were adopted to ensure the reliability and innovativeness of the findings.

[0026] Unlike previous studies that were mostly limited by small samples, hypothesis-driven approaches, and focused solely on miRNAs, this application employs a large-scale (n=1270), multi-center design and an unbiased discovery framework, systematically incorporating four classes of sRNAs—miRNA, tsRNA, rsRNA, and piRNA—for comprehensive analysis. By combining a machine learning-based multi-objective prioritization framework with statistical differential analysis, the selected concise biomarker combinations (e.g., the first 3-10 features) still maintain strong classification performance, demonstrating the methodological advancement and the core value of the discovered biomarker combinations.

[0027] 5. This represents a significant breakthrough in the field of precision psychiatry and has broad application prospects.

[0028] This application not only provides a high-precision diagnostic model, but more importantly, it constructs a scalable, non-invasive, and objective diagnostic framework. This framework, by combining molecular biomarkers with clinical symptom assessment, opens new pathways for precision diagnosis and treatment of mental illnesses. This combination of sRNA biomarkers is expected to serve as an effective complement to existing diagnostic tools, significantly improving the objectivity, accuracy, and consistency of MDD diagnosis, ultimately improving patient clinical outcomes.

[0029] In summary, this application provides a complete solution from discovery and validation to clinical translation and application. Its core biomarker combination has the advantages of high diagnostic efficacy, objective quantification, non-invasive detection, and extremely high clinical feasibility, providing a revolutionary tool for the accurate differential diagnosis and stratified management of major depressive disorder, and has significant clinical value and social significance. Attached Figure Description

[0030] Figures 1-6 This is shown as an embodiment of the present application, illustrating the diagnostic performance evaluation of each sRNA class individually and in combination.

[0031] Figure 1 a- Figure 1 e represents the performance of nine machine learning models—random forest, AdaBoost, support vector machine, LightGBM, GBDT, XGBoost, CatBoost, logistic regression, and MLP—ranked by MOMP priority based on four individual sRNA classes: miRNA (a), rsRNA (b), tsRNA (c), piRNA (d), and their integrated combination (e), in distinguishing MDD from controls. The analysis was conducted in the discovery cohort (204 MDD patients and 427 controls, including 417 healthy controls and 10 individuals with other mental illnesses). The curves represent the average area under the curve obtained through 10-fold cross-validation; 95% confidence intervals were estimated using a bootstrap method with 1,000 iterations.

[0032] Figure 2 a- Figure 2 e represents the model performance assessed in an independent validation cohort (139 MDDs vs. 109 controls, including 83 healthy controls and 26 individuals with other mental illnesses), based on four separate sRNA classes: miRNA (a), rsRNA (b), tsRNA (c), piRNA (d), and their integrated combination (e), using a bootstrap method based on 1,000 replicates to generate 95% confidence intervals.

[0033] Figure 3 a- Figure 3e is the confusion matrix of the MOMP optimized model in the independent validation queue. This matrix shows the number of true positives, true negatives, false positives, and false negatives, providing a visual summary of classification accuracy. The color gradient reflects the ratio of correct to incorrectly classified cases.

[0034] Figure 4 a- Figure 4 e is a SHAP (SHapley Additive Interpretation) summary plot, showing the top 10 features that contributed most to the MOMP optimization model in the independent validation cohort. Features are sorted by their mean absolute SHAP value. Each point represents an independent sample, and the color indicates the relative expression level of each sRNA (orange = high, blue = low). The position on the x-axis represents the SHAP value, reflecting the contribution of each feature to the model output. Positive SHAP values ​​(right side) are associated with an increased risk of MDD, while negative values ​​(left side) are associated with the control state.

[0035] Figure 5 a- Figure 5 e is based on a single sRNA category ( Figure 5 a- Figure 5 d) and integrated sRNA combination ( Figure 5 e) The HAMD-24 score was predicted using elastic network regression combined with 10x cross-validation. The model performance was evaluated using Pearson correlation coefficient, coefficient of determination, and mean absolute error.

[0036] Figure 6 a is the ROC curve for stratifying depression severity (HAMD-24>20 vs. ≤20) using a machine learning classifier based on integrated sRNA combinatorial analysis in an independent validation cohort (n=205), with 95% confidence intervals estimated using 1,000 bootstrap replicates to evaluate performance. Figure 6 b is Figure 6 The SHAP summary plot of the MOMP model in figure a shows the top 10 most influential sRNA features. Each point represents an independent sample, and the color indicates the sRNA expression level (orange = high, blue = low). The x-axis value represents the SHAP score, which indicates the contribution of each feature to the model output. Positive SHAP values ​​(right side) indicate a contribution to major depressive disorder, while negative values ​​(left side) indicate a correlation with lower severity.

[0037] Figure 7 a- Figure 7 d represents Spearman correlation analysis of HAMD-24 scores and plasma expression levels of single sRNAs based on 342 MDD and 238 HC samples. Statistical sRNAs were defined as P < 0.05 and |r| > 0.2. Red dots indicate sRNAs positively correlated with HAMD scores; blue dots indicate sRNAs negatively correlated. Figure 7 e- Figure 7 h is a Venn diagram showing the overlap between sRNAs that are significantly associated with HAMD scores and differentially expressed sRNAs between MDD and HC in the four sRNA categories.

[0038] MOMP screening determined that Random Forest was the optimal model, and it was subsequently used for model training and validation.

[0039] Figure 8 This describes the process of obtaining 17 sRNA penals. A random forest model was used for sequential positive selection on the integrated sRNA panel of the discovery cohort. The bar chart shows the effect of integrated sRNAs in distinguishing MDD from the control group (…). Figure 8 a) Depression severity (HAMD-24>20 vs≤20) stratification ( Figure 8 (b) Feature importance (obtained from average information and determined by SHAP value). The line graph depicts the cumulative AUC (right axis) after adding sRNAs one by one in each iteration. The left graph shows the performance of the top 10-, 5-, and 3-sRNA models for each classification task in the independent validation cohort. AUC and 95% confidence intervals were calculated using 1000 bootstrap repetitions. The bottom right graph shows the confusion matrix for each minimal sRNA model in the three diagnostic tasks in the independent validation cohort. The color gradient represents the proportion of correctly and incorrectly classified cases, providing a visual summary of model accuracy.

[0040] Based on the selected top 10, the two sets were integrated into 17 penals, which were very effective in diagnosing MDD and differentiating between mild and severe cases.

[0041] Figure 9 The 17 penal groups were respectively in the external validation queue ( Figure 9 a) PCR cohort ( Figure 9 b) Verification results. Detailed Implementation

[0042] To make the objectives, technical solutions, and advantages of this application clearer, a more detailed description is provided below. However, it should be understood that the description herein is merely for explaining this application and is not intended to limit its scope.

[0043] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this application. All reagents and instruments used herein are commercially available, and the characterization methods involved can be found in relevant descriptions in the prior art, and will not be repeated here.

[0044] MDD, Depression; BD, Bipolar Disorder; Control Group, Non-Depressed Control Group; HC, Healthy Control Group; Other, Control Group with Other Mental Disorders; PCoA, Principal Coordinate Analysis.

[0045] 1. Design and Methodology From May 2021 to December 2024, this study collected 1270 plasma samples from four clinical centers, including individuals diagnosed with major depressive disorder (MDD) and bipolar disorder (BD), as well as non-depressive controls. Of these, 977 samples were used for high-throughput small RNA sequencing to establish a comprehensive circulating sRNA expression profile. The cohort included 204 MDD patients, 37 BD patients, and 427 non-depressive controls (417 healthy individuals and 10 individuals with other mental illnesses). All samples were recruited from Nanjing Brain Hospital and Nanjing University School of Medicine Affiliated Gulou Hospital (Table 4). A total of 375 individuals (MDD patients and healthy controls) had Hamilton Depression Rating Scale (HAMD-24) scores, of which 168 individuals scored >20, indicating moderate to severe depressive symptoms. The independent validation cohort included 139 MDD patients, 61 BD patients, and 109 non-depressive controls (including 83 healthy controls and 26 individuals with other mental illnesses), recruited from Guangzhou Brain Hospital Affiliated to Guangzhou Medical University and Lianyungang Fourth People's Hospital (Table 4). To further evaluate the diagnostic potential of circulating sRNA, blinded qRT-PCR was performed in a testing cohort comprising 81 MDD patients, 67 BD patients, and 145 healthy controls, all recruited at initial clinical diagnosis from Nanjing Brain Hospital and Guangzhou Brain Hospital Affiliated to Guangzhou Medical University.

[0046] The clinical and demographic characteristics of the study participants are shown in Table 4.

[0047] Table 4

[0048] 2. Subject Recruitment All patients (MDD, BD, and other mental illnesses) were diagnosed by trained psychiatrists through structured clinical interviews according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revised. Clinical assessments were conducted independently by two senior psychiatrists. Participants were eligible regardless of family history of mental illness. Participants were divided into four groups based on the following criteria: major depressive disorder, bipolar disorder, other mental illnesses (schizophrenia), and a healthy control group. (1) MDD group Inclusion criteria: (i) diagnosed with MDD according to DSM-5 criteria and with a score of ≥20 on the 24-item Hamilton Depression Rating Scale; (ii) aged 18–60 years; (iii) Han Chinese ethnicity; (iv) obtained written informed consent from the participant and their legal guardian.

[0049] Exclusion criteria: (i) mental symptoms secondary to a neurological disorder, organic brain disease or other physical illness; (ii) a disease diagnosed according to DSM-5 other than MDD; (iii) current or past use, abuse or dependence on psychoactive substances other than tobacco or alcohol; (iv) an active physical illness of clinical significance; (v) pregnancy or lactation.

[0050] (2) BD group Inclusion criteria: (i) diagnosed with BD according to DSM-5 criteria; (ii) aged 18–60 years; (iii) Han Chinese ethnicity; (iv) obtained written informed consent from the participant and their legal guardian.

[0051] Exclusion criteria: Same as for the MDD group, but applicable to BD-specific DSM-5 diagnoses.

[0052] (3) Other mental illnesses (schizophrenia) Inclusion criteria: (i) diagnosed with schizophrenia or schizophrenia spectrum disorder according to DSM-5; (ii) aged 18–60 years; (iii) Han Chinese nationality; (iv) obtaining written informed consent from the participant and their legal guardian.

[0053] Exclusion criteria: Same as group BD.

[0054] (4) Healthy control group Inclusion criteria: (i) no current or previous diagnosis of a mental disorder as defined in DSM-5; (ii) age 18–60 years; (iii) Han Chinese ethnicity; (iv) provision of written informed consent.

[0055] Exclusion criteria: (i) family history of mental disorder or suicide within three generations on the paternal or maternal side; (ii) presence of neurological disorder; (iii) use of any medication within the past two weeks; (iv) clinically significant active physical illness; (v) current or past suicidal ideation or behavior; (vi) pregnancy or lactation.

[0056] 3. Sample collection and small RNA isolation All samples in this experiment were plasma. Blood was collected between 7:00 AM and 10:00 AM, and participants were fasting (had not eaten since 8:00 PM the previous night). Venous blood was drawn from the antecubital vein using a 21-gauge needle and collected in 5 mL BD Vacutainer plasma separation tubes containing PPT gel and K2EDTA. Samples were transported vertically at room temperature and processed within 30 minutes of collection. Plasma was separated by centrifugation at 3,000 × g for 10 minutes (20°C). 1 mL of plasma was aliquoted into 1.5 mL low-protein binding tubes, flash-frozen in liquid nitrogen, and stored at –80°C until use. Circulating small RNAs were extracted from 300 μL plasma samples using the TIANGEN RNAsimple Total RNA Kit, strictly following the manufacturer's instructions.

[0057] 4. Small RNA library construction and sequencing All sRNA library construction and deep sequencing were performed by BGI Genomics (Shenzhen, China). The simplified procedure involved constructing libraries using the TruSeq Small RNA Sample Prep Kit and sequencing them on an Illumina HiSeq 4000 platform after quality validation. Raw reads were filtered to remove low-quality sequences based on the following criteria: (i) reads containing more than 4 bases with a mass fraction <10 or more than 6 bases with a mass fraction <13; (ii) reads containing 5' primer contamination or lacking a 3' primer; (iii) reads without insert tags; (iv) reads containing poly(A) sequences; and (v) reads shorter than 18 nucleotides. Clean reads were retained for downstream analysis.

[0058] 5. Small RNA Data Processing and Annotation Sequence reads that did not meet the following quality control criteria were excluded: quality fraction below 25 and length less than 18 nucleotides. The resulting clean reads were then compared sequentially with the following non-coding RNA databases: (i) miRNA database, miRBase v22; (ii) genomic tRNA database, GtRNAdb (based on hg19); (iii) rRNA database assembled from the National Center for Biotechnology Information's Nucleotide and Gene Database; and (iv) piRNA database, piRBase releasev3.0.

[0059] Based on the above dataset, sRNAs were annotated, and miRDeep2 was used to identify matches between precursor and mature miRNAs. piRNA annotation was performed using Bowtie, accepting only candidate sequences with no more than three mismatches. Furthermore, tsRNA and rsRNA sequences were annotated using SPORTS 1.1 software (an updated version of SPORTS 1.0, utilizing Bowtie).

[0060] 6. Standardization and Filtration To standardize, the total sequencing frequency of each type of sRNA in each sample was normalized to 1,000,000 and then transformed using the log10(x+1) formula. To ensure data integrity, sRNAs absent in more than 50% of the samples were excluded from subsequent analyses, and batch effects were corrected using the SVA software package. After filtering, the following types of sRNAs were retained for downstream analysis: 367 miRNAs, 291 rsRNAs, 99 tsRNAs, and 66 piRNAs, for a total of 823 integrated sRNAs. For each sRNA, the normality of the data distribution was assessed using the Shapiro–Wilk test using the stats.shapiro function in Python 3.11.5. To present sRNAs concisely in this paper, their names have been simplified or numbered; a list of names is provided in Table 5.

[0061] Table 5

[0062] 7. Correlation analysis Spearman correlation analysis explored the relationship between various sRNAs and HAMD scores. A p-value < 0.05 was considered statistically significant. A predictive model for HAMD scores was developed using a multi-step process employing elastic network regression. Model parameters α (controlling the balance between lasso and ridge penalties) and λ (overall penalty strength) were optimized using 5-fold cross-validation. α values ​​were linearly searched in increments of 0.05 from 0 to 1, and λ values ​​were selected from a combination of logarithmic and linear margins. The optimal values ​​of α and λ were chosen by minimizing the mean squared error during training.

[0063] Step 1: Stability Analysis: The selected sRNA dataset is resampled 100 times to create different training (80%) and test sets (20%). A resilient network regression model is fitted to 100 further subsamples of the training dataset (80% randomly selected each time). Features selected for each model in each iteration are stored.

[0064] Step 2: sRNA Feature Constraints: For each subsample, perform 101 iterations, limiting the number of features available for model construction based on the percentage of iterations in which features were selected in Step 1. This constraint starts at 100% and decreases by 10% with each iteration. For each constraint level, calculate the correlation between predicted HAMD scores and true HAMD scores on the remaining 20% ​​of the test set. Select the constraint level that produces the highest correlation for final feature selection.

[0065] Step 3: Final Model Construction and Validation: The final resilient network model is built on the complete dataset, using only the sRNA features selected from the percentage of the final model set in Step 2. This process is performed using 10-fold cross-validation, where model performance is evaluated based on Pearson correlation coefficient, R², and mean absolute error.

[0066] 8. Machine Learning Model Construction To leverage the diagnostic potential of various sRNAs in MDD, a dataset containing 823 sRNAs covering four distinct sRNA types was compiled. Machine learning models were developed to assist healthcare professionals in differentiating MDD from non-depressive controls and BD, and assessing its severity (defined as a HAMD-24 score threshold of 20). The diagnostic performance of nine machine learning algorithms on individual sRNA subtypes and integrated sRNA datasets was evaluated, including random forest, adaptive boosting, support vector machine, LightGBM, gradient boosting decision tree, extreme gradient boosting, CatBoost, logistic regression, and multilayer perceptron.

[0067] To reduce selection bias and ensure that the model's performance is not affected by the characteristics of a single hospital or population, participants from Nanjing Brain Hospital and Nanjing University School of Medicine Affiliated Gulou Hospital were designated as the discovery cohort, while participants from Guangzhou Medical University Affiliated Brain Hospital and Lianyungang Fourth People's Hospital were designated as the independent validation cohort.

[0068] Hyperparameter tuning was performed within the discovery queue using five-fold cross-validation, and the parameter space specific to each algorithm was thoroughly explored using grid search. Considering the inherent class imbalance in some classification tasks, appropriate class imbalance correction techniques were incorporated into the model development process. Parameter ranges are listed in Table 6.

[0069] Table 6

[0070] The model was developed using Python 3.11.7 and the scikit-learn library, supplemented by the xgboost, lightgbm, and catboost libraries.

[0071] 9. Model Performance Evaluation and MOMP Construction To comprehensively evaluate and enhance the diagnostic potential of circulating sRNA-based MDD, a multi-objective model prioritization framework was developed for constructing robust MDD diagnostic models and discovering biomarkers. An initial model pool was generated by training nine machine learning algorithms on a discovery cohort, and tenfold cross-validation was applied to mitigate overfitting. The dataset was divided into ten equal-sized, non-overlapping subsets, preserving the original class distribution to maintain statistical representativeness. In each fold iteration, one subset was designated as the internal validation set, and the remaining nine were used for model training. This process was repeated across all folds to ensure that each sample served as a validation instance exactly once. Performance metrics were averaged across folds to produce a stable estimate of the model's generalization ability. Independent external validation cohorts from Guangzhou Brain Hospital Affiliated to Guangzhou Medical University and the Fourth People's Hospital of Lianyungang City were used to further evaluate the model's generalization ability and stability.

[0072] To evaluate model performance, the true positive rate, true negative rate, false positive rate, and false negative rate were calculated, and six key evaluation metrics were subsequently derived: sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and Matthews correlation coefficient. The calculation formulas are shown below:

[0073] Receiver operating characteristic (ROC) curves were constructed using sensitivity and 1-specificity scores, and the area under the curve was calculated. A bootstrap resampling strategy was employed (1,000 iterations), and results were reported as mean and 95% confidence intervals.

[0074] The algorithm involved employs a random forest ML model.

[0075] 10. Feature Explanation For feature interpretation, SHAP analysis was applied using the validation dataset. SHAP plots were used to visualize feature effects, where color gradients represent effect magnitudes and directionality indicates the likelihood of a particular outcome. This method was able to identify sRNAs with high predictive importance.

[0076] 11. Importance ranking of sRNAs Random forest was selected as the optimal model due to its superior AUC performance in managing tradeoffs across multiple task settings. To refine the feature set, hierarchical clustering was performed on the integrated dataset of 823 sRNAs based on Spearman rank correlation, thereby eliminating redundant sRNAs exhibiting multicollinearity. From each cluster, sRNAs with the highest AUC values ​​were retained (determined based on the training set of the random forest model on the full feature set). Subsequently, the remaining features were ranked according to their contribution to classification performance, and those features that accounted for more than 90% of the cumulative model gain were selected, with their contribution quantified as the mean information gain and the SHAP value (these metrics reflect the efficacy of each sRNA in distinguishing between depressed subjects and control subjects).

[0077] 12. Establishment of integrated sRNA assemblages To enhance the clinical applicability of the research results, the feature selection process was further refined to determine the minimum sRNA combination that maximizes discriminative performance for each diagnostic category. Specifically, a sequential forward selection strategy was implemented, progressively integrating sRNAs into the model classifier based on the top-ranked features, and reordering them based on the updated importance ranking. The selection process terminated when two consecutive DeLong tests showed no significant improvement in classification performance. Ultimately, multiple sRNA combinations containing 3, 5, and 10 features were constructed. Final validation was performed using independent external test sets to ensure robustness and provide diverse options for potential clinical applications.

[0078] 13. One-step stem-loop RT-qPCR detection of small RNA The reaction mixture for SYBR Green one-step stem-loop RT-qPCR consists of the following: 2 μL RNA template, 0.3 μL Super M-MLV reverse transcriptase, 0.6 μL hot-start HiTaq DNA polymerase, 6 μL 5× one-step RT-PCR buffer, 3 μL 10× Solution I, 0.25 μL dNTPs, and 0.3 μL RNase inhibitor. Additionally, it includes 1 μL SYBR Green dye, 0.5 μL stem-loop reverse transcription primers, 1 μL reverse amplification primers, and 1 μL forward amplification primers. The final volume is adjusted to 30 μL with nuclease-free water. Primers were synthesized by Universal Biotech, and all other reagents were purchased from Phytobio.

[0079] To further understand this application, the following detailed description is provided in conjunction with the preferred embodiments.

[0080] Example 1 In this study, circulating small RNAs (sRNAs) were deeply sequenced from 977 individuals to identify features relevant for the diagnosis of depression and the differentiation of its severity. A classification model was developed and validated using a machine learning-based prioritization framework, identifying a biomarker ensemble of 17 sRNAs. This ensemble was further validated in independent cohorts using quantitative reverse transcription polymerase chain reaction (qRT-PCR). The results of this study provide a scalable, non-invasive diagnostic framework, representing a potential breakthrough in precision psychiatry by enabling objective detection of depression disorders (MDD).

[0081] (1) A multicenter cohort containing 1270 plasma samples with clinical diagnostic records was analyzed, of which 977 samples underwent small RNA sequencing for diagnostic model development and external validation; another independent cohort was used for qRT-PCR-based validation.

[0082] (2) The MOMP framework includes: sRNA-seq data preprocessing; systematic evaluation of nine machine learning classifiers through parameter grid search, discovery of 10x cross-validation in the queue and external validation; and the use of a random forest model in the validation phase to balance diagnostic accuracy and clinical deployability, thereby selecting the best diagnostic model for MDD.

[0083] (3) Based on the comprehensive feature importance and differential expression analysis, sRNAs that can distinguish between MDD and control, MDD and BD and severity subgroups were screened out, and MDD-specific diagnostic combination was constructed.

[0084] (4) The candidate sRNAs in the combination were experimentally validated by qRT-PCR in an independent cohort, demonstrating their consistency with the sequencing results and their potential for clinical translation.

[0085] Identification of circulating small RNA characteristics in MDD: A total of 1270 plasma samples were collected from four hospitals. After quality control, 977 samples were used for small RNA sequencing, including 343 patients with MDD, 500 healthy controls, 36 patients with other mental illnesses, and 98 patients with BD. An additional 293 samples were used for independent validation of selected biomarkers via qRT-PCR. To reduce selection bias and improve generalization, the sequencing cohort was divided into a discovery cohort and an external validation cohort based on hospital location. Clinical and demographic information is summarized in Table 4.

[0086] High-throughput sRNA sequencing identified four major classes of sRNAs: microRNAs, ribosomal RNA-derived sRNAs, transfer RNA-derived sRNAs, and PIWI-interacting RNAs. The sequence length distribution of these sRNA subtypes showed subtle but consistent differences among MDD patients, non-depressed controls (including healthy individuals and other mental illness patients), and BD patients. miRNAs had the highest abundance, with a narrow peak at 21–23 nucleotides. rsRNAs were more widely distributed (18–28 nt), while tsRNAs and piRNAs spanned longer segments (28–35 nt). Compared to controls and BD, MDD samples showed lower miRNA abundance, a narrower length range, and shorter tsRNA fragments. After data preprocessing, 823 high-confidence circulating sRNAs were identified, including 367 miRNAs, 291 rsRNAs, 99 tsRNAs, and 66 piRNAs (Table 5). Principal coordinate analysis based on normalized expression values ​​revealed partially different sRNA expression patterns among MDD patients, non-depressed controls, and BD individuals. Despite some intergroup segregation, there is still a significant amount of overlap, highlighting the need for specific molecular markers to differentiate MDD.

[0087] Differential abundance analysis between MDD patients and non-depressed controls identified 447 significantly altered circulating sRNAs (54.3%), including 192 miRNAs, 168 rsRNAs, 52 tsRNAs, and 35 piRNAs (fold change >1.5 or <0.67; corrected P <0.05) (as shown in Table 7). Of these, 151 sRNAs were downregulated in MDD, and 96 were upregulated. In separate comparisons between MDD and BD, 603 differentially abundant sRNAs were detected, with 268 upregulated and 335 downregulated in MDD. Notably, significant downregulation of miRNAs was observed in both comparisons, highlighting a unique sRNA expression pattern that distinguishes MDD from BD and controls.

[0088] Table 7

[0089]

[0090]

[0091]

[0092]

[0093]

[0094] To investigate the relationship between sRNA expression and depression severity, Spearman correlation analysis was performed on HAMD-24 scores and sRNA levels in 342 MDD patients and 238 controls (screening criteria: P < 0.05; |r| > 0.2). The results are as follows: Figure 7 As shown.

[0095] Development of a diagnostic model for MDD based on circulating sRNA: To evaluate the diagnostic potential of circulating sRNA in MDD, a multi-objective model prioritization framework integrating nine machine learning algorithms was developed.

[0096] Using this framework, each of the four major sRNA classes demonstrated high diagnostic accuracy in internal 10-fold cross-validation. Integrating a single classifier for all 823 sRNAs further improved performance, with the best model achieving an AUC of 0.999 when distinguishing between MDD and non-depressed controls.

[0097] Independent validation of the MDD diagnostic model based on circulating sRNA: To assess the robustness of the sRNA-based diagnostic models, their performance was evaluated in an external validation cohort. The miRNA-based models showed strong accuracy, with AUC values ​​ranging from 0.851 to 0.943. Figure 2 a). The model optimized using MOMP (Multi-Objective Model Prioritization Framework) achieved an AUC of 0.943 (95% confidence interval: 0.913 to 0.967), with a sensitivity of 89.21% and a specificity of 82.57%. Figure 3 a). SHAP analysis confirmed that the key discriminative miRNA was downregulated in MDD, consistent with findings from the discovery phase. Figure 4 a).

[0098] rsRNA-based models also demonstrated high performance, with AUC values ​​ranging from 0.696 to 0.923. The MOMP model achieved an AUC of 0.923 (95% confidence interval: 0.887 to 0.950), with a sensitivity of 85.61% and a specificity of 82.57%. Figure 2 b、 Figure 3 b). The most informative rsRNAs were negatively correlated with MDD status, indicating that lower expression levels were associated with a higher probability of disease. Figure 4 b).

[0099] The tsRNA-based model demonstrated strong diagnostic utility; the MOMP model had an AUC of 0.955 (95% confidence interval: 0.933 to 0.971), a sensitivity of 88.49%, and a specificity of 84.40%. Figure 2 c. Figure 3c). SHAP analysis identified a set of high-influence tsRNAs that significantly contributed to model predictions. Figure 4 c).

[0100] For piRNA, its diagnostic performance was relatively moderate (AUC 0.870; 95% confidence interval: 0.825 to 0.909), with a sensitivity of 81.29% and a specificity of 79.25%. Figure 2 d、 Figure 3 d), although the top-ranked features are still predictive ( Figure 4 d).

[0101] The model integrating all four classes of sRNAs achieved the highest overall accuracy, with an AUC of 0.960 (95% confidence interval: 0.938 to 0.979), a sensitivity of 87.77%, and a specificity of 83.96%. Figure 2 e Figure 3 e). Among all models, rsRNA contributed the most to classification performance, both when used alone and within integrated combinations. Figure 4 e).

[0102] Circulating sRNA biomarkers can diagnose depression and differentiate its severity: To investigate the relationship between sRNA expression and depression severity, Spearman correlation analysis was performed on HAMD-24 scores and sRNA levels in 342 MDD patients and 238 controls (screening criteria: P < 0.05; |r| > 0.2). A significant proportion of sRNAs showed a significant association with symptom burden, including 147 miRNAs (40.1%), 54 rsRNAs (18.6%), 24 tsRNAs (24.2%), and 11 piRNAs (16.7%). Most of the top-ranked sRNAs were negatively correlated with severity, such as miR-151b (r = -0.26, P = 4.78 × 10⁻⁶). -9 ), rsRNA-#22 (r = -0.50), rsRNA-#146 (r = -0.42), rsRNA-#69 (r = -0.44), and piR-187496 (r = -0.20). Although fewer in number than differentially expressed sRNAs, many sRNAs associated with severity overlapped with those used to distinguish MDD from controls, suggesting a common biological basis. Figure 7 ).

[0103] To further evaluate its predictive value, resilient network regression combined with 10-fold cross-validation was applied. A strong linear correlation was observed between predicted and actual HAMD-24 scores across all sRNA categories, with the model integrating all sRNA types showing the best performance.

[0104] To assess severity stratification, patients were divided into severe and non-severe subgroups using the HAMD-24 score cutoff value of 20. The integrated model achieved AUC values ​​ranging from 0.696 to 0.768 across nine algorithms. Notably, many of these features also appeared in the diagnostic combination, indicating a common molecular basis for disease classification and severity assessment. These findings support the use of circulating sRNAs as biomarkers for MDD diagnosis and symptom stratification.

[0105] Random forest was selected as the optimal model, and it was used for model training and validation.

[0106] Figure 8 This describes the process of obtaining 17 sRNA penals. A random forest model was used for sequential forward selection on the integrated sRNA panel of the discovery cohort. Bar plots show the feature importance of integrated sRNAs in distinguishing between MDD and controls (a) and stratification of depression severity (HAMD-24>20 vs ≤20) (b) (obtained from average information and judged by SAP values). Line plots depict the cumulative AUC (right axis) after adding sRNAs one by one in each iteration. The left plot shows the performance of the top 10-, 5-, and 3-sRNA models for each classification task in the independent validation cohort.

[0107] Based on the selected top 10 sRNAs, the two sets were integrated into 17 penals, which were very effective in diagnosing MDD and differentiating between mild and severe cases.

[0108] Figure 9 The effects of the 17 sRNAs were validated in an external validation cohort and a PCR cohort, respectively.

[0109] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, or improvements made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A combination of biomarkers for assisting in the diagnosis and severity grading of depression, characterized in that, The biomarker combination includes one or more of the following: rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82; their sequences are shown in SEQ ID No. 1-SEQ ID No. 17, respectively.

2. The application of a primer combination in the preparation of reagents for the auxiliary diagnosis of depression and its severity classification, characterized in that, The biomarker combination includes one or more of the following: rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82; their sequences are shown in SEQ ID No. 1-SEQ ID No. 17, respectively.

3. The application according to claim 2, characterized in that, The primer set includes primers for detecting at least one sRNA marker among rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82. The primer set consists of reverse transcription primers and primer pairs, wherein: The nucleotide sequences of the reverse transcription primers used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 are shown in SEQ ID No. 18-SEQ ID No.

34. The primer pair used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 consists of an upstream primer and a downstream primer. The nucleotide sequences of the upstream primer are shown in SEQ ID No. 35-SEQ ID No. 51, and the nucleotide sequences of the downstream primer are shown in SEQ ID No. 52-SEQ ID No.

68.

4. The application of a biomarker combination in the preparation of a reagent kit for the auxiliary diagnosis and severity grading of depression, characterized in that, The kit is a real-time quantitative qRT-PCR kit, and the biomarker combination is the biomarker combination described in claim 1.

5. The application according to claim 4, characterized in that, The kit includes a primer set for assisting in the diagnosis of depression and its severity grading. The primer set comprises primers for detecting at least one sRNA marker among rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82. The primer set consists of reverse transcription primers and primer pairs, wherein: The nucleotide sequences of the reverse transcription primers used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 are shown in SEQ ID No. 18-SEQ ID No.

34. The primer pair used to detect rsRNA-#146, rsRNA-#69, rsRNA-#134, tsRNA-#98, rsRNA-#216, rsRNA-#22, rsRNA-#217, miR-151b, rsRNA-#267, miR-766-3p, piR-187496, miR-425-5p, rsRNA-#134, miR-128-3p, rsRNA-#182, miR-766-5p, and rsRNA-#82 consists of an upstream primer and a downstream primer. The nucleotide sequences of the upstream primer are shown in SEQ ID No. 35-SEQ ID No. 51, and the nucleotide sequences of the downstream primer are shown in SEQ ID No. 52-SEQ ID No.

68.

6. The application according to claim 5, characterized in that, The kit also includes a total sRNA extraction reagent set and an RNA reverse transcription reagent set.

7. The application according to claim 6, characterized in that, The total sRNA extraction reagent kit includes protein denaturation lysis buffer, chloroform, ethanol, and washing buffer.

8. The application according to claim 6, characterized in that, The RNA reverse transcription reagent kit includes reverse transcription primers, reverse transcription reaction buffer, and reverse transcriptase mixture.

9. The application according to claim 8, characterized in that, The kit also includes primers for amplifying the internal reference gene U6; the sequence of the primers for amplifying the internal reference gene U6 is as follows: Forward primer for internal reference gene U6: GCTTCGGCAGCACATATACTAAA; Reverse primer for internal reference gene U6: TTTGCGTGTCATCCTTGCG.

10. The application according to claim 9, characterized in that, The reaction conditions for the one-step real-time quantitative qRT-PCR method are as follows: 16℃ for 5 min; 50℃ for 15 min; 95℃ for 10 min; 95℃ for 15 s; 60℃ for 40 s; 50 cycles, with fluorescence signal detected and acquired at 60℃.