A data mining-based traditional Chinese medicine auxiliary diagnosis and treatment system and method for post-stroke depression

CN122245730APending Publication Date: 2026-06-19FIRST AFFILIATED HOSPITAL OF ANHUI UNIV OF CHINESE MEDICINE

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: FIRST AFFILIATED HOSPITAL OF ANHUI UNIV OF CHINESE MEDICINE
Filing Date: 2026-05-08
Publication Date: 2026-06-19

Application Information

Patent Timeline

08 May 2026

Application

19 Jun 2026

Publication

CN122245730A

IPC: G16H50/20; G16H50/70; G16H20/90; G16H20/10; G06N5/022; G06F18/10; G06F18/15; G06F18/23213; G06F18/25; G06F18/213; G06F18/2413; G06F18/214; G06F18/243; G06N5/01; G06N20/00

AI Tagging

Application Domain

Medical data mining Drug and medications

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122245730A_ABST

Patent Text Reader

Abstract

This invention provides a data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression, comprising a data acquisition module. This system utilizes a data acquisition module, a data preprocessing module, a data mining analysis module, a TCM diagnosis and treatment knowledge generation module, a human-computer interaction-assisted diagnosis and treatment module, and a dynamic knowledge base update module, all working in tandem. When providing TCM-assisted diagnosis and treatment for post-stroke depression, it can collect and standardize patient clinical data, perform deep mining through multi-dimensional combined algorithms, generate specific treatment rules and recommend treatment plans in real time, and continuously update the knowledge base based on clinical feedback. This creates a dedicated TCM-assisted diagnosis and treatment system for post-stroke depression, addressing the problems of single data mining methods and lagging knowledge base updates, and improving the standardization, accuracy, and iterative efficiency of diagnosis and treatment.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of medical information technology and artificial intelligence-assisted diagnosis and treatment in traditional Chinese medicine, and in particular to a data mining-based traditional Chinese medicine-assisted diagnosis and treatment system and method for post-stroke depression. Background Technology

[0002] Post-stroke depression is one of the most common mental complications following stroke. Clinical manifestations are mainly characterized by depressed mood, loss of interest, slowed thinking, and taciturnity, and may be accompanied by multi-systemic physical symptoms. It has been reported that more than one-third of stroke survivors suffer from post-stroke depression, which seriously affects their neurological function recovery, quality of life, and overall prognosis.

[0003] Traditional Chinese medicine (TCM) treatment of post-stroke depression has advantages such as multi-target regulation and fewer toxic side effects, and has accumulated rich clinical experience. Post-stroke depression falls under the combined category of "stroke" and "depression" in TCM, with the disease location in the brain. The basic pathogenesis is the disorder of Qi and blood leading to dysfunction of the brain's function of governing the mind. Currently, common clinical patterns include liver Qi stagnation, liver Qi stagnation and spleen deficiency, heart-kidney disharmony, Qi stagnation transforming into fire, and phlegm and blood stasis obstructing the collaterals.

[0004] However, existing technologies have the following shortcomings: Lack of a dedicated TCM-assisted diagnosis and treatment system for post-stroke depression: Most existing TCM-assisted diagnosis and treatment systems are general-purpose platforms, lacking a dedicated data mining model for this specific disease; Limited data mining methods: While existing research has explored establishing diagnostic models for specific syndrome types of post-stroke depression (such as liver stagnation and spleen deficiency) based on decision tree algorithms such as CART, CHAID, and QUEST, and has used the Apriori algorithm to mine medication patterns, most studies are limited to single algorithms and lack a framework for multi-dimensional and multi-method joint mining; Lagging updates to the diagnostic and treatment knowledge base: Existing systems lack adaptive learning capabilities based on clinical feedback, making it difficult to continuously optimize diagnostic and treatment rules with clinical practice.

[0005] Therefore, it is necessary to provide a data mining-based TCM-assisted diagnosis and treatment system and method for post-stroke depression to solve the above-mentioned technical problems. Summary of the Invention

[0006] This invention provides a TCM-assisted diagnosis and treatment system and method for post-stroke depression based on data mining, which solves the problems of the lack of a dedicated TCM-assisted diagnosis and treatment system for post-stroke depression, the single data mining method, and the lagging updates of the diagnosis and treatment knowledge base in the existing technology.

[0007] To address the aforementioned technical problems, this invention provides a data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression, comprising:

[0008] The data acquisition module is used to acquire data on stroke type and disease stage, four diagnostic methods, TCM syndrome classification, treatment prescriptions and efficacy evaluation indicators of post-stroke depression patients, forming a standardized clinical dataset.

[0009] The data preprocessing module is used to handle missing values, remove outliers, standardize data, and align traditional Chinese medicine terms in the collected data to build a database of medical records of post-stroke depression.

[0010] The data mining and analysis module includes a frequency statistics analysis submodule, an association rule mining submodule, a clustering analysis submodule, a decision tree modeling submodule, and a machine learning prediction submodule. The data mining and analysis module is configured to perform multidimensional data mining on the medical record database to generate frequency statistics results, association rules, clustering core prescriptions, diagnostic decision tree models, and efficacy prediction models.

[0011] The TCM diagnosis and treatment knowledge generation module is used to integrate the output results of the data mining and analysis module to generate a TCM auxiliary diagnosis and treatment rule library for different syndromes of post-stroke depression.

[0012] The human-computer interaction-assisted diagnosis and treatment module automatically matches and recommends syndrome differentiation, Chinese medicine prescriptions, and reference prognostic information based on the current patient's four diagnostic methods, and receives feedback from clinicians on adoption or modification.

[0013] The dynamic knowledge base update module is used to collect the feedback information and feed it back to the TCM diagnosis and treatment knowledge generation module to achieve adaptive updates of the knowledge base.

[0014] Preferably, the association rule mining submodule uses the Apriori algorithm and the improved mutual information method to mine the association rules of Chinese medicine compatibility according to syndrome type, calculates the support, confidence and lift, and generates the core drug pairs and corner drug information under each syndrome type.

[0015] Preferably, the clustering analysis submodule uses K-means clustering and hierarchical clustering algorithms to perform clustering analysis on high-frequency drugs, and determines the optimal number of clusters by using the elbow method to form several core prescription clusters, so as to identify the regular treatment strategies for various types of post-stroke depression.

[0016] Preferably, the decision tree modeling submodule uses the CART algorithm and the C5.0 algorithm to construct a decision tree diagnostic model with the four diagnostic methods as input feature variables and the TCM syndrome classification as the target variable. The model performance is evaluated by the accuracy and AUC index, and the optimal model is selected and deployed to the human-computer interaction assisted diagnosis and treatment module.

[0017] Preferably, the machine learning prediction submodule uses the random forest algorithm to construct an efficacy prediction model with patient baseline information and intervention measures as input features and HAMD score improvement rate as the prediction target variable. The number of decision trees is set to 100, and the maximum number of features is the square root of the number of input features.

[0018] Preferably, the TCM auxiliary diagnosis and treatment rule base includes syndrome-symptom mapping information, syndrome-treatment mapping information, syndrome-core prescription compatibility information, medication dosage reference information, and efficacy prediction reference information.

[0019] Preferably, the dynamic knowledge base update module is configured to periodically collect feedback data adopted or modified by clinicians, review and confirm the feedback data, and inject the confirmed new rules into the TCM diagnosis and treatment knowledge generation module through a reasoning mechanism.

[0020] To address the aforementioned problems, this invention also provides a data mining-based TCM-assisted diagnosis and treatment method for post-stroke depression, applied to any of the aforementioned data mining-based TCM-assisted diagnosis and treatment systems for post-stroke depression, characterized by comprising the following steps:

[0021] S1. Collect clinical data of patients with post-stroke depression and preprocess them to construct a medical record database of post-stroke depression.

[0022] S2. The medical record database is analyzed from multiple dimensions using five data mining methods: frequency statistical analysis, association rule mining, cluster analysis, decision tree modeling, and random forest prediction. This generates frequency statistical results, association rules, clustered core prescriptions, diagnostic decision tree models, and efficacy prediction models.

[0023] S3. Integrate the data mining and analysis results to construct a TCM auxiliary diagnosis and treatment rule library for different syndrome types of post-stroke depression;

[0024] S4. Receive the current patient's four diagnostic information, automatically match and recommend syndrome types through the decision tree diagnostic model, provide prognostic judgment through the efficacy prediction model, and generate recommended prescriptions and compatibility analysis through association rules and clustered prescriptions.

[0025] S5. Collect feedback from clinicians on the adoption or modification of the recommendations, and feed back the confirmed new rules to the TCM auxiliary diagnosis and treatment rule base to achieve dynamic updates of the knowledge base.

[0026] Preferably, in step S2, the association rule mining includes: setting the minimum support to 0.1, the minimum confidence to 0.6, and the minimum lift to 1.2, using the Apriori algorithm to perform hierarchical association rule mining, and outputting strong association rules under each certificate type.

[0027] Preferably, in step S2, the decision tree modeling uses the CART algorithm and the C5.0 algorithm to construct a diagnostic model, with the four diagnostic methods as input and the TCM syndrome classification as output. The model is evaluated by accuracy, recall and AUC indicators, and the optimal diagnostic model is selected for deployment and application. The efficacy prediction uses the random forest algorithm, with the number of decision trees set to 100 and the maximum number of features taken as the square root of the number of input features. The HAMD score improvement rate is used as the prediction target to output the prognostic evaluation result.

[0028] Compared with related technologies, the TCM-assisted diagnosis and treatment system and method for post-stroke depression based on data mining provided by this invention has the following beneficial effects:

[0029] This invention provides a TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining. Through the coordinated structure of a data acquisition module, a data preprocessing module, a data mining and analysis module, a TCM diagnosis and treatment knowledge generation module, a human-computer interaction-assisted diagnosis and treatment module, and a dynamic knowledge base update module, this system can collect and standardize patient clinical data for TCM-assisted diagnosis and treatment of post-stroke depression. It performs in-depth mining through multi-dimensional joint algorithms to generate specific diagnosis and treatment rules and recommend treatment plans in real time. Simultaneously, it continuously updates the knowledge base based on clinical feedback, thereby creating a dedicated TCM-assisted diagnosis and treatment system for post-stroke depression. This system addresses the problems of single data mining methods and lagging knowledge base updates, improving the standardization, accuracy, and iterative efficiency of diagnosis and treatment. Attached Figure Description

[0030] Figure 1 This is a flowchart illustrating a preferred embodiment of a data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression provided by the present invention.

[0031] Figure 2 The structural block diagram of the data mining and analysis module provided by this invention;

[0032] Figure 3 This is a structural block diagram of the TCM auxiliary material preparation rule library provided by the present invention;

[0033] Figure 4 This is a flowchart illustrating a preferred embodiment of a traditional Chinese medicine-assisted diagnosis and treatment method for post-stroke depression based on data mining, provided by the present invention. Detailed Implementation

[0034] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0035] Please refer to the following: Figure 1 , Figure 2 , Figure 3 ,in, Figure 1 This is a flowchart illustrating a preferred embodiment of a data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression provided by the present invention. Figure 2 The structural block diagram of the data mining and analysis module provided by this invention; Figure 3 The structural block diagram of the TCM auxiliary material rule library provided by the present invention.

[0036] A data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression includes: a data acquisition module, used to acquire data on stroke type and disease stage, four diagnostic methods, TCM syndrome classification, treatment prescriptions and efficacy evaluation indicators of post-stroke depression patients, forming a standardized clinical dataset;

[0037] The data preprocessing module is used to handle missing values, remove outliers, standardize data, and align traditional Chinese medicine terms in the collected data to build a database of medical records of post-stroke depression.

[0038] The data mining and analysis module includes a frequency statistics analysis submodule, an association rule mining submodule, a clustering analysis submodule, a decision tree modeling submodule, and a machine learning prediction submodule. The data mining and analysis module is configured to perform multidimensional data mining on the medical record database to generate frequency statistics results, association rules, clustering core prescriptions, diagnostic decision tree models, and efficacy prediction models.

[0039] The TCM diagnosis and treatment knowledge generation module is used to integrate the output results of the data mining and analysis module to generate a TCM auxiliary diagnosis and treatment rule library for different syndromes of post-stroke depression.

[0040] The human-computer interaction-assisted diagnosis and treatment module automatically matches and recommends syndrome differentiation, Chinese medicine prescriptions, and reference prognostic information based on the current patient's four diagnostic methods, and receives feedback from clinicians on adoption or modification.

[0041] The dynamic knowledge base update module is used to collect the feedback information and feed it back to the TCM diagnosis and treatment knowledge generation module to achieve adaptive updates of the knowledge base.

[0042] In the specific implementation of the data acquisition module, a structured electronic case report form (e-CRF) was used for multi-source data collection. Stroke type was recorded according to the TOAST classification and the Oxfordshire Community Stroke Project classification. The course of the disease was divided into acute phase (≤14 days), recovery phase (15 days to 6 months), and sequelae phase (>6 months). The four diagnostic methods included inspection (appearance and morphology, tongue appearance), auscultation (sound, odor), inquiry (21 core symptoms such as chief complaint, bowel movements, sleep, and mood), and palpation (pulse). The TCM syndrome classification referred to common syndromes in the "Expert Consensus on TCM Diagnosis and Treatment of Post-Stroke Depression": liver qi stagnation type, liver qi stagnation and spleen deficiency type, heart-kidney disharmony type, qi stagnation transforming into fire type, and phlegm and blood stasis obstructing the collaterals type. Treatment prescriptions recorded the name, dosage, and usage of Chinese herbal medicines. Efficacy evaluation indicators included the Hamilton Depression Rating Scale (HAMD-24), the National Institutes of Health Stroke Scale (NIHSS), and TCM syndrome scores. The collected data was stored uniformly in JSON format, and its integrity was ensured through validation rules.

[0043] The data preprocessing module performs the following operations: Missing values are handled using multiple imputation, removing samples with a missing rate exceeding 30% for key variables (such as HAMD scores), while missing values for other variables are imputed using K-nearest neighbor imputation (K=5); outlier removal is based on interquartile range (IQR) identification and processing; data standardization includes unifying the names of traditional Chinese medicines to the standardized names in the 2020 edition of the Chinese Pharmacopoeia, and mapping symptom terms to the Traditional Chinese Medicine Clinical Terminology Set (TCMLS); TCM terminology alignment uses a BERT-based semantic similarity model to normalize similar symptoms described by different physicians (such as "poor appetite," "lack of appetite," and "loss of appetite"). The preprocessed structured data is stored in a MySQL database to construct a post-stroke depression medical record database. Each record includes the patient's anonymous ID, demographic information, clinical characteristics, syndrome type, prescription, and follow-up efficacy.

[0044] The human-computer interaction-assisted diagnosis module is implemented as a web-based clinical decision support interface. Doctors input the patient's four diagnostic methods (symptoms, tongue and pulse information, or natural language input followed by NLP analysis). The system automatically invokes a decision tree diagnostic model, outputting the most probable syndrome type and its confidence level. Simultaneously, based on association rules and clustered prescriptions in the knowledge base, it generates recommended traditional Chinese medicine prescriptions and displays compatibility analysis (e.g., "Bupleurum and White Peony are the core herb pair for liver qi stagnation syndrome, working together to soothe and soften the liver"). A prognostic assessment is provided through a random forest prediction model ("The probability of HAMD improvement exceeding 50% after 8 weeks of treatment is 72%"). "Accept" and "Modify" buttons are located at the bottom of the interface, allowing doctors to edit the syndrome type and prescription, and provide reasons for modification. All interaction logs (patient ID, input information, recommendation results, doctor feedback, final prescription) are recorded in a feedback database and periodically uploaded to the dynamic knowledge base update module.

[0045] The association rule mining submodule uses the Apriori algorithm and the improved mutual information method to mine the association rules of Chinese medicine compatibility in a hierarchical manner according to syndrome type, calculates support, confidence and lift, and generates core drug pairs and corner drug information under each syndrome type.

[0046] For each syndrome type (e.g., liver qi stagnation), the corresponding Chinese herbal prescriptions are extracted from the medical record database to construct a transactional dataset. The minimum support is set to 0.1, the minimum confidence to 0.6, and the minimum lift to 1.2. The Apriori algorithm is used to generate frequent itemsets and filter for strong association rules. An improved mutual information method is also introduced to calculate the mutual information value between two Chinese herbal medicines (I=log(P(AB) / (P(A)P(B)))))) as a supplementary measure of association strength. The output includes the top 10 herb pairs with the highest support for each syndrome type (e.g., Bupleurum-White Peony, Turmeric-Albizia Bark, etc.), and corner herbs consisting of three herbs (e.g., Bupleurum-White Peony-Angelica). The output is presented in tabular and network diagram formats for subsequent knowledge generation modules.

[0047] The clustering analysis submodule uses K-means clustering and hierarchical clustering algorithms to perform clustering analysis on high-frequency drugs, and determines the optimal number of clusters through the elbow method to form several core prescription clusters in order to identify regular treatment strategies for various types of post-stroke depression.

[0048] First, select traditional Chinese medicines (TCMs) with a usage frequency exceeding 10% within each syndrome type, and construct a drug-prescription matrix (rows: prescriptions, columns: drugs, values: 0 / 1 indicating whether used). K-means clustering (distance metric: Jaccard distance) and hierarchical clustering (inter-cluster distance: Ward's method, distance metric: Euclidean distance) are used respectively. The sum of squared errors (SSE) within clusters under different K values is calculated using the elbow method, and the inflection point where the SSE decreases gradually is selected as the optimal number of clusters (usually K=3~6). For each cluster, the top 8 most frequent TCMs in the prescriptions within the cluster are extracted as the core prescriptions for that cluster. Simultaneously, the silhouette coefficient of each cluster is calculated to evaluate the clustering quality (requirement >0.25). Output the core prescription clusters for each syndrome type; for example, for liver stagnation and spleen deficiency syndrome, the clusters might include "Xiaoyao San (modified)" and "Chaihu Shugan San (modified)". These core prescriptions are used to assist in prescription recommendations in the diagnostic module.

[0049] The decision tree modeling submodule uses the CART algorithm and the C5.0 algorithm to construct a decision tree diagnostic model with the four diagnostic methods as input feature variables and the TCM syndrome classification as the target variable. The model performance is evaluated by the accuracy and AUC index, and the optimal model is selected and deployed to the human-computer interaction assisted diagnosis and treatment module.

[0050] Samples with clearly labeled syndrome types (sample size ≥ 500) were extracted from the medical record database and divided into training and test sets in a 7:3 ratio. Input features included symptom variables (e.g., chest tightness, frequent sighing, bitter taste in the mouth, insomnia, etc.) after one-hot encoding, tongue appearance variables (red tongue body, yellow and greasy tongue coating, etc.), and pulse appearance variables (wiry pulse, thin pulse, etc.), totaling approximately 60-80 features. The CART algorithm used the Gini coefficient as the splitting criterion, setting the minimum number of leaf node samples to 5 and the maximum depth to 10, and performed post-pruning (cost and complexity pruning). The C5.0 algorithm used the information gain ratio, setting the boosting iteration count to 10 and enabling global pruning. Model evaluation metrics included accuracy, recall, precision, F1 score, and macro-average AUC. The performance of the two algorithms on the test set was compared, and the decision tree with a higher AUC and a suitable model size was selected (e.g., CART generates a simpler tree structure, so CART is preferred). The final output is an interpretable set of decision tree rules, such as "If a patient experiences low mood + chest tightness + frequent sighing + wiry pulse - liver qi stagnation syndrome". This set of rules is stored in the knowledge base in IF-THEN format. At the same time, the decision tree model is serialized and deployed to the human-computer interaction module.

[0051] The machine learning prediction submodule uses the random forest algorithm to construct an efficacy prediction model with patient baseline information and intervention measures as input features and HAMD score improvement rate as the prediction target variable. The number of decision trees is set to 100, and the maximum number of features is the square root of the number of input features.

[0052] Samples with complete baseline information (age, gender, stroke type, disease stage, total HAMD score at enrollment), intervention measures (traditional Chinese medicine prescription, whether combined with Western medicine, whether combined with acupuncture), and post-treatment HAMD scores (after 4 weeks and 8 weeks of treatment) were selected from the medical record database. The prediction target was defined as HAMD score improvement rate = (total score before treatment - total score after treatment) / total score before treatment × 100%, and an improvement rate ≥ 50% was defined as "effective," and < 50% was defined as "ineffective." There were 15-20 input features, including: age (continuous), gender (binary), stroke type (multi-class), disease stage (ordered), total pre-treatment HAMD score (continuous), whether antidepressant Western medicine was used (binary), and core traditional Chinese medicine prescription features (5 principal components after PCA dimensionality reduction). The random forest algorithm was set with 100 decision trees, a maximum number of features of sqrt(number of input features), a minimum number of leaf node samples of 5, and a maximum depth of 20. A five-fold cross-validation model was used to evaluate efficacy, with accuracy, sensitivity, specificity, and AUC as the evaluation metrics. The importance of output features was ranked to identify the factors with the greatest impact on treatment efficacy (such as disease stage and core traditional Chinese medicine). After deployment, the model outputs a "predicted effective probability" (e.g., 78%) for newly input patients, assisting clinicians in assessing prognosis and adjusting treatment plans.

[0053] The TCM auxiliary diagnosis and treatment rule base includes information on syndrome-symptom mapping, syndrome-treatment mapping, syndrome-core prescription compatibility, medication dosage reference, and efficacy prediction reference.

[0054] The TCM auxiliary diagnosis and treatment rule base is stored in a graph database (Neo4j), containing the following five types of nodes: syndrome nodes, symptom nodes, treatment method nodes, Chinese herbal medicine nodes, and efficacy prediction nodes. Relationship types include: syndrome-symptom mapping (HAS_SYMPTOM), syndrome-treatment method mapping (TREAT_WITH), syndrome-core prescription compatibility information (PRESCRIBE, attributes include drug pairs, corner drugs, and core prescription clusters), medication dosage reference information (DOSAGE_REF, including commonly used dosage ranges, such as Bupleurum chinense 6-12g), and efficacy prediction reference information (PROGNOSIS, storing a table of predicted effectiveness probabilities under different baseline feature combinations). The knowledge generation module integrates the output results of the data mining and analysis module (frequency statistics, association rules, clustered prescriptions, decision tree rules, random forest feature importance), transforms them into nodes and relationships in the graph database through an ETL process, and provides a RESTful API for other modules to call.

[0055] The dynamic knowledge base update module is configured to periodically collect feedback data adopted or modified by clinicians, review and confirm the feedback data, and inject the confirmed new rules into the TCM diagnosis and treatment knowledge generation module through a reasoning mechanism.

[0056] An automatic update task is set up every two weeks. First, new feedback records are extracted from the feedback database, and records that have been "modified" and whose modified prescriptions have been used for treatment with effective efficacy evaluations are selected. Two TCM brain disease experts with associate senior professional titles or above review the modified rules, with review criteria including: whether the rule conflicts with TCM theory and whether there are duplicate validation samples. Rules that pass the review are added to a temporary knowledge base. Then, the Incremental Association Rule Mining Algorithm (IncApriori) is used to merge the new rules with the original rules, recalculating support and confidence. New rules with significantly improved lift (Δlift > 0.2) are directly injected into the Neo4j knowledge base; rules that conflict with existing rules are marked as pending arbitration and discussed and decided by an expert panel. After the update is complete, the system automatically triggers a rule cache refresh in the human-computer interaction module to ensure that doctors have access to the latest knowledge in real time.

[0057] The working principle of the data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression provided by this invention is as follows:

[0058] The system first acquires raw diagnostic and treatment data of post-stroke depression patients from clinical electronic medical records or research databases through a data acquisition module. This data includes stroke information, four diagnostic methods (inspection, auscultation and olfaction, inquiry, and palpation), TCM syndrome differentiation, treatment prescriptions, and efficacy evaluation indicators. The data preprocessing module cleans, normalizes, and aligns the terminology of the raw data to construct a high-quality medical record database. Subsequently, the data mining and analysis module runs five mining algorithms in parallel: frequency statistics analysis reveals high-frequency drugs and symptoms; association rule mining generates core drug pairs and corner drugs for each syndrome; cluster analysis forms core prescription clusters; decision tree modeling generates interpretable syndrome diagnosis models; and random forest constructs efficacy prediction models. These mining results are fed into a TCM diagnostic and treatment knowledge generation module, where they are integrated to form a structured TCM auxiliary diagnostic and treatment rule base.

[0059] In the application phase, clinicians input the patient's four diagnostic methods (inspection, auscultation and olfaction, palpation, and olfaction) information through the human-computer interaction-assisted diagnosis module. The system automatically calls the decision tree diagnostic model for syndrome differentiation matching, uses association rules and clustering prescription results to generate recommended prescriptions, and calls the efficacy prediction model to output prognostic assessments. Doctors can adopt or modify the recommended treatment plan, and the feedback information is collected by the dynamic knowledge base update module. The update module regularly reviews and incrementally mines the feedback data, feeding back confirmed effective new rules into the knowledge base, achieving closed-loop adaptive optimization of the system. This entire process enables the system to learn from historical data and continuously evolve in clinical use, providing continuously optimized intelligent assistance for the diagnosis and treatment of post-stroke depression using traditional Chinese medicine.

[0060] Compared with related technologies, the TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining provided by this invention has the following beneficial effects:

[0061] By integrating modules such as data acquisition, data preprocessing, data mining and analysis, TCM diagnosis and treatment knowledge generation, human-computer interaction-assisted diagnosis and treatment, and dynamic knowledge base update, this system can collect and standardize patient clinical data for TCM-assisted diagnosis and treatment of post-stroke depression. Through multi-dimensional combined algorithms, it performs in-depth data mining to generate specialized diagnosis and treatment rules and recommend treatment plans in real time. Simultaneously, it continuously updates the knowledge base based on clinical feedback, thus creating a dedicated TCM-assisted diagnosis and treatment system for post-stroke depression. This addresses the issues of limited data mining methods and lagging knowledge base updates, improving the standardization, accuracy, and iteration efficiency of diagnosis and treatment.

[0062] Please see the appendix Figure 4 To address the aforementioned problems, this invention also provides a data mining-based TCM-assisted diagnosis and treatment method for post-stroke depression, applied to any of the aforementioned data mining-based TCM-assisted diagnosis and treatment systems for post-stroke depression, characterized by comprising the following steps:

[0063] S1. Collect clinical data of patients with post-stroke depression and preprocess them to construct a medical record database of post-stroke depression.

[0064] S2. The medical record database is analyzed from multiple dimensions using five data mining methods: frequency statistical analysis, association rule mining, cluster analysis, decision tree modeling, and random forest prediction. This generates frequency statistical results, association rules, clustered core prescriptions, diagnostic decision tree models, and efficacy prediction models.

[0065] S3. Integrate the data mining and analysis results to construct a TCM auxiliary diagnosis and treatment rule library for different syndrome types of post-stroke depression;

[0066] S4. Receive the current patient's four diagnostic information, automatically match and recommend syndrome types through the decision tree diagnostic model, provide prognostic judgment through the efficacy prediction model, and generate recommended prescriptions and compatibility analysis through association rules and clustered prescriptions.

[0067] S5. Collect feedback from clinicians on the adoption or modification of the recommendations, and feed back the confirmed new rules to the TCM auxiliary diagnosis and treatment rule base to achieve dynamic updates of the knowledge base.

[0068] In step S2, the association rule mining includes: setting the minimum support to 0.1, the minimum confidence to 0.6, and the minimum lift to 1.2, using the Apriori algorithm to perform hierarchical association rule mining, and outputting strong association rules under each certificate type.

[0069] In step S2, the decision tree modeling uses the CART algorithm and the C5.0 algorithm to construct a diagnostic model. The four diagnostic methods are used as input and the TCM syndrome classification is used as output. The model is evaluated by accuracy, recall and AUC. The optimal diagnostic model is selected for deployment and application. The efficacy prediction uses the random forest algorithm. The number of decision trees is set to 100 and the maximum number of features is the square root of the number of input features. The HAMD score improvement rate is used as the prediction target to output the prognostic evaluation result.

[0070] The above description is merely an embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A traditional Chinese medicine auxiliary diagnosis and treatment system for post-stroke depression based on data mining, characterized in that, include: The data acquisition module is used to acquire data on stroke type and disease stage, four diagnostic methods, TCM syndrome classification, treatment prescriptions and efficacy evaluation indicators of post-stroke depression patients, forming a standardized clinical dataset. The data preprocessing module is used to handle missing values, remove outliers, standardize data, and align traditional Chinese medicine terms in the collected data to build a database of medical records of post-stroke depression. The data mining and analysis module includes a frequency statistics analysis submodule, an association rule mining submodule, a clustering analysis submodule, a decision tree modeling submodule, and a machine learning prediction submodule. The data mining and analysis module is configured to perform multidimensional data mining on the medical record database to generate frequency statistics results, association rules, clustering core prescriptions, diagnostic decision tree models, and efficacy prediction models. The TCM diagnosis and treatment knowledge generation module is used to integrate the output results of the data mining and analysis module to generate a TCM auxiliary diagnosis and treatment rule library for different syndromes of post-stroke depression. The human-computer interaction-assisted diagnosis and treatment module automatically matches and recommends syndrome differentiation, Chinese medicine prescriptions, and reference prognostic information based on the current patient's four diagnostic methods, and receives feedback from clinicians on adoption or modification. The dynamic knowledge base update module is used to collect the feedback information and feed it back to the TCM diagnosis and treatment knowledge generation module to achieve adaptive updates of the knowledge base.

2. The TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining as described in claim 1, characterized in that, The association rule mining submodule uses the Apriori algorithm and the improved mutual information method to mine the association rules of Chinese medicine compatibility in a hierarchical manner according to syndrome type, calculates support, confidence and lift, and generates core drug pairs and corner drug information under each syndrome type.

3. The TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining as described in claim 1, characterized in that, The clustering analysis submodule uses K-means clustering and hierarchical clustering algorithms to perform clustering analysis on high-frequency drugs, and determines the optimal number of clusters through the elbow method to form several core prescription clusters in order to identify regular treatment strategies for various types of post-stroke depression.

4. The TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining as described in claim 1, characterized in that, The decision tree modeling submodule uses the CART algorithm and the C5.0 algorithm to construct a decision tree diagnostic model with the four diagnostic methods as input feature variables and the TCM syndrome classification as the target variable. The model performance is evaluated by the accuracy and AUC index, and the optimal model is selected and deployed to the human-computer interaction assisted diagnosis and treatment module.

5. A TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining as described in claim 1, characterized in that, The machine learning prediction submodule uses the random forest algorithm to construct an efficacy prediction model with patient baseline information and intervention measures as input features and HAMD score improvement rate as the prediction target variable. The number of decision trees is set to 100, and the maximum number of features is the square root of the number of input features.

6. The TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining as described in claim 1, characterized in that, The TCM auxiliary diagnosis and treatment rule base includes information on syndrome-symptom mapping, syndrome-treatment mapping, syndrome-core prescription compatibility, medication dosage reference, and efficacy prediction reference.

7. A TCM-assisted diagnosis and treatment system for post-stroke depression based on data mining as described in claim 1, characterized in that, The dynamic knowledge base update module is configured to periodically collect feedback data adopted or modified by clinicians, review and confirm the feedback data, and inject the confirmed new rules into the TCM diagnosis and treatment knowledge generation module through a reasoning mechanism.

8. A data mining-based TCM-assisted diagnosis and treatment method for post-stroke depression, applied to the data mining-based TCM-assisted diagnosis and treatment system for post-stroke depression as described in any one of claims 1-7, characterized in that, Includes the following steps: S1. Collect clinical data of patients with post-stroke depression and preprocess them to construct a medical record database of post-stroke depression. S2. The medical record database is analyzed from multiple dimensions using five data mining methods: frequency statistical analysis, association rule mining, cluster analysis, decision tree modeling, and random forest prediction. This generates frequency statistical results, association rules, clustered core prescriptions, diagnostic decision tree models, and efficacy prediction models. S3. Integrate the data mining and analysis results to construct a TCM auxiliary diagnosis and treatment rule library for different syndrome types of post-stroke depression; S4. Receive the current patient's four diagnostic information, automatically match and recommend syndrome types through the decision tree diagnostic model, provide prognostic judgment through the efficacy prediction model, and generate recommended prescriptions and compatibility analysis through association rules and clustered prescriptions. S5. Collect feedback from clinicians on the adoption or modification of the recommendations, and feed back the confirmed new rules to the TCM auxiliary diagnosis and treatment rule base to achieve dynamic updates of the knowledge base.

9. A method for TCM-assisted diagnosis and treatment of post-stroke depression based on data mining as described in claim 8, characterized in that, In step S2, the association rule mining includes: setting the minimum support to 0.1, the minimum confidence to 0.6, and the minimum lift to 1.2, using the Apriori algorithm to perform hierarchical association rule mining, and outputting strong association rules under each certificate type.

10. A method for TCM-assisted diagnosis and treatment of post-stroke depression based on data mining as described in claim 8, characterized in that, In step S2, the decision tree modeling uses the CART algorithm and the C5.0 algorithm to construct a diagnostic model. The four diagnostic methods are used as input and the TCM syndrome classification is used as output. The model is evaluated by accuracy, recall and AUC. The optimal diagnostic model is selected for deployment and application. The efficacy prediction uses the random forest algorithm. The number of decision trees is set to 100 and the maximum number of features is the square root of the number of input features. The HAMD score improvement rate is used as the prediction target to output the prognostic evaluation result.