Data integration method for external control of traditional Chinese medicine clinical trials
By integrating external control data of traditional Chinese medicine using propensity score analysis and a Bayesian power prior framework, the difficulties in recruiting control groups for both traditional Chinese and Western medicine in randomized controlled trials (RCTs) and the challenges in data integration were solved. This approach achieved efficient and scientific data integration and bias control, thereby improving the research efficiency and evidence generation capabilities of clinical trials of traditional Chinese medicine.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 北京市中医药研究所
- Filing Date
- 2026-03-04
- Publication Date
- 2026-06-12
AI Technical Summary
RCTs in the field of traditional Chinese medicine face difficulties in recruiting patients for the Western medicine control group. Existing methods cannot effectively integrate heterogeneous datasets, leading to unreliable efficacy estimates. Furthermore, the lack of a scientific data integration framework makes it impossible to control for biases caused by population differences and unmeasured confounding factors.
Using propensity score and Bayesian power prior framework, we constructed a method for integrating external control data of traditional Chinese medicine by obtaining common key baseline covariates, matching external data with RCT trial groups, dynamically adjusting the weight of external data, using likelihood function for data fusion, and controlling bias through robustness analysis.
It enables the use of real-world data resources while controlling the risk of bias, improving research efficiency, providing a scientific external controlled trial research paradigm, applicable to research scenarios where patient recruitment is difficult, and improving the efficiency of generating and translating clinical evidence in traditional Chinese medicine.
Smart Images

Figure CN122201818A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to data processing methods, and more particularly to a data integration method for external controls in traditional Chinese medicine clinical trials. Background Technology
[0002] RCTs (Randomized controlled trials) are widely recognized as the gold standard for evaluating the efficacy of interventions because they can balance known and unknown confounding factors across groups through randomization. However, patients choosing traditional Chinese medicine (TCM) treatment often exhibit selective preferences for TCM therapies, leading to difficulties in recruiting and enrolling Western medicine control group patients in traditional TCM RCTs. ECT (External controlled trial), a study design that compares patients receiving treatment in the experimental group with patients outside the trial, expands the control group sample size by incorporating external data, providing a new approach to solving these problems. In recent years, with the rise of RWS (Real-World Studies) and the construction of medical big data platforms, high-quality data resources available for external controls in TCM have become increasingly abundant, providing a data foundation for the application of ECT.
[0003] While ECT offers potential advantages, its core methodological challenge lies in effectively integrating heterogeneous datasets from different sources with varying design features and data quality. External control groups typically differ from the current experimental group in terms of population characteristics, treatment criteria, and data collection. If these differences are not adequately corrected, they can directly introduce bias, leading to unreliable efficacy estimates and even erroneous conclusions.
[0004] Currently, traditional Chinese medicine (TCM) ECT (Emergency Treatment Electron) lacks a mature and widely accepted methodological framework. Existing methods mostly employ traditional multivariate regression analysis, failing to consider the "similarity" between external data and current experimental data and its impact on the degree of information borrowing, and also failing to adequately handle unknown or unmeasured confounding factors. Furthermore, the evaluation of TCM efficacy involves multidimensional and complex variables such as syndrome scores and constitution patterns, further increasing the difficulty of data integration. This includes: at the data assessment level, a lack of a framework for evaluating "commutativity"; at the data integration level, simple statistical methods cannot dynamically adjust the degree of borrowing from external data based on data similarity; and at the bias control level, it is difficult to quantify and correct the impact of unmeasured confounding factors. Therefore, how to scientifically assess the applicability of external data, how to quantitatively balance key prognostic variables, how to dynamically adjust the degree of borrowing from external information based on data similarity, and how to assess the potential impact of unmeasured confounding factors are the key technical problems that need to be solved in constructing a TCM ECT data integration method. Summary of the Invention
[0005] To address the technical problems existing in the prior art, the present invention aims to provide a data integration method for external controls in clinical trials of traditional Chinese medicine, which can improve research efficiency by making full use of existing real-world data resources while minimizing the risk of bias caused by population differences and unmeasured confounding.
[0006] To achieve the above-mentioned objectives, this invention provides a data integration method for external controls in traditional Chinese medicine clinical trials, comprising the following steps:
[0007] S1. Acquire external data and target experimental data, wherein the target experimental data includes RCT experimental group data and RCT internal control group data; and select multiple common key baseline covariates from the external data and RCT experimental group data;
[0008] S2. When the preset conditions for exchange are met, a subset of variables is selected from the multiple key baseline covariates to obtain multiple key prognostic variables; and based on the key prognostic variables, the propensity scores of the external control group and the RCT trial group are calculated respectively.
[0009] S3. Within a preset propensity score difference threshold, select the external sample from the external data that is closest to the propensity score of the RCT experimental group; calculate the standardized mean difference between the matched external sample and the RCT experimental group on each key prognostic variable; and after the standardized mean difference meets the preset equilibrium requirement, integrate the matched external sample data as external control group data with the RCT internal control group data to form a mixed control group.
[0010] S4. Based on the Bayesian power prior framework, the power parameter is determined by the distribution similarity between the mixed control group and the RCT experimental group; the likelihood function of the external control group data is dynamically discounted using the power parameter; finally, based on the discounted external control group data and the target experimental data, the estimated value of the treatment effect between the RCT experimental group and the mixed control group is calculated.
[0011] S5. Evaluate the robustness of the estimated treatment effect;
[0012] If the robustness meets the preset requirements, the data integration is completed and the estimated value of the therapeutic effect is output.
[0013] If the robustness does not meet the preset requirements, then adjust at least one of the preset commutativity condition, the preset propensity score difference threshold, or the power parameter.
[0014] According to one technical solution of the present invention, step S1 specifically includes:
[0015] Both external data and target experimental data are interpolated to obtain complete analysis data;
[0016] From the completion analysis data, multiple common key baseline covariates that are related to the preset disease and coexist in the RCT trial group data and external data are identified.
[0017] According to one technical solution of the present invention, the preset condition exchangeable condition specifically includes:
[0018] If the key baseline covariates of the target experimental data and the external data are not statistically different, and the required items of the Pocock principle are satisfied, but not all items are satisfied, then the target experimental data and the external data are deemed to be conditionally interchangeable.
[0019] According to one technical solution of the present invention, the key baseline covariates show no statistically significant differences, specifically:
[0020] The balance test values of the target experimental data and external data on each key baseline covariate are all greater than the preset balance test threshold.
[0021] According to a technical solution of the present invention, satisfying the preset equalization requirement in step S3 includes:
[0022] Calculate the standardized mean difference between the matched external sample and the RCT experimental group on each of the key prognostic variables; if at least one of the standardized mean differences is greater than or equal to a preset equilibrium threshold, then reduce the preset propensity score difference threshold according to a preset adjustment value and re-execute the matching operation until the external sample data that makes the standardized mean difference of each of the key prognostic variables less than the preset equilibrium threshold is obtained.
[0023] If the preset propensity score difference threshold reaches the preset minimum value, and there is still at least one standardized mean difference greater than or equal to the preset equilibrium threshold, then the key prognostic variables are re-determined, and the propensity score is recalculated and the matching operation is performed based on the re-determined key prognostic variables, until the external sample data that makes the standardized mean difference of each key prognostic variable less than the preset equilibrium threshold is obtained.
[0024] If, after redetermining and rematching the key prognostic variables, at least one of the standardized mean deviations is still greater than or equal to the preset equilibrium threshold, the corresponding key prognostic variable or the corresponding sample subgroup is removed, so as to retain the external sample data that makes the standardized mean deviation of each key prognostic variable less than the preset equilibrium threshold.
[0025] According to one technical solution of the present invention, step S4 specifically includes:
[0026] Based on the key prognostic variables of the mixed control group and the RCT experimental group, a difference index reflecting the similarity of the distribution of the key prognostic variables between the two groups was calculated using a preset distribution similarity calculation method.
[0027] The power parameter is calculated using the following formula:
[0028] Where α is a power parameter, ranging from 0 to 1; k is an adjustment coefficient, ranging from 0.1 to 10; and p is the difference index.
[0029] The likelihood function of the external control group data is discounted by a power factor using the power parameter, and the prior distribution of the effect size of the internal control group in the RCT is constructed:
[0030]
[0031] in, This represents the control group effect value within the RCT. Data from an external control group; The effect size of the control group in the RCT Based on data from the external control group The prior function; Data from the external control group The likelihood function; Initially, there is no prior information;
[0032] By adjusting the power parameter, the prior distribution is fused with the likelihood function of the target experimental data to calculate the posterior distribution of the treatment effect:
[0033] in, The therapeutic effect between the RCT trial group and the mixed control group; denoted as posterior distribution of the therapeutic effect; D represents the target trial data. Let be the likelihood function of the target experimental data;
[0034] The estimated value of the treatment effect is extracted based on the posterior distribution.
[0035] According to one technical solution of the present invention, step S5 specifically includes:
[0036] Based on the estimated therapeutic effect, a robust value characterizing the influence of unmeasured confounding factors on the effect estimate is obtained using the E-value calculation formula; and using the robust value as a reference, a probability bias analysis of the external controlled trial of traditional Chinese medicine is conducted using Monte Carlo simulation.
[0037] If the robustness value is greater than or equal to a preset robustness threshold, and the fluctuation of the simulated treatment effect estimate obtained through the probability bias analysis is within a preset allowable range, then the treatment effect estimate is determined to be robust, and the treatment effect estimate is output.
[0038] If the robustness value is less than a preset robustness threshold, or if the fluctuation of the simulated treatment effect estimate obtained through the probability bias analysis exceeds a preset allowable range, then the treatment effect estimate is determined to be unrobust, and one of the following backtracking adjustment operations is performed:
[0039] Go back to step S2 and re-screen key baseline covariates or redetermine key prognostic variables;
[0040] Going back to step S3, reduce the preset propensity score difference threshold and re-execute the matching operation;
[0041] Going back to step S4, readjust the adjustment coefficient k and calculate the power parameter.
[0042] According to one technical solution of the present invention, the preset disease is hypertension;
[0043] The common variables include: age, sex, body mass index, comorbidities, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline triglycerides, baseline total cholesterol, baseline hemoglobin, TCM syndrome score, complete blood count, urinalysis, liver function, and kidney function.
[0044] Key prognostic variables include at least one of the following: age, sex, body mass index, comorbidities, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline total cholesterol, and baseline hemoglobin.
[0045] The present invention also provides an electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein the processor is connected to the memory, and the one or more computer programs are stored in the memory. When the electronic device is running, the processor executes the one or more computer programs stored in the memory to cause the electronic device to perform the above-described data integration method.
[0046] The present invention also provides a computer-readable storage medium, characterized in that it is used to store computer instructions, which, when executed by a processor, implement the above-described data integration method.
[0047] This invention provides a data integration method for external controls in traditional Chinese medicine clinical trials, which has the following beneficial effects:
[0048] By integrating propensity score-based covariate balancing with a dynamic borrowing mechanism based on Bayesian power priors, a structured and reproducible methodology for integrating external controlled data was constructed. Only patient demographics, medical history, laboratory indicators, TCM syndrome characteristics, and outcome indicators need to be input to complete data exchangeability assessment, key prognostic variable balancing, automatic adjustment of external data weights, and unmeasured confounding sensitivity analysis. This method can improve research efficiency by fully utilizing existing real-world data resources while minimizing the risk of bias caused by population differences and unmeasured confounding. As a standardized methodological tool, it provides an efficient, scientific, and TCM-compatible paradigm for evaluating the clinical efficacy of TCM, particularly suitable for research scenarios where patient recruitment is difficult or traditional RCT implementation is limited, thus accelerating the generation and translation of TCM clinical evidence. Attached Figure Description
[0049] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly described below. Obviously, the drawings described below are merely some embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings without creative effort.
[0050] Figure 1 A flowchart illustrating a data integration method for external controls in a traditional Chinese medicine clinical trial according to an embodiment of the present invention;
[0051] Figure 2 This diagram illustrates the propensity score matching flowchart for covariate feature balancing in a data integration method for external controls in a traditional Chinese medicine clinical trial according to an embodiment of the present invention.
[0052] Figure 3 This diagram illustrates a comparison of covariate characteristics before and after balancing in a data integration method for external controls in a traditional Chinese medicine clinical trial according to an embodiment of the present invention.
[0053] Figure 4 The flowchart illustrates a method for integrating data from external controls in a traditional Chinese medicine clinical trial according to an embodiment of the present invention, which employs Bayesian power prior to implement dynamic borrowing.
[0054] Figure 5 The illustration shows a comparison of the estimated therapeutic effect values with different borrowing weights in the data integration method of external control in traditional Chinese medicine clinical trials according to an embodiment of the present invention, taking the diastolic blood pressure decrease value (DBPchange) as an example; the horizontal axis of the figure represents the value of the power parameter α, and the vertical axis represents the estimated therapeutic effect value and its 95% confidence interval. Detailed Implementation
[0055] The description of the embodiments in this specification should be taken in conjunction with the accompanying drawings, which should form part of the complete specification. In the drawings, the shape or thickness of the embodiments may be exaggerated and may be indicated in a simplified or convenient manner. Furthermore, parts of the various structures in the drawings will be described separately; it is worth noting that elements not shown in the figures or not described in words are in a form known to those skilled in the art.
[0056] The descriptions of the embodiments herein, including any references to directions and orientations, are for ease of description only and should not be construed as limiting the scope of the invention. The following description of preferred embodiments involves combinations of features, which may exist independently or in combination; the invention is not particularly limited to the preferred embodiments. The scope of the invention is defined by the claims. Figures 1-5 As shown; Specific Implementation Method 1
[0058] This embodiment of a data integration method for external controls in traditional Chinese medicine clinical trials is characterized by the following steps:
[0059] S1. Acquire external data and target experimental data, wherein the target experimental data includes RCT experimental group data and RCT internal control group data; and select multiple common key baseline covariates from the external data and RCT experimental group data;
[0060] S2. When the preset conditions for exchange are met, a subset of variables is selected from the multiple key baseline covariates to obtain multiple key prognostic variables; and based on the key prognostic variables, the propensity scores of the external control group and the RCT trial group are calculated respectively.
[0061] S3. Within a preset propensity score difference threshold, select the external sample from the external data that is closest to the propensity score of the RCT experimental group; calculate the standardized mean difference between the matched external sample and the RCT experimental group on each key prognostic variable; and after the standardized mean difference meets the preset equilibrium requirement, integrate the matched external sample data as external control group data with the RCT internal control group data to form a mixed control group.
[0062] S4. Based on the Bayesian power prior framework, the power parameter is determined by the distribution similarity between the mixed control group and the RCT experimental group; the likelihood function of the external control group data is dynamically discounted using the power parameter; finally, based on the discounted external control group data and the target experimental data, the estimated value of the treatment effect between the RCT experimental group and the mixed control group is calculated.
[0063] S5. Evaluate the robustness of the estimated treatment effect;
[0064] If the robustness meets the preset requirements, the data integration is completed and the estimated value of the therapeutic effect is output.
[0065] If the robustness does not meet the preset requirements, then adjust at least one of the preset commutativity condition, the preset propensity score difference threshold, or the power parameter.
[0066] In this embodiment, the data integration method is a data integration method based on covariate balancing and dynamic borrowing of external controls from TCM clinical trials. The overall implementation process is as follows:
[0067] Step 1: Determine the type of experimental design and the data source for the external control group;
[0068] Specifically, first determine the type of trial design, adopting a supplementary randomized controlled trial; and determine the data source for the external control group, which is historical data from completed similar clinical trials or real-world data that meets the preset quality standards.
[0069] Step 2: Based on the Pocock principle, assess the interchangeability of data between the external control group and the target trial, and identify potential sources of bias;
[0070] Specifically, based on Pocock's six basic principles (consistent inclusion criteria, consistent interventions, consistent treatment outcomes, coverage of key variables, comparable study timelines, and homogeneous study environments), the commutativity of external data and target trial data is pre-assessed, and the balance of key covariates between the external and target trial data is tested. If the external and target trial data are assessed as conditionally commutative, Pocock's principles can also identify potential sources of bias (selection bias, active treatment bias, information bias, confounding bias, measurement bias, and unmeasured confounding bias), providing a reference for the subsequent sensitivity analysis in step S5 / step five.
[0071] Step 3: Based on the differences in key prognostic variables between the RCT trial group and the external control group, propensity score matching was used to balance patient (sample) characteristics and establish a mixed control group;
[0072] Step 4: Using the Bayesian power prior method, the borrowing degree of the external control group data is dynamically adjusted based on the similarity between the external control group data and the experimental data, so as to achieve automatic weight reduction as internal experimental information accumulates and complete data fusion.
[0073] Step 5: Perform sensitivity analysis on unmeasured confounding factors using methods such as quantitative bias analysis to verify the robustness of the results;
[0074] Step Six: Output the therapeutic effect estimate and its robustness assessment. Establish a data integration method for external controls in traditional Chinese medicine clinical trials. Specific Implementation Method Two
[0076] This embodiment is a further explanation of embodiment one. In this embodiment, step S1 specifically includes:
[0077] Both external data and target experimental data are interpolated to obtain complete analysis data;
[0078] From the completion analysis data, multiple common key baseline covariates that are related to the preset disease and coexist in the RCT trial group data and external data are identified.
[0079] In this embodiment, after acquiring external data and target experimental data, before conducting analysis, multiple imputation processing is performed on variables with missing values in the two sets of data. The Markov Chain Monte Carlo (MCMC) algorithm is used to generate multiple imputation datasets, and the optimal set is selected for subsequent analysis to obtain the completed analysis data.
[0080] After handling missing values, common variables were extracted from the RCT trial group and the external control group, including demographic characteristics, comorbidity information, disease baseline characteristics, TCM syndrome characteristics, combined treatment information, and laboratory test data. Then, combined with the characteristics of the target disease, the clinical experience of experts in related fields, and authoritative literature reports, key baseline covariates that have a significant impact on disease outcomes were finally screened to form confounding variables that fit the current trial, which were used to construct the propensity score model and subsequently obtain the propensity score.
[0081] The propensity score model described above was constructed using logistic regression. The model incorporates the key baseline covariates obtained from the screening process and calculates the conditional probability (i.e., propensity score) of each study subject being assigned to the experimental group using the maximum likelihood estimation method. This probability is then used for subsequent matching and confounding correction. Specific Implementation Method 3
[0083] This embodiment is a further explanation of embodiment one or two. In this embodiment, the preset interchangeable conditions are specifically as follows:
[0084] If there are no statistically significant differences in the key baseline covariates between the target experimental data and the external data, and the required items of the Pocock principle are met, but not all of them, then the target experimental data and the external data are deemed to be conditionally interchangeable.
[0085] In this embodiment, if all six Pocock principles are met and the balance test Q of all key baseline covariates is greater than the preset balance test threshold, the external data and the target experimental data are determined to be completely interchangeable and can be directly included for subsequent data integration, i.e., step S5.
[0086] If the core items of the Pocock principle, namely the mandatory items (inclusion criteria, intervention, and consistent treatment outcomes), are met, and there are only slight differences in the study timeline / study environment, and there are no statistically significant differences in key covariates, then the external data and the target trial data are conditionally interchangeable and can be included in subsequent steps after propensity score matching correction.
[0087] If any of the core criteria of Pocock (inclusion criteria, intervention, and consistency of treatment outcomes) is not met, or if multiple key covariates show significant differences that cannot be corrected by matching, then the external data is not interchangeable with the target trial data, and the external data source is deemed unusable. A new, usable external data source needs to be selected.
[0088] If the external data and the target trial data are deemed interchangeable after evaluation, potential sources of bias (selection bias, active treatment bias, information bias, confounding bias, measurement bias, and unmeasured confounding bias) can be further identified based on the Pocock principle mentioned above. This can provide a reference direction for the subsequent sensitivity analysis in step S5. For example, if selection bias exists, it suggests conducting sensitivity analysis around inclusion / exclusion criteria and baseline population characteristics; if active treatment bias exists, the impact of treatment differences can be assessed through subgroup analysis based on dimensions such as intervention intensity, dosing regimen, and concomitant medications; if information bias exists, scenario analysis can be conducted on the definition of outcome indicators, judgment time points, and measurement tools to examine whether conclusions are consistent under different efficacy criteria; if confounding bias exists, the stability of results using different covariate schemes, weighted models, and variable screening strategies can be verified; if measurement bias exists, differences in detection methods, follow-up periods, and assessment time points can be considered, and the interference caused by measurement variation can be assessed through time-series stratification or extreme value interval replacement; if unmeasured confounding bias exists, quantitative bias analysis and other methods can be used to quantify the degree of influence of unmeasured confounding on effect size. Specific Implementation Method Four
[0090] This implementation method is a further explanation of implementation method three. In this implementation method, there are no statistically significant differences in the key baseline covariates, specifically:
[0091] The balance test values of the target experimental data and external data on each key baseline covariate are all greater than the preset balance test threshold.
[0092] In this embodiment, the equilibrium test value P > 0.05 (the preset equilibrium test threshold) for each key baseline covariate is used as the quantitative threshold for indicating that there is no statistically significant difference in the distribution of key baseline covariates. Detailed Implementation Method Five
[0094] This embodiment is a further explanation of embodiment four. In this embodiment, satisfying the preset equalization requirement in step S3 includes:
[0095] Calculate the standardized mean difference between the matched external samples and the RCT experimental group on each key prognostic variable; if there is at least one standardized mean difference greater than or equal to the preset equilibrium threshold, then reduce the preset propensity score difference threshold according to the preset adjustment value and re-execute the matching operation until external sample data that makes the standardized mean difference of each key prognostic variable less than the preset equilibrium threshold is obtained.
[0096] If the preset propensity score difference threshold reaches the preset minimum value, and there is still at least one standardized mean difference greater than or equal to the preset equilibrium threshold, then the key prognostic variables are redefined, and the propensity score is recalculated and the matching operation is performed based on the redefined key prognostic variables, until external sample data is obtained in which the standardized mean difference of each key prognostic variable is less than the preset equilibrium threshold.
[0097] If, after redefining and rematching the key prognostic variables, at least one standardized mean deviation is still greater than or equal to the preset equilibrium threshold, the corresponding key prognostic variable or the corresponding sample subgroup is removed, so as to retain external sample data that makes the standardized mean deviation of each key prognostic variable less than the preset equilibrium threshold.
[0098] In this embodiment, specifically, a nearest neighbor propensity score matching method based on a preset clamping value (preset propensity score difference threshold) is adopted. In the external data, the nearest neighbor population with the smallest total propensity score distance within the clamping value range is matched to the RCT experimental group, achieving a balance in patient characteristics. The matching employs a 1:1 strategy without replacement, with a clamping value set to 0.2. The final matched external data (external control group data) and the RCT internal control group data form a mixed control group.
[0099] like Figures 2-3 As shown, after matching, if the standardized mean difference of all key prognostic variables is lower than the preset propensity score difference threshold of 0.2, it indicates that the covariate balancing effect is good, and we can proceed directly to the subsequent analysis.
[0100] If, after matching, the standardized mean error of one or more key prognostic variables is still higher than 0.2, the clamp value can be tightened to 0.1 or 0.05 and matching can be performed again. If the clamp value is still not met after tightening, key prognostic variables can be re-selected from the common key baseline covariates, or at least one key prognostic variable can be selected from the common key baseline covariates and added to the original set of key prognostic variables. The propensity score model can then be reconstructed and matching can be performed again. If the above two corrections still cannot be balanced, the key prognostic variable or the corresponding anomalous sample subgroup can be removed, and the balanced and effective samples can be retained to form the final mixed control group.
[0101] Screening or addition methods (or criteria): Based on the results of standardized mean difference and post-matching balance test, covariates with standardized mean difference still higher than 0.2 after adjusting caliper values and difficult to achieve inter-group balance through matching can be removed. At the same time, covariates with standardized mean difference lower than 0.2, closely related to disease prognosis and treatment grouping, and with clear clinical significance can be retained or added. Specific Implementation Method Six
[0103] This embodiment is a further explanation of one of embodiments one, two, four, and five. In this embodiment, step S4 specifically includes:
[0104] Based on the key prognostic variables of the mixed control group and the RCT experimental group, the difference index reflecting the distribution similarity of the key prognostic variables between the two groups was calculated using a pre-defined distribution similarity calculation method.
[0105] The power parameter is calculated using the following formula:
[0106] Where α is a power parameter, ranging from 0 to 1; k is an adjustment coefficient, ranging from 0.1 to 10; and p is a difference index.
[0107] Power-law discounting was applied to the likelihood function of the external control group data using a power parameter, and a prior distribution of the effect size of the control group within the RCT was constructed:
[0108]
[0109] in, This represents the control group effect value within the RCT. Data from an external control group; The effect size of the control group in the RCT Based on data from the external control group The prior function; Data from the external control group The likelihood function; Initially, there is no prior information;
[0110] By adjusting the power parameter, the prior distribution is fused with the likelihood function of the target experimental data to calculate the posterior distribution of the treatment effect:
[0111] in, The treatment effect between the RCT trial group and the mixed control group; denoted as posterior distribution of the therapeutic effect; D represents the target trial data. The likelihood function of the target experimental data;
[0112] The estimated value of the treatment effect is obtained by extracting the posterior distribution.
[0113] In this embodiment, such as Figure 4 As shown, using a Bayesian power prior framework, the effect size θ of the control group within the RCT is first constructed based on the external control group data. c Prior information distribution:
[0114]
[0115] in As an initial, uninformed prior, Let De be the likelihood function of the external control group data; To quantify the power-law parameter of similarity between the external control group and the RCT experimental group, and between the control groups within the RCT, a power-law discount can be applied to the likelihood function of the external control group data to adjust the degree to which the current clinical trial borrows external control group data. The value range is from 0 to 1;
[0116] The value is determined based on the p-value (distribution difference index p) of the distribution similarity test between the external control group data and the RCT experimental group data on the main prognostic indicators, through a continuous function. Calculated dynamically, such as Figure 5 As shown, the smaller the p-value, the higher the similarity between groups. The larger the size, the greater the degree of borrowing;
[0117] The aforementioned pre-defined method for calculating distribution similarity is as follows: based on the key prognostic variables of the mixed control group and the RCT experimental group, the standardized mean difference of the covariates of each group is calculated, and the mean of the absolute values of the standardized mean differences of all key prognostic variables is used as the distribution difference index p.
[0118] If the distribution difference index p is greater than or equal to the preset difference threshold (e.g., 1), then there is a serious gap between the external control group data and the target experimental data. The robustness of data integration can be ensured by triggering backtracking adjustment operations such as step S5.
[0119] Furthermore, according to the formula It can be seen that the power parameter The discount strength depends not only on the distribution difference index p, but also on the moderating power of the cumulative data volume n of the target experiment (the power parameter is negatively correlated with the cumulative data volume of the target experiment). That is, the moderating coefficient k is set as an increasing function of n. According to the formula, when an increase in n leads to an increase in k, under the same distribution difference index p, the calculated discount strength... This will decrease accordingly. This ensures that in the later stages of the experiment, as the data accumulates, the weighting of borrowing from external data will automatically decrease, thereby reducing interference.
[0120] Using the above power prior, the posterior distribution of θ is expressed as follows: The data was fused to obtain an estimate of the therapeutic effect. Detailed Implementation Method Seven
[0122] This embodiment is a further explanation of embodiment six. In this embodiment, step S5 specifically includes:
[0123] Based on the estimated therapeutic effect, a robust value characterizing the influence of unmeasured confounding factors on the effect estimate is obtained using the E-value calculation formula; and using the robust value as a reference, a probability bias analysis of external controlled trials of traditional Chinese medicine is conducted using Monte Carlo simulation.
[0124] If the robustness value is greater than or equal to the preset robustness threshold, and the fluctuation of the simulated treatment effect estimate obtained through probability bias analysis is within the preset allowable range, then the treatment effect estimate is determined to be robust, and the treatment effect estimate is output.
[0125] If the robustness value is less than the preset robustness threshold, or if the fluctuation of the simulated treatment effect estimate obtained through probability bias analysis exceeds the preset allowable range, the treatment effect estimate is determined to be unrobust, and one of the following backtracking adjustment operations is performed:
[0126] Go back to step S2 and re-screen key baseline covariates or redetermine key prognostic variables;
[0127] Go back to step S3, reduce the preset propensity score difference threshold, and re-execute the matching operation;
[0128] Going back to step S4, readjust the adjustment coefficient k and calculate the power parameter.
[0129] In this embodiment, the quantitative bias analysis method uses E-value to assess the impact of unmeasured confounding factors on effect estimation, and combines Monte Carlo simulation to establish a probability bias analysis method for external controlled trials of traditional Chinese medicine (taking unmeasured confounding bias as an example, using quantitative bias analysis for sensitivity analysis), and performs sensitivity analysis.
[0130] If the E-value (robust value) is greater than or equal to the preset robustness threshold (usually set to 1.5) and the probability bias analysis results show that the fluctuation of the effect estimate is within the allowable range, it indicates that the impact of unmeasured confounding bias on the effect estimate is within an acceptable range, the current data fusion results are robust, and the estimated treatment effect value and its 95% confidence interval can be output, and a robustness evaluation can be performed. Complete the data integration of external controls in traditional Chinese medicine clinical trials.
[0131] The therapeutic effect of external controlled trials of traditional Chinese medicine was estimated using 95% confidence intervals and the width of the confidence intervals. Robustness was evaluated by assessing the fluctuation range and stability of the estimated values under different bias analysis scenarios. The final output includes the estimated therapeutic effect results and corresponding robustness evaluation indicators.
[0132] If the E value is less than the preset robustness threshold or the results show that the fluctuation of the effect estimate exceeds the allowable range, it indicates that the unmeasured confounding bias has a significant impact. It is necessary to readjust the following steps for analysis: S2 (screening more homogeneous external data, i.e., more homogeneous key baseline covariates), S3 (optimizing propensity score matching variables), and S4 (reducing the weight of heterogeneous data). Detailed Implementation Method Eight
[0134] This embodiment is a further explanation of embodiment two. In this embodiment, the disease is assumed to be hypertension.
[0135] Common variables include: age, sex, body mass index, comorbidities, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline triglycerides, baseline total cholesterol, baseline hemoglobin, TCM syndrome score, complete blood count, urinalysis, liver function, and kidney function;
[0136] Key prognostic variables include at least one of the following: age, sex, body mass index, comorbidities, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline total cholesterol, and baseline hemoglobin.
[0137] In this embodiment, the clinical trial of traditional Chinese medicine is an external controlled trial to evaluate the efficacy of a traditional Chinese medicine preparation of the liver-calming and yang-subduing type in the treatment of essential hypertension;
[0138] The trial employed a supplementary randomized controlled trial;
[0139] The patients were grade 2 essential hypertension patients, and the RCT trial group consisted of patients who received this traditional Chinese medicine treatment;
[0140] The data source for the external control group was the electronic medical record data of the past 10 years from the HIS database (Hospital Information System) of a tertiary hospital. The data inclusion criteria included: diagnosed with grade 2 essential hypertension; age 18-75 years; complete TCM syndrome score record; and at least one follow-up blood pressure data.
[0141] Based on the characteristics of hypertension, consultation with cardiovascular specialists in both traditional Chinese and Western medicine, and a review of literature on key baseline covariates that may influence medication and disease outcomes, including:
[0142] Demographic characteristics: age, sex, body mass index, etc.;
[0143] Comorbidities: Diabetes, hyperlipidemia, coronary heart disease, etc.
[0144] Baseline characteristics of the disease: baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline triglycerides, baseline total cholesterol, and baseline hemoglobin;
[0145] Traditional Chinese Medicine (TCM) syndrome characteristics: TCM syndrome score;
[0146] Combined treatment information: Western medicine treatment and traditional Chinese medicine treatment include prescriptions for Chinese herbal medicines, prescriptions for prepared Chinese medicines, and non-drug techniques, covering treatment names, dosages, dosage forms, methods of administration, and usage;
[0147] Laboratory test data: complete blood count, urinalysis, liver function, kidney function, etc.
[0148] Ten key baseline covariates were identified as key prognostic variables for propensity score matching, including: sex, presence of other diseases, age, body mass index, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline triglycerides, baseline total cholesterol, and baseline hemoglobin.
[0149] This method evaluates the performance of data integration methods for external controlled trials of traditional Chinese medicine from multiple dimensions, including assessing the matching quality of the internal and external control groups through standardized mean difference and balance test plots, evaluating the goodness of fit of the model through posterior prediction test and information criterion, assessing the robustness of the results through sensitivity analysis and E-value analysis, and evaluating the clinical applicability through effect estimation accuracy and result interpretability.
[0150] This includes an external controlled trial of traditional Chinese medicine (TCM) for the treatment of essential hypertension, established using the aforementioned method of integrating external controlled trial data. Regarding matching quality, the standardized mean difference after matching for all key prognostic variables was below the threshold of 0.2, and the balance test plot showed that the experimental group and the external control group largely overlapped in their covariate distributions after matching. Regarding model fit, the posterior prediction test showed a mean deviation of 0.12 between the observed data and the model's predicted data, and the information criterion assessment showed a WAIC value of -152.3 and a DIC value of -148.7. Regarding robustness of results, E-value analysis showed that for the treatment effect to be statistically insignificant, the relative risk of unmeasured confounding factors needed to be above 2.8. Probability bias analysis showed that, with a relative risk of unmeasured confounding factors not exceeding 2.5, the effect estimation bias was less than 15%. Regarding clinical applicability, the 95% CI width for the decrease in systolic blood pressure was 3.6 mmHg, and the 95% CI width for the decrease in diastolic blood pressure was 2.6 mmHg, indicating high accuracy in effect estimation. Overall evaluation shows that this data integration method performs well in terms of matching quality, model fit, result robustness, and clinical applicability.
[0151] Furthermore, it should be noted that the present invention can be provided as a method, apparatus, or computer program product. Therefore, embodiments of the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Moreover, embodiments of the present invention can take the form of a computer program product implemented on one or more computer-usable storage media containing computer-usable program code.
[0152] Embodiments of the present invention are described with reference to flowchart illustrations and / or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0153] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1The functions specified in one or more boxes. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal equipment to cause a series of operational steps to be performed on the computer or other programmable terminal equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable terminal equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0154] It should also be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or terminal device that includes said element.
[0155] Finally, it should be noted that the above description represents a preferred embodiment of the present invention. It should be pointed out that although preferred embodiments have been described, those skilled in the art, once they understand the basic inventive concept of the present invention, can make various improvements and modifications without departing from the principles described herein. These improvements and modifications should also be considered within the scope of protection of the present invention. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the embodiments of the present invention.
Claims
1. A data integration method for external controls in traditional Chinese medicine clinical trials, characterized in that, The steps are as follows: S1. Acquire external data and target experimental data, wherein the target experimental data includes RCT experimental group data and RCT internal control group data; and select multiple common key baseline covariates from the external data and RCT experimental group data; S2. When the preset conditions for exchange are met, a subset of variables is selected from the multiple key baseline covariates to obtain multiple key prognostic variables; and based on the key prognostic variables, the propensity scores of the external control group and the RCT trial group are calculated respectively. S3. Within a preset propensity score difference threshold, select the external sample from the external data that is closest to the propensity score of the RCT experimental group; calculate the standardized mean difference between the matched external sample and the RCT experimental group on each key prognostic variable; and after the standardized mean difference meets the preset equilibrium requirement, integrate the matched external sample data as external control group data with the RCT internal control group data to form a mixed control group. S4. Based on the Bayesian power prior framework, the power parameter is determined by the distribution similarity between the mixed control group and the RCT experimental group; the likelihood function of the external control group data is dynamically discounted using the power parameter; finally, based on the discounted external control group data and the target experimental data, the estimated value of the treatment effect between the RCT experimental group and the mixed control group is calculated. S5. Evaluate the robustness of the estimated treatment effect; If the robustness meets the preset requirements, the data integration is completed and the estimated value of the therapeutic effect is output. If the robustness does not meet the preset requirements, then adjust at least one of the preset commutativity condition, the preset propensity score difference threshold, or the power parameter.
2. The data integration method for external controls in traditional Chinese medicine clinical trials according to claim 1, characterized in that, Step S1 specifically includes: Both external data and target experimental data are interpolated to obtain complete analysis data; From the completion analysis data, multiple common key baseline covariates that are related to the preset disease and coexist in the RCT trial group data and external data are identified.
3. The data integration method for external controls in traditional Chinese medicine clinical trials according to claim 1 or 2, characterized in that, The preset condition exchangeable condition is specifically as follows: If the key baseline covariates of the target experimental data and the external data are not statistically different, and the required items of the Pocock principle are satisfied, but not all items are satisfied, then the target experimental data and the external data are deemed to be conditionally interchangeable.
4. The data integration method for external controls in traditional Chinese medicine clinical trials according to claim 3, characterized in that, The key baseline covariates showed no statistically significant differences, specifically: The balance test values of the target experimental data and external data on each key baseline covariate are all greater than the preset balance test threshold.
5. The data integration method for external controls in traditional Chinese medicine clinical trials according to claim 4, characterized in that, The requirement to satisfy the preset equalization in step S3 includes: Calculate the standardized mean difference between the matched external sample and the RCT experimental group on each of the key prognostic variables; if at least one of the standardized mean differences is greater than or equal to a preset equilibrium threshold, then reduce the preset propensity score difference threshold according to a preset adjustment value and re-execute the matching operation until the external sample data that makes the standardized mean difference of each of the key prognostic variables less than the preset equilibrium threshold is obtained. If the preset propensity score difference threshold reaches the preset minimum value, and there is still at least one standardized mean difference greater than or equal to the preset equilibrium threshold, then the key prognostic variables are re-determined, and the propensity score is recalculated and the matching operation is performed based on the re-determined key prognostic variables, until the external sample data that makes the standardized mean difference of each key prognostic variable less than the preset equilibrium threshold is obtained. If, after redetermining and rematching the key prognostic variables, at least one of the standardized mean deviations is still greater than or equal to the preset equilibrium threshold, the corresponding key prognostic variable or the corresponding sample subgroup is removed, so as to retain the external sample data that makes the standardized mean deviation of each key prognostic variable less than the preset equilibrium threshold.
6. The data integration method for external controls in traditional Chinese medicine clinical trials according to any one of claims 1, 2, 4, and 5, characterized in that, Step S4 specifically includes: Based on the key prognostic variables of the mixed control group and the RCT experimental group, a difference index reflecting the similarity of the distribution of the key prognostic variables between the two groups was calculated using a preset distribution similarity calculation method. The power parameter is calculated using the following formula: Where α is a power parameter with an initial value ranging from 0 to 1; k is an adjustment coefficient with a value ranging from 0.1 to 10; and p is the difference index. The likelihood function of the external control group data is discounted by a power factor using the power parameter, and the prior distribution of the effect size of the internal control group in the RCT is constructed: in, This represents the control group effect value within the RCT. Data from an external control group; The effect size of the control group in the RCT Based on data from the external control group The prior function; Data from the external control group The likelihood function; Initially, there is no prior information; By adjusting the power parameter, the prior distribution is fused with the likelihood function of the target experimental data to calculate the posterior distribution of the treatment effect: in, The therapeutic effect between the RCT trial group and the mixed control group; denoted as posterior distribution of the therapeutic effect; D represents the target trial data. Let be the likelihood function of the target experimental data; The estimated value of the treatment effect is extracted based on the posterior distribution.
7. The data integration method for external controls in traditional Chinese medicine clinical trials according to claim 6, characterized in that, Step S5 specifically includes: Based on the estimated therapeutic effect, a robust value characterizing the influence of unmeasured confounding factors on the effect estimate is obtained using the E-value calculation formula; and using the robust value as a reference, a probability bias analysis of the external controlled trial of traditional Chinese medicine is conducted using Monte Carlo simulation. If the robustness value is greater than or equal to a preset robustness threshold, and the fluctuation of the simulated treatment effect estimate obtained through the probability bias analysis is within a preset allowable range, then the treatment effect estimate is determined to be robust, and the treatment effect estimate is output. If the robustness value is less than a preset robustness threshold, or if the fluctuation of the simulated treatment effect estimate obtained through the probability bias analysis exceeds a preset allowable range, then the treatment effect estimate is determined to be unrobust, and one of the following backtracking adjustment operations is performed: Go back to step S2 and re-screen key baseline covariates or redetermine key prognostic variables; Going back to step S3, reduce the preset propensity score difference threshold and re-execute the matching operation; Going back to step S4, readjust the adjustment coefficient k and calculate the power parameter.
8. The data integration method for external controls in traditional Chinese medicine clinical trials according to claim 2, characterized in that, The preset disease is hypertension; The common variables include: age, sex, body mass index, comorbidities, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline triglycerides, baseline total cholesterol, baseline hemoglobin, TCM syndrome score, complete blood count, urinalysis, liver function, and kidney function. Key prognostic variables include at least one of the following: age, sex, body mass index, comorbidities, baseline systolic blood pressure, baseline diastolic blood pressure, baseline fasting blood glucose, baseline total cholesterol, and baseline hemoglobin.
9. An electronic device, characterized in that, include: One or more processors, one or more memories, and one or more computer programs; wherein the processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory to cause the electronic device to perform the data integration method as described in any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that, Used to store computer instructions, which, when executed by a processor, implement the data integration method as described in any one of claims 1 to 8.