A method and device for realizing intelligent recruitment of test participants based on AI
By acquiring multi-source medical data of candidates, calculating the matching score and evidence quality score of candidates relative to each inclusion and exclusion screening criterion, and combining time intervals and priorities, an AI model is used to decompose the natural language inclusion and exclusion criteria, which solves the problem of non-timeliness in the screening of trial participants and achieves accurate screening of trial participants.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CENT SOUTH UNIV
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-23
AI Technical Summary
In the current technology for screening trial participants, data that is not timely affects the accuracy of the screening, resulting in inaccurate screening results.
By acquiring multi-source medical data of candidates, the matching score and evidence quality score of candidates relative to each inclusion and exclusion screening criterion are calculated. An AI model is used to decompose the natural language inclusion and exclusion criteria, and a weighted fusion score is calculated by combining time intervals and priorities to determine whether a candidate is a trial participant to be recruited.
Taking into account the timeliness of data, the system achieved precise screening of trial participants, reduced the impact of outdated data, and improved the accuracy and automation of screening.
Smart Images

Figure CN121964022B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of biomedical and clinical research informatics technology, and in particular to a method and apparatus for intelligent recruitment of trial participants based on AI. Background Technology
[0002] In fields such as medical device testing, biological sample collection, and medical research, the selection of candidates is the prerequisite and foundation for the smooth progress of the entire research project. It directly affects the project's progress, the scientific validity of the data, the reliability of the test results, and the control of research costs.
[0003] Currently, candidate enrollment screening primarily involves matching candidates' clinical information with inclusion / exclusion criteria data, and then designating those meeting the criteria as potential candidates. However, this approach can lead to outdated data affecting participant selection. For example, patent application CN115662554A matches potential candidates' clinical information and multi-omics testing results separately with inclusion / exclusion criteria data, designating those whose clinical information and multi-omics testing results both meet the criteria as target potential candidates. This can result in outdated data impacting candidate selection and reducing screening accuracy. Similarly, while patent application CN121260507A considers time, it directly filters structured data based on the data's generation time, which can also lead to less accurate final scores. Summary of the Invention
[0004] Therefore, it is necessary to provide a method and apparatus for intelligent recruitment of trial participants based on AI to address the above-mentioned technical problems. This method can reduce the impact of outdated data on the screening of trial participants and achieve accurate screening of trial participants.
[0005] A method for intelligent recruitment of trial participants based on AI, the method comprising:
[0006] S1. Obtain multi-source medical data and multiple inclusion / exclusion criteria for candidates, wherein the multi-source medical data includes clinical data; and the inclusion / exclusion criteria are obtained by decomposing the natural language inclusion / exclusion criteria using an AI model.
[0007] S2. Determine the time interval between the generation time and the current time for each of the clinical data;
[0008] S3. Based on the multi-source medical data, calculate the matching score of the candidate relative to each of the inclusion and exclusion screening conditions, and calculate the evidence quality score of the candidate relative to each of the inclusion and exclusion screening conditions according to the time interval of each of the clinical data.
[0009] S4. Calculate the weighted fusion score of the candidate relative to each of the entry and exit criteria based on the matching score and the evidence quality score of the candidate relative to each of the entry and exit criteria.
[0010] S5. Based on the weighted fusion score of each of the ingress and egress screening conditions, determine whether the candidate is a trial participant to be recruited.
[0011] In this application, by acquiring multi-source medical data of candidates and multiple inclusion / exclusion screening criteria, the time interval between the generation time of each clinical data and the current time is determined. Based on the multi-source medical data, the matching score of the candidate relative to each inclusion / exclusion screening criterion is calculated. According to the time interval of each clinical data, the evidence quality score of the candidate relative to each inclusion / exclusion screening criterion is calculated. This can reduce the impact of outdated data on candidate selection. At the same time, the time interval can be quantified into a specific evidence quality score, so as to represent the degree of decay of the value of clinical data through the evidence quality score. This makes the weighted fusion score obtained by the candidate relative to each inclusion / exclusion screening criterion and the evidence quality score more accurate. Thus, when judging whether a candidate is a trial participant to be recruited based on the weighted fusion score of each inclusion / exclusion screening criterion, trial participants can be accurately screened while taking into account the timeliness of the data.
[0012] In one embodiment, the inclusion / exclusion criteria include inclusion / exclusion factors and a reference range for the inclusion / exclusion factors. The inclusion / exclusion criteria are numerical inclusion / exclusion criteria. Step S3, calculating the matching score of the candidate relative to each inclusion / exclusion criterion, includes:
[0013] The reference range of the ingress / exgress factor in the numerical ingress / exgress screening criteria is matched with the first target data in the multi-source medical data. The type of the ingress / exgress factor to which the reference range belongs is consistent with the type of the first target data. The ingress / exgress factor is a feature used to determine whether a candidate meets the ingress / exgress screening criteria.
[0014] When the first target data is within the reference range, the matching score of the candidate relative to the numerical in / out screening criteria is determined to be a first preset value;
[0015] When the first target data is not within the reference range, the matching score of the candidate relative to the numerical ingress / outgress screening criteria is determined to be a second preset value, and the first preset value is greater than the second preset value.
[0016] In this application, when the first target data is within the reference range, the matching score of the candidate relative to the numerical entry / exit screening criteria is determined to be a first preset value. When the first target data is not within the reference range, the matching score of the candidate relative to the numerical entry / exit screening criteria is determined to be a second preset value. This can achieve accurate scoring of the candidate relative to the numerical entry / exit screening criteria.
[0017] In one embodiment, the in-and-out selection criteria are semantic in-and-out selection criteria, and step S3, calculating the matching score of the candidate relative to each of the in-and-out selection criteria, includes:
[0018] NLP was used to calculate the semantic similarity between the multi-source medical data and the semantic in / out screening criteria;
[0019] Based on the semantic similarity, through Calculate the matching score of the candidate relative to the semantic in / out filtering criteria, s i Sim represents the matching score of the candidate relative to the semantic in / out filtering condition i, where Sim is the semantic similarity. For steepness control parameters, This is the semantic matching threshold.
[0020] In this application, semantic similarity is used... Calculate the matching score of candidates relative to semantic in / out criteria. This can smooth the boundary between semantic in / out criteria and multi-source medical data, avoid false judgments due to hard thresholds, and thus obtain an undisturbed matching score.
[0021] In one embodiment, the inclusion / exclusion criteria have priority and inclusion / exclusion factors, and step S3, calculating the evidence quality score of the candidate relative to each of the inclusion / exclusion criteria, includes:
[0022] Determine the priority of the ingress / exgress screening criteria and the target type of the ingress / exgress factors;
[0023] From the clinical data, determine second target data of type target type, and determine the time interval of the second target data;
[0024] Based on the time interval of the second target data and the priority of the inflow and outflow screening conditions, through Calculate the evidence quality score of the candidate relative to the inclusion / exclusion criteria, q. i The evidence quality score of the candidate relative to the inclusion / exclusion screening criterion i. For time intervals, This is the attenuation coefficient corresponding to the priority of the inflow and outflow screening conditions.
[0025] In this application, based on the time interval of the second target data and the priority of the inflow and outflow screening conditions, the method is used... Calculate the evidence quality score of candidates relative to the inclusion / exclusion criteria. This allows for the introduction of a timeliness indicator, i.e., time interval, so that older clinical data has a lower evidence quality score, thereby reducing the interference of older clinical data on inclusion / exclusion screening.
[0026] In one embodiment, the in-and-out screening conditions have priority, and the formula for calculating the weighted fusion score in step S4 is:
[0027] ;
[0028] Among them, w i s is the weighted fusion score of the candidate relative to the in / out screening condition i. i q represents the matching score of the candidate relative to the in / out selection criterion i. i k is the evidence quality score of the candidate relative to the inclusion / exclusion screening criterion i. i This is the priority coefficient corresponding to the priority of the in / out selection condition i.
[0029] In this application, by using the formula Calculate the weighted fusion score of candidates relative to the inclusion and exclusion criteria. This allows for the introduction of exponential penalties or weights for higher-priority inclusion and exclusion criteria, strengthening their impact on the weighted fusion score. This ensures that if a candidate's multi-source medical data violates the higher-priority inclusion and exclusion criteria, the weighted fusion score will decrease, thus mathematically forcing these higher-priority inclusion and exclusion criteria to remain in place.
[0030] In one embodiment, the in-and-out selection criteria include in-and-out selection criteria with a first priority, in-and-out selection criteria with a second priority, and in-and-out selection criteria with a third priority, wherein the third priority is lower than the second priority and the first priority. Step S5 includes:
[0031] Calculate the fuzzy fusion score of the candidate relative to the in / out selection criteria with the third priority;
[0032] Based on the weighted fusion score of the in / out selection criteria with priority of first priority, the weighted fusion score of the in / out selection criteria with priority of second priority, and the fuzzy fusion score of the in / out selection criteria with priority of third priority, through... Calculate the raw weighted total score (RawSum) of the candidates; where r iLet i represent the in-and-out selection condition in the first condition set, which consists of in-and-out selection conditions with priority P1 and priority P2; w i r is the weighted fusion score of the in-out selection condition i in the first condition set. j This indicates the in / out selection criteria j and f with priority P3 as the third priority. j The fuzzy fusion score of the candidate relative to the in / out selection condition j with priority P3 as the third priority.
[0033] Based on the candidate's original weighted total score and the weighted fusion score / matching score of the in-and out selection criteria, it is determined whether the candidate is a trial participant to be recruited.
[0034] In this application, the weighted fusion scores based on the first priority entry / exit screening criteria, the second priority entry / exit screening criteria, and the third priority entry / exit screening criteria are used to... Calculate the raw weighted total score of the candidates to obtain the accurate raw weighted total score.
[0035] In one embodiment, determining whether a candidate is a trial participant to be recruited, based on the candidate's original weighted total score and the weighted fusion score / matching score of the inclusion / exclusion screening criteria, includes:
[0036] use Calculate the conflict intensity *c* of the candidates; where *C* is the second set of conditions for inclusion / exclusion selection based on matching scores greater than the upper limit threshold, *D* is the third set of conditions for inclusion / exclusion selection based on matching scores less than the lower limit threshold, and *k* is the number of candidates. i k is the priority coefficient corresponding to the priority of the ingress / outgress filtering condition i in the second condition set. j k is the priority coefficient corresponding to the priority of the ingress / outgress filter condition j in the third condition set. m The priority coefficients corresponding to the priority of the inbound and outbound screening conditions m in the third and fourth condition sets;
[0037] When the conflict intensity is greater than a third preset value, by Calculate the overall score M;
[0038] When the conflict intensity is between the third preset value and the fourth preset value, by Calculate the comprehensive score M, where the third preset value is greater than the fourth preset value, and y is a preset multiple;
[0039] Based on the candidate's overall score and the weighted fusion score / matching score of the inclusion and exclusion screening criteria, it is determined whether the candidate is a trial participant to be recruited.
[0040] In this application, by using Calculate the conflict intensity of candidates and determine their comprehensive score based on the conflict intensity. This allows for a refined and quantitative assessment of whether a candidate is a participant in the trial, effectively identifying and penalizing logical contradictions or inconsistencies between the inclusion and exclusion screening criteria, thereby improving the accuracy, safety, and automation of the screening process.
[0041] In one embodiment, calculating the fuzzy fusion score of the candidate relative to the in / out screening criteria with priority of the third priority includes:
[0042] Obtain the first membership function, second membership function and third membership function of each preset membership level, and determine the target semantic similarity and target evidence quality score of the candidate relative to the third priority of the entry and exit screening conditions. The first membership function is associated with the target semantic similarity, the second membership function is associated with the target evidence quality score, and the third membership function is associated with the fuzzy fusion score.
[0043] From the first membership function, at least one first function corresponding to the target semantic similarity is determined; from the second membership function, at least one second function corresponding to the target evidence quality score is determined; based on the target semantic similarity and the target evidence quality score, the target level of the fuzzy fusion score is determined; and from the third membership function, at least one third function corresponding to the target level is determined.
[0044] Based on the target semantic similarity, a first membership degree is calculated using each of the first functions; based on the target evidence quality score, a second membership degree is calculated using each of the second functions.
[0045] Based on the minimum value of the first membership degree and the second membership degree of each membership degree level, the corresponding third function is trimmed to obtain the fuzzy function of each membership degree level.
[0046] The aggregation function is obtained by superimposing the fuzzy functions of each membership level;
[0047] Calculate the centroid of the closed region enclosed by the aggregation function and the horizontal axis, and determine the fuzzy fusion score of the candidate relative to the in / out screening condition of the third priority based on the coordinate value of the centroid.
[0048] In this application, by determining the target semantic similarity and target evidence quality score of candidates relative to the third priority entry / exit screening criteria, at least one first function corresponding to the target semantic similarity is determined from the first membership function, and at least one second function corresponding to the target evidence quality score is determined from the second membership function; based on the target semantic similarity and target evidence quality score, a target level of fuzzy fusion score is determined, and at least one third function corresponding to the target level is determined from the third membership function; based on the target semantic similarity, the first membership degree is calculated using each first function; based on the target evidence quality score, the second membership degree is calculated using each second function. Membership degree; based on the minimum value of the first and second membership degrees at each membership degree level, the corresponding third function is trimmed to obtain fuzzy functions at each membership degree level; the fuzzy functions at each membership degree level are superimposed to obtain the aggregation function; the centroid of the closed region enclosed by the aggregation function and the horizontal axis is calculated, and based on the coordinate value of the centroid, the fuzzy fusion score of the candidate relative to the third priority selection criteria is determined. In this way, a multi-dimensional scientific fusion of "semantic similarity + evidence quality score" is completed based on fuzzy mathematical logic, which not only ensures objectivity but also conforms to the "weakest link effect" in clinical screening; at the same time, the output fuzzy fusion score has both high discriminative power and strong interpretability.
[0049] In one embodiment, the process of obtaining the multiple inflow and outflow screening conditions in step S1 includes:
[0050] Obtain the inbound and outbound criteria of natural language, and perform word segmentation, cleaning, and redundancy removal on the inbound and outbound criteria to obtain the processed inbound and outbound criteria;
[0051] NLP is used to extract key entities from the processed ingress and exclusion criteria, and the extracted key entities are standardized. The key entities include ingress and exclusion factors, reference ranges, and data sources.
[0052] The key entities are assembled according to the preset inbound and outbound screening condition format to obtain multiple initial inbound and outbound screening conditions, and the initial inbound and outbound screening conditions are modified to obtain multiple inbound and outbound screening conditions.
[0053] In this application, the inclusion and exclusion criteria of natural language are obtained, and the inclusion and exclusion criteria are segmented, cleaned, and redundant processing is performed to obtain processed inclusion and exclusion criteria. NLP is used to extract key entities from the processed inclusion and exclusion criteria, and the extracted key entities are standardized. The key entities include inclusion and exclusion factors, reference ranges, and data sources. The key entities are assembled according to the preset inclusion and exclusion screening condition format to obtain multiple initial inclusion and exclusion screening conditions. The initial inclusion and exclusion screening conditions are then modified to obtain multiple inclusion and exclusion screening conditions. This allows the inclusion and exclusion criteria of natural language to be recognized by computers.
[0054] An AI-based device for intelligent recruitment of trial participants, the device comprising:
[0055] The information acquisition module is used to acquire multi-source medical data and multiple inclusion / exclusion criteria for candidates. The multi-source medical data includes clinical data. The inclusion / exclusion criteria are obtained by decomposing the natural language inclusion / exclusion criteria using an AI model.
[0056] The time interval acquisition module is used to determine the time interval between the generation time of each clinical data and the current time.
[0057] The first calculation module is used to calculate the matching score of the candidate relative to each of the inclusion and exclusion screening conditions based on the multi-source medical data, and to calculate the evidence quality score of the candidate relative to each of the inclusion and exclusion screening conditions according to the time interval of each of the clinical data.
[0058] The second calculation module is used to calculate the weighted fusion score of the candidate relative to each of the admission and rejection criteria based on the matching score and the evidence quality score of the candidate relative to each of the admission and rejection criteria.
[0059] The judgment module is used to determine whether the candidate is a trial participant to be recruited based on the weighted fusion score of each of the inbound and outbound screening conditions.
[0060] The aforementioned AI-based device for intelligent recruitment of trial participants acquires multi-source medical data and multiple inclusion / exclusion criteria for candidates, determines the time interval between the generation time of each clinical data point and the current time, calculates the matching score of candidates relative to each inclusion / exclusion criterion based on the multi-source medical data, and calculates the evidence quality score of candidates relative to each inclusion / exclusion criterion based on the time interval of each clinical data point. This reduces the impact of outdated data on candidate selection and quantifies the time interval into a specific evidence quality score, representing the degree of decay of the value of clinical data. This makes the weighted fusion score obtained from the matching score and evidence quality score of candidates relative to each inclusion / exclusion criterion more accurate. Thus, when determining whether a candidate is a trial participant to be recruited based on the weighted fusion score of each inclusion / exclusion criterion, trial participants can be accurately selected while considering the timeliness of the data. Attached Figure Description
[0061] Figure 1 This is an application environment diagram of an AI-based intelligent recruitment method for trial participants in one embodiment;
[0062] Figure 2 This is a flowchart illustrating a method for intelligent recruitment of trial participants based on AI in one embodiment;
[0063] Figure 3 This is a schematic diagram of the overall process of an AI-based intelligent recruitment method for trial participants in one embodiment.
[0064] Figure 4 This is a structural block diagram of a device for intelligent recruitment of trial participants based on AI in one embodiment;
[0065] Figure 5 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0066] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0067] The AI-based intelligent recruitment method for trial participants provided in this application can be applied to, for example... Figure 1 In the application environment shown, terminal 102 interacts with server 104 via a wired / wireless channel. A data storage system can store the data that server 104 needs to process. Server 104 acquires multi-source medical data and multiple inclusion / exclusion criteria for candidates. The multi-source medical data includes clinical data, wherein the inclusion / exclusion criteria are obtained by decomposing the natural language inclusion / exclusion criteria using an AI model. Server 104 determines the time interval between the generation time of each clinical data point and the current time. Based on the multi-source medical data, server 104 calculates the matching score of the candidate relative to each inclusion / exclusion criterion, and calculates the evidence quality score of the candidate relative to each inclusion / exclusion criterion based on the time interval of each clinical data point. Server 104 calculates the weighted fusion score of the candidate relative to each inclusion / exclusion criterion based on the matching score and evidence quality score of the candidate relative to each inclusion / exclusion criterion. Based on the weighted fusion score of each inclusion / exclusion criterion, server 104 determines whether the candidate is a candidate for recruitment as a trial participant. Terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, etc. Server 104 can be a single server, a server cluster consisting of multiple servers, or a cloud computing center consisting of multiple servers.
[0068] In one embodiment, such as Figure 2 As shown, an AI-based method for intelligent recruitment of trial participants is provided, which can be applied to... Figure 1 Taking server 104 as an example, the following steps are included:
[0069] S1. Obtain multi-source medical data and multiple inclusion / exclusion criteria for candidates. The multi-source medical data includes clinical data. The inclusion / exclusion criteria are obtained by decomposing the natural language inclusion / exclusion criteria using an AI model.
[0070] Multi-source medical data refers to data authorized for use by candidates. Multi-source medical data encompasses multiple dimensions of medical data. Based on its organization, format standardization, and ease of machine parsing and processing, multi-source medical data can be categorized into structured and unstructured data. Structured data includes basic information from the HIS (Hospital Information System), laboratory test indicators from the LIS (Laboratory Information System), and mutation status from gene testing systems. Basic information includes, but is not limited to, the candidate's age, gender, and hospital ID (Identification). Laboratory test indicators include, but are not limited to, liver and kidney function tests and complete blood counts, and include the test timestamp. Mutation status includes, but is not limited to, EGFR (Epidermal Growth Factor Receptor), ALK (Anaplastic Lymphoma Kinase), and PD-L1 TPS (Programmed Death-Ligand 1 Tumor Proportion Score), and includes the report issuance date. Unstructured data includes medical records, surgical descriptions, and pathology reports from EMR (Electronic Medical Record) systems, and imaging diagnostic conclusions from PACS (Picture Archiving and Communication System), such as "no intracranial metastases" or "NSCLC Stage IV." Medical records, surgical descriptions, pathology reports, and imaging diagnostic conclusions all include the report issuance time. Both the detection timestamp and the report issuance time refer to the generation time.
[0071] Clinical data refers to all structured or unstructured information records generated during clinical practice, including medical treatment, nursing, prevention, and rehabilitation, concerning the health status and treatment activities of candidates. Examples include medical records, surgical descriptions, pathology reports, laboratory indicators, mutation status, and imaging diagnostic conclusions.
[0072] AI (Artificial Intelligence) models are large language models capable of breaking down natural language text. For example, Qwen-Med and DeepSeek are both AI models. The inclusion and exclusion criteria for natural language are unstructured textual descriptions.
[0073] Furthermore, multi-source medical data undergoes preprocessing to calculate matching scores. Data preprocessing includes data cleaning and standardization. For example, based on the reference range for creatinine, errors such as "creatinine > 1000 μmol / L" are corrected; the unit "mg / dL" is uniformly converted to "mmol / L"; redundant information in medical record text is removed using NLP (Natural Language Processing) tools; typos and medical terminology abbreviations are corrected, and "NSCLC" is completed as "non-small cell lung cancer". Another example is associating pathological diagnosis text (such as "advanced non-small cell lung cancer") with ICD-10 encoding (C34.900), uniformly labeling "wildtype" and "no definite mutation" as "negative," and uniformly labeling "fusion mutation" and "point mutation" as "positive"; date information such as "last treatment time" and "recurrence time" is converted to "YYYY-MM-DD" format for easier calculation of time intervals. Furthermore, when removing redundant information, information with the smallest time interval between the generation time and the current time is retained.
[0074] It should be noted that the multi-source medical data of the candidates involved in this application are all information and data authorized by the candidates.
[0075] S2. Determine the time interval between the generation time of each clinical data point and the current time.
[0076] Here, "generation time" refers to the time when the clinical data was obtained. "Current time" refers to the time at which the candidate is determined to be a potential trial participant. Determining a candidate's eligibility is based on inclusion and exclusion criteria, evaluating them to ascertain their eligibility for participation in a clinical trial. "Time interval" refers to the time difference between the generation time and the current time. For example, if the clinical data was generated on day A of this year, and the current time is day B of this year, then the time interval for this clinical data is BA days.
[0077] S3. Based on multi-source medical data, calculate the matching score of candidates relative to each inclusion / exclusion screening condition, and calculate the evidence quality score of candidates relative to each inclusion / exclusion screening condition according to the time interval of each clinical data.
[0078] Furthermore, the matching score of a candidate relative to each inclusion / exclusion criterion can be determined based on the similarity between the candidate's multi-source medical data and the inclusion / exclusion criterion, or whether the multi-source medical data falls within the reference range of the inclusion / exclusion criterion. For example, multi-source medical data d can be calculated using NLP. j The semantic similarity Sim with the in-and out selection criteria, if Then the matching score s i =100; if Then the matching score s i =50, otherwise the matching score is s i =0. For example, if the candidate's age falls within the reference range of the inclusion / exclusion criteria, then the matching score s = 0. i =100; if the candidate's age is outside the reference range of the inclusion / exclusion criteria or at the boundary of the reference range, then the matching score s i =0.
[0079] Furthermore, the quality of evidence is scored q. i Through formula calculate, The attenuation coefficient is... For time intervals.
[0080] Furthermore, the integrity of evidence score of each candidate relative to each inclusion / exclusion criterion is calculated. A total score is then calculated based on the candidate's matching score and integrity of evidence score relative to each inclusion / exclusion criterion. The total score is used to determine whether a candidate is a potential trial participant. Specifically, the formula for calculating the total score is as follows: Where N is the total score, hard rules refer to numerical inclusion / exclusion criteria, semantic rules refer to semantic inclusion / exclusion criteria, and all rules refer to all inclusion / exclusion criteria. r The matching score for the in and out selection criteria, s r(证据) The evidence completeness score for the inclusion and exclusion screening criteria, n 硬性 n represents the number of numerical in / out selection criteria. 语义 n is the number of semantic in / out filtering conditions. 所有 This represents the number of inbound and outbound screening criteria.
[0081] S4. Calculate the weighted fusion score of the candidate relative to each inclusion / exclusion criterion based on the candidate's matching score and evidence quality score relative to each inclusion / exclusion criterion.
[0082] Furthermore, the weighted fusion score of a candidate relative to the inclusion / exclusion criteria can be obtained by weighted summing of the candidate's matching score and evidence quality score relative to the inclusion / exclusion criteria. The weights of the matching score and evidence quality score of the inclusion / exclusion criteria can be preset.
[0083] S5. Based on the weighted fusion score of each ingress and egress screening condition, determine whether the candidate is a participant to be recruited for the trial.
[0084] Furthermore, the weighted fusion scores of candidates relative to each inclusion / exclusion screening criterion are weighted and summed to obtain a comprehensive score. This comprehensive score is used to determine whether a candidate is a potential participant in the trial. For example, if the comprehensive score is greater than the upper limit threshold, the candidate is identified as a potential participant; if the comprehensive score is less than the lower limit threshold, the candidate is not identified as a potential participant; if the comprehensive score is greater than the lower limit threshold but less than the upper limit threshold, the candidate is marked as pending review, and the review results will determine whether the candidate is a potential participant. Both the upper and lower limit thresholds are preset values, with the upper limit threshold being greater than the lower limit threshold.
[0085] The aforementioned AI-based method for intelligent recruitment of trial participants involves acquiring multi-source medical data and multiple inclusion / exclusion criteria for candidates, determining the time interval between the generation time and the current time for each clinical data point, calculating the matching score of candidates relative to each inclusion / exclusion criterion based on the multi-source medical data, and calculating the evidence quality score of candidates relative to each inclusion / exclusion criterion based on the time interval of each clinical data point. This reduces the impact of outdated data on candidate selection and quantifies the time interval into a specific evidence quality score, representing the degree of decay in the value of clinical data. This makes the weighted fusion score obtained from the matching score and evidence quality score of candidates relative to each inclusion / exclusion criterion more accurate. Thus, when determining whether a candidate is a trial participant to be recruited based on the weighted fusion score of each inclusion / exclusion criterion, trial participants can be accurately selected while considering the timeliness of the data.
[0086] In one embodiment, the inclusion / exclusion criteria include inclusion / exclusion factors and reference ranges for the inclusion / exclusion factors. The inclusion / exclusion criteria are numerical. Step S3, calculating the matching score of the candidate relative to each inclusion / exclusion criterion, includes:
[0087] The reference range in the numerical inclusion / exclusion screening criteria is matched with the first target data in the multi-source medical data. The type of the inclusion / exclusion factor to which the reference range belongs is consistent with the type of the first target data. The inclusion / exclusion factor is a feature used to determine whether a candidate meets the inclusion / exclusion screening criteria.
[0088] When the first target data is within the reference range, the matching score of the candidate relative to the numerical inclusion and exclusion screening criteria is determined as the first preset value;
[0089] When the first target data is not within the reference range, the matching score of the candidate relative to the numerical inclusion / exclusion criteria is determined as the second preset value, and the first preset value is greater than the second preset value.
[0090] Among them, numerical in / out selection criteria refer to in / out selection criteria whose reference range is a numerical value.
[0091] Both the first and second preset values are pre-set values. Further, the second preset value is 0, the first preset value is 100, and the matching score is s. i It can be done through s i =100×I means that I is an indicator function, I(true)=1, I(false)=0, I(true) indicates that the first target data is within the reference range, and I(false) indicates that the first target data is not within the reference range.
[0092] The ingress / exgress factors within the reference range are essentially the same as those in the numerical ingress / exgress screening criteria. Ingress / exgress factors include, but are not limited to, age, test indicators, and TPS scores. For example, if the reference range in the numerical ingress / exgress screening criteria is 17 to 85 years old, then age data from multi-source medical data should be used as the primary target data for matching.
[0093] Furthermore, if the numerical inclusion / exclusion criteria have multiple inclusion / exclusion factors, and each inclusion / exclusion factor is associated with a reference range, then first target data of the same type as each inclusion / exclusion factor are determined from the multi-source medical data, and the matching score of each first target data is determined; the matching scores of each first target data are weighted and summed to obtain the final matching score of the candidate relative to the numerical inclusion / exclusion criteria.
[0094] In this embodiment, when the first target data is within the reference range, the matching score of the candidate relative to the numerical entry / exit screening conditions is determined to be a first preset value. When the first target data is not within the reference range, the matching score of the candidate relative to the numerical entry / exit screening conditions is determined to be a second preset value. This can achieve accurate scoring of the candidate relative to the numerical entry / exit screening conditions.
[0095] In one embodiment, the inclusion / exclusion criteria are semantic inclusion / exclusion criteria, and step S3 involves calculating the matching score of the candidate relative to each inclusion / exclusion criterion, including:
[0096] Use NLP to calculate the semantic similarity between multi-source medical data and semantic in / out selection criteria;
[0097] Based on semantic similarity, through Calculate the matching score of candidates relative to semantic in / out criteria, s i Let represent the matching score of the candidate relative to semantic in / out filtering condition i, and Sim represent the semantic similarity. For steepness control parameters, This is the semantic matching threshold.
[0098] Among them, semantic inclusion and exclusion criteria refer to inclusion and exclusion criteria whose reference range is textual information. Inclusion and exclusion criteria include inclusion and exclusion factors and the reference range of inclusion and exclusion factors. The reference range of inclusion and exclusion factors does not specifically refer to the interval range formed by the data, but also includes the reference range formed by the textual information.
[0099] Calculating semantic similarity between multi-source medical data and semantic inclusion / exclusion criteria using NLP involves: extracting semantic information from the multi-source medical data that matches the type of inclusion / exclusion factor in the semantic inclusion / exclusion criteria; and calculating the semantic similarity between the semantic information and the reference range of the inclusion / exclusion factor in the semantic inclusion / exclusion criteria using NLP. For example, if the reference range of the inclusion / exclusion factor in the semantic inclusion / exclusion criteria is "cisplatin + etoposide chemotherapy in 2025", and the semantic information in the candidate's multi-source medical data that belongs to the same type as the inclusion / exclusion factor is "cisplatin + etoposide chemotherapy in 2023", then the similarity between "cisplatin + etoposide chemotherapy in 2025" and "cisplatin + etoposide chemotherapy in 2023" is calculated.
[0100] For the Sigmoid function, by using Calculate the matching score of candidates relative to semantic in / out criteria. This can smooth the boundary between semantic in / out criteria and multi-source medical data, that is, the boundary used in fuzzy inference to determine whether semantic in / out criteria are applicable.
[0101] Furthermore, the semantic similarity is between 0 and 1, the steepness control parameter is 10, and the semantic matching threshold is 0.7. This allows the matching score to drop rapidly to 0 when the semantic similarity is below 0.7, and significantly improves the matching score when the semantic similarity reaches or exceeds 0.8.
[0102] In this embodiment, based on semantic similarity, using Calculate the matching score of candidates relative to semantic in / out criteria. This can smooth the boundary between semantic in / out criteria and multi-source medical data, avoid false judgments due to hard thresholds, and thus obtain an undisturbed matching score.
[0103] In one embodiment, the inclusion / exclusion criteria have priority and inclusion / exclusion factors. Step S3, calculating the evidence quality score of a candidate relative to each inclusion / exclusion criterion, includes:
[0104] Determine the priority of the inclusion / exclusion screening criteria and the target type of the inclusion / exclusion factors;
[0105] Identify second target data of the target type from clinical data, and determine the time interval for the second target data;
[0106] Based on the time interval of the second target data and the priority of the inflow and outflow screening conditions, through Calculate the evidence quality score of candidates relative to the inclusion / exclusion criteria, q i The evidence quality score of the candidate relative to the inclusion / exclusion screening criterion i. For time intervals, This is the attenuation coefficient corresponding to the priority of the inflow and outflow screening conditions.
[0107] The priority of inclusion / exclusion screening criteria can be set by staff. Further, inclusion / exclusion screening criteria are divided into semantic inclusion / exclusion criteria and numerical inclusion / exclusion criteria. Numerical inclusion / exclusion criteria have a priority of first or second, while semantic inclusion / exclusion criteria have a priority of second or third. First priority is higher than second priority, and second priority is higher than first priority. First-priority inclusion / exclusion criteria directly determine a candidate's eligibility for trial participation. Examples of first-priority inclusion / exclusion criteria include "EGFR / ALK / ROS1 negative" and "NSCLC stage IIIb-IV". Second-priority inclusion / exclusion criteria significantly affect a candidate's suitability for trial participation. Examples of second-priority inclusion / exclusion criteria include "PD-L1 TPS ≥ 1%" and "ECOG score 0-1". Second-priority inclusion / exclusion criteria need to be combined with other inclusion / exclusion screening criteria to comprehensively determine whether a candidate is a suitable trial participant. Examples of second-priority inclusion / exclusion criteria include "no history of infection in the past month" and "normal blood routine indicators".
[0108] Inclusion / exclusion factors are the features used in inclusion / exclusion screening to determine whether a candidate meets the specific inclusion / exclusion criteria. Inclusion / exclusion factors include, but are not limited to, age, test indicators, and TPS scores. For example, the target type for age as an inclusion / exclusion factor is age, the target type for test indicators is the test indicator, and the target type for TPS scores is the TPS score.
[0109] Furthermore, the attenuation coefficient corresponding to the first priority is less than that corresponding to the second priority, and the attenuation coefficient corresponding to the second priority is less than that corresponding to the third priority. For example, if the attenuation coefficient corresponding to the first priority is 0.005, the attenuation coefficient corresponding to the second priority is 0.01, and the attenuation coefficient corresponding to the third priority is 0.02, the evidence quality score is 98.5 when the priority of the inclusion / exclusion screening condition is the first priority and the time interval is 3 days; and the evidence quality score is 13.5 when the priority of the inclusion / exclusion screening condition is the third priority and the time interval is 100 days.
[0110] In this embodiment, based on the time interval of the second target data and the priority of the in-and-out screening conditions, the method is used... Calculate the evidence quality score of candidates relative to the inclusion / exclusion criteria. This allows for the introduction of a timeliness indicator, i.e., time interval, so that older clinical data has a lower evidence quality score, thereby reducing the interference of older clinical data on inclusion / exclusion screening.
[0111] In one embodiment, the in-and-out screening criteria have priorities, and the formula for calculating the weighted fusion score in step S4 is:
[0112] ;
[0113] Among them, w i s is the weighted fusion score of the candidate relative to the inclusion / exclusion selection criterion i. i q represents the matching score of a candidate relative to the in / out selection criterion i. i k is the evidence quality score of the candidate relative to the inclusion / exclusion selection criterion i. i This is the priority coefficient corresponding to the priority of the in / out selection condition i.
[0114] Furthermore, the priorities include first priority, second priority, and third priority, with third priority being lower than second priority and first priority, and second priority being lower than first priority. Furthermore, the priority coefficient for first priority is 3, the priority coefficient for second priority is 2, and the priority coefficient for third priority is 1.
[0115] In this embodiment, the formula is used. Calculate the weighted fusion score of candidates relative to the inclusion and exclusion criteria. This allows for the introduction of exponential penalties or weights for higher-priority inclusion and exclusion criteria, strengthening their impact on the weighted fusion score. This ensures that if a candidate's multi-source medical data violates the higher-priority inclusion and exclusion criteria, the weighted fusion score will decrease, thus mathematically forcing these higher-priority inclusion and exclusion criteria to remain in place.
[0116] In one embodiment, the in / out selection criteria have priorities, including a first priority, a second priority, and a third priority, with the third priority being lower than the second and first priorities. Step S5 includes:
[0117] Calculate the fuzzy fusion score of the candidate relative to the third priority in the in / out selection criteria;
[0118] Based on the weighted fusion scores of the first-priority entry and exit screening conditions, the second-priority entry and exit screening conditions, and the third-priority entry and exit screening conditions, through... Calculate the raw weighted total score (RawSum) of the candidates; where r iLet i represent the in-and-out selection condition in the first condition set, which consists of in-and-out selection conditions with priority P1 and priority P2; w i r is the weighted fusion score of the in-out selection condition i in the first condition set. j This indicates the in / out selection criteria j and f with priority P3 as the third priority. j The fuzzy fusion score of the candidate relative to the in / out selection condition j with priority P3 as the third priority.
[0119] Based on the candidate's original weighted total score and the weighted fusion score / matching score of the in-and out selection criteria, it is determined whether the candidate is a trial participant to be recruited.
[0120] Specifically, determining whether a candidate is a trial participant to be recruited based on the candidate's original weighted total score and the weighted fusion score / matching score of the inclusion / exclusion screening criteria includes: determining whether a candidate is a trial participant to be recruited based on the candidate's original weighted total score and the weighted fusion score of the inclusion / exclusion screening criteria, or determining whether a candidate is a trial participant to be recruited based on the candidate's original weighted total score and the matching score of the inclusion / exclusion screening criteria.
[0121] Furthermore, if a candidate's original weighted total score is greater than the upper limit threshold of the original weighted total score, and the weighted fusion score of the first priority entry / exit screening condition is greater than the fusion score threshold, the candidate is determined to be a trial participant to be recruited; if a candidate's original weighted total score is greater than the upper limit threshold of the original weighted total score, and the matching score of the first priority entry / exit screening condition is greater than the upper limit threshold of the matching score, the candidate is determined to be a trial participant to be recruited.
[0122] Furthermore, if a candidate's original weighted total score is between the lower threshold and the upper threshold of the original weighted total score, or if the weighted fusion score of the third priority entry / exit screening condition is less than the fusion score threshold, or if the matching score of the third priority entry / exit screening condition is less than the lower threshold of the matching score, then the candidate will be marked as pending review, so as to determine whether the candidate is a trial participant to be recruited through the review results.
[0123] Furthermore, if the candidate's original weighted total score is less than the lower limit threshold, or if the weighted fusion score of the in / out selection condition with priority 1 is less than the fusion score threshold, or if the matching score of the in / out selection condition with priority 1 is less than the lower limit threshold of the matching score, then the candidate is determined not to be a trial participant to be recruited.
[0124] In this embodiment, the weighted fusion scores based on the first priority entry / exit screening conditions, the second priority entry / exit screening conditions, and the third priority entry / exit screening conditions are used to... Calculate the raw weighted total score of the candidates to obtain the accurate raw weighted total score.
[0125] In one embodiment, determining whether a candidate is a potential trial participant based on the candidate's initial composite score and the weighted fusion score / matching score of the inclusion / exclusion screening criteria includes:
[0126] use Calculate the conflict intensity *c* of the candidates; where *C* is the second set of conditions for inclusion / exclusion selection based on matching scores greater than the upper limit threshold, *D* is the third set of conditions for inclusion / exclusion selection based on matching scores less than the lower limit threshold, and *k* is the number of candidates. i k is the priority coefficient corresponding to the priority of the ingress / outgress filtering condition i in the second condition set. j k is the priority coefficient corresponding to the priority of the ingress / outgress filter condition j in the third condition set. m The priority coefficients corresponding to the priority of the inbound and outbound screening conditions m in the third and fourth condition sets;
[0127] When the conflict intensity is greater than the third preset value, through Calculate the overall score;
[0128] When the conflict intensity is between the third and fourth preset values, through Calculate the overall score. If the third preset value is greater than the fourth preset value, y is a preset multiple.
[0129] Based on the candidate's overall score and the weighted fusion score / matching score of the inclusion and exclusion screening criteria, it is determined whether the candidate is a trial participant to be recruited.
[0130] Among them, determining whether a candidate is a trial participant to be recruited based on the candidate's overall score and the weighted fusion score / matching score of the inclusion and exclusion screening criteria includes: determining whether a candidate is a trial participant to be recruited based on the candidate's overall score and the weighted fusion score of the inclusion and exclusion screening criteria, or determining whether a candidate is a trial participant to be recruited based on the candidate's overall score and the matching score of the inclusion and exclusion screening criteria.
[0131] Furthermore, if a candidate's overall score is greater than the upper limit threshold and the weighted fusion score of the first priority entry / exit screening condition is greater than the fusion score threshold, then the candidate is determined to be a trial participant to be recruited; if a candidate's overall score is greater than the upper limit threshold and the matching score of the first priority entry / exit screening condition is greater than the upper limit threshold of the matching score, then the candidate is determined to be a trial participant to be recruited.
[0132] Furthermore, if a candidate's overall score is between the lower and upper score thresholds, or if the weighted fusion score of the third priority entry / exit screening condition is less than the fusion score threshold, or if the matching score of the third priority entry / exit screening condition is less than the lower matching score threshold, then the candidate will be marked as pending review, so as to determine whether the candidate is a trial participant to be recruited based on the review results.
[0133] Furthermore, if a candidate's overall score is less than the lower limit threshold, or if the weighted fusion score of an in / out selection condition with priority 1 is less than the fusion score threshold, or if the matching score of an in / out selection condition with priority 1 is less than the lower limit threshold of the matching score, then the candidate is determined not to be a participant to be recruited for the trial.
[0134] Furthermore, when the conflict intensity is less than or equal to the fourth preset value, the candidate is determined not to be a participant in the trial to be recruited.
[0135] Furthermore, when the inclusion and exclusion criteria are numerical inclusion and exclusion criteria, the matching score of the candidate relative to the numerical inclusion and exclusion criteria is a first preset value or a second preset value. The first preset value is greater than the upper limit threshold of the matching score, and the second preset value is less than the lower limit threshold of the matching score.
[0136] In this embodiment, by using Calculate the conflict intensity of candidates and determine their comprehensive score based on the conflict intensity. This allows for a refined and quantitative assessment of whether a candidate is a participant in the trial, effectively identifying and penalizing logical contradictions or inconsistencies between the inclusion and exclusion screening criteria, thereby improving the accuracy, safety, and automation of the screening process.
[0137] In one embodiment, calculating the fuzzy fusion score of a candidate relative to an in / out screening criterion with a priority of third priority includes:
[0138] Obtain the first membership function, second membership function and third membership function of each preset membership level, and determine the target semantic similarity and target evidence quality score of the candidate relative to the third priority entry and exit screening conditions. The first membership function is associated with the target semantic similarity, the second membership function is associated with the target evidence quality score, and the third membership function is associated with the fuzzy fusion score.
[0139] From the first membership function, determine at least one first function corresponding to the target semantic similarity; from the second membership function, determine at least one second function corresponding to the target evidence quality score; based on the target semantic similarity and the target evidence quality score, determine the target level of the fuzzy fusion score; and from the third membership function, determine at least one third function corresponding to the target level.
[0140] Based on the target semantic similarity, the first membership degree is calculated using each first function; based on the target evidence quality score, the second membership degree is calculated using each second function.
[0141] Based on the minimum value of the first and second membership degrees of each membership level, the corresponding third function is trimmed to obtain the fuzzy function of each membership level.
[0142] By superimposing the fuzzy functions of each membership level, the aggregation function is obtained;
[0143] Calculate the centroid of the closed region enclosed by the aggregation function and the horizontal axis, and determine the fuzzy fusion score of the candidate relative to the third priority entry and exit screening criteria based on the coordinate value of the centroid.
[0144] The membership levels include three grades: low, medium, and high. For example, the first membership function for the low grade is... The first membership function of the medium is The first membership function of the higher order is .
[0145] Target semantic similarity can be calculated using NLP. Specifically, NLP is used to calculate the target semantic similarity between the candidate's multi-source medical data and the third priority inbound / outbound screening criteria.
[0146] The target evidence quality score can be obtained through We obtain, where q i The evidence quality score of the candidate relative to the third priority inclusion / exclusion criterion i. For time intervals, This is the attenuation coefficient corresponding to the third priority.
[0147] The association between the first membership function and the target semantic similarity means that the first membership function is calculated based on the target semantic similarity.
[0148] The association between the second membership function and the target evidence quality score means that the second membership function is calculated based on the target evidence quality score.
[0149] The association between the third membership function and the fuzzy fusion score means that the third membership function is a function used to output the fuzzy fusion score.
[0150] At least one first function corresponding to the target semantic similarity refers to a membership function whose first membership degree is not zero after substituting the target semantic similarity into the first membership function. For example, continuing the example above, if the target semantic similarity is 0.85, then the first function is... and .
[0151] Determining the target level of the fuzzy fusion score based on target semantic similarity and target evidence quality score includes: determining the target level of the fuzzy fusion score according to preset rules in the fuzzy rule base, based on the membership levels of the first function corresponding to the target semantic similarity and the second function corresponding to the target evidence quality score. For example, the fuzzy rule base example is: if the membership level of the first function corresponding to the target semantic similarity is high and the membership level of the second function corresponding to the target evidence quality score is high, then the target level is high; if the membership level of the first function corresponding to the target semantic similarity is medium and the membership level of the second function corresponding to the target evidence quality score is medium, then the target level is medium; if the membership level of the first function corresponding to the target semantic similarity is low and the membership level of the second function corresponding to the target evidence quality score is low, then the target level is low.
[0152] At least one third function corresponding to the target level refers to the third membership function whose membership level is consistent with the target level.
[0153] The first membership degree is obtained by substituting the target semantic similarity into the first function. The second membership degree is obtained by substituting the target evidence quality score into the second function.
[0154] The number of minimum values is the same as the number of first memberships or the number of second memberships. The membership level of the first membership is the membership level of the first function that yields that membership, and the membership level of the second membership is the membership level of the second function that yields that membership.
[0155] Each membership level corresponds to a third function for both the first and second membership degrees. The corresponding third function refers to the function that determines the target level based on the target semantic similarity and target evidence quality score corresponding to the first and second membership degrees. For example, if the target level determined by the target semantic similarity and target evidence quality score corresponding to a medium first membership degree A and a medium membership degree B is the same as the third function corresponding to the first membership degree A and the second membership degree B.
[0156] Pruning refers to removing the portion of the third function whose output value is higher than the minimum value. The remaining portion after pruning is the fuzzy function.
[0157] Superimposing fuzzy functions at each membership level essentially synthesizes the various fuzzy functions, but the actual fuzzy function remains unchanged.
[0158] Furthermore, the fuzzy fusion score of the candidate relative to the third priority entry / exit screening criteria is calculated using the x-axis / y-axis value of the centroid.
[0159] At least one second function corresponding to the target evidence quality score refers to the second membership function in which the second membership degree output after substituting the target evidence quality score is not 0.
[0160] In this embodiment, by determining the target semantic similarity and target evidence quality score of the candidate relative to the third priority entry / exit screening criteria, at least one first function corresponding to the target semantic similarity is determined from the first membership function, and at least one second function corresponding to the target evidence quality score is determined from the second membership function; based on the target semantic similarity and target evidence quality score, the target level of the fuzzy fusion score is determined, and at least one third function corresponding to the target level is determined from the third membership function; based on the target semantic similarity, the first membership degree is calculated using each first function; based on the target evidence quality score, the second membership degree is calculated using each second function. Two membership degrees are used; based on the minimum value of the first and second membership degrees at each membership degree level, the corresponding third function is trimmed to obtain fuzzy functions at each membership degree level; the fuzzy functions at each membership degree level are superimposed to obtain the aggregation function; the centroid of the closed region enclosed by the aggregation function and the horizontal axis is calculated, and based on the coordinate value of the centroid, the fuzzy fusion score of the candidate relative to the third priority screening condition is determined. In this way, a multi-dimensional scientific fusion of "semantic similarity + evidence quality score" is completed based on fuzzy mathematical logic, which not only ensures objectivity but also conforms to the "weakest link effect" in clinical screening; at the same time, the output fuzzy fusion score has both high discriminative power and strong interpretability.
[0161] In one embodiment, the process of obtaining multiple in-and-out screening conditions in step S1 includes:
[0162] Obtain the in and out criteria of natural language, and perform word segmentation, cleaning, and redundancy removal on the in and out criteria to obtain the processed in and out criteria;
[0163] NLP is used to extract key entities from processed inclusion and exclusion criteria, and the extracted key entities are standardized. Key entities include inclusion and exclusion factors, reference ranges, and data sources.
[0164] The key entities are assembled according to the preset inbound and outbound screening criteria format to obtain multiple initial inbound and outbound screening criteria. The initial inbound and outbound screening criteria are then modified to obtain multiple new inbound and outbound screening criteria.
[0165] In this embodiment, multiple ingress and outgress screening conditions are obtained through an AI model.
[0166] The data sources are the medical systems from which the data is acquired. For example, "age comes from HIS", "PD-L1 TPS comes from the pathology testing system", "brain metastasis status comes from PACS", and "treatment history comes from EMR".
[0167] Standardization refers to unifying the names or descriptions of key entities. Specifically, the extracted key entities are mapped to standard terminology sets such as UMLS (Unified Medical Language System) / SNOMED-CT (Systematized Nomenclature of Medicine – Clinical Terms) to obtain unified key entities.
[0168] The key entities are assembled according to the preset inbound and outbound screening criteria format, and the resulting multiple initial inbound and outbound screening criteria are in the form of {inbound / outbound factor, reference range, data source}.
[0169] The specific steps to modify the initial entry and exit criteria are as follows: verify the consistency between the initial entry and exit criteria and the entry and exit standards through a rule engine or LLM (Large Language Model) to correct ambiguities in the initial entry and exit criteria and obtain multiple entry and exit criteria.
[0170] Furthermore, the inbound and outbound criteria include multiple items. In response to the checkbox operation, the selected items are determined, and then the selected items are segmented, cleaned, and redundant processing is removed to obtain the processed inbound and outbound criteria. Further, the selected items are stored as an inbound and outbound criteria template so that when the inbound and outbound criteria in the template are needed later, the template can be directly called for reuse, reducing the checkbox steps and saving time.
[0171] In this embodiment, the in and out criteria of natural language are obtained, and the in and out criteria are segmented, cleaned, and redundant processing is performed to obtain processed in and out criteria. NLP is used to extract key entities from the processed in and out criteria, and the extracted key entities are standardized. The key entities include in and out factors, reference ranges, and data sources. The key entities are assembled according to the preset in and out screening condition format to obtain multiple initial in and out screening conditions. The initial in and out screening conditions are then modified to obtain multiple in and out screening conditions. This allows the in and out criteria of natural language to be recognized by computers.
[0172] In a specific application scenario, the overall flowchart of the method for intelligent recruitment of trial participants based on AI is as follows: Figure 3 As shown. Specifically, the server acquires multi-source medical data and natural language inclusion / exclusion criteria for candidates. The server performs data preprocessing on the multi-source medical data, including but not limited to cleaning, standardization, and time interval labeling. The server uses generative AI to decompose the natural language inclusion / exclusion criteria into multiple inclusion / exclusion screening conditions. The server matches the reference range in the numerical inclusion / exclusion screening conditions with the first target data in the multi-source medical data. The type of the inclusion / exclusion factor to which the reference range belongs is consistent with the type of the first target data. The inclusion / exclusion factor is a feature used to determine whether a candidate meets the inclusion / exclusion screening conditions. When the first target data is within the reference range, the matching score of the candidate relative to the numerical inclusion / exclusion screening conditions is determined to be a first preset value; when the first target data is not within the reference range, the matching score of the candidate relative to the numerical inclusion / exclusion screening conditions is determined to be a second preset value. The server uses NLP to calculate the semantic similarity between the multi-source medical data and the semantic inclusion / exclusion screening conditions; based on the semantic similarity, through... Calculate the matching score of candidates relative to semantic in / out criteria. The server determines the priority of in / out criteria and the target type of in / out factors;
[0173] Identify second target data of the target type from clinical data, and determine the time interval for the second target data;
[0174] Based on the time interval of the second target data and the priority of the inflow and outflow screening conditions, through Calculate the evidence quality score of candidates relative to the inclusion / exclusion criteria. The server uses... Calculate the weighted fusion score of each candidate relative to each inclusion / exclusion criterion. The server calculates the fuzzy fusion score of each candidate relative to the inclusion / exclusion criterion with the third priority; based on the weighted fusion scores of the inclusion / exclusion criterions with the first priority, the second priority, and the third priority, the candidate is then... The server calculates the initial composite score for each candidate. The conflict intensity of candidates is calculated. When the conflict intensity is greater than the third preset value, the initial comprehensive score is used as the comprehensive score. When the conflict intensity is between the third and fourth preset values, the initial comprehensive score is multiplied by a preset multiple to obtain the comprehensive score. The third preset value is greater than the fourth preset value. Based on the candidate's comprehensive score and the weighted fusion score / matching score of the inclusion and exclusion screening criteria, it is determined whether the candidate is a participant to be recruited for the trial. A visual report containing the screening process and screening results is output. The screening process is the process of calculating the comprehensive score, and the screening result is the result of whether the candidate is a participant to be recruited for the trial.
[0175] In a specific application, the inclusion / exclusion criterion S is the multi-source medical data set of the candidates. (Including detection timestamp), steepness control parameters The semantic matching threshold is 10. The attenuation coefficient is 0.7; the ingress / exgress screening condition with the highest priority is [value missing]. The priority coefficient is 0.005. The attenuation coefficient is 3; the second priority ingress / exgress screening condition is attenuated by 3. The priority coefficient is 0.01. The attenuation coefficient is 2; the third priority inbound / outbound screening condition is attenuated by 2. The priority coefficient is 0.02. Set of ingress and outgress filtering criteria: 1. , where r i ={Inclusion / Exclusion Factor, Reference Range, Data Source, Priority}. Taking the inclusion / exclusion screening criteria r1={genetic testing report, EGFR negative, P1} and the time interval as 30 days as an example, the score results are: , , Taking the inclusion / exclusion screening criteria r2={EMR records, no infection history in the past month, P2}, time interval of 25 days, and semantic similarity of 0.85 as an example, the calculation results are as follows: , , If the inclusion / exclusion screening conditions exist, r3 = (LIS report, "normal blood routine", P3), and s3 = 0, then the conflict set is... , Intensity of conflict Since 0.67 is greater than the third preset value of 0.3, the final comprehensive score is... f3 is the fuzzy fusion score of the third priority inbound / outbound screening condition, which is assumed to be f3=30 here.
[0176] In a specific application, inclusion and exclusion criteria included (EGFR / ALK / ROS1 negative, NSCLC stage IIIb-IV, P1), (age 18-75 years, PD-L1 TPS ≥1%, ECOG 0-1, P2), and (no infection in the past month, normal blood count, P3). ROS1 is a proto-oncogene, NSCLC refers to non-small cell lung cancer, and ECOG is the performance status score from the Tumor Collaboration Group. The candidate data consisted of multi-source medical data. Steepness control parameters The semantic matching threshold is 10. The attenuation coefficient is 0.7; the ingress / exgress screening condition with the highest priority is [value missing]. The priority coefficient is 0.005. The attenuation coefficient is 3; the second priority ingress / exgress screening condition is attenuated by 3. The priority coefficient is 0.01. The attenuation coefficient is 2; the third priority inbound / outbound screening condition is attenuated by 2. The priority coefficient is 0.02. The inclusion and exclusion criteria for the first priority P1 are {EGFR / ALK / ROS1 negative (interval of 30 days), NSCLC stage IV (interval of 15 days)}; the inclusion and exclusion criteria for the second priority P2 are {age 62 years (interval of 1 day), PD-L1 TPS=60% (interval of 20 days), ECOG=1 (interval of 7 days)}; the inclusion and exclusion criteria for the third priority P3 are: no infection in the past month (EMR record, Sim=0.9, interval of 25 days), and normal blood routine (interval of 5 days).
[0177] The matching scores for the in-and-out selection criteria of priority P1 are as follows: , ; , The weighted matching score of the first priority P1 in the in-out selection criteria is 23394.8 + 25215.9 = 48610.7.
[0178] The matching scores for the in-and-out selection criteria of the second priority P2 are as follows: , ; , ; , The weighted matching score for the second priority P2 inbound / outbound selection criteria is 9900 + 8187 + 9324 = 27411.
[0179] The fuzzy fusion scores for the in-and-out selection criteria of the third priority P3 are as follows: , ; , The weighted fuzzy fusion score for the third priority P3 inbound / outbound screening criteria is 2223.5 + 3329.7 = 5553.2. The conflict intensity is greater than the third preset value.
[0180] The final overall score is A candidate whose weighted fusion score is greater than the fusion score threshold and whose priority is first is determined to be a trial participant to be recruited.
[0181] Output a visualization report, which includes detailed annotations for each item: "Inclusion / Exclusion Screening Criteria - Priority - Match Score - Evidence Quality Score - Calculation Process," such as, "EGFR Negative (P1, Match Score 100, Evidence Quality Score 86.07, Weighted Fusion Score 23394.8, Calculation Logic: 100 × 86.07 × e..." 1 ()); For the third priority selection criteria, the semantic similarity, matching score, evidence quality score, and the report number corresponding to the data source of each indicator are labeled.
[0182] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0183] Based on the same inventive concept, this application also provides an apparatus for implementing the AI-based intelligent recruitment of trial participants method described above. The solution provided by this apparatus is similar to the implementation scheme described in the above method; therefore, the specific limitations of one or more apparatus embodiments for AI-based intelligent recruitment of trial participants provided below can be found in the limitations of the AI-based intelligent recruitment of trial participants method described above, and will not be repeated here.
[0184] In one embodiment, such as Figure 4 As shown, an AI-based device for intelligent recruitment of trial participants is provided, comprising:
[0185] The information acquisition module is used to acquire multi-source medical data and multiple inclusion / exclusion screening criteria for candidates. The multi-source medical data includes clinical data. The inclusion / exclusion screening criteria are obtained by decomposing the natural language inclusion / exclusion criteria using an AI model.
[0186] The time interval acquisition module is used to determine the time interval between the generation time of each clinical data point and the current time.
[0187] The first calculation module is used to calculate the matching score of candidates relative to each inclusion and exclusion screening condition based on multi-source medical data, and to calculate the evidence quality score of candidates relative to each inclusion and exclusion screening condition according to the time interval of each clinical data.
[0188] The second calculation module is used to calculate the weighted fusion score of the candidate relative to each inclusion and exclusion criteria based on the candidate's matching score and evidence quality score relative to each inclusion and exclusion criteria.
[0189] The judgment module is used to determine whether a candidate is a participant to be recruited for the trial, based on the weighted fusion score of each inbound and outbound screening condition.
[0190] The various modules in the aforementioned AI-based intelligent recruitment device for trial participants can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module.
[0191] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 5 As shown, the computer device includes a processor, memory, and a network interface connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database stores various types of data. The network interface communicates with external terminals via a network connection. When executed by the processor, the computer program implements an AI-based method for intelligent recruitment of trial participants.
[0192] Those skilled in the art will understand that Figure 5The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0193] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.
[0194] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.
[0195] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0196] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.
[0197] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.
[0198] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0199] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A method for intelligent recruitment of trial participants based on AI, characterized in that, The method includes: S1. Obtain multi-source medical data and multiple inclusion / exclusion criteria for candidates, wherein the multi-source medical data includes clinical data; and the inclusion / exclusion criteria are obtained by decomposing the natural language inclusion / exclusion criteria using an AI model. S2. Determine the time interval between the generation time and the current time for each of the clinical data; S3. Based on the multi-source medical data, calculate the matching score of the candidate relative to each of the inclusion and exclusion screening conditions, and calculate the evidence quality score of the candidate relative to each of the inclusion and exclusion screening conditions according to the time interval of each of the clinical data. S4. Calculate the weighted fusion score of the candidate relative to each of the entry and exit criteria based on the matching score and the evidence quality score of the candidate relative to each of the entry and exit criteria. S5. Based on the weighted fusion score of each of the ingress and outgress screening conditions, determine whether the candidate is a trial participant to be recruited; The inclusion / exclusion criteria have priority and inclusion / exclusion factors. Step S3 involves calculating the evidence quality score of the candidate relative to each inclusion / exclusion criterion, including: Determine the priority of the ingress / exgress screening criteria and the target type of the ingress / exgress factors; From the clinical data, determine second target data of type target type, and determine the time interval of the second target data; Based on the time interval of the second target data and the priority of the inflow and outflow screening conditions, through Calculate the evidence quality score of the candidate relative to the inclusion / exclusion criteria, q. i The evidence quality score of the candidate relative to the inclusion / exclusion screening criterion i. For time intervals, This is the attenuation coefficient corresponding to the priority of the inflow and outflow screening conditions.
2. The method according to claim 1, characterized in that, The inclusion / exclusion criteria include inclusion / exclusion factors and reference ranges for the inclusion / exclusion factors. The inclusion / exclusion criteria are numerical inclusion / exclusion criteria. Step S3 involves calculating the matching score of the candidate relative to each inclusion / exclusion criterion, including: The reference range of the ingress / exgress factor in the numerical ingress / exgress screening criteria is matched with the first target data in the multi-source medical data. The type of the ingress / exgress factor to which the reference range belongs is consistent with the type of the first target data. The ingress / exgress factor is a feature used to determine whether a candidate meets the ingress / exgress screening criteria. When the first target data is within the reference range, the matching score of the candidate relative to the numerical in / out screening criteria is determined to be a first preset value; When the first target data is not within the reference range, the matching score of the candidate relative to the numerical ingress / outgress screening criteria is determined to be a second preset value, and the first preset value is greater than the second preset value.
3. The method according to claim 1, characterized in that, The in-and-out selection criteria are semantic in-and-out selection criteria. Step S3 involves calculating the matching score of the candidate relative to each of the in-and-out selection criteria, including: NLP was used to calculate the semantic similarity between the multi-source medical data and the semantic in / out screening criteria; Based on the semantic similarity, through Calculate the matching score of the candidate relative to the semantic in / out filtering criteria, s i Sim represents the matching score of the candidate relative to the semantic in / out filtering condition i, where Sim is the semantic similarity. For steepness control parameters, This is the semantic matching threshold.
4. The method according to claim 1, characterized in that, The ingress and egress screening conditions have priorities, and the formula for calculating the weighted fusion score in step S4 is as follows: ; Among them, w i s is the weighted fusion score of the candidate relative to the in / out screening condition i. i q represents the matching score of the candidate relative to the in / out selection criterion i. i k is the evidence quality score of the candidate relative to the inclusion / exclusion screening criterion i. i This is the priority coefficient corresponding to the priority of the in / out selection condition i.
5. The method according to claim 1, characterized in that, The inbound / outbound screening conditions include inbound / outbound screening conditions with a first priority, inbound / outbound screening conditions with a second priority, and inbound / outbound screening conditions with a third priority, wherein the third priority is lower than the second priority and the first priority. Step S5 includes: Calculate the fuzzy fusion score of the candidate relative to the in / out selection criteria with the third priority; Based on the weighted fusion score of the in / out selection criteria with priority of first priority, the weighted fusion score of the in / out selection criteria with priority of second priority, and the fuzzy fusion score of the in / out selection criteria with priority of third priority, through... Calculate the raw weighted total score (RawSum) of the candidates; where r i Let i represent the in-and-out selection condition in the first condition set, which consists of in-and-out selection conditions with priority P1 and priority P2; w i r is the weighted fusion score of the in-out selection condition i in the first condition set. j This indicates the in / out selection criteria j and f with priority P3 as the third priority. j The fuzzy fusion score of the candidate relative to the in / out selection condition j with priority P3 as the third priority. Based on the candidate's original weighted total score and the weighted fusion score / matching score of the in-and out selection criteria, it is determined whether the candidate is a trial participant to be recruited.
6. The method according to claim 5, characterized in that, The determination of whether a candidate is a participant to be recruited for the trial, based on the candidate's original weighted total score and the weighted fusion score / matching score of the inclusion / exclusion screening criteria, includes: use Calculate the conflict intensity *c* of the candidates; where *C* is the second set of conditions for inclusion / exclusion selection based on matching scores greater than the upper limit threshold, *D* is the third set of conditions for inclusion / exclusion selection based on matching scores less than the lower limit threshold, and *k* is the number of candidates. i k is the priority coefficient corresponding to the priority of the ingress / outgress filtering condition i in the second condition set. j k is the priority coefficient corresponding to the priority of the ingress / outgress filter condition j in the third condition set. m The priority coefficients corresponding to the priority of the inbound and outbound screening conditions m in the third and fourth condition sets; When the conflict intensity is greater than a third preset value, by Calculate the overall score M; When the conflict intensity is between the third preset value and the fourth preset value, by Calculate the comprehensive score M, where the third preset value is greater than the fourth preset value, and y is a preset multiple; Based on the candidate's overall score and the weighted fusion score / matching score of the inclusion and exclusion screening criteria, it is determined whether the candidate is a trial participant to be recruited.
7. The method according to claim 5, characterized in that, The calculation of the fuzzy fusion score of the candidate relative to the in / out selection criteria with the third priority includes: Obtain the first membership function, second membership function and third membership function of each preset membership level, and determine the target semantic similarity and target evidence quality score of the candidate relative to the third priority of the entry and exit screening conditions. The first membership function is associated with the target semantic similarity, the second membership function is associated with the target evidence quality score, and the third membership function is associated with the fuzzy fusion score. From the first membership function, at least one first function corresponding to the target semantic similarity is determined; from the second membership function, at least one second function corresponding to the target evidence quality score is determined; based on the target semantic similarity and the target evidence quality score, the target level of the fuzzy fusion score is determined; and from the third membership function, at least one third function corresponding to the target level is determined. Based on the target semantic similarity, a first membership degree is calculated using each of the first functions; based on the target evidence quality score, a second membership degree is calculated using each of the second functions. Based on the minimum value of the first membership degree and the second membership degree of each membership degree level, the corresponding third function is trimmed to obtain the fuzzy function of each membership degree level. The aggregation function is obtained by superimposing the fuzzy functions of each membership level; Calculate the centroid of the closed region enclosed by the aggregation function and the horizontal axis, and determine the fuzzy fusion score of the candidate relative to the in / out screening condition of the third priority based on the coordinate value of the centroid.
8. The method according to claim 1, characterized in that, The process of obtaining the multiple in-row and out-of-row screening conditions in step S1 includes: Obtain the inbound and outbound criteria of natural language, and perform word segmentation, cleaning, and redundancy removal on the inbound and outbound criteria to obtain the processed inbound and outbound criteria; NLP is used to extract key entities from the processed ingress and exclusion criteria, and the extracted key entities are standardized. The key entities include ingress and exclusion factors, reference ranges, and data sources. The key entities are assembled according to the preset inbound and outbound screening condition format to obtain multiple initial inbound and outbound screening conditions, and the initial inbound and outbound screening conditions are modified to obtain multiple inbound and outbound screening conditions.
9. An apparatus for intelligent recruitment of trial participants based on AI, used to perform the method according to any one of claims 1-8, characterized in that, The device includes: The information acquisition module is used to acquire multi-source medical data and multiple inclusion / exclusion screening conditions of candidates, wherein the multi-source medical data includes clinical data; The time interval acquisition module is used to determine the time interval between the generation time and the current time for each of the clinical data; wherein, the ingress and exclusion criteria are obtained by decomposing the natural language ingress and exclusion criteria using an AI model; The first calculation module is used to calculate the matching score of the candidate relative to each of the inclusion and exclusion screening conditions based on the multi-source medical data, and to calculate the evidence quality score of the candidate relative to each of the inclusion and exclusion screening conditions according to the time interval of each of the clinical data. The second calculation module is used to calculate the weighted fusion score of the candidate relative to each of the admission and rejection criteria based on the matching score and the evidence quality score of the candidate relative to each of the admission and rejection criteria. The judgment module is used to determine whether the candidate is a trial participant to be recruited based on the weighted fusion score of each of the inbound and outbound screening conditions.